Metadata-Version: 2.1 Name: dask-expr Version: 1.1.1 Summary: High Level Expressions for Dask Maintainer-email: Matthew Rocklin License: BSD Project-URL: Source code, https://github.com/dask-contrib/dask-expr/ Keywords: dask pandas Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Topic :: Scientific/Engineering Classifier: Topic :: System :: Distributed Computing Requires-Python: >=3.9 Description-Content-Type: text/markdown License-File: LICENSE.txt Requires-Dist: dask ==2024.5.1 Requires-Dist: pyarrow >=7.0.0 Requires-Dist: pandas >=2 Dask Expressions ================ Dask DataFrames with query optimization. This is a rewrite of Dask DataFrame that includes query optimization and generally improved organization. More in our blog posts: - [Dask Expressions overview](https://blog.dask.org/2023/08/25/dask-expr-introduction) - [TPC-H benchmark results vs. Dask DataFrame](https://blog.coiled.io/blog/dask-expr-tpch-dask.html) Example ------- ```python import dask_expr as dx df = dx.datasets.timeseries() df.head() df.groupby("name").x.mean().compute() ``` Query Representation -------------------- Dask-expr encodes user code in an expression tree: ```python >>> df.x.mean().pprint() Mean: Projection: columns='x' Timeseries: seed=1896674884 ``` This expression tree will be optimized and modified before execution: ```python >>> df.x.mean().optimize().pprint() Div: Sum: Fused(375f9): | Projection: columns='x' | Timeseries: dtypes={'x': } seed=1896674884 Count: Fused(375f9): | Projection: columns='x' | Timeseries: dtypes={'x': } seed=1896674884 ``` Stability --------- This is the default backend for dask.DataFrame since version 2024.3.0. API Coverage ------------ Dask-Expr covers almost everything of the Dask DataFrame API. The only missing features are: - named GroupBy Aggregations