Eventual Inc Daft Versions Save

Distributed DataFrame for Python designed for the cloud, powered by Rust

v0.2.24

2 weeks ago

Changes

✨ New Features

  • [FEAT] Allow returning of pyarrow arrays from UDFs @jaychia (#2252)
  • [FEAT] Add left, right, and outer joins @kevinzwang (#2166)
  • [FEAT] Add rpad and lpad expressions @murex971 (#2157)
  • [FEAT] AWS Profile override in S3Config @samster25 (#2243)
  • [FEAT] Add unpivot @kevinzwang (#2204)
  • [FEAT] Add string repeat functionality @murex971 (#2198)
  • [FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)
  • [FEAT] pivot @colin-ho (#2183)

🚀 Performance Improvements

  • [PERF] Adaptive Query Execution @samster25 (#2176)
  • [PERF]: swap out json_deserializer for simd_json @universalmind303 (#2228)
  • [PERF] Evaluate only true/false side of if_else if predicate is boolean @colin-ho (#2222)
  • [PERF] enable metadata preservation across materialization points @samster25 (#2216)

👾 Bug Fixes

  • [BUG] Fix tab completion on expression namespaced accessors @jaychia (#2251)
  • [BUG] route abfss to AzureBlob @samster25 (#2244)

📖 Documentation

  • [CHORE] Skip demo notebook @jaychia (#2248)
  • [FEAT] Add rpad and lpad expressions @murex971 (#2157)
  • [DOCS] Add user guide for read_sql @colin-ho (#2226)
  • [FEAT] Add unpivot @kevinzwang (#2204)
  • [DOCS] Add read_hudi in the api docs @xushiyan (#2225)
  • [FEAT] Add string repeat functionality @murex971 (#2198)
  • [DOCS] LinkedIn Big Data meetup tutorial @jaychia (#2223)
  • [FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)
  • [DOCS] Add read_lance docs @jaychia (#2218)
  • [FEAT] pivot @colin-ho (#2183)

🧰 Maintenance

  • [CHORE] Drop Python 3.7 @samster25 (#2250)
  • [CHORE] Improve timestamp repr @colin-ho (#2245)
  • [CHORE] Allow multiple group_bys for pivot @colin-ho (#2242)
  • [CHORE] Skip demo notebook @jaychia (#2248)
  • [CHORE] Return &str for expression name @colin-ho (#2224)
  • [CHORE] Mount provision.py for iceberg integration tests @jaychia (#2232)
  • [CHORE]: remove trait aliases @universalmind303 (#2229)

⬆️ Dependencies

  • Bump serde from 1.0.198 to 1.0.200 @dependabot (#2239)
  • Bump csv-async from 1.2.6 to 1.3.0 @dependabot (#2238)

v0.2.23

3 weeks ago

Changes

✨ New Features

  • [FEAT] Read from LanceDB @jaychia (#2195)
  • .sqrt() expression @dmaymay (#2180)

👾 Bug Fixes

  • [BUG] Propagate errors when hitting them in parquet byte stream @samster25 (#2214)

📖 Documentation

  • .sqrt() expression @dmaymay (#2180)
  • [Docs] Add Dask to DF Comparison FAQ @avriiil (#2210)

🧰 Maintenance

  • [CHORE]: statically link liblzma @universalmind303 (#2213)

v0.2.22

4 weeks ago

Changes

This is the last release that will support Python 3.7 which has been EOL for about a year now.

✨ New Features

  • [FEAT] Rust side exceptions for Transient Errors @samster25 (#2197)
  • [FEAT] Enable anonymous S3 access for Delta @jaychia (#2206)
  • [FEAT] Improve Hudi support for more scenarios @xushiyan (#2149)
  • [FEAT] try python 3.11 for releases @samster25 (#2184)
  • [FEAT] Timestamp Truncation @colin-ho (#2158)
  • [FEAT] Enhance temporal arithmetic functionalities @colin-ho (#2146)
  • [FEAT] Add logarithmic expressions @murex971 (#2168)
  • [PERF] Move with_column and exclude function logic to Rust side, add with_columns @kevinzwang (#2167)
  • [FEAT] Allow for variadic kwargs in UDFs @jaychia (#2162)
  • [FEAT] hide stack traces of wrappers for pytest / ipython @samster25 (#2159)
  • [FEAT] Improve query building in read_sql @colin-ho (#2144)

🚀 Performance Improvements

  • [PERF] Move with_column and exclude function logic to Rust side, add with_columns @kevinzwang (#2167)
  • [PERF] Refactor TreeNode to be native to Arc<TreeNode> @samster25 (#2175)

👾 Bug Fixes

  • [FEAT] Improve Hudi support for more scenarios @xushiyan (#2149)
  • [BUG] bump arrow2 to use copy ptr instead of from_raw_parts @samster25 (#2194)
  • [BUG] Fix empty inputs case for string kernels @jaychia (#2165)
  • [BUG] Fix tuple inputs in UDF @jaychia (#2161)

📖 Documentation

  • [DOCS] Add Hudi integration entry @xushiyan (#2208)
  • [FEAT] Improve Hudi support for more scenarios @xushiyan (#2149)
  • [FEAT] Timestamp Truncation @colin-ho (#2158)
  • [FEAT] Add logarithmic expressions @murex971 (#2168)
  • [DOCS] add Dask migration guide @avriiil (#2169)
  • [CHORE] enable codespell and fix mispellings @samster25 (#2177)
  • [PERF] Move with_column and exclude function logic to Rust side, add with_columns @kevinzwang (#2167)

🧰 Maintenance

  • [CHORE] Upgrade Rust toolchain to 2024 04 01 @samster25 (#2192)
  • [CHORE] upgrade dask again for publish pipeline @samster25 (#2193)
  • [CHORE] remove fsspec http filesystem @samster25 (#2191)
  • [CHORE] add ray min version for windows to remove upper pin on pyarrow @samster25 (#2189)
  • [CHORE] upgrade dask for tests for 3.11 @samster25 (#2186)
  • [CHORE] disable macos check for testing in publish pipeline @samster25 (#2185)
  • [CHORE] upgrade publish pipeline python version to 3.12 @samster25 (#2182)
  • [CHORE] use python 3.9 for publishing @samster25 (#2181)
  • [CHORE] enable codespell and fix mispellings @samster25 (#2177)
  • [CHORE] Add usr msg for df.explain(show_all=True) @avriiil (#2081)
  • [CHORE] improve recursive listing error msg @avriiil (#2145)
  • [CHORE] Enforce physical types in logical array @jaychia (#2160)
  • [CHORE] Fix style build on main @jaychia (#2163)
  • [CHORE] unify expr children around expr ref @samster25 (#2156)
  • [CHORE] Rearrange modules in daft-plan crate @samster25 (#2151)
  • [CHORE] collect all tests before running pytest so check for import errors @samster25 (#2150)
  • [CHORE] Bottom Up Logical To Physical translation @samster25 (#2147)

⬆️ Dependencies

5 changes
  • Bump comfy-table from 7.1.0 to 7.1.1 @dependabot (#2199)
  • Bump num-traits from 0.2.17 to 0.2.18 @dependabot (#2200)
  • Bump slackapi/slack-github-action from 1.25.0 to 1.26.0 @dependabot (#2174)
  • Bump regex from 1.10.3 to 1.10.4 @dependabot (#2172)
  • Bump serde_json from 1.0.108 to 1.0.116 @dependabot (#2173)

v0.2.21

1 month ago

Changes

✨ New Features

  • [FEAT] Add S3Config.from_env functionality @jaychia (#2137)
  • deltalake _delta_lake.py: Allow Glue catalog cross account access @pang-wu (#2113)
  • [FEAT] Enable Ruff @samster25 (#2121)
  • [FEAT] Implements other trigonometry expressions @MeepoWin (#2123)
  • [FEAT] exp expression implementation @MeepoWin (#2115)
  • [FEAT] sin/cos/tan expression implementation @reswqa (#2112)
  • [CHORE] Using uv in MakeFile @MeepoWin (#2114)
  • [FEAT] Add option to S3Config to force virtual addressing @samster25 (#2106)
  • [FEAT] fill_null expression @colin-ho (#2089)
  • [FEAT] Add basic list aggregations @kevinzwang (#2032)
  • [FEAT] Allow sql alchemy connection factory as input to read_sql @colin-ho (#2071)
  • [FEAT] Add daft-sketch subcrate and arrow2 serialization functionality @jaychia (#2090)

👾 Bug Fixes

  • [BUG] Fix reading partition key columns in DeltaLake @jaychia (#2118)

📖 Documentation

  • [CHORE] Fix underlines in README @jaychia (#2143)
  • [DOCS] Update iceberg integration docs to add writes @jaychia (#2110)
  • [DOCS] Create CODE_OF_CONDUCT.md @samster25 (#2101)
  • [CHORE] Skip deltalake notebooks for CI @jaychia (#2097)
  • [CHORE] Add link to good first issues in readme @colin-ho (#2088)
  • [DOCS] Fix docs typo @avriiil (#2075)
  • [DOCS] Typos in user guide @avriiil (#2079)
  • [DOCS] Fix typos on 10-min tutorial @avriiil (#2082)
  • [DOCS] Add ml batch inference tutorials @jaychia (#2057)
  • [CHORE] Fix autolabeller CI step for forks @jaychia (#2138)

🧰 Maintenance

  • [CHORE] Fix underlines in README @jaychia (#2143)
  • [CHORE] Split labelling and update release CI steps @jaychia (#2142)
  • [CHORE] Fix the labeller CI step which is not triggering @jaychia (#2141)
  • [CHORE] Fixing readthedocs build @jaychia (#2135)
  • [CHORE] Fix documentation build with uv @jaychia (#2134)
  • [CHORE] Fix build command @MeepoWin (#2126)
  • [CHORE] Rename virtual env folder to .venv @MeepoWin (#2122)
  • [CHORE] refactors for ruff [1/n] @samster25 (#2120)
  • [CHORE] FunctionExpr and exp @samster25 (#2119)
  • [CHORE] FunctionEvaluator directly receive FunctionExpr @MeepoWin (#2117)
  • [CHORE] Update .gitignore for JetBrains IDE and pyenv user @MeepoWin (#2116)
  • [CHORE] Refactor string kernels @colin-ho (#2087)
  • [CHORE] Skip deltalake notebooks for CI @jaychia (#2097)
  • [CHORE] Add link to good first issues in readme @colin-ho (#2088)
  • [CHORE] fix empty data and pattern case in str expressions @murex971 (#2085)

⬆️ Dependencies

5 changes
  • Bump bytes from 1.5.0 to 1.6.0 @dependabot (#2131)
  • Bump futures from 0.3.28 to 0.3.30 @dependabot (#2130)
  • Bump isbang/compose-action from 1.5.1 to 2.0.0 @dependabot (#2091)
  • Bump dyn-clone from 1.0.16 to 1.0.17 @dependabot (#2093)
  • Bump tokio from 1.33.0 to 1.37.0 @dependabot (#2092)

v0.2.20

1 month ago

Changes

✨ New Features

  • [FEAT] improve error message for s3 streaming error @samster25 (#2055)
  • [FEAT] Add str.replace expression @colin-ho (#2048)
  • [FEAT] Enable str.split using regex pattern @colin-ho (#2044)
  • [FEAT] Support Hudi reader @xushiyan (#2011)
  • [FEAT] Add find functionality for string @murex971 (#2046)
  • [FEAT] round expression implemtation @sherlockbeard (#2041)
  • [FEAT] Add str.extract_all expression @colin-ho (#2038)
  • [FEAT] Add str.right() function @murex971 (#2031)
  • [FEAT] Sign expression implemtation @sherlockbeard (#2037)
  • [FEAT] drop psutil in favor of our own tool @samster25 (#2035)
  • [FEAT] Allow passing on_error="null" to ignore decoding errors in image decode @jaychia (#2033)
  • [FEAT] Add str.extract() function @colin-ho (#2020)
  • [FEAT] Add str.left() funtion @murex971 (#2027)

🚀 Performance Improvements

  • [PERF] [Delta Lake] Add IO multithreading arg to daft.read_delta_lake(). @clarkzinzow (#2029)

👾 Bug Fixes

  • [BUG] Allow for writes to s3a and s3n paths @jaychia (#2054)
  • [BUG] Fix if_else series naming from predicate broadcast @colin-ho (#2051)
  • [BUG] enable dependabot for iceberg int tests @samster25 (#2042)
  • [BUG] Fix all-null ImageArray length issues @jaychia (#2034)
  • [BUG] produce only a single sdist @samster25 (#2078)

📖 Documentation

  • [FEAT] Add str.replace expression @colin-ho (#2048)
  • [DOCS] Add docs for write_iceberg @jaychia (#2053)
  • [FEAT] Add str.extract_all expression @colin-ho (#2038)
  • [FEAT] Add str.extract() function @colin-ho (#2020)
  • [CHORE] Add global aggregation docs and error on improper aggregation usage @kevinzwang (#2025)
  • [DOCS] Fix typos and broken links @kaytsui (#2052)

🧰 Maintenance

  • [CHORE] Remove autouse from gen_tpch fixture @colin-ho (#2049)
  • [CHORE] Refactor sql tpch tests @colin-ho (#2047)
  • [CHORE] Add tpch test for read sql @colin-ho (#2026)
  • [CHORE] Add column range stats from read_sql @colin-ho (#2015)
  • [CHORE] upgrade upload/download artifact github action @samster25 (#2043)
  • [CHORE] Exclude Twitter and LinkedIn from broken link checker @colin-ho (#2030)
  • [CHORE] Add global aggregation docs and error on improper aggregation usage @kevinzwang (#2025)
  • [CHORE] Use docker compose instead of docker-compose in builds @jaychia (#2077)

⬆️ Dependencies

10 changes
  • Bump release-drafter/release-drafter from 5 to 6 @dependabot (#2065)
  • Bump nick-fields/retry from 2 to 3 @dependabot (#2066)
  • Bump openssl-sys from 0.9.93 to 0.9.102 @dependabot (#2062)
  • Bump async-trait from 0.1.74 to 0.1.79 @dependabot (#2061)
  • Bump base64 from 0.21.5 to 0.22.0 @dependabot (#2059)
  • Bump async-compression from 0.4.5 to 0.4.7 @dependabot (#2060)
  • Bump lxml from 4.9.3 to 5.1.0 @dependabot (#1764)
  • Bump slackapi/slack-github-action from 1.24.0 to 1.25.0 @dependabot (#1822)
  • Bump actions/setup-python from 4 to 5 @dependabot (#1717)
  • Bump conda-incubator/setup-miniconda from 2 to 3 @dependabot (#1666)

v0.2.19

2 months ago

Changes

✨ New Features

  • [FEAT] iceberg writes unpartitioned @samster25 (#2016)
  • [FEAT] Add str.match() function @colin-ho (#2007)
  • [FEAT] read_sql @colin-ho (#1943)

👾 Bug Fixes

  • [BUG] Fix connector-x and psycopg dependencies for CI @colin-ho (#2017)
  • [BUG] Disable Numeric and String comparison @samster25 (#2019)
  • [BUG] deltalake read pq splitting bug @jaychia (#2013)

📖 Documentation

  • [DOCS] [Hotfix] [Delta Lake] Break data skipping optimizations into different section. @clarkzinzow (#2018)
  • [CHORE] Fix is_in docs @colin-ho (#2022)
  • [FEAT] Add str.match() function @colin-ho (#2007)
  • [FEAT] read_sql @colin-ho (#1943)

🧰 Maintenance

  • [CHORE] Unskip tensor tests in test_parquet_roundtrip.py @jaychia (#2024)
  • [CHORE] Fix is_in docs @colin-ho (#2022)

v0.2.18

2 months ago

Changes

✨ New Features

  • [FEAT] Top level global expressions @kevinzwang (#2000)
  • [FEAT] Add str.capitalize() function @murex971 (#2003)
  • [FEAT] Support reading Parquet files with Field ID @jaychia (#1990)
  • [FEAT] Enable JQ style JSON accessors on strings @colin-ho (#2001)
  • [FEAT] [Catalogs] [Delta Lake] Add support for AWS Glue Catalog and Databricks Unity Catalog integrations to Delta Lake reader @clarkzinzow (#1991)
  • [FEAT] Enable UDF to handle arbitrary number of Daft series @gmweaver (#1984)

👾 Bug Fixes

  • [BUG] skip metadata check for field equality @samster25 (#2006)
  • [BUG] Fix struct getters on logical types @jaychia (#2008)
  • [BUG] Filter out marker files from glob scan @colin-ho (#1999)

📖 Documentation

  • [DOCS] Fix stale docs in df.write_csv @jaychia (#2012)
  • [FEAT] Enable JQ style JSON accessors on strings @colin-ho (#2001)

🧰 Maintenance

  • [CHORE] [Hotfix] Remove pyarrow upper bound for Windows. @clarkzinzow (#2002)
  • [CHORE] [Catalogs] [Delta Lake] Add test coverage for Delta Lake reads on Azure. @clarkzinzow (#1970)
  • [CHORE] [Repartitioning] Refactor + hide PartitionSpec and rename to ClusteringSpec. @clarkzinzow (#1961)
  • [CHORE] Simplify cast to schema @jaychia (#1982)
  • [CHORE] Disables anonymous mode for S3 accesses in DeltaLake @jaychia (#1975)
  • [CHORE] Set DAFT_ANALYTICS_ENABLED=0 in nightly tests @jaychia (#1972)

v0.2.17

2 months ago

Changes

✨ New Features

  • [FEAT] Add str.reverse() function @nsalerni (#1957)
  • [FEAT] Add str.lower() function @nsalerni (#1938)
  • [FEAT] MapArray @colin-ho (#1959)
  • [FEAT] any_value groupby aggregation @kevinzwang (#1941)
  • [FEAT] adding floor function @chandbud5 (#1960)
  • [FEAT] Expose coerce_int96_timestamp_unit flag on top level daft.read_parquet call @samster25 (#1936)
  • [FEAT] Time Array @colin-ho (#1892)
  • [FEAT] Add str.lstrip() and str.rstrip() functions @nsalerni (#1944)
  • [FEAT] Add str.upper() function @nsalerni (#1942)

🚀 Performance Improvements

  • [PERF] scan task in memory estimate @samster25 (#1901)
  • [PERF] Spread scan tasks over Ray cluster. @clarkzinzow (#1950)

📖 Documentation

  • [DOCS] [Delta Lake] Add user guide for Delta Lake reads. @clarkzinzow (#1969)
  • [Catalogs] [Delta Lake] Add initial support for reading from Delta Lake. @clarkzinzow (#1879)
  • [DOCS] Fix notebooks by falling back on null for URL downloads @jaychia (#1951)
  • [DOCS] Add documentation for using and developing Daft on Ray @kevinzwang (#1896)
  • [DOCS] Update schema hints documentation @jaychia (#1935)

🧰 Maintenance

  • [CHORE] Remove non-MicroPartition and non-ScanOperator paths @clarkzinzow (#1946)
  • [CHORE] Populate previews only when show() or __repr__() is called @colin-ho (#1889)
  • [CHORE] Update segment endpoint @jaychia (#1902)

v0.2.16

3 months ago

Changes

✨ New Features

  • [FEAT] perform head operation instead of list when given a file without regex or / @samster25 (#1891)

🚀 Performance Improvements

  • [PERF] Parallel glob @samster25 (#1897)

v0.2.15

3 months ago

Changes

👾 Bug Fixes

  • [BUG] dont create dirs if non local fs @samster25 (#1888)
  • [BUG] Fix Ray autoscaling from zero worker CPUs @kevinzwang (#1884)
  • [BUG] Attempt to skip IMDS if region or credentials are provided @samster25 (#1886)
  • [BUG] [Query Planner] Properly track ascending/descending sort order for range partitioning and sorting. @clarkzinzow (#1862)
  • [BUG] Fix bug with merge tasks that allows for tasks larger than max size allowed @samster25 (#1882)

📖 Documentation

  • [DOCS] Improve .str.concat docs @jaychia (#1887)

🧰 Maintenance

  • [CHORE] Left-align headers in reprs @jaychia (#1880)