Modin Versions Save

Modin: Scale your Pandas workflows by changing a single line of code

0.25.0

6 months ago

This release introduces modin.utils.execute function to improve benchmarking experience, includes new version of HDK 0.9. It also includes performance optimizations for sort_values, value_counts, 2D setitem and several others, as well as many bug fixes.

Key Features and Updates Since 0.24.0

Stability and Bugfixes
- FIX-#4507: Do not call ray.get() inside of the kernel executing call queues (#6633)
- FIX-#6585: Avoid FutureWarnings in rolling unless necessary (#6586)
- FIX-#6600: Fix usage of list of UDF functions in Series.groupby.agg (#6613)
- FIX-#6602: Refactor join to avoid distributing a dict object warning (#6612)
- FIX-#6604: HDK: Added support for list to DataFrame.agg() (#6606)
- FIX-#6607: Fix incorrect cache after .sort_values() (#6608)
- FIX-#6624: Add FutureWarnings for first/last/bool (#6625)
- FIX-#6628: Allow groupby.diff() for dates (#6631)
- FIX-#6632: Return Series instead of Dataframe for groupby.apply in case of experimental groupby (#6649)
- FIX-#6635: HDK: read_csv(): treat object dtype as string (#6636)
- FIX-#6637: Fix skiprows parameter usage for read_excel (#6638)
- FIX-#6642: Fix modin.numpy.array.sum on HDK (#6643)
- FIX-#6647: Added init file to make modin/experimental/sql/hdk/query.py part of modin package (#6646)
- FIX-#6651: Make sure Series.between works correctly (#6656)
- FIX-#6680: Specify navigation_with_keys=True to fix docs build (#6681)
Performance enhancements
- PERF-#2813: Distributed from_pandas() for numerical data in Ray (#6640)
- PERF-#5533: Improved sort_values by reducing the number of partitions (#6589)
- PERF-#6362: Implement 2D setitem without to-pandas conversion (#6618)
- PERF-#6614: HDK: Use MODIN_CPUS instead of os.cpu_count() for the fragment size calculation (#6615)
- PERF-#6629: HDK: Avoid LazyProxyCategoricalDtype materialization on merge (#6630)
- PERF-#6645: Avoid label synchronization for dot operation (#6644)
- PERF-#6653: value_counts(): Eliminate redundant sorting. (#6654)
- PERF-#6661: Do not convert columns dtypes if the new dtypes are the same (#6662)
Refactor Codebase
- REFACTOR-#6622: Don't use deprecated random_integers func (#6623)
Update testing suite
- TEST-#5489: Allow for pytest to print warnings in tests output (#6621)
Documentation improvements
- DOCS-#4085: Replace vague links to actual names of the pages/sections in docs (#4096)
- DOCS-#6658: Add a note how to enable object spilling in a multi-node Ray cluster (#6659)
New Features
- FEAT-#5221: Add execute to trigger lazy computations and wait for them to complete (#6648)
- FEAT-#5634: Introduce materialize parameter for partition.ip func (#6650)
- FEAT-#6675: Bump pyhdk version to 0.9 (#6676)

Contributors

@AndreyPavlenko @Egor-Krivov @Garra1980 @YarShev @anmyachev @dchigarev

0.24.1

7 months ago

Hotfix for sort_values.

Key Features and Updates Since 0.24.0

Stability and Bugfixes
- FIX-#6604: HDK: Added support for list to DataFrame.agg() (#6606)
- FIX-#6607: Fix incorrect cache after .sort_values() (#6608)

Contributors

@AndreyPavlenko @dchigarev

0.24.0

7 months ago

This release upgrades the pandas version to 2.1, updates the minimum supported python version up to 3.9, introduces ModinDataLoader to improve interaction with PyTorch, fixes several issues with interchange protocol that solved known compatibility issues with Plotly, Seaborn and Altair, includes new version of HDK 0.8. It also includes some other new features, and many bug fixes.

Key Features and Updates Since 0.23.0

Stability and Bugfixes
- FIX-#0000: Don't test experimental xgboost with Ray nightly build (#6424)
- FIX-#0000: Fix xgboost tests with ray>2.6.0 (#6425)
- FIX-#1930: Fix one of the cases of heterogeneous data for read_csv (#5507)
- FIX-#4347: read_excel: defaults to pandas for unsupported types of 'io' (#6462)
- FIX-#4580: Fix access by row label in query and eval (#6488)
- FIX-#4687: Change Column.null_count to return a built-in int instead of NumPy scalar (#6526)
- FIX-#5164: Fix unwrap_partitions for virtual partitions when axis=None (#6560)
- FIX-#5536: Remove branch disabling __getattribute__ for experimental mode (#6529)
- FIX-#5627: Stop checking temp_df.dtype == 'category' (#6360)
- FIX-#5972: Compute correct dtype for Series.str.find/index/rfind/rindex (#6426)
- FIX-#6219: Don't default to pandas for copy on empty DataFrame/Series objects (#6371)
- FIX-#6299: __array__ method always returns array of vanilla numpy (#6300)
- FIX-#6334: Improve error message if HDK isn't installed in the environment (#6358)
- FIX-#6347: Remove 'modin in the cloud' experimental feature (#6408)
- FIX-#6364: Make reshuffling work with BenchmarkMode.put(True) (#6365)
- FIX-#6367: Enable support for groupby.size() in reshuffling groupby (#6370)
- FIX-#6368: Apply deferred indices before map-reduce groupby (#6369)
- FIX-#6372: Precompute dtypes for sum operation (#6421)
- FIX-#6375: Don't initialize engines at import time (#6374)
- FIX-#6386: Don't make unnecessary astype calls for modin.array.sum op (#6395)
- FIX-#6392: Compute dtypes for the DataFrame.mean() result (#6520)
- FIX-#6394: Preserve dtypes for __setitem__ op when using not hashable key (#6547)
- FIX-#6396: Set __factory to None in case of any problems during initialization (#6397)
- FIX-#6402: Allow datetime and timedelta types in diff (#6403)
- FIX-#6405: Apply disable_logging to __getattr__ (#6406)
- FIX-#6410: Add a link to @modin_project twitter (#6411)
- FIX-#6414: Fix read_feather with pyarrow<11.0 (#6415)
- FIX-#6427: Make code compatible with flake8==6.1.0 (#6428)
- FIX-#6429: Exclude pymssql==2.2.8 from environments (#6430)
- FIX-#6436: Support ~ in paths in IO functions correctly (#6448)
- FIX-#6443: Cast boolean columns before sum|mean|median groupby aggregations (#6444)
- FIX-#6446: Stop requiring modin-xgboost approval (#6447)
- FIX-#6456: Create fake xgboost module for building docs (#6457)
- FIX-#6459: Support fastparquet>=2023.1.0 (#6458)
- FIX-#6465: Fix groupby.apply() for UDFs that change the output's shape (#6506)
- FIX-#6479: HDK CalciteBuilder: Do not call is_bool_dtype() for categorical (#6480)
- FIX-#6483: Default to pandas for __array_ufunc__ (#6486)
- FIX-#6509: Fix 'reshuffling' in case of a string key (#6510)
- FIX-#6514: test_sort_cols_str from test_dataframe.py crashed on HDK 0.7.0 and python 3.9 (#6515)
- FIX-#6516: HDK: test_dataframe.py is crashed if Calcite is disabled (#6517)
- FIX-#6518: Fix interchange protocol for string columns (#6523)
- FIX-#6519: Consider botocore as an optional dependency (#6521)
- FIX-#6532: Fix read_excel so that it doesn't use rich_text param for old openpyxl (#6534)
- FIX-#6535: Pin s3fs<2023.9.0 (#6536)
- FIX-#6537: Unpin s3fs<2023.9.0 (#6544)
- FIX-#6540: Correct handling of range indices and index names in read_parquet (#6545)
- FIX-#6541: Fix ValueError: buffer source array is read-only for iloc (#6538)
- FIX-#6549: Remove usage of dfsql module (#6550)
- FIX-#6552: Avoid FutureWarnings in groupby unless necessary (#6595)
- FIX-#6553: Fix read_csv with iterator=True (#6554)
- FIX-#6558: Normalize the number of partitions after .read_parquet() (#6559)
- FIX-#6561: Remove MODIN_OMNISCI_* env vars in favor of MODIN_HDK_* (#6562)
- FIX-#6565: Don't implement map function via applymap (#6566)
- FIX-#6572: Execute simple queries row-wise in pandas backend (#6575)
- FIX-#6582: Avoid FutureWarnings in bfill/backfill/ffill/pad unless necessary (#6599)
- FIX-#6587: Use different env files for unidist engine for windows and linux (#6588)
- FIX-#6601: sort_values shouldn't affect source dataframe/series (#6603)
Performance enhancements
- PERF-#6332: Don't materialize axes in concat operation (#6381)
- PERF-#6373: Preserve dtypes cache for _repartition (#6376)
- PERF-#6378: Use numpy.array operations in internals of iloc/loc operation (#6393)
- PERF-#6388: Avoid masking in __getitem__ when the number of rows to be taken > 90% (#6423)
- PERF-#6398: Improved performance of list-like objects insertion into DataFrames (#6476)
- PERF-#6433: Implement .dropna() using map-reduce pattern (#6472)
- PERF-#6437: Preserve dtypes for reindex (#6438)
- PERF-#6464: Improve reshuffling for multi-column groupby in low-cardinality cases (#6533)
- PERF-#6466: Verify indices equality without triggering any computations (#6491)
- PERF-#6478: Do not propagate new columns if they're identical to the previous ones (#6481)
- PERF-#6524: Add a 'column' shape hint for the results of qc.to_datetime() (#6525)
- PERF-#6583: Remove redundant index reassignment in query() (#6584)
- PERF-#6590: Chunk axes independently in .from_pandas() (#6591)
Refactor Codebase
- REFACTOR-#4278: Remove unused arguments from BasePandasDataset.apply (#6451)
- REFACTOR-#4902: Use isort (#6551)
- REFACTOR-#6470: Remove Patcher internal class (#6471)
- REFACTOR-#6489: Enforce API-layer bool/integer argument for __invert__ (#6490)
- REFACTOR-#6569: Use contextlib.nullcontext instead of custom one (#6570)
- REFACTOR-#6576: Don't use deprecated is_int64_dtype and is_period_dtype function (#6577)
Update testing suite
- TEST-#0000: Download ray wheel for python 3.9 (#6513)
- TEST-#2008: Reduce runtime of CI checks a lot (#6356)
- TEST-#4270: Revert disabling time_groupby_agg_nunique ASV bench (#6564)
- TEST-#4348: Use psycopg2-binary for testing and developing purpose (#6573)
- TEST-#4477: Add tests for df.eval with scalar and groupby.transofm call in the expr (#6546)
- TEST-#4643: Add interchange test for empty dataframe (#6454)
- TEST-#5008: Set benchmark mode within unit test instead of with environment variable (#6359)
- TEST-#6349: Update minimum versions for test dependencies in general environments (#6350)
- TEST-#6439: Create HDK environment manually for ASV (#6431)
- TEST-#6449: Run tests in test_dmatrix.py only for Ray engine (#6450)
- TEST-#6460: Don't use repr to force materialization (#6461)
- TEST-#6469: Pin numexpr<2.8.5 (#6474)
- TEST-#6477: Update ASV to 0.5.1 (#6432)
- TEST-#6497: Remove boto3 from environments to speedup creation (#6496)
- TEST-#6505: Update python version for ASV benchmarks on HDK (#6504)
- TEST-#6593: Adapt tests for pandas 2.1.1 (#6592)
Documentation improvements
- DOCS-#0000: Update CI link in README to show only pushes (#6531)
- DOCS-#6416: Fix import path for spreadsheet feature (#6581)
- DOCS-#6419: Clarify read_parquet supported parameters (#6420)
- DOCS-#6452: Update copyright year (#6453)
New Features
- FEAT-#1611: Add some datetime extraction functions for HDK (#6568)
- FEAT-#5645: Add support for modin's numpy array in dataframe.insert function (#6400)
- FEAT-#6139: DataLoader interplay. (#6140)
- FEAT-#6377: HDK: Do not keep reference to arrow table imported to HDK (#6380)
- FEAT-#6389: Make sure git ignores logs in .modin folder (#6390)
- FEAT-#6401: Support compression param and more file extensions in to_parquet (#6404)
- FEAT-#6407: Update minimum dependency versions (#6342)
- FEAT-#6417: Add support for filters to read_parquet (#6442)
- FEAT-#6434: HDK: Do not convert dictionary columns to string when importing arrow tables (#6435)
- FEAT-#6440: Use different HDK parameters for different queries (#6441)
- FEAT-#6484: HDK: Add support for nlargest/nsmallest groupby aggregation (#6485)
- FEAT-#6500: HDK: Add support for datetime64 to int64 cast (#6501)
- FEAT-#6502: HDK: Add enable_multifrag_execution_result=1 HDK launch parameter (#6503)
- FEAT-#6511: Update the minimum supported python version up to 3.9 (#6508)
- FEAT-#6522: Update to pandas 2.1.0 (#6512)
- FEAT-#6527: HDK: Add support for the quantile group by aggregation. (#6528)
- FEAT-#6597: Bump pyhdk version to 0.8 (#6598)

Contributors

@AndreyPavlenko @RehanSD @YarShev @anmyachev @dchigarev @mvashishtha @vnlitvinov @abykovsk @zmbc @noloerino @rentruewang

0.23.1

8 months ago

Modin 0.23.1

This release contains fixes that improve Modin's performance for both the NumPy and pandas APIs, as well as removes the Modin In the Cloud experimental feature. This release also includes upgrades to Modin's testing suite that significantly speed up CI.

Key Features and Updates Since 0.23.0

Stability and Bugfixes
- FIX-#0000: don't test experimental xgboost with Ray nightly build (#6424)
- FIX-#0000: fix xgboost tests with ray>2.6.0 (#6425)
- FIX-#1930: Fix one of the cases of heterogeneous data for read_csv (#5507)
- FIX-#4580: Fix access by row label in query and eval (#6488)
- FIX-#5627: Stop checking temp_df.dtype == 'category' (#6360)
- FIX-#5972: compute correct dtype for Series.str.find/index/rfind/rindex (#6426)
- FIX-#6219: don't default to pandas for 'copy' on empty DataFrame/Series objects (#6371)
- FIX-#6299: array method always returns array of vanilla numpy (#6300)
- FIX-#6334: improve error message if hdk isn't installed in the environment (#6358)
- FIX-#6347: remove 'modin in the cloud' experimental feature (#6408)
- FIX-#6364: Make reshuffling work with 'BenchmarkMode.put(True)' (#6365)
- FIX-#6367: Enable support for 'groupby.size()' in reshuffling groupby (#6370)
- FIX-#6368: Apply deferred indices before map-reduce groupby (#6369)
- FIX-#6372: precompute dtypes for 'sum' operation (#6421)
- FIX-#6375: don't initialize engines at import time (#6374)
- FIX-#6386: don't make unnecesary 'astype' calls for modin.array.sum op (#6395)
- FIX-#6396: set '__factory' to 'None' in case of any problems during initialization (#6397)
- FIX-#6402: Allow datetime and timedelta types in diff (#6403)
- FIX-#6405: Apply disable_logging to __getattr__ (#6406)
- FIX-#6410: add a link to @modin_project twitter (#6411)
- FIX-#6414: fix 'read_feather' with pyarrow<11.0 (#6415)
- FIX-#6427: make code compatible with flake8==6.1.0 (#6428)
- FIX-#6429: exclude pymssql==2.2.8 from environments (#6430)
- FIX-#6436: Support ~ in paths in IO functions correctly (#6448)
- FIX-#6443: Cast boolean columns before sum|mean|median groupby aggregations (#6444)
- FIX-#6456: create fake xgboost module for building docs (#6457)
- FIX-#6459: support fastparquet>=2023.1.0 (#6458)
- FIX-#6483: Default to pandas for array_ufunc (#6486)
Performance enhancements
- PERF-#6437: preserve dtypes for 'reindex' (#6438)
Update testing suite
- TEST-#2008: Reduce runtime of CI checks a lot (#6356)
- TEST-#6349: Update minimum versions for test dependencies in general environments (#6350)
- TEST-#6469: pin numexpr<2.8.5 (#6474)
New Features
- FEAT-#6407: update minimum dependency versions (#6342)
Uncategorized improvements
- Release version 0.23.1 (#6495)

Contributors

@AndreyPavlenko @RehanSD @YarShev @anmyachev @dchigarev @mvashishtha @vnlitvinov

0.23.0

10 months ago

Modin 0.23.0

This release upgrades the pandas version to 2.0. It also includes '.corr' speed-up, new features, and bug fixes.

Key Features and Updates Since 0.22.0

Stability and Bugfixes
- FIX-#1851: Squash multiple LogicalProject nodes (#6306)
- FIX-#3371: Remove pandas patch level pin (#6211)
- FIX-#4048: support sqlalchemy objects in con parameter for to_sql (#5940)
- FIX-#4485: fix 'clip' with list-like bounds and axis=None (#6344)
- FIX-#4954: defaults to pandas in read_json in case of rows having different columns (#5946)
- FIX-#5077: fix 'Series.rename_axis' signature (#6324)
- FIX-#5461: fix groupby if dataframe has empty partitions (#6307)
- FIX-#6035: Fall back to Pandas, when merging unsupported column types (#6036)
- FIX-#6085: HDK: Implemented support for datetime64 dtypes serialization (#6086)
- FIX-#6208: HDK: Added support for median aggregation (#6209)
- FIX-#6215: Process '.corr(numeric_only=False)' parameter at the qc level (#6242)
- FIX-#6218: Fix read_excel and unpin openpyxl (#6247)
- FIX-#6229: fix Series.equals/DataFrame.equals with NA entries (#6270)
- FIX-#6232: support DataFrame.cov(numeric_only=False) without fallback to pandas (#6262)
- FIX-#6237: Log errors only from deepest modin layer (#6238)
- FIX-#6245: support datetime64 with different resolutions types for HDK (#6255)
- FIX-#6246: fix 'groupby(..., as_index=False).agg(...)' case (#6263)
- FIX-#6258: Fix series to_dict (#6260)
- FIX-#6259: Fix astype("category") causing read-only buffer error (#6267)
- FIX-#6273: fix DataFrame.min/max/mean/median/skew/kurt with axis=None (#6275)
- FIX-#6297: fix experimental numpy.argmax/argmin with Nans in data (#6298)
- FIX-#6309: do not materialize axes for 'rank' operation (#6310)
- FIX-#6313: update MIN_RAY_VERSION var: 1.4.0 -> 1.13.0 (#6314)
- FIX-#6317: fix syntax error in 'push-to-master.yml' (#6318)
- FIX-#6336: pin 'pydantic<2' to fix CI (#6337)
- FIX-#6338: fix TypeError: WorksheetReader.init() got an unexpected keyword argument 'rich_text' (#6339)
- FIX-#6341: call _filter_empties only if shapes are different on particular axis (#6333)
- FIX-#6352: Fix the HdkOnNativeDataframePartition._width_cache property computation (#6353)
- FIX-#6354: Skip bad and pre-release versions (#6355)
Performance enhancements
- PERF-#4560: Implement '.corr()' method using MapReduce pattern (#6193)
- PERF-#6319: remove '__make_init_labels_args' explicit calls that materialize axes (#6312)
Refactor Codebase
- REFACTOR-#0000: Remove OmnisciWorker as unused (#6278)
- REFACTOR-#0000: rename 'exc' -> 'err' (#6252)
- REFACTOR-#6279: HDK DataFrame should not have more than one partition (#6280)
- REFACTOR-#6329: deprecate cloud feature (#6330)
Update testing suite
- TEST-#6282: Reduce copy-pasteness in ci.yml (#6283)
- TEST-#6308: add to_numpy ASV bench (#6305)
- TEST-#6315: increase 'install_timeout' for ASV benchmarks: 600 -> 6000 sec (#6316)
New Features
- FEAT-#5684: Use TreeReduce implementation for 'pivot_table' in certain cases (#6089)
- FEAT-#5759: Implement lazy Arrow execution for the HDK engine (#6251)
- FEAT-#5936: support pandas 2.0.2 (#5995)
- FEAT-#6048: add wait method for Dask/Ray/Unidist wrappers (#6049)
- FEAT-#6191: Implement groupby.rolling API (#6292)
- FEAT-#6253: add 'dtype_backend' parameter support for read_parquet/read_feather (#6264)
- FEAT-#6256: HDK: Add support for DataFrameGroupBy.head/tail() (#6257)
- FEAT-#6284: Do not convert HDK query execution result to arrow. (#6286)
- FEAT-#6296: Add additional pyhdk launch parameters (#6303)
- FEAT-#6322: Give a warning only if the major or minor part of pandas version are different (#6323)
- FEAT-#6325: Add GPU execution option for HDK backend (#6326)
- FEAT-#6327: Bump pyhdk version to 0.7 (#6328)
- FEAT-#6351: Add a simple heuristic for fragment size when running on a GPU (#6346)

Contributors

@AndreyPavlenko @YarShev @alexbaden @anmyachev @dchigarev @kurapov-peter @mvashishtha @vnlitvinov

0.22.3

10 months ago

Patch release with main point of pinning pydantic<2 to resolve Ray issues, plus a few bugfixes.

Key Features and Updates Since 0.22.2

Stability and Bugfixes
- FIX-#5461: fix groupby if dataframe has empty partitions (#6307)
- FIX-#6035: Fall back to Pandas, when merging unsupported column types (#6036)
- FIX-#6297: fix experimental numpy.argmax/argmin with Nans in data (#6298)
- FIX-#6309: do not materialize axes for 'rank' operation (#6310)
- FIX-#6313: update MIN_RAY_VERSION var: 1.4.0 -> 1.13.0 (#6314)
- FIX-#6336: pin 'pydantic<2' to fix CI (#6337)

Contributors

@AndreyPavlenko @anmyachev

0.23.0rc0

10 months ago

This release includes support for pandas 2.0, '.corr' speed-up, new features and bug fixes.

Note: this is a release candidate. If everything goes well, we'll release Modin 0.23.0 in two weeks.

Key Features and Updates Since 0.22.0

Stability and Bugfixes
- FIX-#3371: Remove pandas patch level pin (#6211)
- FIX-#4954: Defaults to pandas in read_json in case of rows having different columns (#5946)
- FIX-#6215: Process '.corr(numeric_only=False)' parameter at the qc level (#6242)
- FIX-#6218: Fix read_excel and unpin openpyxl (#6247)
- FIX-#6232: Support DataFrame.cov(numeric_only=False) without fallback to pandas (#6262)
- FIX-#6237: Log errors only from deepest modin layer (#6238)
- FIX-#6245: Support datetime64 with different resolutions types for HDK (#6255)
- FIX-#6246: Fix 'groupby(..., as_index=False).agg(...)' case (#6263)
- FIX-#6258: Fix series to_dict (#6260)
- FIX-#6259: Fix astype("category") causing read-only buffer error (#6267)
- FIX-#6273: Fix DataFrame.min/max/mean/median/skew/kurt with axis=None (#6275)
Performance enhancements
- PERF-#4560: Implement '.corr()' method using MapReduce pattern (#6193)
New Features
- FEAT-#5759: Implement lazy Arrow execution for the HDK engine (#6251)
- FEAT-#5936: Support pandas 2.0.2 (#5995)
- FEAT-#6048: Add wait method for Dask/Ray/Unidist wrappers (#6049)
- FEAT-#6253: Add 'dtype_backend' parameter support for read_parquet/read_feather (#6264)
- FEAT-#6256: HDK: Add support for DataFrameGroupBy.head/tail() (#6257)

Contributors

@AndreyPavlenko @YarShev @anmyachev @dchigarev @mvashishtha @vnlitvinov

0.22.2

10 months ago

This release includes several bug fixes.

Key Features and Updates Since 0.22.1

Stability and Bugfixes
- FIX-https://github.com/modin-project/modin/issues/6258: Fix series to_dict (https://github.com/modin-project/modin/pull/6260)
- FIX-https://github.com/modin-project/modin/issues/6259: Fix astype("category") causing read-only buffer error (https://github.com/modin-project/modin/pull/6267)

Contributors

@mvashishtha

0.22.1

11 months ago

This release includes a bug fix.

Key Features and Updates Since 0.22.0

Stability and Bugfixes
- FIX-https://github.com/modin-project/modin/issues/6237: Log errors only from deepest modin layer (https://github.com/modin-project/modin/pull/6238)

Contributors

@mvashishtha

0.22.0

11 months ago

This release includes support for pyhdk=0.6, a few performance enhancements, new features and bug fixes.

Key Features and Updates Since 0.21.0

Stability and Bugfixes
- FIX-https://github.com/modin-project/modin/issues/6104: Stop selecting same column twice for repr (https://github.com/modin-project/modin/pull/6210)
- FIX-https://github.com/modin-project/modin/issues/6199: make sure read_html return a list of DataFrames (https://github.com/modin-project/modin/pull/6200)
- FIX-https://github.com/modin-project/modin/issues/6201: align groupby objects signatures with pandas (https://github.com/modin-project/modin/pull/6202)
- FIX-https://github.com/modin-project/modin/issues/6212: Fix '.read_feather()' failure if the file contains index metadata (https://github.com/modin-project/modin/pull/6213)
- FIX-https://github.com/modin-project/modin/issues/6216: make sure 'infer_objects' returns DataFrame (https://github.com/modin-project/modin/pull/6217)
- FIX-https://github.com/modin-project/modin/issues/5722: Use full axis function when casting to "category" (https://github.com/modin-project/modin/pull/6222)
- FIX-https://github.com/modin-project/modin/issues/5889: HDK: Combine multiple lazy concat operations into a single one and replace recursion with iteration (https://github.com/modin-project/modin/pull/5932)
Performance enhancements
- PERF-https://github.com/modin-project/modin/issues/6126: Remove redundant '.fillna(0)' at the end of '.size()' and '.count()' (https://github.com/modin-project/modin/pull/6127)
- PERF-https://github.com/modin-project/modin/issues/6224: Use 'Map' operator to retrieve categorical codes (https://github.com/modin-project/modin/pull/6230)
Refactor Codebase
- REFACTOR-https://github.com/modin-project/modin/issues/5916: Align Python engine's API with other engines (https://github.com/modin-project/modin/pull/6214)
New Features
- FIX-https://github.com/modin-project/modin/issues/6189: Bump pyhdk version to 0.6 (https://github.com/modin-project/modin/pull/6190)
- FEAT-https://github.com/modin-project/modin/issues/6225: Allow set_index to take an object to be handled by the backend (a backend index, etc) (https://github.com/modin-project/modin/pull/6228)
Dependencies
- FIX-https://github.com/modin-project/modin/issues/6072: unpin pyarrow and xfail test_read_parquet_pandas_index test (https://github.com/modin-project/modin/pull/6223)

Contributors

@mvashishtha @AndreyPavlenko @anmyachev @dchigarev @jkew @YarShev