Mars Project Mars Versions Save

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

v0.8.3

2 years ago

This is the release notes of v0.8.3. See here for the complete list of solved issues and merged PRs.

Enhancements

Stop inferring outputs when args provided (#2761)
Remove deprecate warnings when import mars.tensor (#2790)
[Ray] New ray actor creation model (#2794)

Bug fixes

Fix long exception of asyncio.gather (#2753)
Fix wrong result of df.merge (#2777)
Fix DataFrame initializer when Mars object exists in list (#2778)
Fix duplicate dec object ref (#2789, thanks @Catch-Bull!)
[Ray] Support Ray client mode (#2796)

Tests

Increase test stability for command-line tests (#2786)

v0.9.0b2

2 years ago

This is the release notes of v0.9.0b2. See here for the complete list of solved issues and merged PRs.

New Features

Metric
- Add metric framework (#2742, thanks @zhongchun!)
- Add prometheus metric implementation (#2752, thanks @zhongchun!)
- Add ray metrics implementation (#2749, thanks @zhongchun!)
- Add common metrics (#2760, thanks @zhongchun!)

Enhancements

Simplify rechunk implementation (#2745)
Stop inferring outputs when args provided (#2759)
Add broadcast merge support for DataFrame (#2772)
Remove deprecate warnings when import mars.tensor (#2788)
Optimize in-process actor calls (#2763)
[ray] New ray actor creation model (#2783)

Bug fixes

Fix duplicate dec object ref (#2741, thanks @Catch-Bull!)
Fix long exception of asyncio.gather (#2748)
Fix NameError: name 'pq' is not defined if pyarrow is not installed (#2751)
Fix profiling band_subtasks and most_calls are empty if the slow duration is large (#2755)
Fix the wrong result of df.merge (#2774)
Fix DataFrame initializer when Mars object exists in list (#2770)
[ray] support ray client mode (#2773)

Tests

Increase test stability for command-line tests (#2779)

v0.8.2

2 years ago

This is the release notes of v0.8.2. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame
- Support inclusive argument for pd.date_range (#2721)

Enhancements

Optimize eval-setitem expressions as single eval expressions (#2699)
[Ray] Refine raydataset integration (#2712)
[Ray] refine ray dataset integration (#2726)
Add support for reading partitioned parquet for fastparquet (#2729)
Fix duplicate exceptions in log (#2736)

Bug fixes

Fix sort_values for empty DataFrame or Series (#2686)
Eliminate redundant eval node in optimization (#2688)
Avoid iterative tiling for df.loc[:, fields] (#2689)
Fix use_arrow_dtype parameter for read_parquet (#2702)
Fix error on dependent DataFrame setitems (#2703)
Fix estimate_pandas_size on pd.MultiIndex (#2710)
Import vineyard.data.pickle to make members available (#2716)
Fix shuffle when ndim of input tensors are different (#2728)

v0.9.0b1

2 years ago

This is the release notes of v0.9.0b1. See here for the complete list of solved issues and merged PRs.

Highlights

A new coloring-based fusion algorithm is introduced in #2719, performance is expected to have a significant increase compared to previous releases, however, some unexpected situations may happen, feel free to reach out to us if you find any.

New Features

DataFrame
- Support inclusive argument for pd.date_range (#2718)
Others
- Add cibuildwheel with Linux AArch64 wheel build support (#2672, thanks @odidev!)

Enhancements

Refine failure recovery log and exception (#2633)
Optimize eval-setitem expressions as single eval expressions (#2695)
Auto merge small chunks when df.groupby().apply(func) is doing aggregation (#2708)
Optimize GroupBy's aggregation algorithm (#2696)
[Ray] refine ray dataset integration (#2705)
Improve profiling (#2629)
Add support for reading partitioned parquet for fastparquet (#2724)
Introduce coloring based fusion algorithm (#2719)
Fix duplicate exceptions in log (#2723)

Bug fixes

Fix sort_values for empty DataFrame or Series (#2681)
Eliminate redundant eval node in optimization (#2683)
Avoid iterative tiling for df.loc[:, fields] (#2685)
[hotfix][ray] fix ray dataset compatibility (#2693)
Fix use_arrow_dtype parameter for read_parquet (#2698)
Fix error on dependent DataFrame setitems (#2701)
Fix estimate_pandas_size for pd.MultiIndex (#2707)
Import vineyard.data.pickle to make members available. (#2714)
Fix shuffle when ndim of input tensors are different (#2727)

Documentation

Add Slack invite link (#2704, thanks @yuyiming!)

v0.8.1

2 years ago

This is the release notes of v0.8.1. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame
- Add support for GroupBy.{ffill, bfill,fillna} (#2657, thanks @Marascax!)
- Add nunique support for DataFrameGroupBy (#2667)

Enhancements

Add support for HTTP request rewriter (#2665)
Add merging small files support for md.{read_parquet, read_csv} (#2669)
Optimize filtering DataFrame with its fields (#2668)

Bug fixes

Allow specifying multiple supervisor processes (#2625)
Fix backward compatibility for pandas 1.0 (#2630)
Fix NotImplementedError for mo.batch when single call not implemented (#2637)
Fix compatibility for pandas 1.4 (#2652)
Fix IndexError raise by aggregation of DataFrameGroupBy (#2653)
Fix df.loc[:] to make sure same index_value key generated (#2654)
Fix aggregation with comparison (#2655)
Fix the wrong index_value generated by df.loc[:] (#2666)
Fix as_index when calling groupby-agg (#2678)

v0.9.0a2

2 years ago

This is the release notes of v0.9.0a2. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame
- Add support for GroupBy.{ffill, bfill,fillna} (#2639, thanks @Marascax!)
- Add nunique support for DataFrameGroupBy (#2662)
Others
- Add wheel support for Python 3.10 and drop Python 3.6 (#2622)

Enhancements

Added merging small files support for md.{read_parquet, read_csv} (#2661)
Add support for HTTP request rewriter (#2664)
Optimize filtering DataFrame with its fields (#2571)
Add pyproject.toml to config build packages (#2674)

Bug fixes

Fix backward compatibility for pandas 1.1 and 1.2 (#2624)
Fix backward compatibility for pandas 1.0 (#2628)
Fix NotImplementedError for mo.batch when single call not implemented (#2635)
Fix IndexError raise by aggregation of DataFrameGroupBy (#2641)
Fix compatibility for pandas 1.4 (#2650)
Fix df.loc[:] to make sure same index_value key generated (#2643)
Fix aggregation with comparison (#2647)
Fix the wrong index_value generated by df.loc[:] (#2658)
Fix optimizing DataFrame query with timestamp in conditions (#2671)
Fix as_index when calling agg on SeriesGroupBy (#2676)

v0.8.0

2 years ago

This is the release notes of v0.8.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.8.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1 alpha2 alpha3 beta1 beta2 rc1

New Features

Tensor
- Implements mt.bincount (#2552)
DataFrame
- Support Series.median (#2570, thanks @perfumescent!)
Learn
- Add mars.learn.metrics.multilabel_confusion_matrix and derivative metrics (#2568)

Enhancements

Implement web API of get_infos (#2564)
Reduce time cost of cpu_percent() calls (#2572)
Stop calling user funcs when dtypes is specified (#2596)
Supports adding Mars extensions via setup entrypoints (#2598)
[Ray] Refine mars on ray usability (#2606)
Reduce estimation time cost (#2607)
Skip details of shuffled chunks in meta (#2609)
Reduce the time cost of fetching tileable data (#2616)
Reduce RPC cost of oscar by removing unnecessary tasks (#2613)
Use batched request to apply for slots (#2615)

Bug fixes

Fix index series.apply when result index unchanged (#2563)
Fix DataFrame getitem when exists duplicate columns (#2582)
Upgrade required version of vineyard (#2593)
Fix progress always is 0 or 100% (#2595)
Fix None dtype for some unary tensor functions (#2604)
Make Proxima work with latest Mars (#2605, thanks @yuyiming!)
Fix tests for cudf 21.10 (#2608)
Fix duplicate decref of subtask input chunk (#2614, thanks @Catch-Bull!)

v0.9.0a1

2 years ago

This is the release notes of v0.9.0a1. See here for the complete list of solved issues and merged PRs.

New Features

Tensor
- Implements mt.bincount (#2548)
DataFrame
- Support Series.median() (#2566, thanks @perfumescent!)
Learn
- Add mars.learn.metrics.multilabel_confusion_matrix and derivative metrics (#2554)
Services
- Add basic profiling support for supervisor (#2586)

Enhancements

Add app_queue in new_cluster (#2550, thanks @xxxxsk!)
Implement web API of get_infos (#2558)
Reduce time cost of cpu_percent() calls (#2567)
Reduce estimation time cost (#2577)
[ray] refine mars on ray usability (#2580)
[ray] Refine raydataset integration (#2579)
Optimize tileable graph construction (#2583)
Stop calling user funcs when dtypes is specified (#2587)
Supports adding Mars extensions via setup entrypoints (#2589)
Skip details of shuffled chunks in meta (#2600)
Reduce the time cost of fetching tileable data (#2594)
Use batched request to apply for slots (#2601)
Reduce RPC cost of oscar by removing unnecessary tasks (#2597)

Bug fixes

Fix index series.apply when result index unchanged (#2557)
Stop using asdict to handle dataclasses (#2561)
Fix tests under cudf 21.10 (#2608)
Fix DataFrame getitem when exists duplicate columns (#2581)
Upgrade required version of vineyard. (#2588)
Fix progress always is 0 or 100% (#2591)
Make Proxima work with latest Mars (#2599, thanks @yuyiming!)
Fix None dtype for some unary tensor functions (#2603)
Fix duplicate decref of subtask input chunk (#2611, thanks @Catch-Bull!)

Documentation

Add a document about how to implement a Mars operand (#2562)

v0.7.5

2 years ago

This is the release notes of v0.7.5. See here for the complete list of solved issues and merged PRs.

New Features

Tensor
- Add preliminary implementations for ufunc methods (#2513)
- Add partial support for setitem with fancy indexing (#2544)
DataFrame
- Implements md.get_dummies (#2534, thanks @hoarjour!)
Learn
- Add make_regression support for learn module (#2517)
- Implements mars.learn.preprocessor.LabelEncoder (#2545)
Services
- Add web API for scheduling (#2535)
Web
- Display tileable properties on web (#2539, thanks @RandomY-2!)
Others
- Add experimental support for CUDA under WSL for Windows 11 (#2543)

Enhancements

Reduce indentation of frontend code (#2541)

Bug fixes

Fix output of df.groupby(as_index=False).size() (#2508)
Fix reduction result on empty series (#2522)
Fix df.loc when df is empty (#2526)
[Ray] Fix serializing lambdas in web (#2529)
Fix df.loc when providing empty list (#2532)

Documentation

Add doc for reading csv in oss (#2530, thanks @Catch-Bull!)

v0.8.0rc1

2 years ago

This is the release notes of v0.8.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

Tensor
- Add preliminary implementations for ufunc methods (#2510)
- Add partial support for setitem with fancy indexing (#2453)
DataFrame
- Support md.get_dummies() (#2323, thanks @hoarjour!)
Learn
- Add make_regression support for learn module (#2515)
- Implements fit and predict methods for bagging (#2516)
- Implements mars.learn.ensemble.IsolationForest (#2531)
- Implements mars.learn.preprocessor.LabelEncoder (#2542)
Services
- Add web API for scheduling (#2533)
Web
- Display tileable properties on web (#2525, thanks @RandomY-2!)
Others
- Support mutable tensor on oscar (#2432, thanks @Coco58323!)
- Add experimental support for CUDA under WSL for Windows 11 (#2538)

Enhancements

Use black to enforce code style (#2492)
Reduce indentation of frontend code (#2540)

Bug fixes

Fix output of df.groupby(as_index=False).size() (#2507)
[Ray] Fix web serialize lambda (#2512)
Fix reduction result on empty series (#2520)
Fix DataFrame.loc when df is empty (#2524)
Fix df.loc when providing empty list (#2528)

Documentation

Add doc for reading csv in oss (#2514, thanks @Catch-Bull!)