Mars Project Mars Versions Save

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

v0.8.3

2 years ago

This is the release notes of v0.8.3. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Stop inferring outputs when args provided (#2761)
  • Remove deprecate warnings when import mars.tensor (#2790)
  • [Ray] New ray actor creation model (#2794)

Bug fixes

  • Fix long exception of asyncio.gather (#2753)
  • Fix wrong result of df.merge (#2777)
  • Fix DataFrame initializer when Mars object exists in list (#2778)
  • Fix duplicate dec object ref (#2789, thanks @Catch-Bull!)
  • [Ray] Support Ray client mode (#2796)

Tests

  • Increase test stability for command-line tests (#2786)

v0.9.0b2

2 years ago

This is the release notes of v0.9.0b2. See here for the complete list of solved issues and merged PRs.

New Features

  • Metric
    • Add metric framework (#2742, thanks @zhongchun!)
    • Add prometheus metric implementation (#2752, thanks @zhongchun!)
    • Add ray metrics implementation (#2749, thanks @zhongchun!)
    • Add common metrics (#2760, thanks @zhongchun!)

Enhancements

  • Simplify rechunk implementation (#2745)
  • Stop inferring outputs when args provided (#2759)
  • Add broadcast merge support for DataFrame (#2772)
  • Remove deprecate warnings when import mars.tensor (#2788)
  • Optimize in-process actor calls (#2763)
  • [ray] New ray actor creation model (#2783)

Bug fixes

  • Fix duplicate dec object ref (#2741, thanks @Catch-Bull!)
  • Fix long exception of asyncio.gather (#2748)
  • Fix NameError: name 'pq' is not defined if pyarrow is not installed (#2751)
  • Fix profiling band_subtasks and most_calls are empty if the slow duration is large (#2755)
  • Fix the wrong result of df.merge (#2774)
  • Fix DataFrame initializer when Mars object exists in list (#2770)
  • [ray] support ray client mode (#2773)

Tests

  • Increase test stability for command-line tests (#2779)

v0.8.2

2 years ago

This is the release notes of v0.8.2. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Support inclusive argument for pd.date_range (#2721)

Enhancements

  • Optimize eval-setitem expressions as single eval expressions (#2699)
  • [Ray] Refine raydataset integration (#2712)
  • [Ray] refine ray dataset integration (#2726)
  • Add support for reading partitioned parquet for fastparquet (#2729)
  • Fix duplicate exceptions in log (#2736)

Bug fixes

  • Fix sort_values for empty DataFrame or Series (#2686)
  • Eliminate redundant eval node in optimization (#2688)
  • Avoid iterative tiling for df.loc[:, fields] (#2689)
  • Fix use_arrow_dtype parameter for read_parquet (#2702)
  • Fix error on dependent DataFrame setitems (#2703)
  • Fix estimate_pandas_size on pd.MultiIndex (#2710)
  • Import vineyard.data.pickle to make members available (#2716)
  • Fix shuffle when ndim of input tensors are different (#2728)

v0.9.0b1

2 years ago

This is the release notes of v0.9.0b1. See here for the complete list of solved issues and merged PRs.

Highlights

  • A new coloring-based fusion algorithm is introduced in #2719, performance is expected to have a significant increase compared to previous releases, however, some unexpected situations may happen, feel free to reach out to us if you find any.

New Features

  • DataFrame
    • Support inclusive argument for pd.date_range (#2718)
  • Others
    • Add cibuildwheel with Linux AArch64 wheel build support (#2672, thanks @odidev!)

Enhancements

  • Refine failure recovery log and exception (#2633)
  • Optimize eval-setitem expressions as single eval expressions (#2695)
  • Auto merge small chunks when df.groupby().apply(func) is doing aggregation (#2708)
  • Optimize GroupBy's aggregation algorithm (#2696)
  • [Ray] refine ray dataset integration (#2705)
  • Improve profiling (#2629)
  • Add support for reading partitioned parquet for fastparquet (#2724)
  • Introduce coloring based fusion algorithm (#2719)
  • Fix duplicate exceptions in log (#2723)

Bug fixes

  • Fix sort_values for empty DataFrame or Series (#2681)
  • Eliminate redundant eval node in optimization (#2683)
  • Avoid iterative tiling for df.loc[:, fields] (#2685)
  • [hotfix][ray] fix ray dataset compatibility (#2693)
  • Fix use_arrow_dtype parameter for read_parquet (#2698)
  • Fix error on dependent DataFrame setitems (#2701)
  • Fix estimate_pandas_size for pd.MultiIndex (#2707)
  • Import vineyard.data.pickle to make members available. (#2714)
  • Fix shuffle when ndim of input tensors are different (#2727)

Documentation

  • Add Slack invite link (#2704, thanks @yuyiming!)

v0.8.1

2 years ago

This is the release notes of v0.8.1. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Add support for GroupBy.{ffill, bfill,fillna} (#2657, thanks @Marascax!)
    • Add nunique support for DataFrameGroupBy (#2667)

Enhancements

  • Add support for HTTP request rewriter (#2665)
  • Add merging small files support for md.{read_parquet, read_csv} (#2669)
  • Optimize filtering DataFrame with its fields (#2668)

Bug fixes

  • Allow specifying multiple supervisor processes (#2625)
  • Fix backward compatibility for pandas 1.0 (#2630)
  • Fix NotImplementedError for mo.batch when single call not implemented (#2637)
  • Fix compatibility for pandas 1.4 (#2652)
  • Fix IndexError raise by aggregation of DataFrameGroupBy (#2653)
  • Fix df.loc[:] to make sure same index_value key generated (#2654)
  • Fix aggregation with comparison (#2655)
  • Fix the wrong index_value generated by df.loc[:] (#2666)
  • Fix as_index when calling groupby-agg (#2678)

v0.9.0a2

2 years ago

This is the release notes of v0.9.0a2. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Add support for GroupBy.{ffill, bfill,fillna} (#2639, thanks @Marascax!)
    • Add nunique support for DataFrameGroupBy (#2662)
  • Others
    • Add wheel support for Python 3.10 and drop Python 3.6 (#2622)

Enhancements

  • Added merging small files support for md.{read_parquet, read_csv} (#2661)
  • Add support for HTTP request rewriter (#2664)
  • Optimize filtering DataFrame with its fields (#2571)
  • Add pyproject.toml to config build packages (#2674)

Bug fixes

  • Fix backward compatibility for pandas 1.1 and 1.2 (#2624)
  • Fix backward compatibility for pandas 1.0 (#2628)
  • Fix NotImplementedError for mo.batch when single call not implemented (#2635)
  • Fix IndexError raise by aggregation of DataFrameGroupBy (#2641)
  • Fix compatibility for pandas 1.4 (#2650)
  • Fix df.loc[:] to make sure same index_value key generated (#2643)
  • Fix aggregation with comparison (#2647)
  • Fix the wrong index_value generated by df.loc[:] (#2658)
  • Fix optimizing DataFrame query with timestamp in conditions (#2671)
  • Fix as_index when calling agg on SeriesGroupBy (#2676)

v0.8.0

2 years ago

This is the release notes of v0.8.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.8.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1 alpha2 alpha3 beta1 beta2 rc1

New Features

  • Tensor
    • Implements mt.bincount (#2552)
  • DataFrame
    • Support Series.median (#2570, thanks @perfumescent!)
  • Learn
    • Add mars.learn.metrics.multilabel_confusion_matrix and derivative metrics (#2568)

Enhancements

  • Implement web API of get_infos (#2564)
  • Reduce time cost of cpu_percent() calls (#2572)
  • Stop calling user funcs when dtypes is specified (#2596)
  • Supports adding Mars extensions via setup entrypoints (#2598)
  • [Ray] Refine mars on ray usability (#2606)
  • Reduce estimation time cost (#2607)
  • Skip details of shuffled chunks in meta (#2609)
  • Reduce the time cost of fetching tileable data (#2616)
  • Reduce RPC cost of oscar by removing unnecessary tasks (#2613)
  • Use batched request to apply for slots (#2615)

Bug fixes

  • Fix index series.apply when result index unchanged (#2563)
  • Fix DataFrame getitem when exists duplicate columns (#2582)
  • Upgrade required version of vineyard (#2593)
  • Fix progress always is 0 or 100% (#2595)
  • Fix None dtype for some unary tensor functions (#2604)
  • Make Proxima work with latest Mars (#2605, thanks @yuyiming!)
  • Fix tests for cudf 21.10 (#2608)
  • Fix duplicate decref of subtask input chunk (#2614, thanks @Catch-Bull!)

v0.9.0a1

2 years ago

This is the release notes of v0.9.0a1. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implements mt.bincount (#2548)
  • DataFrame
    • Support Series.median() (#2566, thanks @perfumescent!)
  • Learn
    • Add mars.learn.metrics.multilabel_confusion_matrix and derivative metrics (#2554)
  • Services
    • Add basic profiling support for supervisor (#2586)

Enhancements

  • Add app_queue in new_cluster (#2550, thanks @xxxxsk!)
  • Implement web API of get_infos (#2558)
  • Reduce time cost of cpu_percent() calls (#2567)
  • Reduce estimation time cost (#2577)
  • [ray] refine mars on ray usability (#2580)
  • [ray] Refine raydataset integration (#2579)
  • Optimize tileable graph construction (#2583)
  • Stop calling user funcs when dtypes is specified (#2587)
  • Supports adding Mars extensions via setup entrypoints (#2589)
  • Skip details of shuffled chunks in meta (#2600)
  • Reduce the time cost of fetching tileable data (#2594)
  • Use batched request to apply for slots (#2601)
  • Reduce RPC cost of oscar by removing unnecessary tasks (#2597)

Bug fixes

  • Fix index series.apply when result index unchanged (#2557)
  • Stop using asdict to handle dataclasses (#2561)
  • Fix tests under cudf 21.10 (#2608)
  • Fix DataFrame getitem when exists duplicate columns (#2581)
  • Upgrade required version of vineyard. (#2588)
  • Fix progress always is 0 or 100% (#2591)
  • Make Proxima work with latest Mars (#2599, thanks @yuyiming!)
  • Fix None dtype for some unary tensor functions (#2603)
  • Fix duplicate decref of subtask input chunk (#2611, thanks @Catch-Bull!)

Documentation

  • Add a document about how to implement a Mars operand (#2562)

v0.7.5

2 years ago

This is the release notes of v0.7.5. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Add preliminary implementations for ufunc methods (#2513)
    • Add partial support for setitem with fancy indexing (#2544)
  • DataFrame
    • Implements md.get_dummies (#2534, thanks @hoarjour!)
  • Learn
    • Add make_regression support for learn module (#2517)
    • Implements mars.learn.preprocessor.LabelEncoder (#2545)
  • Services
    • Add web API for scheduling (#2535)
  • Web
    • Display tileable properties on web (#2539, thanks @RandomY-2!)
  • Others
    • Add experimental support for CUDA under WSL for Windows 11 (#2543)

Enhancements

  • Reduce indentation of frontend code (#2541)

Bug fixes

  • Fix output of df.groupby(as_index=False).size() (#2508)
  • Fix reduction result on empty series (#2522)
  • Fix df.loc when df is empty (#2526)
  • [Ray] Fix serializing lambdas in web (#2529)
  • Fix df.loc when providing empty list (#2532)

Documentation

  • Add doc for reading csv in oss (#2530, thanks @Catch-Bull!)

v0.8.0rc1

2 years ago

This is the release notes of v0.8.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Add preliminary implementations for ufunc methods (#2510)
    • Add partial support for setitem with fancy indexing (#2453)
  • DataFrame
    • Support md.get_dummies() (#2323, thanks @hoarjour!)
  • Learn
    • Add make_regression support for learn module (#2515)
    • Implements fit and predict methods for bagging (#2516)
    • Implements mars.learn.ensemble.IsolationForest (#2531)
    • Implements mars.learn.preprocessor.LabelEncoder (#2542)
  • Services
    • Add web API for scheduling (#2533)
  • Web
    • Display tileable properties on web (#2525, thanks @RandomY-2!)
  • Others
    • Support mutable tensor on oscar (#2432, thanks @Coco58323!)
    • Add experimental support for CUDA under WSL for Windows 11 (#2538)

Enhancements

  • Use black to enforce code style (#2492)
  • Reduce indentation of frontend code (#2540)

Bug fixes

  • Fix output of df.groupby(as_index=False).size() (#2507)
  • [Ray] Fix web serialize lambda (#2512)
  • Fix reduction result on empty series (#2520)
  • Fix DataFrame.loc when df is empty (#2524)
  • Fix df.loc when providing empty list (#2528)

Documentation

  • Add doc for reading csv in oss (#2514, thanks @Catch-Bull!)