Mars Project Mars Versions Save

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

v0.10.0

1 year ago

What's Changed

Full Changelog: https://github.com/mars-project/mars/compare/v0.10.0a1...v0.10.0

v0.9.0

1 year ago

This is the release notes of v0.9.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.9.0rc3; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1 alpha2 beta1 beta2 rc1 rc2 rc3

Changes that break compatibility

From v0.9 on, Python 3.6 is dropped support.

Highlights

  • Performance is fully optimized in this version, welcome to give your feedback.

New Features

  • Oscar
    • Stop importing main module when starting Mars local cluster (#3113)
  • Tensor
    • Integrate special error functions (#3062)
    • Integrate part of scipy elliptic functions and integrals (#3112)
  • DataFrame
    • Support sort=True for Groupby (#3063, thanks @sak2002!)

Enhancements

  • Dump remote tracebacks to make local ones more friendly (#3030)
  • Optimize import speed for Mars package (#3035)
  • [Ray] Implement ray task executor progress (#3065)
  • Shuffle both sides at the same time for md.merge (#3066)
  • Refine ThreadedServiceContext.get_chunks_meta usage (#3067)
  • Do not aggressively choose tree method in tile of groupby for distributed setting (#3070)
  • Disable bloom filter in merge for now (#3071)
  • [Ray] Implements get_chunks_result for Ray execution context (#3072)
  • Use tell when remove mapper data after execution (#3073)
  • Assign reducer ops in task assigner to make them more balanced across cluster (#3075)
  • [Ray] Destroy Ray executor when the task finish (#3074)
  • Combine tree and shuffle methods in DataFrameGroupBy.agg tile (#3077)
  • [Ray] Implements get_chunks_meta for Ray execution context (#3076)
  • Use OS-designated ports instead of random ports to create sub pools (#3087)
  • Call immutable web API only once when previous call blocks (#3088)
  • Unify DataFrameGroupByAgg's tile logic for auto method (#3094)
  • [Ray] Support basic subtask retry and lineage reconstruction (#3097)
  • Simplify argument passing in actor batch calls (#3100)
  • [Ray] Implements get_total_n_cpu for Ray execution context (#3104)
  • Optimize performance of transfer (#3105)
  • Add n_reducers and reducer_ordinal to shuffle operands (#3107)
  • [Ray] Implement cancel method on Ray task executor (#3093)
  • [Ray] Create RayTaskState actor as needed by default (#3114)
  • [Ray] Implement gc for ray task executor context (#3116)
  • Optimize serializable memory (#3126)

Bug fixes

  • Patch pandas to make pickle compatible between 1.2 and 1.3 (#3050)
  • Fix errors when deleting mapper data (#3064)
  • Fix chunk index error in auto_merge_chunks (#3068)
  • Fix recursive_tile that it may cause duplicated tile for one tileable (#3069)
  • [Ray] Fix ray worker failover (#3115)
  • [Ray] Fix pandas schema parsing when reading Ray dataset (#3117)
  • [Ray] fix auto scale-in hang (#3125)
  • [Metric] Fix prometheus metric backend (#3127)
  • Fix mt.{cumsum, cumprod} when the first chunk is empty (#3136)

Tests

  • Check initialization of serializables on CI (#3013)
  • [Ray] Optimize Ray CI execution time and stability (#3121)
  • Update pytest imports for test_special.py (#3131)
  • [Ray] Fix flaky test test_optional_supervisor_node (#3135)

Others

  • Build web code before CIBW when deploying to PyPI (#3016)

v0.10.0a1

1 year ago

This is the release notes of v0.10.0a1. See here for the complete list of solved issues and merged PRs.

New Features

  • Oscar
    • Stop importing main module when starting Mars local cluster (#3110)
  • Tensor
    • Integrate special error functions (#3060)
    • Integrate part of scipy elliptic functions and integrals (#3111)
  • DataFrame
    • Support sort=True for Groupby (#2959, thanks @sak2002!)

Enhancements

  • Disable bloom filter in merge for now (#2967)
  • [Ray] Implement ray task executor progress (#3008)
  • Dump remote tracebacks to make local ones more friendly (#3028)
  • Use tell when remove mapper data after execution (#3027)
  • Optimize import speed for Mars package (#3022)
  • Do not aggressively choose tree method in tile of groupby for distributed setting (#3032)
  • [Ray] Implements get_chunks_result for Ray execution context (#3023)
  • Refine ThreadedServiceContext.get_chunks_meta usage (#3037)
  • Shuffle both sides at the same time for md.merge (#3041)
  • Assign reducer ops in task assigner to make them more balanced across cluster (#3048)
  • [Ray] Destroy Ray executor when the task finish (#3049)
  • [Ray] Implements get_chunks_meta for Ray execution context (#3052)
  • [Ray] Support basic subtask retry and lineage reconstruction (#2969)
  • Combine tree and shuffle methods in DataFrameGroupBy.agg tile (#3051)
  • [Ray] Implements get_total_n_cpu for Ray execution context (#3059)
  • [Ray] Implement cancel method on Ray task executor (#3044)
  • Use OS-designated ports instead of random ports to create sub pools (#3053)
  • Unify DataFrameGroupByAgg's tile logic for auto method (#3084)
  • Simplify router clean up when pools or clusters ends (#3086)
  • Call immutable web API only once when previous call blocks (#3085)
  • [Ray] Create RayTaskState actor as needed by default (#3081)
  • [Ray] Implement gc for ray task executor context (#3061)
  • Simplify argument passing in actor batch calls (#3098)
  • Optimize performance of transfer (#3091)
  • Add n_reducers and reducer_ordinal to shuffle operands (#3055)
  • Optimize serializable memory (#3120)

Bug fixes

  • Fix errors when deleting mapper data (#3018)
  • Fix recursive_tile that it may cause duplicated tile for one tileable (#3021)
  • Fix error message when sparse data format not supported (#3046)
  • Patch pandas to make pickle compatible between 1.2 and 1.3 (#3047)
  • Fix chunk index error in auto_merge_chunks (#3057)
  • [Ray] Fix ray worker failover (#3080)
  • [Metric] Fix prometheus metric backend (#3124)
  • Fix mt.{cumsum, cumprod} when the first chunk is empty (#3134)

Tests

  • Check initialization of serializables on CI (#3007)
  • Use @pytest_asyncio.fixture instead of @pytest.fixture for async fixtures (#3025)
  • Change code owners to Mars PMC maintainers (#3031)
  • [Ray] Fix ray executor progress test (#3033)
  • [Ray] Optimize Ray CI execution time and stability (#3102)
  • Make test_session_set_progress more stable under Ray tests (#3103)
  • Update pytest imports for test_special.py (#3129)
  • [Ray] Fix flaky test test_optional_supervisor_node (#3133)

Others

  • Build web code before CIBW when deploying to PyPI (#3014)
  • Make PyPI user name configurable (#3130)

v0.8.7

1 year ago

This is the release notes of v0.8.7.

Bug fixes

  • Fixes missing web packages in Linux wheels (#3014)

v0.8.6

1 year ago

This is the release notes of v0.8.6. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implementing Ellipsoidal Harmonics Functions (#2927, thanks @shantam-8!)

Enhancements

  • Add support for dask.persist (#2990, thanks @loopyme!)
  • Optimize gen subtask graph (#3006)
  • Ignore broadcaster's locality when assign subtasks (#2994)

Bug fixes

  • Fix task hang when error object cannot be pickled (#2913)
  • Fix potential KeyError in actor_ref calls when running with multiple processes (#2962)
  • Wrap errors in operand execution to protect scheduling service (#2971)
  • Fix dtype of series result for DataFrame.apply (#2979)
  • Fix default config to ensure storage backends configured (#2989)
  • Fix potential empty chunks when creating DataFrame from pandas (#2991)
  • Fix incorrect result for df.sort_values when specifying multiple ascending (#3006)
  • Fix missing extra_params when constructing operands (#3006)

Tests

  • Fix version mismatch between kubernetes and minikube (#2988)

v0.9.0rc3

1 year ago

This is the release notes of v0.9.0rc3. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implementing Ellipsoidal Harmonics Functions (#2891, thanks @shantam-8!)
  • Services
    • Support worker meta service (#2909)
    • Basic Ray execution backend (#2921)

Enhancements

  • Add execution API to enable custimization of Mars Task Service (#2894)
  • Optimize serialization performance (#2914)
  • Skip adding band in meta when fetch shuffle data (#2922)
  • Store complete meta on worker and update supervisor meta via fetching from workers (#2912)
  • Use cython to accelerate core serialization (#2924)
  • Refine lifecycle api to support incref or decref with ref counts (#2926)
  • Ignore fetch operands when assign initial nodes (#2929)
  • Use cython to accelerate message serialization (#2932)
  • Ignore broadcaster's locality when assign subtasks (#2943)
  • Allow spawning serialization to threads for large objects (#2944)
  • Add metrics and event report for Ray channels (#2936)
  • Add more logs about execution info (#2940)
  • Add support for dask.persist (#2953, thanks @loopyme!)
  • Remove should_be_monotonic property (#2949)
  • Add metrics on operand and subtask executions (#2947, thanks @zhongchun!)
  • [Ray] optimize ray fetcher by query in remote node (#2957)
  • Improve deploy backend (#2958)
  • Support reporting tile progress (#2954)
  • Add logic key for tileable graph (#2961, thanks @zhongchun!)
  • [Ray] Loads the subtask inputs from meta (#2976)
  • New ExecutionConfig API (#2968)
  • Fix speculative execution compatibility with coloring (#2995)
  • Make functions that may take long run in thread for lifecycle tracker (#2992)
  • Optimize metric configs (#2996, thanks @zhongchun!)
  • Expand the ability of resource evaluator (#2997, thanks @zhongchun!)
  • Optimize gen subtask graph (#3004)
  • [Ray] Ray execution state (#3002)

Bug fixes

  • Fix paramter issue of worker actor pool (#2911, thanks @zhongchun!)
  • Fix default config to ensure storage backends configured (#2935)
  • Wrap errors in operand execution to protect scheduling service (#2964)
  • Fix dtype of series result for DataFrame.apply (#2978)
  • Fix potential data leak for shuffle tasks (#2975)
  • Fix potential empty chunks when creating DataFrame from pandas (#2987)
  • [Ray] Support new ray cluster through ray client (#2981)
  • Fix missing extra_params when constructing operands (#2999)
  • Fix msg_to_simple_str in Ray backend and add tests (#3003)
  • Fix incorrect result for df.sort_values when specifying multiple ascending (#2984)

Documentation

  • Add development documents for metrics (#2955, thanks @zhongchun!)

Tests

  • Add TPC-H benchmarks (#2937)
  • Fix Ray cases (#2983)
  • Fix version mismatch between kubernetes and minikube (#2986)
  • Allow selecting TPC queries (#3005)

v0.8.5

2 years ago

This is the release notes of v0.8.5. See here for the complete list of solved issues and merged PRs.

New Features

  • Web
    • Add stack display page on Mars Web (#2881)

Enhancements

  • Avoid printing too many messages in Oscar (#2880)
  • [Ray] Use main pool as owner when autoscale disabled (#2903)

Bug fixes

  • Fix XGBoost when some workers do not have evals data (#2863)
  • Raise ActorNotExist when no supervisors available (#2869)
  • Fix dtype infer in DataFrame arithmetic on datetime consts (#2880)
  • Fix duplicate node iteration in GraphAssigner (#2880)
  • Fix timeout for wait_task (#2890)
  • Make sure errors can be raised in Actor.__pre_destroy__ (#2892)

Tests

  • Upgrade azure-pipelines to Python 3.9 (#2886)
  • Adapt to official cancel of Github Actions (#2903)

v0.9.0rc2

2 years ago

This is the release notes of v0.9.0rc2. See here for the complete list of solved issues and merged PRs.

New Features

  • Web
    • Add stack display page on Mars Web (#2876)

Enhancements

  • Avoid printing too many messages in Oscar (#2871)
  • Expand slot scheduler to resource scheduler (#2846, thanks @zhongchun!)
  • Optimized iterative tiling by pruning unrelated chunks (#2874)
  • Optimize DataFrameIsin's tile (#2864)
  • Add benchmark for serialization (#2901)
  • [Ray] Ray client channel get recv when first complied (#2740, thanks @Catch-Bull!)
  • Use bloom filter to optimize df.merge execution (#2895)
  • Stop recording all mapper meta (#2900)
  • [Ray] Use main pool as owner when autoscale disabled (#2878)

Bug fixes

  • Fix XGBoost when some workers do not have evals data (#2861)
  • Fix duplicate node iteration in GraphAssigner (#2857)
  • Raise ActorNotExist when no supervisors available (#2859)
  • Fix dtype infer in DataFrame arithmetic on datetime consts (#2879)
  • Fix timeout for wait_task (#2883)
  • Make sure error can be raised in Actor.__pre_destroy__ (#2887)

Tests

  • Upgrade azure-pipelines to Python 3.9 (#2862)
  • Adapt to official cancel of Github Actions (#2902)

v0.9.0rc1

2 years ago

This is the release notes of v0.9.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implements mars.tensor.setdiff1d (#2823)
  • Learn
    • Added support for mars.learn.metrics.roc_auc_score (#2832)
  • Services
    • A speculative execution based task scheduler (#2576)
  • Metric
    • [ray] Add metric for ray object store (#2776, thanks @Catch-Bull!)
  • Others
    • Use versioneer to manage release versions (#2806)

Enhancements

  • Support generating a DOT file for subtask graph (#2803)
  • Support generating dtypes, index_value etc lazily for DataFrame chunks (#2756)
  • [ray] Default enable fault tolerance for ray (#2801)
  • Improve subtask details in logs (#2836)
  • Accurate resource management for global slot manager (#2732)
  • Configure nthread of XGBoost jobs (#2844)
  • Improved performance of mars.learn.metrics.{roc_curve, roc_auc_score} (#2838)
  • Bump minimist and nanoid in Mars UI due to security alerts (#2849)
  • Fix store duplicate chunk and meta per subtask (#2845)

Bug fixes

  • Fix default value of gpu property for some operands (#2811)
  • Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2817)
  • Fix race condition of set_subtask_result (#2784)
  • Fix duplicate subtask submit (#2815)
  • Change StorageHandlerActor to stateful (#2824)
  • Fix running xgboost on Ray cluster (#2826)
  • Fix FileSystem.ls for OSS (#2837)
  • Stop fetching data when pure dependencies specified (#2840)
  • Fix dirty version number caused by versioneer when building with cibuildwheel (#2855)

Tests

  • [Ray] Refine ray tests (#2793)
  • Build docker images cronically (#2804)
  • Introduce asv benchmark (#2798)

v0.8.4

2 years ago

This is the release notes of v0.8.4. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implements mars.tensor.setdiff1d (#2829)
  • Learn
    • Added support for mars.learn.metrics.roc_auc_score (#2841)
  • Others
    • Use versioneer to manage release versions (#2807)
    • Use cibuildwheel to release wheels (#2854)

Enhancements

  • Support generating a DOT file for subtask graph (#2818)
  • Enhance subtask details in logs (#2842)
  • Configure cores of XGBoost jobs (#2847)
  • Improved performance of mars.learn.metrics.{roc_curve, roc_auc_score} (#2850)
  • Fix store duplicate chunk and meta per subtask (#2851)
  • Bump minimist and nanoid in Mars UI due to security alerts (#2851)

Bug fixes

  • Fix race condition of set_subtask_result (#2819)
  • Fix duplicate subtask submit (#2819)
  • Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2819)
  • Fix default value of gpu property for some operands (#2820)
  • Fix running xgboost on Ray cluster (#2830)
  • Change StorageHandlerActor to stateful (#2830)
  • Fix FileSystem.ls for OSS (#2842)
  • Stop fetching data when pure dependencies specified (#2843)

Tests

  • [Ray] Refine ray tests (#2810)
  • Build docker images cronically (#2807)