Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Ray 2.10 release brings important stability improvements and enhancements to Ray Data, with Ray Data becoming generally available (GA).
num_replicas=”auto”
(#42613).max_queued_requests
(#42950).max_ongoing_requests (max_concurrent_queries)
is also now strictly enforced (#42947).RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE=0
.max_concurrent_queries
-> max_ongoing_requests
target_num_ongoing_requests_per_replica
-> target_ongoing_requests
downscale_smoothing_factor
-> downscaling_factor
upscale_smoothing_factor
-> upscaling_factor
max_ongoing_requests
will change from 100 to 5.target_ongoing_requests
will change from 1 to 2.ScalingConfig(accelerator_type)
.XGBoostTrainer
and LightGBMTrainer
to no longer depend on xgboost_ray
and lightgbm_ray
. A new, more flexible API will be released in a future release.local_dir
and RAY_AIR_LOCAL_CACHE_DIR
.🎉 New Features:
num_rows_per_file
parameter to file-based writes (#42694)DataIterator.materialize
(#43210)DataIterator.to_tf
if tf.TypeSpec
is provided (#42917)Dataset.write_bigquery
(#42584)💫 Enhancements:
ImageDatasource
to use Image.BILINEAR
as the default image resampling filter (#43484)ray.data.from_huggingface
(#42599)Stage
class and related usages (#42685)🔨 Fixes:
OutputSplitter
(#43740)OpBufferQueue
(#43015)Limit
operators. (#42958)Dataset.streaming_split
for job hanging (#42601)📖 Documentation:
🎉 New Features:
ScalingConfig(accelerator_type)
for improved worker scheduling (#43090)💫 Enhancements:
train_func
for setup/teardown logic (#43209)DEFAULT_NCCL_SOCKET_IFNAME
to simplify network configuration (#42808)🔨 Fixes:
memory
resource requirements (#42999)Path.as_posix
over os.path.join
(#42037)RayFSDPStrategy
(#43594)RayTrainReportCallback
(#42751)get_latest_checkpoint
returns None (#42953)📖 Documentation:
train_loop_config
(#43691)ray.train.report
docstring that it is not a barrier (#42422)prepare_data_loader
shuffle behavior and set_epoch
(#41807)🏗 Architecture refactoring:
XGBoostTrainer
and LightGBMTrainer
as DataParallelTrainer
. Removed dependency on xgboost_ray
and lightgbm_ray
. (#42111, #42767, #43244, #43424)local_dir
and RAY_AIR_LOCAL_CACHE_DIR
. Add isolation between driver and distributed worker artifacts so that large files written by workers are not uploaded implicitly. Results are now only written to storage_path
, rather than having another copy in the user’s home directory (~/ray_results
). (#43369, #43403, #43689)ray.train.torch.get_device
into another get_devices
API for multi-GPU worker setup (#42314)storage_path
(#42853, #43179)SyncConfig
(#42909)preprocessor
argument from Trainers (#43146, #43234)MosaicTrainer
and remove SklearnTrainer
(#42814)💫 Enhancements:
TBXLogger
for logging images (#37822)Experiment(config)
to handle RLlib AlgorithmConfig
(#42816, #42116)🔨 Fixes:
reuse_actors
error on actor cleanup for function trainables (#42951)os.path.join
(#42037)📖 Documentation:
🏗 Architecture refactoring:
local_dir
and RAY_AIR_LOCAL_CACHE_DIR
. Add isolation between driver and distributed worker artifacts so that large files written by workers are not uploaded implicitly. Results are now only written to storage_path
, rather than having another copy in the user’s home directory (~/ray_results
). (#43369, #43403, #43689)SyncConfig
and chdir_to_trial_dir
(#42909)storage_path
(#42853, #43179)NevergradSearch
(#42305)checkpoint_dir
and reporter
deprecation notices (#42698)🎉 New Features:
max_queued_requests
(#42950).num_replicas=”auto”
(#42613).🏗 API Changes:
max_concurrent_queries
to max_ongoing_requests
target_num_ongoing_requests_per_replica
to target_ongoing_requests
downscale_smoothing_factor
to downscaling_factor
upscale_smoothing_factor
to upscaling_factor
max_ongoing_requests
will change from 100 to 5.target_ongoing_requests
will change from 1 to 2.💫 Enhancements:
RAY_SERVE_LOG_ENCODING
env to set the global logging behavior for Serve (#42781).max_ongoing_requests
(max_concurrent_queries
) is also now strictly enforced (#42947).RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE=0
.max_ongoing_requests=1
for autoscaling deployments and still upscale properly, because requests queued at handles are properly taken into account for autoscaling.RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=0
RAY_SERVE_EAGERLY_START_REPLACEMENT_REPLICAS=0
🔨 Fixes:
KeyError
on disconnects (#43713).📖 Documentation:
🎉 New Features:
💫 Enhancements:
🔨 Fixes:
policy_to_train
logic (#41529), fix multi-APU for PPO on the new API stack. (#44001), Issue 40347: (#42090)📖 Documentation:
🎉 New Features:
💫 Enhancements:
get_task()
now accepts ObjectRef (#43507)🔨 Fixes:
📖 Documentation:
💫 Enhancements:
heap_memory
param for setup_ray_cluster
API, and change default value of per ray worker node config, and change default value of ray head node config for global Ray cluster (#42604)🔨 Fixes:
Many thanks to all those who contributed to this release!
@ronyw7, @xsqian, @justinvyu, @matthewdeng, @sven1977, @thomasdesr, @veryhannibal, @klebster2, @can-anyscale, @simran-2797, @stephanie-wang, @simonsays1980, @kouroshHakha, @Zandew, @akshay-anyscale, @matschaffer-roblox, @WeichenXu123, @matthew29tang, @vitsai, @Hank0626, @anmyachev, @kira-lin, @ericl, @zcin, @sihanwang41, @peytondmurray, @raulchen, @aslonnie, @ruisearch42, @vszal, @pcmoritz, @rickyyx, @chrislevn, @brycehuang30, @alexeykudinkin, @vonsago, @shrekris-anyscale, @andrewsykim, @c21, @mattip, @hongchaodeng, @dabauxi, @fishbone, @scottjlee, @justina777, @surenyufuz, @robertnishihara, @nikitavemuri, @Yard1, @huchen2021, @shomilj, @architkulkarni, @liuxsh9, @Jocn2020, @liuyang-my, @rkooo567, @alanwguo, @KPostOffice, @woshiyyya, @n30111, @edoakes, @y-abe, @martinbomio, @jiwq, @arunppsg, @ArturNiederfahrenhorst, @kevin85421, @khluu, @JingChen23, @masariello, @angelinalg, @jjyao, @omatthew98, @jonathan-anyscale, @sjoshi6, @gaborgsomogyi, @rynewang, @ratnopamc, @chris-ray-zhang, @ijrsvt, @scottsun94, @raychen911, @franklsf95, @GeneDer, @madhuri-rai07, @scv119, @bveeramani, @anyscalesam, @zen-xu, @npuichigo
This patch release contains fixes for Ray Core, Ray Data, and Ray Serve.
🔨 Fixes:
🔨 Fixes:
schema
call in to_tf
if tf.TypeSpec
is provided (#42917)🔨 Fixes:
Many thanks to all those who contributed to this release!
@rynewang, @GeneDer, @alexeykudinkin, @edoakes, @c21, @rkooo567
This patch release contains fixes for Ray Core, Ray Data, and Ray Serve.
🔨 Fixes:
🔨 Fixes:
ParquetDatasource._estimate_files_encoding_ratio()
(https://github.com/ray-project/ray/pull/42759) (https://github.com/ray-project/ray/pull/42774)🔨 Fixes:
Many thanks to all those who contributed to this release!
@c21, @raulchen, @can-anyscale, @edoakes, @peytondmurray, @scottjlee, @aslonnie, @architkulkarni, @GeneDer, @Zandew, @sihanwang41
This patch release contains fixes for Ray Core, Ray Data, and Ray Serve.
🔨 Fixes:
🔨 Fixes:
🔨 Fixes:
🎉 New Features:
concurrency
argument to replace ComputeStrategy
in map-like APIs (#41461)map_groups
(#40778)💫 Enhancements:
OpState.outqueue_num_blocks
(#41748)StreamingOutputsBackpressurePolicy
(#41637)ConcurrencyCapBackpressurePolicy._cap_multiplier
to be set to 1.0 (#41222)StatsManager
to manage _StatsActor
remote calls (#40913)max_retry_cnt
parameter for BigQuery
Write (#41163)🔨 Fixes:
Dataset.context
not being sealed after creation (#41569)DataContext
is not propagated when using streaming_split
(#41473)BigQueryDatasource
fault tolerance bugs (#40986)📖 Documentation:
ray.data.read_databricks_tables
doc (#41366)read_json
docs example for setting PyArrow block size when reading large files (#40533)AllToAllAPI
to dataset methods (#40842)🎉 New Features:
Result
from cloud storage (#40622)💫 Enhancements:
Result.from_path
(#40684)ReportCheckpointCallback
to delete temporary directory (#41033)🔨 Fixes:
RayTrainReportCallback
to ensure synchronous reporting. (#40875)Result
s properly from moved storage path (#40647)📖 Documentation:
🏗 Architecture refactoring:
🎉 New Features:
Result
from cloud storage (#40622)💫 Enhancements:
🔨 Fixes:
Result
s properly from moved storage path (#40647)📖 Documentation:
MLflowLoggerCallback
and setup_mlflow
(#37854)🏗 Architecture refactoring:
TuneClient
/TuneServer
APIs (#41469)Searcher
s (#41414)air.remote_storage
, etc.) (#40207)🎉 New Features:
💫 Enhancements:
__del__
in the deployment to execute custom clean up steps.🔨 Fixes:
🎉 New Features:
MultiAgentEpisode
class introduced. Basis for upcoming multi-agent EnvRunner, which will replace RolloutWorker APIs. (#40263, #40799)SingleAgentEnvRunner
(w/o Policy/RolloutWorker APIs). CI learning tests added. (#39732, #41074, #41075)on_workers_recreated
callback to Algorithm, which is triggered after workers have failed and been restarted. (#40354)💫 Enhancements:
rllib_contrib
cleanups: #40939, #40744, #40789, #40444, #37271🔨 Fixes:
AlgorithmConfig.rl_module_spec
was NOT a “@property” yet) breaks when trying to load from this checkpoint. (#41157)📖 Documentation:
🎉 New Features:
ObjectRefGenerator
-> “DynamicObjectRefGenerator”💫 Enhancements:
__ray_call__
default actor method (#41534)🔨 Fixes:
💫 Enhancements:
🔨 Fixes:
run_init
for TPU command runner📖Documentation:
💫 Enhancements:
🎉 New Features:
Many thanks to all those who contributed to this release!
@justinvyu, @zcin, @avnishn, @jonathan-anyscale, @shrekris-anyscale, @LeonLuttenberger, @c21, @JingChen23, @liuyang-my, @ahmed-mahran, @huchen2021, @raulchen, @scottjlee, @jiwq, @z4y1b2, @jjyao, @JoshTanke, @marxav, @ArturNiederfahrenhorst, @SongGuyang, @jerome-habana, @rickyyx, @rynewang, @batuhanfaik, @can-anyscale, @allenwang28, @wingkitlee0, @angelinalg, @peytondmurray, @rueian, @KamenShah, @stephanie-wang, @bryanjuho, @sihanwang41, @ericl, @sofianhnaide, @RaffaGonzo, @xychu, @simonsays1980, @pcmoritz, @aslonnie, @WeichenXu123, @architkulkarni, @matthew29tang, @larrylian, @iycheng, @hongchaodeng, @rudeigerc, @rkooo567, @robertnishihara, @alanwguo, @emmyscode, @kevin85421, @alexeykudinkin, @michaelhly, @ijrsvt, @ArkAung, @mattip, @harborn, @sven1977, @liuxsh9, @woshiyyya, @hahahannes, @GeneDer, @vitsai, @Zandew, @evalaiyc98, @edoakes, @matthewdeng, @bveeramani
The Ray 2.8.1 patch release contains fixes for the Ray Dashboard.
Additional context can be found here: https://www.anyscale.com/blog/update-on-ray-cves-cve-2023-6019-cve-2023-6020-cve-2023-6021-cve-2023-48022-cve-2023-48023
🔨 Fixes:
[core][state][log] Cherry pick changes to prevent state API from reading files outside the Ray log directory (#41520) [Dashboard] Migrate Logs page to use state api. (#41474) (#41522)
This release features stability improvements and API clean-ups across the Ray libraries.
rllib_contrib
(still available within RLlib for Ray 2.8).🎉 New Features:
Dataset.map
and Dataset.flat_map
(#40010)💫Enhancements:
DatasetPipeline
(#40129)BulkExecutor
code path (#40200)Dataset
parameters and methods (#40385)_StatsActor
(#40118)Dataset.unique()
(#40016)sample_boundaries
in SortTaskSpec
(#39581)🔨 Fixes:
_StatsActor
errors with PandasBlock
(#40481)do_write
(#40422)get_object_locations
for metrics (#39884).pieces
with updated .fragments
(#39523)Preprocessor
that have been fit in older versions (#39173)convert_udf_returns_to_numpy
(#39188)RefBundles
(#39016)📖Documentation:
🎉 New Features:
💫Enhancements:
pytorch_lightning
and lightning
(#39841, #40266)DataContext
to RayTrainWorkers
(#40116)🔨 Fixes:
📖Documentation:
🏗 Architecture refactoring:
LightningTrainer
, AccelerateTrainer
, `TransformersTrainer (#40163)DatasetConfig
(#39963)DatasetPipeline
(#40159)💫Enhancements:
Tuner.restore()
is called on an instance (#39676)
🏗 Architecture refactoring:💫Enhancements:
8265
.52365
) is deprecated. The support will be removed in a future version.InputNode
and DAGDriver
Deployment.deploy()
, Deployment.delete()
, Deployment.get_handle()
serve.get_deployment
and serve.list_deployments
🔨 Fixes:
dedicated_cpu
and detached
options in serve.start()
have been fully disallowed.grpc_options
on serve.start()
was only allowing a gRPCOptions
object in Ray 2.7.0. Dictionaries are now allowed to be used asgrpc_options
in the serve.start()
call.💫Enhancements:
rllib_contrib
algorithms (A2C, A3C, AlphaStar #36584, AlphaZero #36736, ApexDDPG #36596, ApexDQN #36591, ARS #36607, Bandits #36612, CRR #36616, DDPG, DDPPO #36620, Dreamer(V1), DT #36623, ES #36625, LeelaChessZero #36627, MA-DDPG #36628, MAML, MB-MPO #36662, PG #36666, QMix #36682, R2D2, SimpleQ #36688, SlateQ #36710, and TD3 #36726) all produce warnings now if used. See here for more information on the rllib_contrib
efforts. (36620, 36628, 3🔨 Fixes:
🎉 New Features:
ray start --runtime-env-agent-port
is officially supported. (#39919)🔨 Fixes:
📖Documentation:
💫Enhancements:
📖Documentation:
🔨 Fixes:
🎉 New Features:
Many thanks to all who contributed to this release!
@scottjlee, @chappidim, @alexeykudinkin, @ArturNiederfahrenhorst, @stephanie-wang, @chaowanggg, @peytondmurray, @maxpumperla, @arvind-chandra, @iycheng, @JalinWang, @matthewdeng, @wfangchi, @z4y1b2, @alanwguo, @Zandew, @kouroshHakha, @justinvyu, @yuanchen8911, @vitsai, @hongchaodeng, @allenwang28, @caozy623, @ijrsvt, @omus, @larrylian, @can-anyscale, @joncarter1, @ericl, @lejara, @jjyao, @Ox0400, @architkulkarni, @edoakes, @raulchen, @bveeramani, @sihanwang41, @WeichenXu123, @zcin, @Codle, @dimakis, @simonsays1980, @cadedaniel, @angelinalg, @luv003, @JingChen23, @xwjiang2010, @rynewang, @Yicheng-Lu-llll, @scrivy, @michaelhly, @shrekris-anyscale, @xxnwj, @avnishn, @woshiyyya, @aslonnie, @amogkam, @krfricke, @pcmoritz, @liuyang-my, @jonathan-anyscale, @rickyyx, @scottsun94, @richardliaw, @rkooo567, @stefanbschneider, @kevin85421, @c21, @sven1977, @GeneDer, @matthew29tang, @RocketRider, @LaynePeng, @samhallam-reverb, @scv119, @huchen2021
application
tag to the ray_serve_num_http_error_requests
metricError QPS per Application
panel in the Ray DashboardTrial.node_ip
property (#40028)ray start
would occasionally fail with ValueError:
acceleratorType should match v(generation)-(cores/chips).
🔨 Fixes:
Error QPS per Application
panel in the Ray Dashboard🔨 Fixes:
🔨 Fixes:
Thanks
Many thanks to all those who contributed to this release!
@chaowanggg, @allenwang28, @shrekris-anyscale, @GeneDer, @justinvyu, @can-anyscale, @edoakes, @architkulkarni, @rkooo567, @rynewang, @rickyyx, @sven1977
Ray 2.7 release brings important stability improvements and enhancements to Ray libraries, with Ray Train and Ray Serve becoming generally available. Ray 2.7 is accompanied with a GA release of KubeRay.
DeploymentHandle
API to unify various existing Handle APIs, a high performant gRPC proxy to serve gRPC requests through Ray Serve, along with various stability and usability improvements.Take a look at our refreshed documentation and the Ray 2.7 migration guide and let us know your feedback!
🏗 Architecture refactoring:
🎉 New Features:
Read
and Map
operator (zero-copy fusion) (#38789)Dataset.write_images
to write images (#38228)Dataset.write_sql()
to write SQL databases (#38544)Dataset.map()
and flat_map()
(#38606)💫Enhancements:
FileBasedDataSource
(#39493)ArrowBlock
building time for blocks of size 1 (#38988)partition_filter
parameter to read_parquet
(#38479)Dataset.take()
and related methods (#38677)reader.get_read_tasks
until execution (#38373)iter_batches
an Iterable (#37881)Dataset.to_pandas()
(#37420)Dataset.to_dask()
parameter to toggle consistent metadata check (#37163)Datasource.on_write_start
(#38298)DatasetDict
as input into from_huggingface()
(#37555)🔨 Fixes:
Preprocessor
that have been fit in older versions (#39488)RefBundles
(#39085)local_uri
to all non-Parquet data sources (#38719)ctx
parameter to Datasource.write
(#38688)map_batches
over empty blocks (#38161)ActorPool
map_batches
(#38110)tif
file extension to ImageDatasource
(#38129)_block_udf
from FileBasedDatasource
reads (#38111)📖Documentation:
🤝 API Changes
train.Checkpoint
class that unifies interaction with remote storage such as S3, GS, and HDFS. The changes follow the proposal in [REP35] Consolidated persistence API for Ray Train/Tune (#38452, #38481, #38581, #38626, #38864, #38844)preprocessor
arg to Trainer
(#38640)Result.log_dir
(#38794)💫Enhancements:
🔨 Fixes:
🏗 Architecture refactoring:
📖Documentation:
🤝 API Changes
train.Checkpoint
class that unifies interaction with remote storage such as S3, GS, and HDFS. The changes follow the proposal in [REP35] Consolidated persistence API for Ray Train/Tune (#38452, #38481, #38581, #38626, #38864, #38844)Result.log_dir
(#38794)💫Enhancements:
🔨 Fixes:
🏗 Architecture refactoring:
🎉 New Features:
DeploymentHandle
API that will replace the existing RayServeHandle
and RayServeSyncHandle
APIs in a future release. You are encouraged to migrate to the new API to avoid breakages in the future. To opt in, either use handle.options(use_new_handle_api=True)
or set the global environment variable export RAY_SERVE_ENABLE_NEW_HANDLE_API=1
. See https://docs.ray.io/en/latest/serve/model_composition.html for more details.get_app_handle
that gets a handle used to send requests to an application. The API uses the new DeploymentHandle
API.get_deployment_handle
that gets a handle that can be used to send requests to any deployment in any application.serve.status
which can be used to get the status of proxies and Serve applications (and their deployments and replicas). This is the pythonic equivalent of the CLI serve status
.--reload
option has been added to the serve run
CLI.💫Enhancements:
serve.start
and serve.run
have a few small changes and deprecations in preparation for this, see https://docs.ray.io/en/latest/serve/api/index.html for details.ray_serve_num_ongoing_http_requests
) to track the number of ongoing requests in each proxyRAY_SERVE_MULTIPLEXED_MODEL_ID_MATCHING_TIMEOUT_S
flag to wait until the model matching.🔨 Fixes:
asyncio.Event
s not being removed in the long poll host: https://github.com/ray-project/ray/pull/38516.ray_serve_deployment_queued_queries
wouldn’t decrement when clients disconnected: https://github.com/ray-project/ray/pull/37965.📖Documentation:
🎉 New Features:
💫Enhancements:
🔨 Fixes:
📖Documentation:
🎉 New Features:
num_returns="dynamic"
generator. The API could be used by specifying num_returns="streaming"
. The API has been used for Ray data and Ray serve to support streaming use cases. See the test script to learn how to use the API. The documentation will be available in a few days.💫Enhancements:
pip install ray
doesn't require the Python grpcio dependency anymore.ray job submit
now exits with 1
if the job fails instead of 0
. To get the old behavior back, you may use ray job submit ... || true
. (#38390)get_assigned_resources
in pg will return the name of the original resources instead of formatted name (#37421)${ENV_VAR}
now can be replaced. Previous versions only supported limited number of env vars. (#36187)🔨 Fixes:
ray start --node-ip-address=...
, the driver also had to specify ray.init(_node_ip_address)
. Now Ray finds the node ip address automatically. (#37644)ray.init
: https://github.com/ray-project/ray/issues/26019
💫Enhancements:
📖Documentation:
Thanks
Many thanks to all those who contributed to this release!
@simran-2797, @can-anyscale, @akshay-anyscale, @c21, @EdwardCuiPeacock, @rynewang, @volks73, @sven1977, @alexeykudinkin, @mattip, @Rohan138, @larrylian, @DavidYoonsik, @scv119, @alpozcan, @JalinWang, @peterghaddad, @rkooo567, @avnishn, @JoshKarpel, @tekumara, @zcin, @jiwq, @nikosavola, @seokjin1013, @shrekris-anyscale, @ericl, @yuxiaoba, @vymao, @architkulkarni, @rickyyx, @bveeramani, @SongGuyang, @jjyao, @sihanwang41, @kevin85421, @ArturNiederfahrenhorst, @justinvyu, @pleaseupgradegrpcio, @aslonnie, @kukushking, @94929, @jrosti, @MattiasDC, @edoakes, @PRESIDENT810, @cadedaniel, @ddelange, @alanwguo, @noahjax, @matthewdeng, @pcmoritz, @richardliaw, @vitsai, @Michaelvll, @tanmaychimurkar, @smiraldr, @wfangchi, @amogkam, @crypdick, @WeichenXu123, @darthhexx, @angelinalg, @chaowanggg, @GeneDer, @xwjiang2010, @peytondmurray, @z4y1b2, @scottsun94, @chappidim, @jovany-wang, @jaidisido, @krfricke, @woshiyyya, @Shubhamurkade, @ijrsvt, @scottjlee, @kouroshHakha, @allenwang28, @raulchen, @stephanie-wang, @iycheng
The Ray 2.6.3 patch release contains fixes for Ray Serve, and Ray Core streaming generators.
🔨 Fixes:
🔨 Fixes:
serve run
help message (#37859) (#38018)ray_serve_deployment_queued_queries
when client disconnects (#37965) (#38020)📖 Documentation: