SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
We are excited to release SkyPilot v0.5.0, where we introduce a significant amount of new features and enhancements, including:
and more!
any_of
or ordered
in resources
), allowing users to significantly enlarge the resource pool and get higher availability.best
disk tier for the best performance and cost, so you can choose the best disk for any cloud. (#2434)SkyServe is a serving system on top of SkyPilot that deploys and scales any HTTP services across one or more regions or clouds, with autoscaling, load balancing, and more.
Other Enhancements
Kubernetes support received a number of New Features and Enhancements.
sky local up
(#2890)Other Enhancements
KUBECONFIG
env var for config file specification (#3169)SkyPilot now supports 13 cloud providers, including 4 new provider-contributed clouds: VMWare vSphere, RunPod, Fluidstack and Cudo Compute.
New Features
Enhancements
Fixes
New Features
Enhancements
Fixes
--disk-size
for Custom Machine Images (#2718)Enhancements
Fixes
sky check
(#3038)New Features
sky status --endpoints
CLI (#3199)sky show-gpus
(#2583, #2892, #2933, #2946, #3083, #3149, #3113)--commit
and --version
for sky
CLI (#2720, #2731, #2733)Enhancements
--disk-tier none
override (#2906)sky check
improvement (#3174, #3212, #3160)Fixes
sky_logs
and mounting directory (#2667, #2845)sky logs
with --sync-down
(#2660)Deprecations
cpunode/gpunode/tpunode
, hide admin
(#2800)Local
cloud which is now replaced by Kubernetes support (#3037, #3186)New Features
Enhancements
~/.ssh/generated/ssh
instead of directly editing ~/.ssh/config
(#2706, #3069)Fixes
New Features
Enhancements
Fixes
~/.sky/config.yaml
for spot jobs (#2876)New Features
Enhancements
Fixes
Full Changelog: https://github.com/skypilot-org/skypilot/compare/v0.4.0...v0.5.0
New contributors: @rtalaricw, @jackyk02, @Vaibhav2001, @rohanvaidya45, @Shrinandan, @manishiitg, @amitkumarj441, @tgaddair, @aseriesof-tubes, @changxiaohui, @thams, @kishb87, @PratikKumar125, @mmcclean, @dtran24, @davidwagnerkc, @mjibril, @kbrgl, @msehsah1, @JungleCatSW, @Ying1123
Many thanks to all contributors who contributed to this release!
Contributors: @Michaelvll, @concretevitamin, @cblmemo, @romilbhardwaj, @MaoZiming, @landscapepainter, @sunny0826, @suquark, @Vaibhav2001, @infwinston, @hemildesai, @asaiacai, @Shrinandan, @kishb87, @rtalaricw, @iojw, @aseriesof-tubes, @manishiitg, @jackyk02, @mmcclean, @thams, @amitkumarj441, @rohanvaidya45, @saihtaungkham, @tgaddair, @davidwagnerkc, @PratikKumar125, @dtran24, @changxiaohui, @mjibril, @kbrgl, @msehsah1, @JungleCatSW, @Ying1123
This is a patch release to ship bug fixes faster to our users! This release includes many feature updates and bug fixes, including the new provisioner for AWS, fixing OOM and credential issues for long-running spot jobs, and some additional improvements.
Detailed changelog coming up in v0.5!
We are excited to release SkyPilot v0.4.0, which brings a host of new features and improvements, including Kubernetes support, native container support, ability to open ports, and more.
sky check
and sky launch --cloud kubernetes
to run your task on Kubernetes.ports
field. These ports are publicly accessible and can be used for hosting LLM inference endpoints, Jupyter notebooks, web servers, Tensorboard, and other services.setup
and run
commands can now directly be executed in that container. This allows you to wrap your environment in a container and run it on any cloud with SkyPilot.SkyPilot now supports 8 clouds, including community contributed support for two new clouds:
SkyPilot now also supports IBM COS buckets (#1966).
--ip
flag for sky status
returns the public IP address of the cluster (e.g., sky status --ip mycluster
). Use this to access services such as LLM inference endpoints, jupyter notebooks and more.file_mounts
can be dynamically defined with environment variables (docs, example), environment variables can be set through a dotenv file with the new --env-file
flag (#2296).sky status
updates for stopped clusters are 10x faster (#2288), and the job queue is more memory efficient (#1636).pip install skypilot-nightly
(#1446)Below is a detailed list of changes.
sky spot dashboard
: you can now see all your spot jobs in GUI (#2103, #2136)sky status
can now show the head IP of the cluster with -a
or --ip
flags (#2305, #2563)sky down/stop/start
defaults to a unique cluster if it exists and sky cancel
without cluster cancels the latest task (#2325)sky check
output is now friendlier with more hints for disabled clouds (#2002, #2017, #2196, #2114, #2221, #2377)sky down
progress bar now reflects clusters failed to terminate (#1595, #2005)--cpus
is provided (#2037)sky launch
is interrupted (#2206, #2252)ports
field (docs, #2210, #2477)image_id
- tasks can now be run inside docker containers (docs, #1910)--clone-disk-from
flag (#2098)sky launch
by caching cluster IP address (#2400)sky status --refresh
for STOPPED cluster is 10x faster (#2079)sky spot launch
will now exclude files from .gitignore (#2018)sky storage
CLI (#2063, #2177)<2.0
(#2157)>3.13, != 5.4.*
to avoid issues with Cython 3 (#2256, #2514)<= 2.6.3
is supported on local machines (#2401)pycryptodome
, oauth2client
are no longer required (#2515)New contributors: @JGoo1, @tobi, @HysunHe, @blucz, @shethhriday29, @MaoZiming, @ksasi, @pushmatrix, @hzeng-0, @saihtaungkham, @fozziethebeat, @n10dollar, @asaiacai, @mtaku3, @gbmarc1, @alex000kim, @steve-marmalade, @xzrderek, @sunny0826.
Many thanks to all contributors who contributed to this release!
@Michaelvll, @concretevitamin, @romilbhardwaj, @cblmemo, @HysunHe, @landscapepainter, @shethhriday29, @infwinston, @alex000kim, @suquark, @sunny0826, @gbmarc1, @MaoZiming, @xzrderek, @tobi, @steve-marmalade, @saihtaungkham, @pushmatrix, @n10dollar, @mtaku3, @ksasi, @hzeng-0, @fozziethebeat, @blucz, @asiaacai, @WoosukKwon, @JGoo1, @mraheja, @iojw, @hemildesai, @ewzeng, @aviweit, @Saikrishna-Achalla, @Cohen-J-Omer
This patch release brings many bug fixes and features, including new mechanics for stop/down, callbacks for spot jobs and a critical dependency fix for PyYAML after the release of cython 3.
Detailed changelog coming up in v0.4!
This is a patch release to ship bug fixes faster to our users! This release includes many feature updates and bug fixes, including the pedantic dependency issue, disk cloning, file mounts, and cloud-specific improvements.
Detailed changelog coming up in v0.4!
This is a patch release to ship several important enhancements and bug fixes:
Enhancements
sky launch --gpus h100
rm -rf ~/.sky/catalogs/v5/lambda
FAILED_SETUP
error (#1998)Fixes
$PWD/~/sky_logs
in some cases (#2009)sky spot launch --retry-until-up
to make it actually retry until up (#2004)sky check
has never been called (#2017)Full Changelog: https://github.com/skypilot-org/skypilot/compare/v0.3.0...v0.3.1
We are excited to release SkyPilot v0.3, the most significant release thus far in the project's history.
v0.3 focuses on:
See the release blog post for a deep-dive into highlights.
Release notes below are as compared to v0.2 (full changelog).
sky check
to set it up. Docs here.sky cost-report
; fine-grained optimizer; user identity; AWS SSO; private IP-only VPCs; Ray runtime is decoupled from user's Ray clusters; ...New Features
sky cost-report
: show the estimated cost of launched clusters (#1301, #1621, #1780, #1680, #1788)
sky launch
/ YAML resources:
field
--cpus
support https://github.com/skypilot-org/skypilot/pull/1622
--memory
support https://github.com/skypilot-org/skypilot/pull/1746
--disk-tier
support https://github.com/skypilot-org/skypilot/pull/1812
--detach-setup
and --detach-run
to sky launch
https://github.com/skypilot-org/skypilot/pull/1379
--retry-until-up
, --region
, --zone
, and --idle-minutes-to-autostop
for interactive nodes https://github.com/skypilot-org/skypilot/pull/1297
sky status/sky.status()
on specific clusters https://github.com/skypilot-org/skypilot/pull/1568
--region
in sky show-gpus
https://github.com/skypilot-org/skypilot/pull/1187
image_id
field under resources
https://github.com/skypilot-org/skypilot/pull/1384
Enhancements
sky show-gpus
sky show-gpus <gpu>:<num>
(same syntax as sky launch --gpus
) https://github.com/skypilot-org/skypilot/pull/1924
sky down -p
bypass identity mismatch errors. https://github.com/skypilot-org/skypilot/pull/1892
Fixes
sky {cpu,gpu,tpu}node
commands correctly reuse existing cluster if possible https://github.com/skypilot-org/skypilot/pull/1787
New Features
sky status
(#1270, #1467, #1691)sky spot queue -a
(#1655)Enhancements
sky spot launch
default -r/--retry-until-up
to True. https://github.com/skypilot-org/skypilot/pull/1781
sky start
on the spot controller resets the default autostop https://github.com/skypilot-org/skypilot/pull/1453
sky spot queue
displays job states with colors (#1473)sky spot queue
no longer shows a cached (and possibly stale) version of the jobs (#1742)sky down
on spot controller when in-progress spot jobs exist https://github.com/skypilot-org/skypilot/pull/1667
FAILED_SETUP
for spot jobs that fail during setup
(#1479)CANCELLING
for spot jobs that are being cancelled (#1785)SKYPILOT_JOB_ID
the same for all recoveries of the same job https://github.com/skypilot-org/skypilot/pull/1400
Fixes
-n
) possibly overwriting each other https://github.com/skypilot-org/skypilot/pull/1782
ssh_proxy_command
if specified https://github.com/skypilot-org/skypilot/pull/1792
Robustness is enhanced for TPUs in various modes: VMs, pods, spot (#1500, #1279, #1359, #1483, #1562, ...).
Enhancements
apt install ...
in setup
may non-deterministically fail due to APT lock being held by background unattended upgradescloud-init
ensures unattended-upgrade is disabled at boot (#1949, #1954); for other clouds we kill the processes (#1347)Fixes
New Features
source
of a storage mount, e.g., source: [~/mydir/myfile.txt, ~/datasets]
https://github.com/skypilot-org/skypilot/pull/1311 #1677Enhancements
.git
folder for cloud storage mounts https://github.com/skypilot-org/skypilot/pull/1494
file_mounts
destination path is a relative path, it is treated as being under workdir #1315Fixes
sky storage delete
for externally deleted buckets https://github.com/skypilot-org/skypilot/pull/1875
New Features
Enhancements
sky launch/start
ray[default]>=2.2.0,<=2.4.0
to fix some dependency conflicts with click/grpcio/protobufSKY_NUM_GPUS_PER_NODE
https://github.com/skypilot-org/skypilot/pull/1337
PYTHONUNBUFFERED=1
in task execution to disable python output buffer by default https://github.com/skypilot-org/skypilot/pull/1290
Fixes
.ssh/config
more robust (#1763, #1683)New Features
Enhancements
Fixes
Enhancements
sky check
https://github.com/skypilot-org/skypilot/pull/1772
skypilot-user
tags to VMs on these clouds. https://github.com/skypilot-org/skypilot/pull/1593
Fixes
Enhancements
skypilot-user
tags to VMs on these clouds. https://github.com/skypilot-org/skypilot/pull/1593
Fixes
New Features
Enhancements
mv ~/.sky/catalogs/v5/gcp ~/.sky/catalogs/v5/gcp.backup
so that new catalogs will be auto-fetchedFixes
New contributors: @dongreenberg, @turian, @scruel, @vivekkhimani, @stephenbalaban, @landscapepainter, @cblmemo, @Saikrishna-Achalla, @datlife, @Cohen-J-Omer (IBM Cloud support!), @zetavg.
Many thanks to all contributors who contributed to this release!
@Michaelvll, @concretevitamin, @romilbhardwaj, @infwinston, @ewzeng, @michaelzhiluo, @WoosukKwon, @iojw, @sumanthgenz, @landscapepainter, @suquark, @dongreenberg, @cblmemo, @mraheja, @vivekkhimani, @turian, @stephenbalaban, @scruel, @lhqing, @datlife, @Saikrishna-Achalla, @Cohen-J-Omer, @zetavg
Another patch release to ship bug fixes faster to our users! This release includes many fixes, including those for managed spot and cloud specific improvements.
Detailed changelog coming up in v0.3!
This patch release brings more bug fixes, including fixes for cloud-specific networking and VPC configuration and managed spot.
Detailed changelog coming up in v0.3!