Cortex Versions Save

Production infrastructure for machine learning at scale

v0.42.1

1 year ago

v0.42.1

New features

Add support for new set of EC2 instances amongst which the c6 and g5 families can be found https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)

Bug fixes

Esthetic fix where the VPC CNI logging functionality was triggering warn logs when running the cortex CLI https://github.com/cortexlabs/cortex/pull/2443 (RobertLucian)

Misc

Update Cortex dependency versions; eksctl, EKS to 1.22, AWS IAM, Python, etc https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian, deliahu)

v0.42.0

2 years ago

v0.42.0

New features

Add support for the Classic Load Balancer for APIs; the Network Load Balancer remains the default (docs) https://github.com/cortexlabs/cortex/pull/2413 https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)

Bug fixes

Fix Async API http/tcp probes when probing the empty root path (/) https://github.com/cortexlabs/cortex/pull/2407 (RobertLucian)
Fix nil pointer exception in the cortex cluster export command https://github.com/cortexlabs/cortex/pull/2415 https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)
Ensure that user-specified environment variables are ordered deterministically in the Kubernetes deployment spec https://github.com/cortexlabs/cortex/pull/2411 (deliahu)

Misc

Ensure that the batch on-job-complete request contains a valid JSON body https://github.com/cortexlabs/cortex/pull/2409 (RobertLucian)

v0.41.0

2 years ago

v0.41.0

New features

Support configurable pre_stop command for containers https://github.com/cortexlabs/cortex/pull/2403 (docs) (deliahu)

Misc

Support m6i instance types https://github.com/cortexlabs/cortex/pull/2398 (deliahu)
Update to Kubernetes v1.21 https://github.com/cortexlabs/cortex/pull/2398 (deliahu)

Bug fixes

Wait for in-flight requests to reach zero before terminating the proxy container https://github.com/cortexlabs/cortex/pull/2402 (deliahu)
Fix cortex get --env command https://github.com/cortexlabs/cortex/pull/2404 (deliahu)
Fix cluster price estimate during cortex cluster up for spot node groups with on-demand base capacity https://github.com/cortexlabs/cortex/pull/2406 (RobertLucian)

Nucleus Model Server

We have released v0.1.0 of the Nucleus model server!

Nucleus is a model server for TensorFlow and generic Python models. It is compatible with Cortex clusters, Kubernetes clusters, and any other container-based deployment platforms. Nucleus can also be run locally via Docker compose.

Some of Nucleus's features include:

Generic Python models (PyTorch, ONNX, Sklearn, MLFlow, Numpy, Pandas, etc)
TensorFlow models
CPU and GPU support
Serve models directly from S3 paths
Configurable multiprocessing and multithreadding
Multi-model endpoints
Dynamic server-side request batching
Automatic model reloading when new model versions are uploaded to S3
Model caching based on LRU policy (on disk and memory)
HTTP and gRPC support

v0.40.0

2 years ago

v0.40.0

New features

Support concurrency for Async APIs (via the max_concurrency field) https://github.com/cortexlabs/cortex/pull/2376 https://github.com/cortexlabs/cortex/issues/2200 (miguelvr)
Add graphs for cluster-wide and per-API cost breakdowns to the cluster metrics dashboard https://github.com/cortexlabs/cortex/pull/2382 https://github.com/cortexlabs/cortex/issues/1962 (RobertLucian)
Allow worker nodes containing Async APIs to scale to zero (now a shared async gateway is used, which runs on the operator node group) https://github.com/cortexlabs/cortex/pull/2380 https://github.com/cortexlabs/cortex/issues/2279 (vishalbollu)
Add cortex describe API_NAME command for Realtime and Async APIs https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 (RobertLucian)
Support updating the priority of an existing node group https://github.com/cortexlabs/cortex/pull/2369 https://github.com/cortexlabs/cortex/issues/2254 (vishalbollu)

Misc

Improve the reporting of API statuses https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 (RobertLucian)
Remove the default readiness probe on the target port if a custom readiness probe is specified in the API spec https://github.com/cortexlabs/cortex/pull/2379 (RobertLucian)

v0.39.1

2 years ago

v0.39.1

Bug fixes

Remove an unnecessary cluster validation which limited the IP ranges that could be used in api_load_balancer_cidr_white_list and operator_load_balancer_cidr_white_list https://github.com/cortexlabs/cortex/pull/2363 (RobertLucian)

v0.39.0

2 years ago

v0.39.0

New features

Add cortex cluster health command to show the health of the cluster's components https://github.com/cortexlabs/cortex/pull/2313 https://github.com/cortexlabs/cortex/issues/2029 (miguelvr)
Forward request headers to AsyncAPIs https://github.com/cortexlabs/cortex/pull/2329 https://github.com/cortexlabs/cortex/issues/2296 (miguelvr)
Add metrics dashboard for Task APIs https://github.com/cortexlabs/cortex/pull/2311 https://github.com/cortexlabs/cortex/pull/2322 (RobertLucian)

Reliability

Enable larger cluster sizes (up to 1000 nodes with 10000 pods) by enabling IPVS https://github.com/cortexlabs/cortex/pull/2357 https://github.com/cortexlabs/cortex/issues/1834 (RobertLucian)
Automatically limit the rate at which nodes are added to avoid overloading the Kubernetes API server https://github.com/cortexlabs/cortex/pull/2331 https://github.com/cortexlabs/cortex/pull/2338 https://github.com/cortexlabs/cortex/issues/2314 (RobertLucian)
Ensure cluster autoscaler availability https://github.com/cortexlabs/cortex/pull/2347 https://github.com/cortexlabs/cortex/issues/2346 (RobertLucian)
Improve istiod availability at large scale https://github.com/cortexlabs/cortex/pull/2342 https://github.com/cortexlabs/cortex/issues/2332 (RobertLucian)
Reduce metrics shown in cortex get to improve scalability and reliability of the command https://github.com/cortexlabs/cortex/pull/2333 https://github.com/cortexlabs/cortex/issues/2319 (vishalbollu)
Show aggregated node statistics in the cluster dashboard https://github.com/cortexlabs/cortex/pull/2336 https://github.com/cortexlabs/cortex/issues/2318 (RobertLucian)

Bug fixes

Ensure that the Content-Type header is properly set to application/json for responses to Async API submissions https://github.com/cortexlabs/cortex/pull/2323 (vishalbollu)
Fix pod autoscaler scale-to-zero edge cases https://github.com/cortexlabs/cortex/pull/2350 (miguelvr)
Allow autoscaling configuration to be updated on a running API https://github.com/cortexlabs/cortex/pull/2355 (RobertLucian)
Fix node group priority calculation for the cluster autoscaler https://github.com/cortexlabs/cortex/pull/2358 https://github.com/cortexlabs/cortex/pull/2343 (RobertLucian, deliahu)
Allow the node_groups selector to be updated in a running API https://github.com/cortexlabs/cortex/pull/2354 (RobertLucian)
Fix the active replicas graph on the Async API dashboard https://github.com/cortexlabs/cortex/pull/2328 (RobertLucian)

Docs

Add a guide for running in production https://github.com/cortexlabs/cortex/pull/2334 https://github.com/cortexlabs/cortex/issues/2317 (vishalbollu)
Add a guide for configuring an HTTP API Gateway https://github.com/cortexlabs/cortex/pull/2341 (deliahu)

Misc

Add a graph of the number of active and queued requests to the Async API dashboard https://github.com/cortexlabs/cortex/pull/2326 https://github.com/cortexlabs/cortex/issues/1960 (deliahu)
Add a graph of the number of instances to the cluster dashboard https://github.com/cortexlabs/cortex/pull/2336 https://github.com/cortexlabs/cortex/issues/2318 (RobertLucian)
Ensure that cortex cluster info --print-config displays YAML that is consumable by cortex cluster configure https://github.com/cortexlabs/cortex/pull/2324 (vishalbollu)

v0.38.0

2 years ago

v0.38.0

New features

Support autoscaling down to zero replicas for Realtime APIs https://github.com/cortexlabs/cortex/pull/2298 https://github.com/cortexlabs/cortex/issues/445 (miguelvr)
Allow ssl_certificate_arn, api_load_balancer_cidr_white_list, and operator_load_balancer_cidr_white_list to be updated on an existing cluster (via the cortex cluster configure command) https://github.com/cortexlabs/cortex/pull/2305 https://github.com/cortexlabs/cortex/issues/2107 (vishalbollu)
Allow Prometheus's instance type to be configured (docs) https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/issues/2285 (RobertLucian)
Allow multiple Inferentia chips to be assigned to a single container https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/1123 (deliahu)

Bug fixes

Fix cluster autoscaler's nodegroup priority calculation https://github.com/cortexlabs/cortex/pull/2309 (RobertLucian)

Misc

Various scalability improvements https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/2297 https://github.com/cortexlabs/cortex/issues/2278 https://github.com/cortexlabs/cortex/issues/2285
Allow setting a nodegroup's max_instances to 0 https://github.com/cortexlabs/cortex/pull/2310 (RobertLucian)

v0.37.0

2 years ago

v0.37.0

New features

Support ARM instance types https://github.com/cortexlabs/cortex/pull/2268 https://github.com/cortexlabs/cortex/issues/1528 (RobertLucian)
Add cortex cluster configure command to add, remove, or scale nodegroups on a running cluster https://github.com/cortexlabs/cortex/pull/2246 https://github.com/cortexlabs/cortex/issues/2096 (RobertLucian)
Add cortex cluster info --print-config command to print the current configuration of a running cluster https://github.com/cortexlabs/cortex/pull/2246 (RobertLucian)
Add metrics dashboard for Async APIs https://github.com/cortexlabs/cortex/pull/2242 https://github.com/cortexlabs/cortex/issues/1958 (miguelvr)
Support cortex refresh command for Async APIs https://github.com/cortexlabs/cortex/pull/2265 https://github.com/cortexlabs/cortex/issues/2237 (deliahu)

Breaking changes

The cortex cluster scale command has been replaced by the cortex cluster configure command.

Bug fixes

Fix Async API metrics reporting for non-200 response status codes https://github.com/cortexlabs/cortex/pull/2266 (miguelvr)
Make batch job metrics persistence resilient to instance termination https://github.com/cortexlabs/cortex/pull/2247 https://github.com/cortexlabs/cortex/issues/2041 (vishalbollu)
Make network validations during cortex cluster up more permissive (to avoid unnecessarily failing checks on GovCloud) https://github.com/cortexlabs/cortex/pull/2248 (vishalbollu)
Fix Inferentia resource requests https://github.com/cortexlabs/cortex/pull/2250 (RobertLucian)

Docs

Add instructions for exporting logs and metrics to external tools (vishalbollu)

Misc

Improve output of cortex cluster info for running batch jobs https://github.com/cortexlabs/cortex/pull/2270 (deliahu)
Persist Batch job metrics regardless of job status https://github.com/cortexlabs/cortex/pull/2244 (miguelvr)
Support creating clusters with no node groups https://github.com/cortexlabs/cortex/pull/2269 (deliahu)
Improve handling of container startup errors in batch jobs with multiple containers https://github.com/cortexlabs/cortex/pull/2260 https://github.com/cortexlabs/cortex/issues/2217 (vishalbollu)
Add CPU and memory resource requests to the proxy and dequeuer containers https://github.com/cortexlabs/cortex/pull/2252 (deliahu)

v0.36.0

2 years ago

v0.36.0

New features

Support running arbitrary Docker containers in all workload types (Realtime, Async, Batch, Task) https://github.com/cortexlabs/cortex/pull/2173 (RobertLucian, miguelvr, vishalbollu, deliahu, ospillinger)
Support autoscaling Async APIs to zero replicas https://github.com/cortexlabs/cortex/pull/2224 https://github.com/cortexlabs/cortex/issues/2199 (RobertLucian)

Breaking changes

With this release, we have generalized Cortex to exclusively support running arbitrary Docker containers for all workload types (Realtime, Async, Batch, and Task). This enables the use of any model server, programming language, etc. As a result, the API configuration has been updated: the predictor section has been removed, the pod section has been added, and the autoscaling parameters have been modified slightly (depending on the workload type). See updated docs for Realtime, Async, Batch, and Task. If you'd like to to see examples of Dockerizing Python applications, see our test/apis folder.
The cortex prepare-debug command has been removed; Cortex now exclusively runs Docker containers, which can be run locally via docker run.
The cortex patch command as been removed; its behavior is now identical to cortex deploy.
The cortex logs command now prints a CloudWatch Insights URL with a pre-populated query which can be executed to show logs from your workloads, since this is the recommended approach in production. If you wish to stream logs from a pod at random, you can use cortex logs --random-pod (keep in mind that these logs will not include some system logs related to your workload).
gRPC support has been temporarily removed; we are working on adding it back in v0.37.

Bug fixes

Handle exception when initializing the Python client when the default environment is not set https://github.com/cortexlabs/cortex/pull/2225 https://github.com/cortexlabs/cortex/issues/2223 (deliahu)

Docs

Document how to configure SMTP in Grafana (e.g to enable email alerts) https://github.com/cortexlabs/cortex/pull/2219 (RobertLucian)

Misc

Show CloudWatch Insights URL with a pre-populated query in the output of cortex logs https://github.com/cortexlabs/cortex/issues/2085 (vishalbollu)
Improve efficiency of batch job submission validations https://github.com/cortexlabs/cortex/pull/2179 https://github.com/cortexlabs/cortex/issues/2178 (deliahu)

v0.35.0

3 years ago

v0.35.0

New features

Avoid processing HTTP requests that have been cancelled by the client https://github.com/cortexlabs/cortex/pull/2135 https://github.com/cortexlabs/cortex/issues/1453 (vishalbollu)
Support GP3 volumes (and make GP3 the default volume type) https://github.com/cortexlabs/cortex/pull/2130 https://github.com/cortexlabs/cortex/issues/1843 (RobertLucian)
Allow setting the shared memory (shm) size for Task APIs https://github.com/cortexlabs/cortex/pull/2132 https://github.com/cortexlabs/cortex/issues/2115 (RobertLucian)
Implement automatic 7-day expiration for Async API responses https://github.com/cortexlabs/cortex/pull/2151 (RobertLucian)
Add cortex env rename command https://github.com/cortexlabs/cortex/pull/2165 https://github.com/cortexlabs/cortex/issues/1773 (deliahu)

Breaking changes

The Python client methods which deploy Python classes have been separated from the deploy() method. Now, deploy() is used only to deploy project folders, and deploy_realtime_api(), deploy_async_api(), deploy_batch_api(), and deploy_task_api() are for deploying Python classes. (docs)
The name of the bucket that Cortex uses for internal purposes is no longer configurable. During cluster creation, Cortex will auto-generate the bucket name (and create the bucket if it doesn't exist). During cluster deletion, the bucket will be emptied (unless the --keep-aws-resources flag is provided to cortex cluster down). Users' files should not be stored in the Cortex internal bucket.

Bug fixes

Fix the number of Async API replicas shown in cortex cluster info https://github.com/cortexlabs/cortex/pull/2140 https://github.com/cortexlabs/cortex/issues/2129 (RobertLucian)

Misc

Delete all cortex-created AWS resources when deleting a cluster, and support the --keep-aws-resources flag with cortex cluster down to preserve AWS resources https://github.com/cortexlabs/cortex/pull/2161 https://github.com/cortexlabs/cortex/issues/1612 (RobertLucian)
Validate the user's AWS service quota for number of security groups and in/out rules during cluster creation https://github.com/cortexlabs/cortex/pull/2127 https://github.com/cortexlabs/cortex/issues/2087 (RobertLucian)
Allow specifying only one of --min-instances or --max-instances with cortex cluster scale https://github.com/cortexlabs/cortex/pull/2149 (RobertLucian)
Use 405 status code for un-implemented Realtime API methods https://github.com/cortexlabs/cortex/pull/2158 (RobertLucian)
Decrease file size and project size limits https://github.com/cortexlabs/cortex/pull/2152 (deliahu)
Set the default environment name to the cluster name when creating a cluster https://github.com/cortexlabs/cortex/pull/2164 https://github.com/cortexlabs/cortex/issues/1546 (deliahu)