Clearml Agent Versions Save

ClearML Agent - ML-Ops made easy. ML-Ops scheduler & orchestration solution

v1.3.0

2 years ago

New Features and Improvements

Support private repos from requirements.txt file (#107, thanks @nielstenboom!)
Bump PyJWT version due to "Key confusion through non-blocklisted public key formats" vulnerability
Add support for additional command line arguments in k8s glue example
Add Python 3.10 support

Bug Fixes

Fix git unsafe directory issue (disable check on cached vcs folder)
Fix dynamic GPUs with "all" GPUs on the same worker
Fix broken pytorch setuptools incompatibility (force setuptools < 59 if torch is below 1.11)
Fix setuptools requirement issue by making sure that if we have "setuptools" in the original required packages, we preserve the line in the pip freeze list
Fix optional priority packaged always compare lower case package name
Fix potential requirements installation failure by making pygobject an optional package (i.e. if installation fails continue the Task package environment setup)
Fix repository URL contains credentials even when agent.force_git_ssh_protocol: true

v1.2.3

2 years ago

Bug Fixes

Fix PYTHONPATH is overwritten when executing a task (append to it instead)
Fix pytorch package is reinstalled when the same version is already installed
Fix copying configuration sets an empty worker name
Protect dynamic GPUs from failing to parse worker GPU index

v1.2.2

2 years ago

Bug Fixes

Fix CLEARML_AGENT_SKIP_PIP_VENV_INSTALL fails to find python executable
Fix apt-get update failure causes apt-get install not to be executed

v1.2.1

2 years ago

New Features and Improvements

Update S3 bucket verify option for minio #83 (thanks @pshowbs!)
Add environment variable for request method #91 (thanks @mmiller-max!)
Add additional k8s-glue dockerfiles #94 (thanks @xadcoh!)
Update default docker image to nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
Add support for custom docker image resolving using the agent.default_docker.match_rules configuration setting (see here)
Add agent.force_git_root_python_path configuration setting to force adding the git repository root folder to the PYTHONPATH (if set working directory is not added to the PYHTONPATH)
Add build --force-docker command line argument to the to allow ignoring task container data
Add agent.poetry_version configuration setting to specify poetry version (and force installation of poetry if missing, see here)
Add custom build script support
Add extra configurations when starting daemon
Add agent.package_manager.force_original_requirements configuration option, allowing to only use original requirements produced by local execution (note that using this configuration option prevents editing installed packages using the UI)
Add support for the CLEARML_AGENT_PROPAGATE_EXITCODE environment variabe. Set this variable to 1 to allow ClearML Agent to return a nonzero exit code on failure
Update clearml-agent init (use app.clear.ml as default server, add git token references)

Bug Fixes

Fix virtualenv python interpreter used #98 (thanks @idantene!)
Fix typing package incorrectly required for Python>3.5 #103 (thanks @Honzys!)
Fix symbolic links not copied from cached VCS into working copy (windows platform will result with default copy content instead of original symbolic link) #89
Fix agent fails to check out code from main branch when branch/commit is not explicitly specified https://github.com/allegroai/clearml/issues/551
Fix git+git:// requirements
Fix default_python calculation (and verbosity)
Fix using deprecated abc support (Python 3.10 compatibility)
Fix no default value for CLEARML_API_DEFAULT_REQ_METHOD causes ValueError if not specified
Fix agent.hide_docker_command_env_vars mode to include URL passwords and handle environment vars containing docker commands
Fix conda package manager listed packages with local links (@ file://) should ignore the local package if it does not exist
Fix cuda patch version support in conda
Fix agent attempts to check out code when in standalone mode
Fix FORCE_LOCAL_CLEARML_AGENT_WHEEL environment variable handling when running from a Windows host
Fix user-provided " is unnecessarily replaced to \\"
Fix token is not propagated to docker in case credentials are not available
Fix PyTorch aarch64 and windows support
Fix VCS packages are reinstalled when the same commit version is already installed
Fix git packages are installed even if commit is given and is preinstalled when using cached virtual environment

v1.1.2

2 years ago

Bug Fixes

This release fixes the six conflict with the new pathlib2 version 2.3.7 and up.

v1.1.1

2 years ago

Features and Bug Fixes

Add support for truncating task log file after reporting to server using agent.truncate_task_output_files configuration setting
Fix PyJWT resiliency support
Fix --stop checking default queue tag (#80)
Fix queue tag default does not exist and --queue not specified (try queue named "default")
Fix Python 3.5 compatibility
Fix Python 2.7 support for PyTorch

1.1.0

2 years ago

Breaking Changes

Disable default demo server (available by setting the CLEARML_NO_DEFAULT_SERVER=0 environment variable)
Change k8s glue default pod label to CLEARML=agent (instead of TRAINS=agent)

Features

Add poetry cache into docker mapping #74
Allow rewriting SSH URLs (see here), refers to #72, #42
Add docker environment arguments log masking support, customizable using the agent.hide_docker_command_env_vars configuration value (see here) #67
Add support for naming docker containers using the agent.docker_container_name_format configuration option to set a name format (disabled by default) https://github.com/allegroai/clearml/issues/412
k8s glue
- Remove queue name from pod name, add queue name and ID to pod labels #64
- Update task status_message for non-responsive or hanging pods
- Support the agent.docker_force_pull configuration option for scheduled pods
- Add docker example for running the k8s glue as a pod in a k8s cluster
Add agent.ignore_requested_python_version configuration option to ignore any requested python version (default false, see here)
Add agent.docker_internal_mounts configuration option to control containers internal mounts (non-root containers, see here)
Add support for -r requirements.txt in the Installed Packages section
Add support for CLEARML_AGENT_INITIAL_CONNECT_RETRY_OVERRIDE environment variable to override initial server connection behavior (defaults to true, allows boolean value or an explicit number specifying the number of connect retries)
Add support for CLEARML_AGENT_DISABLE_SSH_MOUNT environment variable allowing to disable the auto .ssh mount into the docker
Add support for CLEARML_AGENT_SKIP_PIP_VENV_INSTALL environment variable to skip Python virtual env installation on execute and allow providing a custom venv binary
Add support for CLEARML_AGENT_VENV_CACHE_PATH environment variable to allow overriding venv cache folder configuration
Add support for CLEARML_AGENT_EXTRA_DOCKER_ARGS environment variable to allow overriding extra docker args configuration
Add support for environment variables containing bash-style string lists using shlex
Add printout when using ClearML key/secret from environment variables
Increase worker keep-alive timeout to 10 minutes instead of 1 minute
Update documentation

Bug Fixes

Fix auto mount SSH_AUTH_SOCK into docker #45
Fix package manager configuration documentation #78
Fix support for spaces in docker arguments https://github.com/allegroai/clearml/issues/358
Fix standalone script with pre-exiting conda venv
Fix PyYAML v5.4, v5.4.1 versions not supported
Fix parsing VCS links starting with git+git@ (notice git+git:// was already supported)
Fix Python package with git+git:// links or git+ssh:// conversion
Fix --services-mode if the execute agent fails when starting to run with error code 0
Fix --stop with dynamic gpus
Fix support for unicode standalone scripts, changing default ascii encoding to UTF-8
Fix venv cache cannot reinstall package from git with http credentials
Fix PYTHONIOENCODING environment variable is overwritten when already defined
k8s glue
- Fix suppoer for multiple k8s glue instances with pod limits
- Fix task container handling fails parsing docker image
- Fix task container is not set when using default image/arguments
- Fix task container image arguments are used when no image is specified
- Fix task container arguments not supported in when template is not provided
- Fix agent.extra_docker_bash_script not applied correctly
- Fix task runtime properties are removed when re-enqueuing task
- Fix error is not thrown when failing to push task to queue

1.0.0

3 years ago

Features

Add conda and pip environment debug prints (using --debug)
Add support for PyJWT v2
Change the default conda channel order, so it pulls the correct pytorch package
Improve k8s glue support
- Support k8s glue container env vars merging
- Add number of pods limit to k8s glue using the max_pods_limit argument (use --max-pods switch in the k8s glue example)
- Add k8s glue default restartPolicy=Never to template to prevent pods from restarting
Add --stop switch support for dynamic gpus
Verify docker command exists when running in docker mode
Add support for terminating dockers on sig_term in dynamic mode
Add stopping message on Task process termination
Add agent.docker_install_opencv_libs configuration option to enable automatic opencv libs install for faster docker spin-up (default: true, see here)
Add support for new container base setup script feature
Bump virtualenv dependency version (support v>=16,<21)
Add support for dynamic gpus opportunistic scheduling (with min/max gpus per queue)
Deprecate venv_update in configuration (replaced by the more robust venvs_cache)
Add Python 3.9 to the support table

Bug Fixes

Fix agent can return non-zero error code and pods will end up restarting forever #56
Fix poetry support #57
Fix cuda version from driver does not return minor version
Fix requirements local path replace back when using cache
Fix k8s glue
- Fix broken k8s glue docker args parsing
- Fix empty env prevents override when merging template
Fix venv cache crash on bad symbolic links
Fix no docker arguments provided

0.17.2

3 years ago

Features

Add virtual environment caching
- Supports venv caching both in standard and docker mode
- Configurable using the agent.venvs_cache configuration section
- Disabled by default, enable here
Add support for --services-mode with venvs
Add agent.force_git_ssh_user configuration value (default git, see here) #42
Add agent.ignore_requested_python_version configuration option for multi python environments (default false)
Add agent.enable_task_env configuration option to set the OS environment based on the Environment section of the Task (default false, see here)
K8s glue
- Add support for detecting and deleting k8s pods that fail to start
- Allow providing namespace in k8s glue and k8s glue example
- Add base-pod-number parameter to k8s glue and example
Change agent.default_docker.image to nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 (see here)
Use shared git cache for multiple agents on the same machine
Upgrade pynvml add detect CUDA version from driver level
Update agent and services docker files
Update documentation

Bug Fixes

Fix docker --network returns None
Fix docker mode without venvs cache dir
Fix applying git diff on a newly added file
Fix environment variables CLEARML_WEB_HOST/CLEARML_FILES_HOST not passed to running tasks (or updated on the config object)
Fix --detached command line option not supported on Windows (ignore and issue warning)
Fix file not found error (errno 2) interpreted as aborted (i.e. Ctrl-C)
Fix from clearml runtime diff patching
Fix cache to take cuda version into account
Fix CPU mode
Fix multi instances on Windows
Fix conda support for git+http links
Fix k8s glue does not pass docker environment variables, remove deprecated flags

0.17.1

3 years ago

ClearML-Agent (formerly allegro trains-agent)

Features and Bug Fixes

Fix support for pip virtual-environment on Windows
Fix support for conda using repository requirements.txt (empty "Installed Packages" section)