Clearml Agent Versions Save

ClearML Agent - ML-Ops made easy. ML-Ops scheduler & orchestration solution

v1.3.0

2 years ago

New Features and Improvements

  • Support private repos from requirements.txt file (#107, thanks @nielstenboom!)
  • Bump PyJWT version due to "Key confusion through non-blocklisted public key formats" vulnerability
  • Add support for additional command line arguments in k8s glue example
  • Add Python 3.10 support

Bug Fixes

  • Fix git unsafe directory issue (disable check on cached vcs folder)
  • Fix dynamic GPUs with "all" GPUs on the same worker
  • Fix broken pytorch setuptools incompatibility (force setuptools < 59 if torch is below 1.11)
  • Fix setuptools requirement issue by making sure that if we have "setuptools" in the original required packages, we preserve the line in the pip freeze list
  • Fix optional priority packaged always compare lower case package name
  • Fix potential requirements installation failure by making pygobject an optional package (i.e. if installation fails continue the Task package environment setup)
  • Fix repository URL contains credentials even when agent.force_git_ssh_protocol: true

v1.2.3

2 years ago

Bug Fixes

  • Fix PYTHONPATH is overwritten when executing a task (append to it instead)
  • Fix pytorch package is reinstalled when the same version is already installed
  • Fix copying configuration sets an empty worker name
  • Protect dynamic GPUs from failing to parse worker GPU index

v1.2.2

2 years ago

Bug Fixes

  • Fix CLEARML_AGENT_SKIP_PIP_VENV_INSTALL fails to find python executable
  • Fix apt-get update failure causes apt-get install not to be executed

v1.2.1

2 years ago

New Features and Improvements

  • Update S3 bucket verify option for minio #83 (thanks @pshowbs!)
  • Add environment variable for request method #91 (thanks @mmiller-max!)
  • Add additional k8s-glue dockerfiles #94 (thanks @xadcoh!)
  • Update default docker image to nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
  • Add support for custom docker image resolving using the agent.default_docker.match_rules configuration setting (see here)
  • Add agent.force_git_root_python_path configuration setting to force adding the git repository root folder to the PYTHONPATH (if set working directory is not added to the PYHTONPATH)
  • Add build --force-docker command line argument to the to allow ignoring task container data
  • Add agent.poetry_version configuration setting to specify poetry version (and force installation of poetry if missing, see here)
  • Add custom build script support
  • Add extra configurations when starting daemon
  • Add agent.package_manager.force_original_requirements configuration option, allowing to only use original requirements produced by local execution (note that using this configuration option prevents editing installed packages using the UI)
  • Add support for the CLEARML_AGENT_PROPAGATE_EXITCODE environment variabe. Set this variable to 1 to allow ClearML Agent to return a nonzero exit code on failure
  • Update clearml-agent init (use app.clear.ml as default server, add git token references)

Bug Fixes

  • Fix virtualenv python interpreter used #98 (thanks @idantene!)
  • Fix typing package incorrectly required for Python>3.5 #103 (thanks @Honzys!)
  • Fix symbolic links not copied from cached VCS into working copy (windows platform will result with default copy content instead of original symbolic link) #89
  • Fix agent fails to check out code from main branch when branch/commit is not explicitly specified https://github.com/allegroai/clearml/issues/551
  • Fix git+git:// requirements
  • Fix default_python calculation (and verbosity)
  • Fix using deprecated abc support (Python 3.10 compatibility)
  • Fix no default value for CLEARML_API_DEFAULT_REQ_METHOD causes ValueError if not specified
  • Fix agent.hide_docker_command_env_vars mode to include URL passwords and handle environment vars containing docker commands
  • Fix conda package manager listed packages with local links (@ file://) should ignore the local package if it does not exist
  • Fix cuda patch version support in conda
  • Fix agent attempts to check out code when in standalone mode
  • Fix FORCE_LOCAL_CLEARML_AGENT_WHEEL environment variable handling when running from a Windows host
  • Fix user-provided " is unnecessarily replaced to \\"
  • Fix token is not propagated to docker in case credentials are not available
  • Fix PyTorch aarch64 and windows support
  • Fix VCS packages are reinstalled when the same commit version is already installed
  • Fix git packages are installed even if commit is given and is preinstalled when using cached virtual environment

v1.1.2

2 years ago

Bug Fixes

  • This release fixes the six conflict with the new pathlib2 version 2.3.7 and up.

v1.1.1

2 years ago

Features and Bug Fixes

  • Add support for truncating task log file after reporting to server using agent.truncate_task_output_files configuration setting
  • Fix PyJWT resiliency support
  • Fix --stop checking default queue tag (#80)
  • Fix queue tag default does not exist and --queue not specified (try queue named "default")
  • Fix Python 3.5 compatibility
  • Fix Python 2.7 support for PyTorch

1.1.0

2 years ago

Breaking Changes

  • Disable default demo server (available by setting the CLEARML_NO_DEFAULT_SERVER=0 environment variable)
  • Change k8s glue default pod label to CLEARML=agent (instead of TRAINS=agent)

Features

  • Add poetry cache into docker mapping #74
  • Allow rewriting SSH URLs (see here), refers to #72, #42
  • Add docker environment arguments log masking support, customizable using the agent.hide_docker_command_env_vars configuration value (see here) #67
  • Add support for naming docker containers using the agent.docker_container_name_format configuration option to set a name format (disabled by default) https://github.com/allegroai/clearml/issues/412
  • k8s glue
    • Remove queue name from pod name, add queue name and ID to pod labels #64
    • Update task status_message for non-responsive or hanging pods
    • Support the agent.docker_force_pull configuration option for scheduled pods
    • Add docker example for running the k8s glue as a pod in a k8s cluster
  • Add agent.ignore_requested_python_version configuration option to ignore any requested python version (default false, see here)
  • Add agent.docker_internal_mounts configuration option to control containers internal mounts (non-root containers, see here)
  • Add support for -r requirements.txt in the Installed Packages section
  • Add support for CLEARML_AGENT_INITIAL_CONNECT_RETRY_OVERRIDE environment variable to override initial server connection behavior (defaults to true, allows boolean value or an explicit number specifying the number of connect retries)
  • Add support for CLEARML_AGENT_DISABLE_SSH_MOUNT environment variable allowing to disable the auto .ssh mount into the docker
  • Add support for CLEARML_AGENT_SKIP_PIP_VENV_INSTALL environment variable to skip Python virtual env installation on execute and allow providing a custom venv binary
  • Add support for CLEARML_AGENT_VENV_CACHE_PATH environment variable to allow overriding venv cache folder configuration
  • Add support for CLEARML_AGENT_EXTRA_DOCKER_ARGS environment variable to allow overriding extra docker args configuration
  • Add support for environment variables containing bash-style string lists using shlex
  • Add printout when using ClearML key/secret from environment variables
  • Increase worker keep-alive timeout to 10 minutes instead of 1 minute
  • Update documentation

Bug Fixes

  • Fix auto mount SSH_AUTH_SOCK into docker #45
  • Fix package manager configuration documentation #78
  • Fix support for spaces in docker arguments https://github.com/allegroai/clearml/issues/358
  • Fix standalone script with pre-exiting conda venv
  • Fix PyYAML v5.4, v5.4.1 versions not supported
  • Fix parsing VCS links starting with git+git@ (notice git+git:// was already supported)
  • Fix Python package with git+git:// links or git+ssh:// conversion
  • Fix --services-mode if the execute agent fails when starting to run with error code 0
  • Fix --stop with dynamic gpus
  • Fix support for unicode standalone scripts, changing default ascii encoding to UTF-8
  • Fix venv cache cannot reinstall package from git with http credentials
  • Fix PYTHONIOENCODING environment variable is overwritten when already defined
  • k8s glue
    • Fix suppoer for multiple k8s glue instances with pod limits
    • Fix task container handling fails parsing docker image
    • Fix task container is not set when using default image/arguments
    • Fix task container image arguments are used when no image is specified
    • Fix task container arguments not supported in when template is not provided
    • Fix agent.extra_docker_bash_script not applied correctly
    • Fix task runtime properties are removed when re-enqueuing task
    • Fix error is not thrown when failing to push task to queue

1.0.0

3 years ago

Features

  • Add conda and pip environment debug prints (using --debug)
  • Add support for PyJWT v2
  • Change the default conda channel order, so it pulls the correct pytorch package
  • Improve k8s glue support
    • Support k8s glue container env vars merging
    • Add number of pods limit to k8s glue using the max_pods_limit argument (use --max-pods switch in the k8s glue example)
    • Add k8s glue default restartPolicy=Never to template to prevent pods from restarting
  • Add --stop switch support for dynamic gpus
  • Verify docker command exists when running in docker mode
  • Add support for terminating dockers on sig_term in dynamic mode
  • Add stopping message on Task process termination
  • Add agent.docker_install_opencv_libs configuration option to enable automatic opencv libs install for faster docker spin-up (default: true, see here)
  • Add support for new container base setup script feature
  • Bump virtualenv dependency version (support v>=16,<21)
  • Add support for dynamic gpus opportunistic scheduling (with min/max gpus per queue)
  • Deprecate venv_update in configuration (replaced by the more robust venvs_cache)
  • Add Python 3.9 to the support table

Bug Fixes

  • Fix agent can return non-zero error code and pods will end up restarting forever #56
  • Fix poetry support #57
  • Fix cuda version from driver does not return minor version
  • Fix requirements local path replace back when using cache
  • Fix k8s glue
    • Fix broken k8s glue docker args parsing
    • Fix empty env prevents override when merging template
  • Fix venv cache crash on bad symbolic links
  • Fix no docker arguments provided

0.17.2

3 years ago

Features

  • Add virtual environment caching
    • Supports venv caching both in standard and docker mode
    • Configurable using the agent.venvs_cache configuration section
    • Disabled by default, enable here
  • Add support for --services-mode with venvs
  • Add agent.force_git_ssh_user configuration value (default git, see here) #42
  • Add agent.ignore_requested_python_version configuration option for multi python environments (default false)
  • Add agent.enable_task_env configuration option to set the OS environment based on the Environment section of the Task (default false, see here)
  • K8s glue
    • Add support for detecting and deleting k8s pods that fail to start
    • Allow providing namespace in k8s glue and k8s glue example
    • Add base-pod-number parameter to k8s glue and example
  • Change agent.default_docker.image to nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 (see here)
  • Use shared git cache for multiple agents on the same machine
  • Upgrade pynvml add detect CUDA version from driver level
  • Update agent and services docker files
  • Update documentation

Bug Fixes

  • Fix docker --network returns None
  • Fix docker mode without venvs cache dir
  • Fix applying git diff on a newly added file
  • Fix environment variables CLEARML_WEB_HOST/CLEARML_FILES_HOST not passed to running tasks (or updated on the config object)
  • Fix --detached command line option not supported on Windows (ignore and issue warning)
  • Fix file not found error (errno 2) interpreted as aborted (i.e. Ctrl-C)
  • Fix from clearml runtime diff patching
  • Fix cache to take cuda version into account
  • Fix CPU mode
  • Fix multi instances on Windows
  • Fix conda support for git+http links
  • Fix k8s glue does not pass docker environment variables, remove deprecated flags

0.17.1

3 years ago

ClearML-Agent (formerly allegro trains-agent)

Features and Bug Fixes

  • Fix support for pip virtual-environment on Windows
  • Fix support for conda using repository requirements.txt (empty "Installed Packages" section)