Clearml Agent Versions Save

ClearML Agent - ML-Ops made easy. ML-Ops scheduler & orchestration solution

v1.8.0

1 month ago

New Features

  • Add CLEARML_AGENT_FORCE_POETRY environment variable to allow forcing poetry even when using pip requirements manager
  • Add CLEARML_AGENT_FORCE_TASK_INIT environment variable to allow runtime patching of script even if no repository is specified and the code is running a preinstalled docker
  • Improve venv cache handling:
    • Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior)
    • Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation
    • Do not lock the cache folder if we do not need to delete old entries
    • Add agent.venvs_cache.lock_timeout to control the venv cache folder lock timeout (in seconds, default 30)
  • Add protection for truncate() call
  • Move configuration sanitization settings to the default config file
  • Add queue ID report before pulling task
  • Improve GPU monitoring for MIGs

Bug Fixes

  • Use correct Python version in Poetry init (#179, thanks @nfzd!)
  • Fix queue handling in K8sIntegration and k8s_glue_example.py (#183, thanks @FeU-aKlos!)
  • Fix FileNotFoundException crash in find_python_executable_for_version (#192, thanks @ae-ae!)
  • Fix delete temp console pipe log files after Task execution is completed (important for long lasting services agents to avoid collecting temp files on host machine)
  • Fix agent.enable_git_ask_pass does not show in configuration dump
  • Fix pippip is returned as a pip version if no value exists in agent.package_manager.pip_version
  • Fix Python 3.12 support by removing distutil imports
  • Fix IOError on file lock when using shared folder
  • Fix torch resolver settings applied to PytorchRequirement instance are not used
  • Fix comment lines (starting with #) are not ignored in docker startup bash script
  • Fix dynamic GPU sometimes misses the initial print

v1.7.0

4 months ago

New Features

  • Add agent.docker_args_extra_precedes_task and agent.protected_docker_extra_args configuration settings to prevent the same switch to be used by both agent.extra_docker_args and the a Task's docker args
  • Add agent.resource_monitoring.disk_use_path configuration option to allow monitoring a different volume than the one containing the home folder
  • Change default agent.enable_git_ask_pass to true
  • Add example and support for pre-built containers including services-mode support with overrides CLEARML_AGENT_FORCE_CODE_DIR and CLEARML_AGENT_FORCE_EXEC_SCRIPT
  • Add CLEARML_AGENT_SERVICE_TASK=1 environment variable in case we're running a service task
  • Add CLEARML_AGENT_TEMP_STDOUT_FILE_DIR to allow specifying temp dir used for storing agent log files and temporary log files (daemon and execution)
  • Update GPU stats and pynvml support
  • Add git clone verbosity using CLEARML_AGENT_GIT_CLONE_VERBOSE environment variable
  • k8s glue
    • Add status reason when aborting before moving to k8s_scheduler queue
    • When cleaning up pending pods, verify task is still aborted and pod is still pending before deleting the pod
    • Set worker ID in k8s pod execution

Bug Fixes

  • Fix agent.package_manager.poetry_install_extra_args are used in all Poetry commands and not just in install (#173)
  • Fix if process return code is SIGKILL (-9 or 137) and abort callback was called, do not mark as failed but as aborted
  • Fix agent.git_host setting will cause git@domain URLs to not be replaced by SSH URLs since furl cannot parse them to obtain host
  • Fix an environment variable that should be set with a numerical value of 0 (i.e. end up as "0" or "0.0") is set to an empty string
  • Fix agent.package_manager.extra_index_url URLs are not sanitized in configuration printout
  • Fix recursion issue when deep-copying a session
  • k8s glue
    • Fix k8s glue configuration might be contaminated when changed during apply
    • Fix KeyError if container does not contain the arguments field

v1.6.1

8 months ago

Bug Fixes

  • Fix requests requirement lower constraint breaks backwards compatibility for Python 3.6

v1.6.0

8 months ago

New Features

  • Upgrade requests library (https://github.com/allegroai/clearml-agent/pull/162, thanks @jday1!)
  • Add support for controlling PyTorch resolving mode using the CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE environment variable and agent.package_manager.pytorch_resolve configuration setting with none (no resolving), pip (sets extra index based on cuda and lets pip resolve) or direct (the previous parsing algorithm that does the matching and downloading), default is pip (#152)
  • Add backwards compatibility in standalone mode using the CLEARML_AGENT_STANDALONE_CONFIG_BC environment variable
  • Add CLEARML_AGENT_DOCKER_AGENT_REPO alias for the FORCE_CLEARML_AGENT_REPO environment variable
  • Show a better message for agent init when an existing clearml.conf is found
  • Add support for task field injection into container docker name using the agent.docker_container_name_format_fields configuration setting
  • Add support for adding additional labels to docker containers using the CLEARML_AGENT_EXTRA_DOCKER_LABELS environment variable
  • Add support for setting file mode in files applied by the agent (using the files configuration option) using the mode property
  • Add support for skipping agent pip upgrade in the default k8s pod container bash script using the CLEARML_AGENT_NO_UPDATE environment variable
  • Add support for additional pip install flags when installing dependencies using the CLEARML_EXTRA_PIP_INSTALL_FLAGS environment variable and agent.package_manager.extra_pip_install_flags configuration option
  • Add support for extra docker arguments referencing machines environment variables using the agent.docker_allow_host_environ configuration option, allowing users to use $ENV in the task docker arguments (e.g. -e HOST_NAME=$HOST_NAME)
  • Add support for k8s jobs execution (as opposed to only pods)
  • Update default docker image versions
  • Add Python 3.11 support

Bug Fixes

  • Fix git+ssh:// links inside installed packages not being properly converted to authenticated https:// and vice versa
  • Fix pip version required in the "Installed Packages" is now preserved and reinstalled
  • Fix various agent paths not loaded correctly if an empty string or null is used (should be disabled, not converted to .)
  • Fix docker container backwards compatibility for API<2.13
  • Fix default docker match rules resolver (used incorrect field "container" instead of "image")
  • Fix task docker argument might be passed twice (might cause an error with flags such as --network and --ipc)

v1.5.2

1 year ago

New Features and Improvements

  • Switch services agent entrypoint shell from sh to bash (#141, thanks @InCogNiTo124!)
  • Improve poetry support
    • Add poetry cwd support (#142, thanks @nielstenboom!)
    • Add agent.package_manager.poetry_install_extra_args configuration option
  • Do not allow request exceptions (keep retrying, throw error only on the initial login call)

Bug Fixes

  • Fix agent update version (#132, thanks @achaiah!)
  • Fix login uses GET with payload which breaks when trying to connect a server running in GCP
  • Fix clearml-agent build --docker stuck on certain containers
  • Fix build fails when target is relative path
  • Fix pinging running task (change default to once a minute)
  • Fix _ is allowed in k8s label names
  • Fix k8s glue does not delete pending pods if the tasks they represent were aborted
  • Reintroduce CLEARML_AGENT_SERVICES_DOCKER_RESTART accidentally reverted by a previous merge
  • Fix git+ssh:// links inside installed packages not being converted properly to HTTPS-authenticated links

v1.5.1

1 year ago

New Features and Improvements

  • Upgrade requirements for attrs, jsonschema, pyparsing, six and pyjwt (#129)
  • Add default output URI selection to clearml-agent init
  • Add agent.disable_task_docker_override configuration option to disable docker override specified in executing tasks
  • Add CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES env var (default true) to allow overriding default system_site_packages: true behavior when running tasks in containers (docker mode and k8s-glue)

Bug Fixes

  • Fix using deprecated types validator argument raises an error (deprecated even before jsonschema 3.0.0 and unsupported since 4.0.0)
  • Fix pip support allowing multiple pip version constraints (by default, one for < Python 3.10 and one for >= Python 3.10)

v1.5.0

1 year ago

New Features and Improvements

  • Add option to crash agent on exception using agent.crash_on_exception configuration setting (#123, thanks @nielstenboom!)
  • Improve venv cache disabled message
  • Upgrade packages for better Python 3.10 support
  • Remove future package dependency (Python 2 is not supported for clearml-agent)
  • Change default pip version used to pip<21 for better Python 3.10 support
  • Add support for operator != in package version (mostly for better pytorch resolving)
  • Add support for PyTorch new extra_index_url repo (find the correct index url based on the cuda version, and let pip do the rest)
  • Make venv caching the default behavior
  • Add support for CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV environment variable (see agent.hide_docker_command_env_vars config option)
  • Ping executing tasks to make sure the server does not consider them stale (set using the agent.task_ping_interval_sec configuration option, defaults to every 120 seconds)

Bug Fixes

  • Fix docker extra arguments showing up in configuration printout
  • Fix an issue with running on Python 3.10 / 3.11
  • Fix cached git token prevents cloning repository (using agent.enable_git_ask_pass forcing the agent to use GIT_ASKPASS for user/password when cloning/fetching repositories)
  • Fix setting CLEARML_API_DEFAULT_REQ_METHOD raises an error
  • Fix get_task_session() may cause an old copy of the APIClient to be used containing a reference to the previous session
  • K8s Glue
    • Fix agent.system_site_packages is not turned on by default in k8s glue
    • Make sure we git_user/pass is passed to the task pod
    • Remove support for kubectl run

v1.4.1

1 year ago

Improvements

  • Add warning if venv cache is disabled
  • Add agent.disable_ssh_mount configuration option (same as the CLEARML_AGENT_DISABLE_SSH_MOUNT environment variable)

Bug Fixes

  • Fix docker command for monitoring child agents
  • Fix --gpus all not reporting GPU stats on worker machine

v1.4.0

1 year ago

New Features and Improvements

  • Add support for MIG devices (use 0:1 for GPU 0 slice 1, or use 0.1)
  • Add agent.enable_git_ask_pass to improve passing user/pass to git commands
  • Add docker ssh_ro_folder (default /.ssh) and changed docker ssh_folder (default: ~/.ssh)
  • Allow overriding pytorch lookup page (See torch_page, torch_nightly_page and torch_url_template_prefix under the agent.package_manager configuration settings)
  • Add support for abort callback registration
  • K8s glue
    • Add CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH environment variable to allow customizing the agent startup script location
    • Add debug environment variable CLEARML_AGENT_DEBUG_INFO
    • Add CLEARML_AGENT_CHILD_AGENTS_COUNT_CMD environment variable to allow overriding child agent count command in k8s
    • Refactor template handling

Bug Fixes

  • Fix Python 3.10+ support
  • Fix use_credentials_chain is missing in config file example
  • Fix Git PAT messages
  • Fix home folder in clearml.conf to ~ (instead of /root)
  • Fix docker mode uses ~/.clearml/venvs-builds as default for easier user-mode containers
  • Fix package @ file:// with quoted (URL style) links should not be ignored
  • Fix name not escaped as regex (all services "get_all" use regex for name)
  • Fix second .ssh temp mount fails if container changes the files inside
  • Fix GCP load balancer does not forward GET request body (allow changing default request action to PUT/POST/GET. See api.http.default_method or CLEARML_API_DEFAULT_REQ_METHOD)
  • K8s glue
    • Fix resolving k8s pending queue may cause a queue with a UUID name to be created
    • Fix template namespace should override default namespace
    • Fix extra_bash_init_cmd location in initial bash script
    • Fix debug mode
  • Fixed documentation (#117)

v1.3.0

1 year ago

New Features and Improvements

  • Support private repos from requirements.txt file (#107, thanks @nielstenboom!)
  • Bump PyJWT version due to "Key confusion through non-blocklisted public key formats" vulnerability
  • Add support for additional command line arguments in k8s glue example
  • Add Python 3.10 support

Bug Fixes

  • Fix git unsafe directory issue (disable check on cached vcs folder)
  • Fix dynamic GPUs with "all" GPUs on the same worker
  • Fix broken pytorch setuptools incompatibility (force setuptools < 59 if torch is below 1.11)
  • Fix setuptools requirement issue by making sure that if we have "setuptools" in the original required packages, we preserve the line in the pip freeze list
  • Fix optional priority packaged always compare lower case package name
  • Fix potential requirements installation failure by making pygobject an optional package (i.e. if installation fails continue the Task package environment setup)
  • Fix repository URL contains credentials even when agent.force_git_ssh_protocol: true