Toil Versions Save

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

releases/6.0.0

4 months ago

NOTE!

We now have a config file! https://toil.readthedocs.io/en/latest/running/cliOptions.html#the-config-file

Breaking Changes

  • Removed the parasol batch system
  • Removed the TES batch system (this is now a plugin)
  • Removed our WDL compiler in favor of an interpreter (we still support WDL, we just do it differently now)
  • We no longer support python3.7

CWL

  • Support CWL 1.2.1 (#4682)
  • CWL Pipefish compatibility (#4636)
  • Support per-task preemptibility in CWL (#4551)
  • Fix configargparse in CWL (#4618)
  • cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565)
  • Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639)
  • Implement a better config file system for CWL/WDL options (#4666)
  • Allow working with remote files in CWL and WDL workflows (#4690)
  • Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725)
  • Log more usefully for CWL workflows (#4736)

WDL

  • Simplify WDL Toil job graphs (#4524)
  • More WDL and Slurm documentation (#4558)
  • Improve WDL documentation (#4732)
  • Add String to File functionality into toil-wdl-runner (#4589)
  • Run WDL output through Toil export system to support URIs (#4579)
  • Allow the WDL output section to reference itself (#4592)
  • Ensure sibling files in toil-wdl-runner (#4610)
  • Make WDLOutputJob collect all task outputs (#4602)
  • Report errors in WDL using MiniWDL's error location printer (#4637)
  • Remove the WDL compiler. (#4679)
  • Implement a better config file system for CWL/WDL options (#4666)
  • Allow working with remote files in CWL and WDL workflows (#4690)
  • Strip leading whitespace from WDL commands (#4720)

Misc

  • Add config file support (#4569)
  • Support Python3.11 and drop Python 3.7 (#4646)
  • Move TES batch system to a plugin (#4650)
  • Turn batch system tests back on (#4649)
  • Separate out integration tests to run on a schedule (#4612)
  • Avoid concurrent modification in cluster scaler tests (#4600)
  • Remove old buckets from AWS (#4588)
  • Tests: only request a single core (#4572)
  • Reduce the number of assert statements (#4590)
  • take any nvidia-smi exception as not having gpu (#4611)
  • More resiliancy (#4395)
  • Remove useage of the deprecated pkg_resources (#4701)
  • Make sure cwltool always knows we have an outdir to fix #4698 (#4699)
  • AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700)
  • Only count output file usage when using the file store (#4692)
  • Remove the parasol batch system. (#4678)
  • Move around reqs and move aws dev libraries to aws (#4664)
  • Make sure the --batchLogsDir exists if it is set (#4635)
  • Update EC2 instances and EC2 update script. (#4745)
  • remove extraneous dependency on old 'mock' (#4739)
  • Point CI at the new public URLs for stuff we host
  • Add init.py to options folder (#4723)

Bug Fixes

  • Lower redirect log level to fix #4526 (#4578)
  • Fix mypy from being broken by new boto types (#4577)
  • Fix CI on local Gitlab runners (#4571)
  • Banish ghost jobs (#4563)
  • Stop deleting chained-to jobs which fail as orphaned jobs (#4557)
  • Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575)
  • Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656)
  • Fix CI Appliance Builds (#4655)
  • Tolerate a failed AMI polling attempt (#4727)* Add pure Python fallback for getDirSizeRecursively() (#4753)
  • Don't mark inputs (or outputs) executable for no reason (#4728)
  • Fix scheduled CI tests (#4742)
  • Fix --printJobInfo (#4709)

Thank you to our contributors: @stxue1 , @w-gao, @DailyDreaming , @mr-c , @adamnovak , @glennhickey, @misterbrandonwalker, and @a-detiste !

releases/5.12.0

9 months ago

WDL

  • Virtualize filenames as in-container paths from point of view of WDL command (#4527)
  • Add WDL conformance tests to CI (#4530)
  • Use less memory in the Giraffe WDL test (#4541)

Version Upgrades

  • Upgrade to cwltool 3.1.20230601100705 (#4500)
  • Update mock requirement from <5,>=4.0.3 to >=4.0.3,<6 (#4366)

Misc

  • Anonymous access to Google Storage (#4518)
  • Reorder config so that default settings are applied first (#4528)
  • Add a way to forward accelerators to Docker containers (#4492)

Bug Fixes

  • Fix test failures without docker installed (#4544)
  • Prevent certain tests from being run twice in CI (#4529)
  • Drop external Docker builder (#4523)
  • Fix CI lint test (#4533)
  • Grab AWS group policies on top of user (#4505)
  • Grab accelerator set off the end of the list instead of by index (#4506)
  • Fix RtD build (#4491)
  • Include tests (#4499)

Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , and @tjni !

releases/5.11.0

11 months ago

Breaking Changes

  • Imported files will be symlinked by default, unless the user sets --noLinkImports or the workflow imports with symlink=False. (#3949)

WDL

  • Toil will now stop if it encounters an error polling a possible import URL for a WDL workflow input file. (#4479)
  • WDL workflows will be protected against imported files with no basenames. (#4477)

Misc

  • Toil batch system ID numbers for issued jobs now start at 1. (#4482)
  • Attempts to import files from URLs when the implementing job store is missing an extra are now better reported. (#4479)
  • Include tests in the source distribution that gets published to PyPI (#4499)

Bug Fixes

  • Toil should no longer crash when a delete wins a race against a load in FileJobStore (#4484)
  • Prevent local root jobs (such as WDLRootJob) from being run twice. (#4482)
  • Slurm and other grid batch system jobs will now have more informative names (#4472)
  • WDL workflows can no longer import "" as a File. (#4477)

Thank you to our contributors: @stxue1, @DailyDreaming, @mr-c, @adamnovak

releases/5.10.0

1 year ago

Changelog

Highlighted Features Added

  • Add a --caching option which explicitly states whether to use caching with a workflow. Uses a default value depending on whether or not we are using the file job store if not specified. (#4218)
  • New prototype WDL runner python -m toil.wdl.wdltoil using MiniWDL (#3468)
  • MiniWDL-based WDL implementation can now run the vg Giraffe WDL workflow ( #4353)
  • Toil now tests against our own tiny set of WDL conformance tests (#4351)
  • Toil can run the HPRC assembly WDL workflows (#4435)
  • Toil can now use Mesos roles (#4455)

Breaking Changes

  • Replace "preemptable" with "preemptible", add example of using --defaultPreemptible flag to Preemptibility documentation (#1951)

CWL

  • CWL: run all ExpressionTools on the Leader node, instead of submitting separate jobs (#4157)

Kubernetes

  • Kubernetes batch system: Delete jobs individually when batch delete fails (#3403)
  • Documentation for running a Toil leader for a Kubernetes workflow outside Kubernetes now covers examples and common problems for running CWL workflows (document toil-cwl-runner + "Running the Leader Outside Kubernetes" #3422)
  • Kubernetes batch system: support --maxCores, --maxDisk, and --maxMemory (#2864)
  • Add tutorial for Kubernetes launch cluster (#3743)

Dependencies

  • Require htcondor 10 exactly (#4315)
  • Toil jobs now have a local parameter which determines if they should run on the leader. (#4388)

Misc

  • The offline tests can now be run in parallel (#3493)
  • Code updated to be more idiomatic for Python3.7 (#4295)
  • Support for a --network for toil launch-cluster for Google cloud (#4196)
  • Support for a --use_private_ip for toil launch-cluster to dial nodes by private IP instead of public IP (#4196)
  • GPU scheduling should now be supported on Slurm (#4308)
  • Toil now supports a --batchLogsDir option and TOIL_BATCH_LOGS_DIR environment variable, to provide a directory other than the work dir where Toil will instruct HPC batch systems to save their captured job logs.
  • htcondor batch system should now work again, and will retry connections
  • Updated the --coalesceStatusCalls help documentation to reflect the current state of https://github.com/DataBiosphere/toil/issues/4431 (#4437)
  • Toil no longer trusts XDG_RUNTIME_DIR under Slurm (fixes some of the issues behind #4395 when Slurm is configured not to follow the XDG spec) (#4435)
  • Toil now puts it lock files for Singularity cache directories for WDL in those directories (#4435)
  • Toil's WDL interpreter can now use local-to-the-leader jobs for evaluating WDL code that doesn't need appreciable resources (#4388)
  • Toil now tolerates more possible exceptions related to the panasas network file system (#4440)
  • Type hinting to functions in resource.py (#938)
  • Added return type to inVirtualEnv() in __init__.py (#938)
  • Added None checks to some function bodies (#938)

Bug Fixes

  • Stop crashing when predefined batch job exit reasons are used and need to go into the message bus log file (#4321)
  • Added import subprocess to restore the behavior of #588. (#4429)
  • Toil will no longer use the stored message bus path from an old execution of a workflow when deciding where to save the message bus log when restarting a workflow (#4438)
  • Fix --custom-net mutual exclusivity bug. (#4458)

Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , @jfennick , @misterbrandonwalker , @w-gao , @stephanaime , @glennhickey , @Hexotical , @manabuishii @gmloose , @boukn , and @thiagogenez !

releases/5.9.2

1 year ago

Changelog

Bug Fixes

  • Change build tag import (#4329)

Thank you to our contributors: @adamnovak , @Hexotical !

releases/5.9.0

1 year ago

Changelog

Bug Fixes

  • Fix --provisioner and --metrics together (#4328)
  • Ignore incorrect type hint from boto3, remove json.loads (#4330)
  • Warn about missing --bypass-file-store with in-place update (#4337)
  • Replace prepareHTSubmission with prepareSubmission in HTCondor (#4319)
  • Merge "Google fixes" (#4293)
  • Support (only) current htcondor (#4320)
  • Delete k8s jobs individually when batch delete fails (#4306)

Misc

  • Update aws spot documentation (#4310)
  • Enable parallel testing (#3493)
  • Add documentation for running CWL workflows on non-Toil-managed Kubernetes clusters (#4332)
  • Export all slurm args by default (#4237)
  • Allow for subclasses of base types in messages (#4322)
  • Non cache default (#4299)

Dependencies

  • Bump mypy from 0.982 to 0.991 (#4345)
  • Bump schema-salad>=8.4.20230128170514,<9 to schema-salad>=8.3.20220913105718,<8.4 (#4342) (#4341)
  • Bump cwltool from 3.1.20221008225030 to 3.1.20221201130942 (#4338)
  • Bump pyupgrade to 3.7 (#4295)

Thank you to our contributors: @adamnovak , @Hexotical , @w-gao, @mr-c , @gmloose , @boukn , and @thiagogenez !

releases/5.8.0

1 year ago

Changelog

Highlighted Features Added

  • Toil server now exposes workflow tasks via WES (#4046).
  • Toil server now has a --wes_dialect agc option that will hide any tasks that don't have Amazon Batch job IDs, and put the IDs in the task names for those that do (#4047).
  • Toil jobs now accept an accelerators requirement, like accelerators=1 or accelerators={'kind': 'gpu', 'brand': 'nvidia', 'count': 2} (#4163)
  • Include total requested cores for each job type in toil stats (#4173)
  • Toil jobs now expose job.accelerators to workflow
  • Add prefix suffix params to AbstractFileStore.getLocalTempFile and AbstractFileStore.getLocalTempFileName (#4273)
  • CWL: --no-compute-checksum, --strict-cpu-limit, --disable-validate, and --fast-parser are now available

Breaking Changes

  • Toil's built-in autoscaler now guesses that some memory and disk space on nodes will not actually be available for jobs; pass --assumeZeroOverhead to revert to the old behavior (#2103)

CWL

  • CWL job unit and display names have been changed to make more sense as task names, and management of them has been unified into a CWLNamedJob. (#4046/#4047)
  • CWL CUDARequirement is parsed by cwltool and turned into a requirement for the minimum requested number of nvidia GPU accelerators (#3982)
  • fix false warning when outputSource contains only one None value (#4300)

Kubernetes

  • KubernetesBatchSystem can add nvidia.com/gpu and amd.com/gpu resource requests for jobs that request those accelerators (#4163)
  • KubernetesBatchSystem can request GPUs by model key, if nodes are labeled appropriately (#4163)

Dependencies

Misc

  • Toil WES server now accepts requests that leave out workflow_params. (#4037)
  • The MessageBus has been expanded to use pypubsub, and now has MessageInbox and MessageOutbox objects to represent connections to it. (#4046/#4047)
  • ToilMetrics now rides on the MessageBus rails. (#4046/#4047)
  • Toil workflows now have a --writeMessages option, which takes a file to which a line-oriented stream of MessageBus messages will be written. Reading this file will allow you to recover the current state of the workflow. (#4046/#4047)
  • Add code for warning check to be used when launching cluster with AWS. (#3514)
  • Use a CI prebake image for gitlab testing. (#4185)
  • Toil clusters now have /var/tmp as the default temporary directory, since they often make large temporary files (#4148)
  • Adds basic testing for slurm using a slurm docker cluster by running sample workflows. (#3856)
  • Add message bus documentation (#4239)
  • SingleMachineBatchSystem can schedule nvidia GPU accelerators, limiting the concurrent jobs to no more than there are accelerators to support, and setting CUDA_VISIBLE_DEVICES in the tasks' environments to tell them which nvidia GPU(s) to use. (#4163)
  • AWSBatchBatchSystem can use AWS Batch's GPU resource to provide nvidia GPU accelerators (#4163)
  • Toil jobs no longer need to re-run after their child/followOn/service jobs in order to delete themselves. (#3188)
  • Message bus is now thread safe (#4276)
  • Docker build has been updated with new Aventer Mesos deb URL (fixes #4290)
  • docker binary in the container has been updated to that included in the Ubuntu repos (fixes #4282)
  • Singularity in the appliance has been updated to 3.10 which is >=3.9, for cgroups v2 support.
  • Base Ubuntu container image for the appliance has been updated to 22.04, which has a new enough libc for Debian's Singularity 3.10 debs.
  • Safer type usage checking for systems without boto3 installed
  • Tests are now more runnable post-installation. Temporary paths are not selected based upon the location of the tests themselves. (#4287)

Bug Fixes

  • Only use /var/run/user if XDG tells us we have it in our session. Otherwise we will try other places, including /run/lock/toil. (#4170)
  • toil destroy-cluster: terminate stopped instances when destroying the cluster (#4271)
  • fileJobStore: handle arbitrary os.link errors to work on some filesystems (#2232)

Thank you to our contributors!

releases/5.7.1

1 year ago

Changelog

Highlighted Features Added

AWS Batch Batch System (#3956) AGC Integration (#4039) + More AGC integration (#4067) + AGC megabranch (#4113) Scale TES to be able to run reasonably-sized workflows on Funnel on Kubernetes with the AWS job store (#3927)

CWL

Run CWL conformance tests via WES (#4052) Implement and test CWL loadContents from URLs to fix #4125 (#4126) Add CWL tests under ARM (#4038) Cache results of cwltool version lookup (#4141)

Misc

SGE batch system change to support serial jobs. (#4022) Performance testing for Graviton instances (#4123) Stop waiting on hostpath volumes to exist (#4146) Catch and warn about jobs going away too slowly on FileJobStore (#4149) Add documentation for the type-checking hooks (#4117) Pod murder bot (#4060) Contrib hook scripts (#4105) Allow newer google-cloud-storage (#4114) Use environment variable to set parallel partition name (#4096) Register pytest markers (#4103) Mention --export=ALL for SLURM environments (#4100) (#4102) Allow persisting workflow state in WES server across container recreation (#4082) Change toil kill to use the job store shared file API to find pig.log (#4075) Bring back kill loop in the single_machine batch system but with a timeout (#4070) Reorganize Locking (#4059) Add and test preemptability constraints (#4044) Enhanced types (#3975) Use an init process that reaps zombies on toil clusters (#3974) Add launch cluster support for ARM (#3971) Feat: square bracket to period separator (#4008) Add AGC health check endpoint (#3997) Tolerate and require typed Werkzeug (#4011) Add more static URLs for Singularity debs (#4007)

Bug Fixes

Update WES set up docs (#4027) Add real time logs (#4031) Fail fast if Docker builder is missing (#4001) Make Toil version be reported as a string in WES (#4013) Fix assorted typos within assorted comments (#4023) Make file store case insensitive (#4153) Pre-lex commands for qsub (#4150) Update Cactus and exclude broken networkx (#4107) Make toil kill work when the leader is on another machine (#4084) Wrong filename in output (#4139) Tolerate a missing VersionID key to fix #4129 (#4130) Only import from typing_extensions on old Python where we install it (#4090) Allow missing username and fix Docker build (#4077) Leave more time for concurrency measurement to fix #4012 (#4068) Stop people asking for ARM Mesos clusters to fix #4057 (#4058)

Thank you to our contributors: @mr-c, @adamnovak, @w-gao, @jonathanxu18, @Hexotical, @gmloose, @kannon92, @douglowe, @gcapes, and @pmiddend!

releases/5.6.0

2 years ago

Changelog

Highlighted Features Added

  • Integrate ARM Docker builds to make multi-arch images. (#3802)
  • WES support and server mode. (#3779)
  • TES batch system prototype. (#3821)
  • Support for new resource syntax in PBSPro. (#3048)
  • Toil now looks for lost jobs every minute instead of every hour (#3948)

Breaking Changes

  • Remove --disableCaching's true/false argument. (#3869)

CWL

  • CWL helper jobs: better disk & memory requirements. (#3834)
  • CWL: safer test path generation for post-install testing. (#3818)
  • More detailed names, which show up in job names sent to BatchSystem schedulers. #3941 (was #3893)
  • If you use scatter and collect files, duplicates are correctly dealt with and renamed. (#3968)
  • CWL: at the end of a job, ask cwltool to cleanup (#3965)

Kubernetes

  • Assign Kubernetes jobs an explicit TTL. (#3936)
  • Wait for node creation. (#3934)

Dependencies

  • Enable Dependabot updates. (#3827)
  • Multiple consolidated dependabot updates. (#3851)
  • Update addict requirement from <2.3,>=2.2.1 to >=2.2.1,<2.5. (#3861)
  • Bump cwltool to 3.1.20211107152837. (#3833 #3866 #3909)
  • Bump cwltest from 2.1.20210626101542 to 2.2.20210901154959. (#3848)
  • Bump flake8 from 3.8.4 to 4.0.1. (#3847)
  • Allow more docker-py versions. (#3860)
  • Remove pyyaml dependency. (#3858)

Misc

  • Update cleanup script. (#3937)
  • Remove use of sys.maxsize. (#3824)
  • Spelling fixes. (#3814)
  • Add codeql-analysis for Python. (#3825)
  • Update jobstore function names. (#3809)
  • Move AMI functions to lib. (#3810)
  • Add "make pyupgrade" (py36-plus). (#3805)
  • Type hints. (#3930)
  • Coalesce status calls in slurm. (#3822)
  • Python logging takes format values as *args. (#3852)
  • Change quick test to a 10 minute timeout. (#3843)
  • Add make uninstall to makefile. (#3883)
  • Fix toil kill to find shared pid.log file (with unit test). #3941 (was #3932)
  • Update documentation. (#3947)
  • Remove remains of Travis. (#3976)

Bug Fixes

  • Stop checkpoints from being reissued multiple times. (#3931)
  • Don't consult LSF config when explicitly defining memory units. (#3820)
  • Robustly remove state dirs. (#3836)
  • Fix exception checking for exit_code. (#3830)
  • Use exitStatus instead of exitReason for batch exit type comparison. (#3839)
  • Update cwltest to improve K8 runs. (#3935)
  • Consolidated CI Fixes. (#3887)
  • Fix CWL conformance tests. (#3891)
  • Toil-managed cluster scaling should work again with --metrics. (#3943)

Thank you to our contributors: @mr-c, @adamnovak, @w-gao, @jonathanxu18, @Hexotical, @tmooney, @nikhil, @kannon92, @douglowe, @mhpopescu, @Phhere, and @gmloose!

releases/5.5.0

2 years ago

Changelog

CWL

  • Add podman support; and other fixes from recent cwltool #3799
  • Add streaming feature for cwltoil #3694
  • Warn users if a different cwltool version is installed #3686
  • Turn on all Kubernetes CWL tests that are expected to work on Singularity #3720
  • Fix CWL in toil docs jobstore usage #3728
  • DOC: update versions of CWL support
  • Allow filestore bypass #3652

Misc

  • Numerous Type Hints. #3705 #3701 #3693 #3691 #3663 #3688 #3642 #3684 #3682 #3680 #3666 #3675
  • Single source of truth for job state #3776
  • Do not set default for statePollingWait #3774
  • Use absolute local paths when exportFile/importFile do not detect a schema #3767
  • Multi-zone balancing within regions for AWS autoscaling groups #3746
  • Migrate cloud-config to ignition #3488
  • 🎡 Wheel Of Issues 🎰 #3760
  • Add back addBatchSystemFactory function #3754
  • Redirect stderr to /dev/null of lsf conf queries #3751
  • Set number of cores based on job.cores for OpenMP applications #3739
  • Google jobstore batching #3740
  • Locations of CLI option docs
  • Add AWS provisioner storage system #3727
  • Set cls.bucket #3726
  • Stream vs dowload jobs #3722
  • Update Toil's main python test version to 3.8. #3669
  • Move Travis tests to Gitlab #3675

Bug Fixes

  • Don't leak symlinks #3795
  • Prevent exception from being raised when modifying dir permissions for clean up #3778
  • Fix scontrol output parsing #3793
  • Fix AttributeError #3742
  • Workaround for S3 in us-east-1 #3710
  • Time data format #3708
  • Fix leader.py batch system std files prefix glob #3679

Thank you to our contributors: @mr-c, @adamnovak, @w-gao, @jonathanxu18, @Hexotical, @ionox0, @gmloose, @juanesarango, @mhpopescu, @mberacochea, @nikhil!