Skip to content

Releases: DataBiosphere/toil

7.0.0

21 May 22:25
Compare
Choose a tag to compare

What's Changed

6.1.0

08 May 18:55
3f9cba3
Compare
Choose a tag to compare

Highlighted Features Added

  • WDL and CWL task standard output and standard error logs that are not captured by the workflow will now be logged at INFO level and stored in the --writeLogs/--writeLogsGzip directory. (#4657)
  • Use a default log limit of 100MiB (#4788)

Breaking Changes

  • Stats and logging system again uses job display name (#4755)
  • --disableProgress is once again a flag that doesn't take an argument (#4758)

CWL

  • Don't clear out user-provided values for the --default-container option (#4730)

WDL

  • WDL job names now include numbers for scatters (#4755)
  • Multi-line WDL placeholder substitutions no longer interfere with de-indenting WDL command blocks (chanzuckerberg/miniwdl#665)
  • Standard error for failed tasks is now always logged to the worker log somewhere (#4781)

Kubernetes

Dependencies

  • Deps: removed the ruaml.yaml.string plugin dependency for a simpler solution (#4760)

Misc

  • Toil will no longer warn about a missing XDG_RUNTIME_DIR (#4769)
  • Read the Docs and CI docs builds should have Graphviz installed (pending CI image rebuild) (#4734)
  • Add more Python3.12 compatibility by replacing the one function from distutils that we use, strtobool(). (#4765)
  • Set default cache folders to be accessible between toil-wdl-runner workflows (Same as MiniWDL/Singularity defaults) (#4761)
  • Set toil-wdl-runner cache folders on Toil managed clusters to be at /var/lib/toil (#4761)
  • Fall back to assuming machine has 1 core when CPU count is unavailable. (#4545)
  • FileJobStore now supports filenames that get modified when percent-encoded (#4779)

Thank you to our contributors:

@DailyDreaming @mr-c @stxue1 @adamnovak @app/dependabot

Full Changelog: releases/6.0.0...releases/6.1.0

6.0.0

16 Jan 19:40
Compare
Choose a tag to compare

NOTE!

We now have a config file! https://toil.readthedocs.io/en/latest/running/cliOptions.html#the-config-file

Breaking Changes

  • Removed the parasol batch system
  • Removed the TES batch system (this is now a plugin)
  • Removed our WDL compiler in favor of an interpreter (we still support WDL, we just do it differently now)
  • We no longer support python3.7

CWL

  • Support CWL 1.2.1 (#4682)
  • CWL Pipefish compatibility (#4636)
  • Support per-task preemptibility in CWL (#4551)
  • Fix configargparse in CWL (#4618)
  • cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565)
  • Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639)
  • Implement a better config file system for CWL/WDL options (#4666)
  • Allow working with remote files in CWL and WDL workflows (#4690)
  • Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725)
  • Log more usefully for CWL workflows (#4736)

WDL

  • Simplify WDL Toil job graphs (#4524)
  • More WDL and Slurm documentation (#4558)
  • Improve WDL documentation (#4732)
  • Add String to File functionality into toil-wdl-runner (#4589)
  • Run WDL output through Toil export system to support URIs (#4579)
  • Allow the WDL output section to reference itself (#4592)
  • Ensure sibling files in toil-wdl-runner (#4610)
  • Make WDLOutputJob collect all task outputs (#4602)
  • Report errors in WDL using MiniWDL's error location printer (#4637)
  • Remove the WDL compiler. (#4679)
  • Implement a better config file system for CWL/WDL options (#4666)
  • Allow working with remote files in CWL and WDL workflows (#4690)
  • Strip leading whitespace from WDL commands (#4720)

Misc

  • Add config file support (#4569)
  • Support Python3.11 and drop Python 3.7 (#4646)
  • Move TES batch system to a plugin (#4650)
  • Turn batch system tests back on (#4649)
  • Separate out integration tests to run on a schedule (#4612)
  • Avoid concurrent modification in cluster scaler tests (#4600)
  • Remove old buckets from AWS (#4588)
  • Tests: only request a single core (#4572)
  • Reduce the number of assert statements (#4590)
  • take any nvidia-smi exception as not having gpu (#4611)
  • More resiliancy (#4395)
  • Remove useage of the deprecated pkg_resources (#4701)
  • Make sure cwltool always knows we have an outdir to fix #4698 (#4699)
  • AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700)
  • Only count output file usage when using the file store (#4692)
  • Remove the parasol batch system. (#4678)
  • Move around reqs and move aws dev libraries to aws (#4664)
  • Make sure the --batchLogsDir exists if it is set (#4635)
  • Update EC2 instances and EC2 update script. (#4745)
  • remove extraneous dependency on old 'mock' (#4739)
  • Point CI at the new public URLs for stuff we host
  • Add init.py to options folder (#4723)

Bug Fixes

  • Lower redirect log level to fix #4526 (#4578)
  • Fix mypy from being broken by new boto types (#4577)
  • Fix CI on local Gitlab runners (#4571)
  • Banish ghost jobs (#4563)
  • Stop deleting chained-to jobs which fail as orphaned jobs (#4557)
  • Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575)
  • Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656)
  • Fix CI Appliance Builds (#4655)
  • Tolerate a failed AMI polling attempt (#4727)* Add pure Python fallback for getDirSizeRecursively() (#4753)
  • Don't mark inputs (or outputs) executable for no reason (#4728)
  • Fix scheduled CI tests (#4742)
  • Fix --printJobInfo (#4709)

Thank you to our contributors: @stxue1 , @w-gao, @DailyDreaming , @mr-c , @adamnovak , @glennhickey, @misterbrandonwalker, and @a-detiste !

5.12.0

27 Jul 03:19
6d5a5b8
Compare
Choose a tag to compare

WDL

  • Virtualize filenames as in-container paths from point of view of WDL command (#4527)
  • Add WDL conformance tests to CI (#4530)
  • Use less memory in the Giraffe WDL test (#4541)

Version Upgrades

  • Upgrade to cwltool 3.1.20230601100705 (#4500)
  • Update mock requirement from <5,>=4.0.3 to >=4.0.3,<6 (#4366)

Misc

  • Anonymous access to Google Storage (#4518)
  • Reorder config so that default settings are applied first (#4528)
  • Add a way to forward accelerators to Docker containers (#4492)

Bug Fixes

  • Fix test failures without docker installed (#4544)
  • Prevent certain tests from being run twice in CI (#4529)
  • Drop external Docker builder (#4523)
  • Fix CI lint test (#4533)
  • Grab AWS group policies on top of user (#4505)
  • Grab accelerator set off the end of the list instead of by index (#4506)
  • Fix RtD build (#4491)
  • Include tests (#4499)

Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , and @tjni !

5.11.0

15 Jun 15:17
Compare
Choose a tag to compare

Breaking Changes

  • Imported files will be symlinked by default, unless the user sets --noLinkImports or the workflow imports with symlink=False. (#3949)

WDL

  • Toil will now stop if it encounters an error polling a possible import URL for a WDL workflow input file. (#4479)
  • WDL workflows will be protected against imported files with no basenames. (#4477)

Misc

  • Toil batch system ID numbers for issued jobs now start at 1. (#4482)
  • Attempts to import files from URLs when the implementing job store is missing an extra are now better reported. (#4479)
  • Include tests in the source distribution that gets published to PyPI (#4499)

Bug Fixes

  • Toil should no longer crash when a delete wins a race against a load in FileJobStore (#4484)
  • Prevent local root jobs (such as WDLRootJob) from being run twice. (#4482)
  • Slurm and other grid batch system jobs will now have more informative names (#4472)
  • WDL workflows can no longer import "" as a File. (#4477)

Thank you to our contributors: @stxue1, @DailyDreaming, @mr-c, @adamnovak

5.10.0

18 May 09:03
21422a3
Compare
Choose a tag to compare

Changelog

Highlighted Features Added

  • Add a --caching option which explicitly states whether to use caching with a workflow. Uses a default value depending on whether or not we are using the file job store if not specified. (#4218)
  • New prototype WDL runner python -m toil.wdl.wdltoil using MiniWDL (#3468)
  • MiniWDL-based WDL implementation can now run the vg Giraffe WDL workflow ( #4353)
  • Toil now tests against our own tiny set of WDL conformance tests (#4351)
  • Toil can run the HPRC assembly WDL workflows (#4435)
  • Toil can now use Mesos roles (#4455)

Breaking Changes

  • Replace "preemptable" with "preemptible", add example of using --defaultPreemptible flag to Preemptibility documentation (#1951)

CWL

  • CWL: run all ExpressionTools on the Leader node, instead of submitting separate jobs (#4157)

Kubernetes

  • Kubernetes batch system: Delete jobs individually when batch delete fails (#3403)
  • Documentation for running a Toil leader for a Kubernetes workflow outside Kubernetes now covers examples and common problems for running CWL workflows (document toil-cwl-runner + "Running the Leader Outside Kubernetes" #3422)
  • Kubernetes batch system: support --maxCores, --maxDisk, and --maxMemory (#2864)
  • Add tutorial for Kubernetes launch cluster (#3743)

Dependencies

  • Require htcondor 10 exactly (#4315)
  • Toil jobs now have a local parameter which determines if they should run on the leader. (#4388)

Misc

  • The offline tests can now be run in parallel (#3493)
  • Code updated to be more idiomatic for Python3.7 (#4295)
  • Support for a --network for toil launch-cluster for Google cloud (#4196)
  • Support for a --use_private_ip for toil launch-cluster to dial nodes by private IP instead of public IP (#4196)
  • GPU scheduling should now be supported on Slurm (#4308)
  • Toil now supports a --batchLogsDir option and TOIL_BATCH_LOGS_DIR environment variable, to provide a directory other than the work dir where Toil will instruct HPC batch systems to save their captured job logs.
  • htcondor batch system should now work again, and will retry connections
  • Updated the --coalesceStatusCalls help documentation to reflect the current state of #4431 (#4437)
  • Toil no longer trusts XDG_RUNTIME_DIR under Slurm (fixes some of the issues behind #4395 when Slurm is configured not to follow the XDG spec) (#4435)
  • Toil now puts it lock files for Singularity cache directories for WDL in those directories (#4435)
  • Toil's WDL interpreter can now use local-to-the-leader jobs for evaluating WDL code that doesn't need appreciable resources (#4388)
  • Toil now tolerates more possible exceptions related to the panasas network file system (#4440)
  • Type hinting to functions in resource.py (#938)
  • Added return type to inVirtualEnv() in __init__.py (#938)
  • Added None checks to some function bodies (#938)

Bug Fixes

  • Stop crashing when predefined batch job exit reasons are used and need to go into the message bus log file (#4321)
  • Added import subprocess to restore the behavior of #588. (#4429)
  • Toil will no longer use the stored message bus path from an old execution of a workflow when deciding where to save the message bus log when restarting a workflow (#4438)
  • Fix --custom-net mutual exclusivity bug. (#4458)

Thank you to our contributors: @stxue1 , @DailyDreaming , @mr-c , @adamnovak , @jfennick , @misterbrandonwalker , @w-gao , @stephanaime , @glennhickey , @Hexotical , @manabuishii @gmloose , @boukn , and @thiagogenez !

5.9.2

04 Feb 05:38
Compare
Choose a tag to compare

Changelog

Bug Fixes

  • Change build tag import (#4329)

Thank you to our contributors: @adamnovak , @Hexotical !

5.9.0

03 Feb 06:04
8155e0a
Compare
Choose a tag to compare

Changelog

Bug Fixes

  • Fix --provisioner and --metrics together (#4328)
  • Ignore incorrect type hint from boto3, remove json.loads (#4330)
  • Warn about missing --bypass-file-store with in-place update (#4337)
  • Replace prepareHTSubmission with prepareSubmission in HTCondor (#4319)
  • Merge "Google fixes" (#4293)
  • Support (only) current htcondor (#4320)
  • Delete k8s jobs individually when batch delete fails (#4306)

Misc

  • Update aws spot documentation (#4310)
  • Enable parallel testing (#3493)
  • Add documentation for running CWL workflows on non-Toil-managed Kubernetes clusters (#4332)
  • Export all slurm args by default (#4237)
  • Allow for subclasses of base types in messages (#4322)
  • Non cache default (#4299)

Dependencies

  • Bump mypy from 0.982 to 0.991 (#4345)
  • Bump schema-salad>=8.4.20230128170514,<9 to schema-salad>=8.3.20220913105718,<8.4 (#4342) (#4341)
  • Bump cwltool from 3.1.20221008225030 to 3.1.20221201130942 (#4338)
  • Bump pyupgrade to 3.7 (#4295)

Thank you to our contributors: @adamnovak , @Hexotical , @w-gao, @mr-c , @gmloose , @boukn , and @thiagogenez !

5.8.0

04 Jan 23:01
79792b7
Compare
Choose a tag to compare

Changelog

Highlighted Features Added

  • Toil server now exposes workflow tasks via WES (#4046).
  • Toil server now has a --wes_dialect agc option that will hide any tasks that don't have Amazon Batch job IDs, and put the IDs in the task names for those that do (#4047).
  • Toil jobs now accept an accelerators requirement, like accelerators=1 or accelerators={'kind': 'gpu', 'brand': 'nvidia', 'count': 2} (#4163)
  • Include total requested cores for each job type in toil stats (#4173)
  • Toil jobs now expose job.accelerators to workflow
  • Add prefix suffix params to AbstractFileStore.getLocalTempFile and AbstractFileStore.getLocalTempFileName (#4273)
  • CWL: --no-compute-checksum, --strict-cpu-limit, --disable-validate, and --fast-parser are now available

Breaking Changes

  • Toil's built-in autoscaler now guesses that some memory and disk space on nodes will not actually be available for jobs; pass --assumeZeroOverhead to revert to the old behavior (#2103)

CWL

  • CWL job unit and display names have been changed to make more sense as task names, and management of them has been unified into a CWLNamedJob. (#4046/#4047)
  • CWL CUDARequirement is parsed by cwltool and turned into a requirement for the minimum requested number of nvidia GPU accelerators (#3982)
  • fix false warning when outputSource contains only one None value (#4300)

Kubernetes

  • KubernetesBatchSystem can add nvidia.com/gpu and amd.com/gpu resource requests for jobs that request those accelerators (#4163)
  • KubernetesBatchSystem can request GPUs by model key, if nodes are labeled appropriately (#4163)

Dependencies

Misc

  • Toil WES server now accepts requests that leave out workflow_params. (#4037)
  • The MessageBus has been expanded to use pypubsub, and now has MessageInbox and MessageOutbox objects to represent connections to it. (#4046/#4047)
  • ToilMetrics now rides on the MessageBus rails. (#4046/#4047)
  • Toil workflows now have a --writeMessages option, which takes a file to which a line-oriented stream of MessageBus messages will be written. Reading this file will allow you to recover the current state of the workflow. (#4046/#4047)
  • Add code for warning check to be used when launching cluster with AWS. (#3514)
  • Use a CI prebake image for gitlab testing. (#4185)
  • Toil clusters now have /var/tmp as the default temporary directory, since they often make large temporary files (#4148)
  • Adds basic testing for slurm using a slurm docker cluster by running sample workflows. (#3856)
  • Add message bus documentation (#4239)
  • SingleMachineBatchSystem can schedule nvidia GPU accelerators, limiting the concurrent jobs to no more than there are accelerators to support, and setting CUDA_VISIBLE_DEVICES in the tasks' environments to tell them which nvidia GPU(s) to use. (#4163)
  • AWSBatchBatchSystem can use AWS Batch's GPU resource to provide nvidia GPU accelerators (#4163)
  • Toil jobs no longer need to re-run after their child/followOn/service jobs in order to delete themselves. (#3188)
  • Message bus is now thread safe (#4276)
  • Docker build has been updated with new Aventer Mesos deb URL (fixes #4290)
  • docker binary in the container has been updated to that included in the Ubuntu repos (fixes #4282)
  • Singularity in the appliance has been updated to 3.10 which is >=3.9, for cgroups v2 support.
  • Base Ubuntu container image for the appliance has been updated to 22.04, which has a new enough libc for Debian's Singularity 3.10 debs.
  • Safer type usage checking for systems without boto3 installed
  • Tests are now more runnable post-installation. Temporary paths are not selected based upon the location of the tests themselves. (#4287)

Bug Fixes

  • Only use /var/run/user if XDG tells us we have it in our session. Otherwise we will try other places, including /run/lock/toil. (#4170)
  • toil destroy-cluster: terminate stopped instances when destroying the cluster (#4271)
  • fileJobStore: handle arbitrary os.link errors to work on some filesystems (#2232)

Thank you to our contributors!

5.7.1

19 Jul 05:19
b5cae96
Compare
Choose a tag to compare

Changelog

Highlighted Features Added

AWS Batch Batch System (#3956)
AGC Integration (#4039) + More AGC integration (#4067) + AGC megabranch (#4113)
Scale TES to be able to run reasonably-sized workflows on Funnel on Kubernetes with the AWS job store (#3927)

CWL

Run CWL conformance tests via WES (#4052)
Implement and test CWL loadContents from URLs to fix #4125 (#4126)
Add CWL tests under ARM (#4038)
Cache results of cwltool version lookup (#4141)

Misc

SGE batch system change to support serial jobs. (#4022)
Performance testing for Graviton instances (#4123)
Stop waiting on hostpath volumes to exist (#4146)
Catch and warn about jobs going away too slowly on FileJobStore (#4149)
Add documentation for the type-checking hooks (#4117)
Pod murder bot (#4060)
Contrib hook scripts (#4105)
Allow newer google-cloud-storage (#4114)
Use environment variable to set parallel partition name (#4096)
Register pytest markers (#4103)
Mention --export=ALL for SLURM environments (#4100) (#4102)
Allow persisting workflow state in WES server across container recreation (#4082)
Change toil kill to use the job store shared file API to find pig.log (#4075)
Bring back kill loop in the single_machine batch system but with a timeout (#4070)
Reorganize Locking (#4059)
Add and test preemptability constraints (#4044)
Enhanced types (#3975)
Use an init process that reaps zombies on toil clusters (#3974)
Add launch cluster support for ARM (#3971)
Feat: square bracket to period separator (#4008)
Add AGC health check endpoint (#3997)
Tolerate and require typed Werkzeug (#4011)
Add more static URLs for Singularity debs (#4007)

Bug Fixes

Update WES set up docs (#4027)
Add real time logs (#4031)
Fail fast if Docker builder is missing (#4001)
Make Toil version be reported as a string in WES (#4013)
Fix assorted typos within assorted comments (#4023)
Make file store case insensitive (#4153)
Pre-lex commands for qsub (#4150)
Update Cactus and exclude broken networkx (#4107)
Make toil kill work when the leader is on another machine (#4084)
Wrong filename in output (#4139)
Tolerate a missing VersionID key to fix #4129 (#4130)
Only import from typing_extensions on old Python where we install it (#4090)
Allow missing username and fix Docker build (#4077)
Leave more time for concurrency measurement to fix #4012 (#4068)
Stop people asking for ARM Mesos clusters to fix #4057 (#4058)

Thank you to our contributors: @mr-c, @adamnovak, @w-gao, @jonathanxu18, @Hexotical, @gmloose, @kannon92, @douglowe, @gcapes, and @pmiddend!