Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the WDL compiler. #4679

Merged
merged 10 commits into from
Dec 5, 2023
5 changes: 1 addition & 4 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ stages:
- main_tests
- integration


lint:
rules:
- if: $CI_PIPELINE_SOURCE != "schedule"
Expand All @@ -75,7 +74,6 @@ lint:
- make docs
# - make diff_pydocstyle_report


cwl_dependency_is_stand_alone:
rules:
- if: $CI_PIPELINE_SOURCE != "schedule"
Expand All @@ -85,15 +83,14 @@ cwl_dependency_is_stand_alone:
- ${MAIN_PYTHON_PKG} -m virtualenv venv && . venv/bin/activate && make prepare && make develop extras=[cwl]
- make test threads="${TEST_THREADS}" marker="${MARKER}" tests=src/toil/test/docs/scriptsTest.py::ToilDocumentationTest::testCwlexample


wdl_dependency_is_stand_alone:
rules:
- if: $CI_PIPELINE_SOURCE != "schedule"
stage: linting_and_dependencies
script:
- pwd
- ${MAIN_PYTHON_PKG} -m virtualenv venv && . venv/bin/activate && make prepare && make develop extras=[wdl]
- make test threads="${TEST_THREADS}" marker="${MARKER}" tests=src/toil/test/wdl/toilwdlTest.py::ToilWdlTest::testMD5sum
- make test threads="${TEST_THREADS}" marker="${MARKER}" tests=src/toil/test/wdl/wdltoil_test.py::WDLTests::test_MD5sum

quick_test_offline:
rules:
Expand Down
9 changes: 0 additions & 9 deletions contrib/admin/mypy-with-ignore.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,6 @@ def main():
'src/toil/__init__.py',
'src/toil/deferred.py',
'src/toil/version.py',
'src/toil/wdl/utils.py',
'src/toil/wdl/wdl_synthesis.py',
'src/toil/wdl/wdl_analysis.py',
'src/toil/wdl/wdl_functions.py',
'src/toil/wdl/toilwdl.py',
'src/toil/wdl/versions/draft2.py',
'src/toil/wdl/versions/v1.py',
'src/toil/wdl/versions/dev.py',
'src/toil/provisioners/abstractProvisioner.py',
'src/toil/provisioners/gceProvisioner.py',
'src/toil/provisioners/__init__.py',
Expand Down Expand Up @@ -104,7 +96,6 @@ def ignore(file_path):
if file_path.startswith(prefix):
return True
return False


filtered_files_to_check = []
for file_path in all_files_to_check:
Expand Down
137 changes: 4 additions & 133 deletions docs/running/wdl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ You can run WDL workflows with ``toil-wdl-runner``. Currently,
workflow, and has support for workflows in WDL 1.0 or later (which are required
to declare a ``version``, and which use ``inputs`` and ``outputs`` sections).

.. tip::
The last release of Toil that supported unversioned, ``draft-2`` WDL workflows was `5.12.0`_.

Toil is, for compatible workflows, a drop-in replacement for the `Cromwell`_ WDL runner.
Instead of running a workflow with Cromwell::

Expand All @@ -39,6 +42,7 @@ workflow, you can do::

toil-wdl-runner https://raw.githubusercontent.com/DataBiosphere/toil/36b54c45e8554ded5093bcdd03edb2f6b0d93887/src/toil/test/wdl/miniwdl_self_test/self_test.wdl https://raw.githubusercontent.com/DataBiosphere/toil/36b54c45e8554ded5093bcdd03edb2f6b0d93887/src/toil/test/wdl/miniwdl_self_test/inputs.json

.. _`5.12.0`: https:/DataBiosphere/toil/releases/tag/releases%2F5.12.0
.. _`Cromwell`: https:/broadinstitute/cromwell#readme

Writing WDL with Toil
Expand Down Expand Up @@ -126,137 +130,4 @@ Toil is not yet fully conformant with the WDL specification, but it inherits mos

.. _`MiniWDL`: https:/chanzuckerberg/miniwdl/#miniwdl

Using the Old WDL Compiler
--------------------------

Up through Toil 5.9.2, ``toil-wdl-runner`` worked by compiling the WDL code to
a Toil Python workflow, and executing that. The old compiler is
still available as ``toil-wdl-runner-old``.

The compiler implements:
* Scatter
* Many Built-In Functions
* Docker Calls
* Handles Priority, and Output File Wrangling
* Currently Handles Primitives and Arrays

The compiler DOES NOT implement:
* Robust cloud autoscaling
* WDL files that ``import`` other WDL files (including URI handling for 'http://' and 'https://')

Recommended best practice when running wdl files with ``toil-wdl-runner-old`` is to first use the Broad's wdltool for syntax validation and generating
the needed json input file. Full documentation can be found in the repository_, and a precompiled jar binary can be
downloaded here: wdltool_ (this requires java7_).

.. _repository: https:/broadinstitute/wdltool
.. _wdltool: https:/broadinstitute/wdltool/releases
.. _java7: http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html

That means two steps. First, make sure your wdl file is valid and devoid of syntax errors by running::

java -jar wdltool.jar validate example_wdlfile.wdl

Second, generate a complementary json file if your wdl file needs one. This json will contain keys for every necessary
input that your wdl file needs to run::

java -jar wdltool.jar inputs example_wdlfile.wdl

When this json template is generated, open the file, and fill in values as necessary by hand. WDL files all require
json files to accompany them. If no variable inputs are needed, a json file containing only '{}' may be required.

Once a wdl file is validated and has an appropriate json file, workflows can be compiled and run using::

toil-wdl-runner-old example_wdlfile.wdl example_jsonfile.json

Toil WDL Compiler Options
~~~~~~~~~~~~~~~~~~~~~~~~~
``-o`` or ``--outdir``: Specifies the output folder, and defaults to the current working directory if
not specified by the user.

``--dev_mode``: Creates "AST.out", which holds a printed AST of the wdl file and "mappings.out", which holds the
printed task, workflow, csv, and tsv dictionaries generated by the parser. Also saves the compiled toil python workflow
file for debugging.

Any number of arbitrary options may also be specified. These options will not be parsed immediately, but passed down
as toil options once the wdl/json files are processed. For valid toil options, see the documentation:
http://toil.readthedocs.io/en/latest/running/cliOptions.html

Compiler Example: ENCODE Example from ENCODE-DCC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For this example, we will run a WDL draft-2 workflow. This version is too old
to be supported by ``toil-wdl-runner``, so we will need to use
``toil-wdl-runner-old``.

To follow this example, you will need docker installed. The original workflow can be found here:
https:/ENCODE-DCC/pipeline-container

We've included the wdl file and data files in the toil repository needed to run this example. First, download
the example code_ and unzip. The file needed is "testENCODE/encode_mapping_workflow.wdl".

Next, use wdltool_ (this requires java7_) to validate this file::

java -jar wdltool.jar validate encode_mapping_workflow.wdl

Next, use wdltool to generate a json file for this wdl file::

java -jar wdltool.jar inputs encode_mapping_workflow.wdl

This json file once opened should look like this::

{
"encode_mapping_workflow.fastqs": "Array[File]",
"encode_mapping_workflow.trimming_parameter": "String",
"encode_mapping_workflow.reference": "File"
}

You will need to edit this file to replace the types (like ``Array[File]``) with values of those types.

The trimming_parameter should be set to 'native'.

For the file parameters, download the example data_ and unzip. Inside are two data files required for the run::

ENCODE_data/reference/GRCh38_chr21_bwa.tar.gz
ENCODE_data/ENCFF000VOL_chr21.fq.gz

Editing the json to include these as inputs, the json should now look something like this::

{
"encode_mapping_workflow.fastqs": ["/path/to/unzipped/ENCODE_data/ENCFF000VOL_chr21.fq.gz"],
"encode_mapping_workflow.trimming_parameter": "native",
"encode_mapping_workflow.reference": "/path/to/unzipped/ENCODE_data/reference/GRCh38_chr21_bwa.tar.gz"
}

The wdl and json files can now be run using the command::

toil-wdl-runner-old encode_mapping_workflow.wdl encode_mapping_workflow.json

This should deposit the output files in the user's current working directory (to change this, specify a new directory
with the ``-o`` option).

.. _code: https://toil-datasets.s3.amazonaws.com/wdl_templates.zip
.. _data: https://toil-datasets.s3.amazonaws.com/ENCODE_data.zip

Compiler Example: GATK Examples from the Broad
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Terra hosts some example documentation for using early, pre-1.0 versions of WDL, originally authored by the Broad:
https://support.terra.bio/hc/en-us/sections/360007347652?name=wdl-tutorials

One can follow along with these tutorials, write their own old-style WDL files following the directions and run them using either
Cromwell or Toil's old WDL compiler. For example, in tutorial 1, if you've followed along and named your wdl file 'helloHaplotypeCall.wdl',
then once you've validated your wdl file using wdltool_ (this requires java7_) using::

java -jar wdltool.jar validate helloHaplotypeCaller.wdl

and generated a ``json`` file (and subsequently typed in appropriate file paths and variables) using::

java -jar wdltool.jar inputs helloHaplotypeCaller.wdl

.. note::
Absolute filepath inputs are recommended for local testing with the Toil WDL compiler.

then the WDL script can be compiled and run using::

toil-wdl-runner-old helloHaplotypeCaller.wdl helloHaplotypeCaller_inputs.json


1 change: 0 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,6 @@ def run_setup():
'cwltoil = toil.cwl.cwltoil:cwltoil_was_removed [cwl]',
'toil-cwl-runner = toil.cwl.cwltoil:main [cwl]',
'toil-wdl-runner = toil.wdl.wdltoil:main [wdl]',
'toil-wdl-runner-old = toil.wdl.toilwdl:main [wdl]',
'toil-wes-cwl-runner = toil.server.cli.wes_cwl_runner:main [server]',
'_toil_mesos_executor = toil.batchSystems.mesos.executor:main [mesos]',
'_toil_contained_executor = toil.batchSystems.contained_executor:executor']})
Expand Down
2 changes: 1 addition & 1 deletion src/toil/server/cli/wes_cwl_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def parse_params(self, workflow_params_file: str) -> Dict[str, Any]:

:param workflow_params_file: The URL or path to the CWL input file.
"""
loader = schema_salad.ref_resolver.Loader(
loader = schema_salad.ref_resolver.Loader( # type:ignore
{"location": {"@type": "@id"}, "path": {"@type": "@id"}}
)

Expand Down
49 changes: 17 additions & 32 deletions src/toil/test/utils/toilDebugTest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
"""A set of test cases for toilwdl.py"""
# Copyright (C) 2015-2021 Regents of the University of California
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand All @@ -15,7 +14,7 @@
import logging
import os
import subprocess
from pathlib import Path
import tempfile

import pytest

Expand All @@ -26,36 +25,34 @@
logger = logging.getLogger(__name__)


@pytest.fixture
def workflow_debug_jobstore(tmp_path: Path) -> str:
jobStorePath = str(tmp_path / "toilWorkflowRun")
def workflow_debug_jobstore() -> str:
job_store_path = os.path.join(tempfile.mkdtemp(), "toilWorkflowRun")
subprocess.check_call(
[
python,
os.path.abspath("src/toil/test/utils/ABCWorkflowDebug/debugWorkflow.py"),
jobStorePath,
job_store_path,
]
)
return jobStorePath
return job_store_path


@slow
def testJobStoreContents(workflow_debug_jobstore: str):
def testJobStoreContents():
"""
Test toilDebugFile.printContentsOfJobStore().

Runs a workflow that imports 'B.txt' and 'mkFile.py' into the
jobStore. 'A.txt', 'C.txt', 'ABC.txt' are then created. This checks to
make sure these contents are found in the jobStore and printed.
"""
jobStoreDir = workflow_debug_jobstore
contents = ["A.txt", "B.txt", "C.txt", "ABC.txt", "mkFile.py"]

subprocess.check_call(
[
python,
os.path.abspath("src/toil/utils/toilDebugFile.py"),
jobStoreDir,
workflow_debug_jobstore(),
"--logDebug",
"--listFilesInJobStore=True",
]
Expand All @@ -78,7 +75,7 @@ def testJobStoreContents(workflow_debug_jobstore: str):
os.remove(jobstoreFileContents)


def fetchFiles(symLink, jobStoreDir: str, outputDir):
def fetchFiles(symLink: bool, jobStoreDir: str, outputDir: str):
"""
Fn for testFetchJobStoreFiles() and testFetchJobStoreFilesWSymlinks().

Expand All @@ -99,8 +96,8 @@ def fetchFiles(symLink, jobStoreDir: str, outputDir):
"*C.txt",
"*ABC.txt",
"*mkFile.py",
"--localFilePath=" + outputDir,
"--useSymlinks=" + str(symLink),
f"--localFilePath={outputDir}",
f"--useSymlinks={symLink}",
]
print(cmd)
subprocess.check_call(cmd)
Expand All @@ -114,22 +111,10 @@ def fetchFiles(symLink, jobStoreDir: str, outputDir):


# expected run time = 4s
def testFetchJobStoreFiles(tmp_path: Path, workflow_debug_jobstore: str) -> None:
"""Test toilDebugFile.fetchJobStoreFiles() without using symlinks."""
outputDir = tmp_path / "testoutput"
outputDir.mkdir()
fetchFiles(
symLink=False, jobStoreDir=workflow_debug_jobstore, outputDir=str(outputDir)
)


# expected run time = 4s
def testFetchJobStoreFilesWSymlinks(
tmp_path: Path, workflow_debug_jobstore: str
) -> None:
"""Test toilDebugFile.fetchJobStoreFiles() using symlinks."""
outputDir = tmp_path / "testoutput"
outputDir.mkdir()
fetchFiles(
symLink=True, jobStoreDir=workflow_debug_jobstore, outputDir=str(outputDir)
)
def testFetchJobStoreFiles() -> None:
"""Test toilDebugFile.fetchJobStoreFiles() symlinks."""
job_store_dir = workflow_debug_jobstore()
output_dir = os.path.join(os.path.dirname(job_store_dir), "testoutput")
os.makedirs(output_dir, exist_ok=True)
for symlink in (True, False):
fetchFiles(symLink=symlink, jobStoreDir=job_store_dir, outputDir=output_dir)
Loading