Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Attempt to make the GATK tasks a) auto-detect GATK version and b) swi… #365

Merged
merged 5 commits into from
Aug 22, 2019

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Aug 22, 2019

…tch command lines accordingly.

@nh13 This isn't quite done yet - firstly I haven't tested any of this and secondly I've only gone through HC and GenotypeGVCF tasks so far to adapt for 4+. That said, I'd like feedback on the approach I've taken before I go much further.

@tfenne tfenne requested a review from nh13 August 22, 2019 15:10
@tfenne
Copy link
Member Author

tfenne commented Aug 22, 2019

@clintval You might be interested in this given your recent set of tweets. This isn't as elegant as it could be, but is an attempt to handle GATK 3.x vs. 4.x with a single set of task classes.

@clintval
Copy link
Member

I support this PR ❤️

@clintval
Copy link
Member

clintval commented Aug 22, 2019

I spent some time thinking about how I would solve this problem in a typed workflow DSL like Dagr. I originally found inspiration in a little library that helps with data class migrations. I dabbled with this problem in Snakemake but I never arrived on a working solution. I'm certain there is an elegant solution out there that would empower the user!

I'm thinking a first-class feature upon ProcessTask that would enable:

Backward-in-time Support

Dagr maintains a compatibility interface with older versions of the executable (similar to the aim of this PR).

Forward-in-time Support

Client written compatibility that provides a way to migrate an existing ProcessTask to a new version of the executable. Think overrides, deprecations, or translation of arguments. We would need new language features to achieve this. A concise implementation that reduces boilerplate could be tricky to conceive! If these client created migrations work well, they can be pulled into Dagr source as time marches on, bettering support for all life-time versions of an executable.


My free 2¢ idea

two-cents

Thanks for listening!

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I really like this pattern. I think there's going to be a lot of if/else if/...` logic if we follow this pattern more generally for tasks. Let's discuss this offline.

val jar = gatkJar
if (jar == GatkTask.GatkJarPath) GatkTask.GatkMajorVersion
else GatkTask.majorVersion(jar)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about this being something to bake into JarTask. We could also have a trait too, since not everything is a JAR.

protected def addWalkerArgs(buffer: ListBuffer[Any]): Unit

/** Helper function to select an argument name for pre vs. post V4 naming. */
protected def either(preV4Name: String, postV4Name: String): String = if (gatkMajorVersion < 4) preV4Name else postV4Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems brittle if ever the GATK API changes. And also brittle if this becomes a pattern we adopt for other tools who change their command line API more frequently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed ... I think I might just go back to having two separate argument generation sections for 3 vs 4 for now.

@@ -50,9 +50,13 @@ class GenotypeGvcfs private (ref: PathToFasta,
val dbSnpVcf: Option[PathToVcf] = None)
extends GatkTask("GenotypeGVCFs", ref, intervals=intervals) {

if (gatkMajorVersion >= 4 && gvcfs.length > 1) throw new IllegalStateException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require?

@nh13 nh13 assigned tfenne and unassigned nh13 Aug 22, 2019
@tfenne tfenne force-pushed the tf_gatk_multi_version_support branch from f567c0a to 1bd30ce Compare August 22, 2019 21:40
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor quibbles, fix/resolve and approved.

case n if n >= 4 =>
// Args for version 4
buffer.append("-I", bam.toAbsolutePath)
buffer.append("-O", vcf.toAbsolutePath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GenotypeGvcfs you have a section before this block that adds common arguments:

// Args that are common to all versions

I'd recommend moving common args similarly, or not have common args in GenotypeGvcfs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost did it, but in fact there's only 1 common arg, and it felt not worth it. Note that output is -o vs.-O. But I can move the -I argument out.

case n if n < 4 =>
buffer.append("--out", vcf.toAbsolutePath.toString)
case n if n >= 4 =>
buffer.append("--output", vcf.toAbsolutePath.toString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need .toString here and elsewhere since we call it later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why there were .toStrings there in the first place. Removed.

@@ -51,6 +51,8 @@ class Mutect1(val tumorBam: PathToBam,
) extends GatkTask(walker="MuTect", ref=ref, intervals=Some(intervals)) {

override protected def addWalkerArgs(buffer: ListBuffer[Any]): Unit = {
require(gatkMajorVersion < 4, s"Mutect v1 is not supported in GATK v${gatkMajorVersion}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This require is in the addWalkerArgs method, but in IndelRealigner and RealignerTargetCreator it's in the constructor. The former will check when we run the task, while the latter when we build the task. Up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll move this one up because failing earlier seems better.

@codecov-io
Copy link

codecov-io commented Aug 22, 2019

Codecov Report

Merging #365 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #365   +/-   ##
=======================================
  Coverage   91.95%   91.95%           
=======================================
  Files          31       31           
  Lines        1156     1156           
  Branches       65       65           
=======================================
  Hits         1063     1063           
  Misses         93       93

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d84ae1...6503f2c. Read the comment docs.

@tfenne tfenne merged commit 2b0bab7 into master Aug 22, 2019
@tfenne tfenne deleted the tf_gatk_multi_version_support branch August 23, 2019 13:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants