Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added handling for indels within CollectHsMetrics and CollectTargetedPcrMetrics. #1273

Merged
merged 6 commits into from
Feb 1, 2019

Conversation

tfenne
Copy link
Collaborator

@tfenne tfenne commented Jan 23, 2019

Description

When the INCLUDE_INDELS parameter is set to true:

  1. Deletions within reads are treated as covering the reference bases if the preceding base is high quality
  2. Inserted bases are counted as "on-target" but don't contribute coverage to any target base(s)

There are two motivations for this change:

  1. I want to be able to use CollectHsMetrics' per-base coverage output to assess how well an assay is covering target bases. If the sample being sequenced has one or more short deletions within the targets this results in a precipitous drop in target coverage. However, I'd argue that those bases in the reference/target are covered, and if covered sufficiently will allow for a deletion to be called in variant calling.
  2. When the het sensitivity code was introduced a bug was introduced that would lead to reads being double-counted in some places if a read overlapped more than one target. E.g. I would expect that, other than for rounding issues, PCT_USABLE_BASES_ON_TARGET + PCT_EXC_* to add up to 1.0. However I would frequently see this adding up to substantially over 1.0.

I've done my best to re-organize the code to make it easier to follow and maintain in future.

Checklist

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

…PcrMetrics.

When the INCLUDE_INDELS parameter is set to true:
  1. Deletions within reads are treated as covering the reference bases if the preceding base is high quality
  2. Inserted bases are counted as "on-target" but don't contribute coverage to any target base(s)
@tfenne tfenne requested a review from yfarjoun January 23, 2019 22:27
@tfenne
Copy link
Collaborator Author

tfenne commented Jan 23, 2019

@yfarjoun I'm not really done with this PR, I plan to add some additional tests, but wanted to get this up for you, and anyone else who is interested, to look at and see if you agree with direction. Thanks!

@coveralls
Copy link

coveralls commented Jan 23, 2019

Coverage Status

Coverage increased (+0.01%) to 81.493% when pulling 283341e on tfenne:tf_hs_metrics_with_indels into 6ad57dc on broadinstitute:master.

Copy link
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. thanks for the tests. I have a few comments but nothing major.

isOnTarget = true;
// Then go through the per-target/per-base hq and unfiltered coverage
// The cutoff of >= 2 is because even the unfilteredCoverage doesn't want Q1 or Q0 bases!
if (qual > 2 && incrementPerTargetCoverage && onTarget) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (qual > 2 && incrementPerTargetCoverage && onTarget) {
if (qual >= 2 && incrementPerTargetCoverage && onTarget) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the comment, the code is actually correct here - the het sensitivity code only wants bases > q2, not >= q2.

if (qual > 2 && incrementPerTargetCoverage && onTarget) {
for (final Interval target : targets) {
if (refPos >= target.getStart() && refPos <= target.getEnd()) {
final int targetOffset = refPos - target.getStart();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor into overlapsTarget and use below in 619 as well?

 if (refPos >= target.getStart() && refPos <= target.getEnd()) {


// Now run CollectHsMetrics four times
final File out = Files.createTempFile("hsmetrics.", ".txt").toFile();
runPicardCommandLine(Arrays.asList("INCLUDE_INDELS=false", "SAMPLE_SIZE=0", "TI="+ts.getPath(), "BI="+bs.getPath(), "O="+out.getPath(), "I="+withDelBam.getAbsolutePath()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be more readable with a data-provider...no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find the whole TestNG data provider stuff makes tests harder to read, which is why I avoid it.

}

// Write things out to file
final File dir = IOUtil.createTempDir("hsmetrics.", ".test");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to delete the directory yourself.

@@ -548,63 +550,79 @@ public void acceptRecord(final SAMRecord record) {
rec = record;
}

// Find the target overlaps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are here, in line 331 above, can you replace the use of getNotPrimaryAlignmentFlag() with the non-deprecated isSecondaryAlignment ?

@@ -548,63 +550,79 @@ public void acceptRecord(final SAMRecord record) {
rec = record;
}

// Find the target overlaps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you think that the clipping for overlapping bases should be modified in any way if we take indels into account?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clipping is probably fine. The clipping is based on reference positions IIRC, which should be ok.

final int len = cig.getLength();

for (int i=0; i<len; ++i) {
if (op.isAlignment() || (this.includeIndels && op.isIndel())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this line be swapped with the previous line as it seems to encompass all of the for-loop and doesn't depend on i.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole if is encompassed, but not the statements on lines 612 and 613 that update readOffset and refOffset. I did at one point try inverting and adding an else that updated the offsets using the length of the element and saw no performance increase, so went back to how it is now.


/** Writes the contents of a SAMRecordSetBuilder out to a file. */
File writeBam(final SAMRecordSetBuilder builder, final File f) {
final SAMFileWriter out = new SAMFileWriterFactory().makeSAMOrBAMWriter(builder.getHeader(), false, f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use try-with-resources?

@tfenne
Copy link
Collaborator Author

tfenne commented Jan 31, 2019

Thanks for the review @yfarjoun - I've implemented most of your suggestions and commented where I haven't.

Copy link
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I suspect that this can make a big difference for small panels....

@tfenne tfenne merged commit 42c8ee0 into broadinstitute:master Feb 1, 2019
@tfenne tfenne deleted the tf_hs_metrics_with_indels branch February 1, 2019 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants