-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added handling for indels within CollectHsMetrics and CollectTargetedPcrMetrics. #1273
Added handling for indels within CollectHsMetrics and CollectTargetedPcrMetrics. #1273
Conversation
…PcrMetrics. When the INCLUDE_INDELS parameter is set to true: 1. Deletions within reads are treated as covering the reference bases if the preceding base is high quality 2. Inserted bases are counted as "on-target" but don't contribute coverage to any target base(s)
@yfarjoun I'm not really done with this PR, I plan to add some additional tests, but wanted to get this up for you, and anyone else who is interested, to look at and see if you agree with direction. Thanks! |
…unPicardCommandLine("I=", input) vs. runPicardCommandLine("I=" + input).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. thanks for the tests. I have a few comments but nothing major.
src/main/java/picard/analysis/directed/TargetMetricsCollector.java
Outdated
Show resolved
Hide resolved
src/main/java/picard/analysis/directed/TargetMetricsCollector.java
Outdated
Show resolved
Hide resolved
isOnTarget = true; | ||
// Then go through the per-target/per-base hq and unfiltered coverage | ||
// The cutoff of >= 2 is because even the unfilteredCoverage doesn't want Q1 or Q0 bases! | ||
if (qual > 2 && incrementPerTargetCoverage && onTarget) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (qual > 2 && incrementPerTargetCoverage && onTarget) { | |
if (qual >= 2 && incrementPerTargetCoverage && onTarget) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the comment, the code is actually correct here - the het sensitivity code only wants bases > q2, not >= q2.
if (qual > 2 && incrementPerTargetCoverage && onTarget) { | ||
for (final Interval target : targets) { | ||
if (refPos >= target.getStart() && refPos <= target.getEnd()) { | ||
final int targetOffset = refPos - target.getStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor into overlapsTarget and use below in 619 as well?
if (refPos >= target.getStart() && refPos <= target.getEnd()) {
|
||
// Now run CollectHsMetrics four times | ||
final File out = Files.createTempFile("hsmetrics.", ".txt").toFile(); | ||
runPicardCommandLine(Arrays.asList("INCLUDE_INDELS=false", "SAMPLE_SIZE=0", "TI="+ts.getPath(), "BI="+bs.getPath(), "O="+out.getPath(), "I="+withDelBam.getAbsolutePath())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be more readable with a data-provider...no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find the whole TestNG data provider stuff makes tests harder to read, which is why I avoid it.
} | ||
|
||
// Write things out to file | ||
final File dir = IOUtil.createTempDir("hsmetrics.", ".test"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to delete the directory yourself.
@@ -548,63 +550,79 @@ public void acceptRecord(final SAMRecord record) { | |||
rec = record; | |||
} | |||
|
|||
// Find the target overlaps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are here, in line 331 above, can you replace the use of getNotPrimaryAlignmentFlag()
with the non-deprecated isSecondaryAlignment
?
@@ -548,63 +550,79 @@ public void acceptRecord(final SAMRecord record) { | |||
rec = record; | |||
} | |||
|
|||
// Find the target overlaps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you think that the clipping for overlapping bases should be modified in any way if we take indels into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the clipping is probably fine. The clipping is based on reference positions IIRC, which should be ok.
final int len = cig.getLength(); | ||
|
||
for (int i=0; i<len; ++i) { | ||
if (op.isAlignment() || (this.includeIndels && op.isIndel())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this line be swapped with the previous line as it seems to encompass all of the for-loop and doesn't depend on i
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole if is encompassed, but not the statements on lines 612 and 613 that update readOffset
and refOffset
. I did at one point try inverting and adding an else that updated the offsets using the length of the element and saw no performance increase, so went back to how it is now.
|
||
/** Writes the contents of a SAMRecordSetBuilder out to a file. */ | ||
File writeBam(final SAMRecordSetBuilder builder, final File f) { | ||
final SAMFileWriter out = new SAMFileWriterFactory().makeSAMOrBAMWriter(builder.getHeader(), false, f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use try-with-resources?
Co-Authored-By: tfenne <[email protected]>
Thanks for the review @yfarjoun - I've implemented most of your suggestions and commented where I haven't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I suspect that this can make a big difference for small panels....
Description
When the INCLUDE_INDELS parameter is set to true:
There are two motivations for this change:
PCT_USABLE_BASES_ON_TARGET
+PCT_EXC_*
to add up to 1.0. However I would frequently see this adding up to substantially over 1.0.I've done my best to re-organize the code to make it easier to follow and maintain in future.
Checklist
Content
Review