Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yf stratify chimeric #1269

Merged
merged 7 commits into from
Mar 8, 2019
Merged

Yf stratify chimeric #1269

merged 7 commits into from
Mar 8, 2019

Conversation

yfarjoun
Copy link
Contributor

Description

Adds two stratifiers to CollectSamErrorMetrics that allow exploring the data with respect to Chimeric reads and soft-clipped bases.


Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https:/broadinstitute/picard/wiki/Guidelines-for-pull-requests

@coveralls
Copy link

coveralls commented Jan 19, 2019

Coverage Status

Coverage decreased (-0.007%) to 81.473% when pulling 14c76b9 on yf_stratify_chimeric into b15d007 on master.

Copy link
Contributor

@fleharty fleharty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfarjoun
Minor changes requested, mostly in descriptions.

@@ -74,7 +74,7 @@
"<p>" +
"The resulting metric file will be named according to a provided prefix and a suffix which is generated " +
" automatically according to the error metric. " +
"The tool cal collect multiple metrics in a single pass and there should be hardly any " +
"The tool can collect multiple metrics in a single pass and there [ be hardly any " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace "[" with "should"

return -1;
}
return cigar.getCigarElements().stream()
.filter(e->e.getOperator()== CigarOperator.S)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces around operators -> and ==

POST_DINUC(() -> postDiNucleotideStratifier, "Stratifies bases by the read base at the previous cycle, and the current reference base."),
HOMOPOLYMER_LENGTH(() -> homoPolymerLengthStratifier, "Stratifies bases based on the length of homopolymer they are part of (only accounts for bases that were read prior to the current base)."),
HOMOPOLYMER(() -> homopolymerStratifier, "Stratifies bases based on the length of homopolymer, the base that the homopolymer is comprised of, and the reference base."),
GC_CONTENT(() -> gcContentStratifier, "The gc content of their read."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

their -> the

Also, gc should be "GC"

READ_ORDINALITY(() -> readOrdinalityStratifier, "The read ordinality (i.e. first or second)."),
READ_BASE(() -> currentReadBaseStratifier, "the base in the original reading direction."),
READ_DIRECTION(() -> readDirectionStratifier, "The alignment direction of the read (encoded as + or -)."),
PAIR_ORIENTATION(() -> readOrientationStratifier, "The reads orientation and ordinality. (into F1R2 or F2R1) Assumes reads are \"innies\"."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either "reads" should have a possessive or be singular. I would vote for "read".
Not sure I understand what "into" means.

READ_BASE(() -> currentReadBaseStratifier, "the base in the original reading direction."),
READ_DIRECTION(() -> readDirectionStratifier, "The alignment direction of the read (encoded as + or -)."),
PAIR_ORIENTATION(() -> readOrientationStratifier, "The reads orientation and ordinality. (into F1R2 or F2R1) Assumes reads are \"innies\"."),
PAIR_PROPERNESS(() -> readPairednessStratifier, "The properness of the reads alignment. Looks for indications of chimerism."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"reads" should be possessive or singular. I vote for "read".

private static Integer stratifySoftClippedBases(final SAMRecord sam) {
final Cigar cigar = sam.getCigar();
if (cigar == null) {
return -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you declare a constant for this?

PAIR_PROPERNESS(() -> readPairednessStratifier, "The properness of the reads alignment. Looks for indications of chimerism."),
REFERENCE_BASE(() -> referenceBaseStratifier, "The reference base in the read's direction."),
PRE_DINUC(() -> preDiNucleotideStratifier, "The read base at the previous cycle, and the current reference base."),
POST_DINUC(() -> postDiNucleotideStratifier, "The read base at the previous cycle, and the current reference base."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "previous" should be "subsequent"

READ_GROUP(() -> readgroupStratifier, "The read-group id of the read."),
CYCLE(() -> baseCycleStratifier, "The machine cycle during which the base was read."),
BINNED_CYCLE(() -> binnedReadCycleStratifier, "The binned machine cycle. Similar to CYCLE, but binned into 5 evenly spaced ranges across the size of the read. This stratifier may produce confusing results when used on datasets with variable sized reads."),
SOFT_CLIPS(() -> softClipsLengthStratifier, "The number of softclipped bases their read has."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"their" -> "the"

@yfarjoun
Copy link
Contributor Author

Thanks for the review @fleharty. I think I addressed all your points.

Copy link
Contributor

@fleharty fleharty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a question about whether or not F1F2 and R1R2 should be included in the documentation, and a final.
Otherwise I approve. You'll need to find another reviewer anyway.

@yfarjoun yfarjoun merged commit 904a385 into master Mar 8, 2019
@yfarjoun yfarjoun deleted the yf_stratify_chimeric branch March 8, 2019 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants