Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

liftovervcf without sorting the variants #1306

Merged
merged 2 commits into from
Apr 22, 2019

Conversation

lindenb
Copy link
Contributor

@lindenb lindenb commented Apr 10, 2019

Description

I'm trying to liftover gnomad/genome to hg38 but it takes to much time and memory. I'd rather have the resulting VCF file without sorting and then possibly sort later with bcftools.

I added an option DISABLE_SORT to liftover vcf:

  • now the sorter miight be null if DISABLE_SORT=true
  • I moved the variable out to a class member accept
  • I added a test that just write with/without DISABLE_SORT

now vcfliftover can write to stdout, variants are not sorted:

$ java -jar build/libs/picard.jar  LiftoverVcf INPUT=testdata/picard/vcf/LiftOver/testLiftoverBiallelicIndels.vcf OUTPUT=/dev/stdout CHAIN=testdata/picard/vcf/LiftOver/test.over.chain REJECT=/dev/null  REFERENCE_SEQUENCE=testdata/picard/vcf/LiftOver/dummy.reference.fasta DISABLE_SORT=true 2> /dev/null 
##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ReverseComplementedAlleles,Number=0,Type=Flag,Description="The REF and the ALT alleles have been reverse complemented in liftover since the mapping from the previous reference to the current one was on the negative strand.">
##INFO=<ID=SwappedAlleles,Number=0,Type=Flag,Description="The REF and the ALT alleles have been swapped in liftover due to changes in the reference. It is possible that not all INFO annotations reflect this swap, and in the genotypes, only the GT, PL, and AD fields have been modified. You should check the TAGS_TO_REVERSE parameter that was used during the LiftOver to be sure.">
##contig=<ID=chr1,length=540>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample1
chr1	539	.	A	AAGGG	15676.17	PASS	END=539;ReverseComplementedAlleles	GT	0/0
chr1	421	.	CA	C	724.43	PASS	END=422;ReverseComplementedAlleles	GT	0/1
chr1	469	.	A	T	100	PASS	END=469;ReverseComplementedAlleles	GT	0/1
chr1	469	.	A	T	100	PASS	END=469;ReverseComplementedAlleles	GT	./.
chr1	421	.	CA	C	100	PASS	END=422;ReverseComplementedAlleles	GT	1/.

Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https:/broadinstitute/picard/wiki/Guidelines-for-pull-requests

Copy link
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition @lindenb! thanks!

Lifting gnomad is indeed a pain....it's interesting that you think that sorting it with a different tool will be better...

I've put in a few comments, but overall it looks great! thanks again.

src/main/java/picard/vcf/LiftoverVcf.java Outdated Show resolved Hide resolved

//try to open with / without index
try (final VCFFileReader liftReader = new VCFFileReader(liftOutputFile, !disableSort)) {
// nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least count the variants and make sure that you got the right number....

@@ -250,7 +253,10 @@
);

private VariantContextWriter rejects;
/** the output VariantContextWriter */
private VariantContextWriter accept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, can you rename this accepts (with s) or change this and rejects to acceptedRecords and rejectedRecords ?

.modifyOption(Options.ALLOW_MISSING_FIELDS_IN_HEADER, ALLOW_MISSING_FIELDS_IN_HEADER)
.setOutputFile(OUTPUT)
.setReferenceDictionary(walker.getSequenceDictionary())
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep ; in same line.

@@ -477,7 +508,11 @@ private void trackLiftedVariantContig(Map<String, Long> map, String contig) {
private void addAndTrack(final VariantContext toAdd, final VariantContext source) {
trackLiftedVariantContig(liftedBySourceContig, source.getContig());
trackLiftedVariantContig(liftedByDestContig, toAdd.getContig());
sorter.add(toAdd);
if (sorter != null) { //we're sorting the variants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (sorter != null) { //we're sorting the variants
if (!DISABLE_SORT) { //we're sorting the variants

// nothing
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many NLs

for (final VariantContext ctx : sorter) {
out.add(ctx);
progress.record(ctx.getContig(), ctx.getStart());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra nl

log.info("Writing out sorted records to final VCF.");

for (final VariantContext ctx : sorter) {
this.accept.add(ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is off.

@@ -477,7 +508,11 @@ private void trackLiftedVariantContig(Map<String, Long> map, String contig) {
private void addAndTrack(final VariantContext toAdd, final VariantContext source) {
trackLiftedVariantContig(liftedBySourceContig, source.getContig());
trackLiftedVariantContig(liftedByDestContig, toAdd.getContig());
sorter.add(toAdd);
if (sorter != null) { //we're sorting the variants
sorter.add(toAdd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use spaces (4) not tabs for indent.

@lindenb
Copy link
Contributor Author

lindenb commented Apr 11, 2019

@yfarjoun back to you.

Thank you for the review , I hope I removed all the formatting problems

shouldn't the CREATE_INDEX be implicit?

well one can imagine that you want a sorted VCF without index. I added a test

        if (CREATE_INDEX && DISABLE_SORT) {
            log.error("CREATE_INDEX=true and DISABLE_SORT=true are mutually exclusive.");
            return 1;
         }

@yfarjoun
Copy link
Contributor

what I meant is that your test only tests the explicit invocation, and not the implicit one...

Copy link
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost, a few extra newlines and one misformatted } else {

@@ -287,6 +287,11 @@ protected int doWork() {
IOUtil.assertFileIsWritable(OUTPUT);
IOUtil.assertFileIsWritable(REJECT);


if (CREATE_INDEX && DISABLE_SORT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type of input validation should be in customCommandLineValidation()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact I realized that CREATED_INDEX is only for BAM-related tools, for LiftoverVCF, the index is always created event if CREATED_INDEX=false

@@ -181,6 +181,9 @@
@Argument(doc = "INFO field annotations that should be deleted when swapping reference with variant alleles.", optional = true)
public Collection<String> TAGS_TO_DROP = new ArrayList<>(LiftoverUtils.DEFAULT_TAGS_TO_DROP);

@Argument(doc = "Output VCF file will be written on the fly but it won't be sorted and indexed.", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Argument(doc = "Output VCF file will be written on the fly but it won't be sorted and indexed.", optional = true)
@Argument(doc = "Output VCF file will be written on the fly but it won't be sorted nor indexed.", optional = true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“or” actually 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not native english speaker, nor do I care that much :-D

out.writeHeader(outHeader);
this.acceptedRecords = new VariantContextWriterBuilder()
.modifyOption(Options.ALLOW_MISSING_FIELDS_IN_HEADER, ALLOW_MISSING_FIELDS_IN_HEADER)
.modifyOption(Options.INDEX_ON_THE_FLY,!DISABLE_SORT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.modifyOption(Options.INDEX_ON_THE_FLY,!DISABLE_SORT)
.modifyOption(Options.INDEX_ON_THE_FLY, !DISABLE_SORT)

@lindenb
Copy link
Contributor Author

lindenb commented Apr 11, 2019

what I meant is that your test only tests the explicit invocation, and not the implicit one...

Sorry, I'm not sure I understand 🤔 . for example in a test like LiftoverVcfTest/testWriteOriginalPosition how is the condition implicit/explicit tested ?

@yfarjoun
Copy link
Contributor

I guess I mean that you explicitly told the program to CREATE_INDEX rather than it doing whatever it thinks

@yfarjoun yfarjoun merged commit a6c6d88 into broadinstitute:master Apr 22, 2019
@lindenb lindenb deleted the pl_vcfliftover_nosort branch April 23, 2019 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants