-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variant calling with paired-end sequencing #968
Comments
We did add support for consensus calling the overlapping bases in a read pair which is turned on by default in both I hope that helps! |
Thank you! So the information coming from the overlapping mate read is taken into account, but in any case two separate consensus sequences will be reported in the output file. Can you confirm that my understanding is correct? I think this is an important point to consider when calling variants on the consensus sequences: as far as I can see, the real number of distinct template molecules supporting a mutation might be lower than the number of consensus sequences carrying the mutation, simply because the same molecule is represented twice in the consensus file. |
You are correct that two read pairs are output. But variant callers (like GATK) do not count the same bace twice when reporting a mutation from an overlapping read pair. This is regardless of whether the read is a raw read pair or made from a consensus. |
Dear fgbio developers
We are using fgbio for a sequencing project, and I need a clarification regarding the consensus generation. Reading from the documentation, I understand that “this tool calls each end of a pair independently, and does not jointly call bases that overlap within a pair”.
If I understand correctly, this basically means that the two ends of a read pair will generate two separate consensus sequences, even if they are tagged with the same UMI and map to the same genomic location (they may even overlap).
If that’s the case, when I call the variants on these consensus sequences, the two ends of the same read pair will count as two molecules. Correct? So, looking at the final VCF file, one cannot distinguish a variant detected in two different molecules from a variant that is detected in the two ends of a read pair. Is my reasoning correct?
Thanks!
The text was updated successfully, but these errors were encountered: