Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple trim lengths or read structure in TrimFastq #927

Open
bkohrn opened this issue Aug 14, 2023 · 6 comments
Open

Allow multiple trim lengths or read structure in TrimFastq #927

bkohrn opened this issue Aug 14, 2023 · 6 comments

Comments

@bkohrn
Copy link

bkohrn commented Aug 14, 2023

It would be nice if we could either give multiple output read lengths (one per file) or a read structure (similar to what exists in FastqToBam) in TrimFastq.

@nh13
Copy link
Member

nh13 commented Aug 14, 2023

@bkohrn wouldn't be too hard to change this line to take in multiple lengths:

@arg(flag='l', doc="Length to trim reads to.") val length: Int,
, then propagate that further.

nh13 added a commit that referenced this issue Aug 16, 2023
* feat: allow TrimFastq to specify a length per input FASTQ

See: #927

Co-authored-by: Clint Valentine <[email protected]>
@eboyden
Copy link

eboyden commented Sep 1, 2023

+1 for read structure, so that 5' ends can be modified without having to run DemuxFastqs or FastqToBam

@nh13
Copy link
Member

nh13 commented Sep 1, 2023

What's the objection to just using FastqToBam instead of TrimFastq?

@eboyden
Copy link

eboyden commented Sep 1, 2023

Only that it requires bam output, so if you have downstream steps that require fastq input (e.g. 3' quality or dovetail trimming, and certain aligners e.g. BWA), you need to perform the additional step to convert the ubam back to fastq, that's all. Probably lower on the triage list than some other things, especially if there's a faster/easier means to implement the op's request.

@bkohrn
Copy link
Author

bkohrn commented Sep 1, 2023

Mostly if the end goal is fastq files to use in further analysis (say, I have a 10X run that was sequenced as 150 x 10 x 10 x 150 bp reads, and I need to trim it down as fastq files to 26 x 10 x 10 x 90 bp reads). If I use FastqtoBam, I then need to do a second step to convert back to fastq files before I can proceed with analysis. Not that it isn't doable (I did it last time, using samtools fastq to convert back to fastq files), but it would probably save some time to be able to do the trimming in one step rather than two.

@nh13
Copy link
Member

nh13 commented Sep 1, 2023

I'd just pipe into samtools fastq like you're doing to be honest. Of course there's no need to use output compression, so set fgbio --compression 0 FastqToBam. ... | samtools fastq ... which can speed it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants