Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column" BUT sra QUALITY column is present. #851

Open
GabeAl opened this issue Sep 8, 2023 · 1 comment

Comments

@GabeAl
Copy link

GabeAl commented Sep 8, 2023

This misbehavior happens with fasterq-dump 3.0.7 on the following accession and a few others, at a rate of about 7% of biosample-based accessions. SAMN08049698 and some other biosample-based accessions fail, but most work fine including within the same project. Interestingly, the underlying run in this case (SRR6468610) still works fine.

Why this is a bug:

  1. The sra file appears to indeed contains quality scores
  2. There is only one underlying (single) RUN accession for these biosamples (which is, in turn, the only fastq data associated with them)
  3. The underlying run accession, when supplied directly, works as expected for these cases, including the example.
  4. Other biosample accessions from the same study submitted at the same time from the same modality (even from the same subject!) work perfectly. In this case that would be SAMN08049628 (another biosample that dumps properly directly to fastq).

A workaround could be manually using entrez direct or something to translate into Run IDs, dumping the Run IDs, renaming (or concatenating) to biosample IDs again. But that's quite the workaround for the couple of files this fails on, so I thought I'd report it here.

@GabeAl GabeAl changed the title fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column "fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column" BUT quality column is present Sep 12, 2023
@GabeAl GabeAl changed the title "fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column" BUT quality column is present "fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column" BUT sra QUALITY column is present. Sep 12, 2023
@mortunco
Copy link

mortunco commented Jan 9, 2024

I am having the same problem. I am running fasterq-dump in a snakemake workflow. Weirdly. when i run without specifying a temp directory (which uses default) OR when I direct my home (~/temp-fasterq-dump). I get no errors. I am thinking it might be a permssion or directory lock problem maybe?

Important notes

  1. I download SRA from AWS s3 bucket then do fasterq-dump to download SRA file.
  2. despite the error. fastq 1 and 2 are generated.

$ code/sratoolkit.3.0.7-centos_linux64/bin/fasterq-dump raw-data/testproject-001/GSM3148577_BC10_TUMOR1/SRR7191904.sra --threads 8 --split-3 -O raw-data/testproject-001/GSM3148577_BC10_TUMOR1 -t raw-data/testproject-001/GS
M3148577_BC10_TUMOR1/temp-fasterq-dump 
spots read      : 41,812,717
reads read      : 83,625,434
reads written   : 83,625,434
2024-01-09T18:32:54 fasterq-dump.3.0.7 err: the input data is missing the QUALITY-column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants