-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate comprehensive comparisons between RNAseq, Mutation, and Clinical Matrix #17
Labels
Comments
You should be able to use the sample_ids from Also I usually wait till the last moment to drop samples, meaning we probably could process the clinical matrix without this information? |
dhimmel
added a commit
to dhimmel/cancer-data
that referenced
this issue
Aug 24, 2016
Keeps only samples with type equal to "Primary Tumor". This filters multiple samples from the same patient, which could cause an issue for machine learning due to a dependent observations (discussed in cognoma#10). This filter reduced the number of samples with expression and mutation from 7,705 to 7,306. Closes cognoma#10: all variables that could help with sample selection or covariates, that are in PANCAN_clinicalMatrix, are extracted to `data/samples.tsv`. Relies on documentation of PANCAN_clinicalMatrix variables provided by the Xena Browser team in cognoma#14. Closes cognoma#17: only sample_ids with expression, mutation, and clinical data are output to `data/`.
clairemcleod
pushed a commit
that referenced
this issue
Aug 25, 2016
* Extract sample info from PANCAN_clinicalMatrix Keeps only samples with type equal to "Primary Tumor". This filters multiple samples from the same patient, which could cause an issue for machine learning due to a dependent observations (discussed in #10). This filter reduced the number of samples with expression and mutation from 7,705 to 7,306. Closes #10: all variables that could help with sample selection or covariates, that are in PANCAN_clinicalMatrix, are extracted to `data/samples.tsv`. Relies on documentation of PANCAN_clinicalMatrix variables provided by the Xena Browser team in #14. Closes #17: only sample_ids with expression, mutation, and clinical data are output to `data/`. * Retain primary blood cancers Retain cancers whose type is "Primary Blood Derived Cancer - Peripheral Blood". See #20 (comment)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We need to generate a comparison between the sample IDs that exist in all three data sources. It will be good to subset the clinical matrix to only samples that are measured by RNAseq and to file a pull request with this report.
The text was updated successfully, but these errors were encountered: