Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract detailed mutation information for TCGA samples #15

Open
gwaybio opened this issue Aug 9, 2016 · 2 comments
Open

Extract detailed mutation information for TCGA samples #15

gwaybio opened this issue Aug 9, 2016 · 2 comments

Comments

@gwaybio
Copy link
Member

gwaybio commented Aug 9, 2016

In speaking with a cancer biologist and collaborator about cognoma it was discovered that a huge win we could relatively easily deliver is classification performance (or classification scores) across different mutation types for an input gene. This would be extremely useful for a researcher who is interested in determining the pathogenicity of a particular mutation.

I believe that cognoma is an ideal way of approaching this problem. Typically, when genes mutate there is a range of severity regarding how the particular mutation impacts downstream changes. For a particularly virulent mutation, a classifier trained to detect an inactivation signature may output a higher score for those groups of samples, than other samples with a less virulent mutation.

In my eyes, this particular issue bypasses the machine learning group - they will still work with the previously defined Y matrices. However, in order for the backend to serve the frontend information from the database about each sample's mutation so that the frontend can visualize the results we need to know how to parse this information.

I looked briefly at the information embedded in the PANCAN mutation data - particularly the columns labeled HGVSc and HGVSp. These columns hold standard ways of storing specific mutation calls. More information about these standards are provided by the HGVS website.

@dhimmel dhimmel changed the title Visualize classifier performance across mutation types Extract detailed mutation information for TCGA samples Aug 9, 2016
@gwaybio
Copy link
Member Author

gwaybio commented Aug 9, 2016

To follow up - today, the data group should be focusing on extracting a table of the format:

sample_id gene_mutation DNA_change protein_change
TCGA___ TP53 c.___ p.Arg175Pro

@dhimmel
Copy link
Member

dhimmel commented Aug 10, 2016

Note that the gene_mutation column should be Entrez GeneIDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants