-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current Xena PANCAN_mutation dataset is missing some samples and variables from a previous release #16
Comments
Agreed. If you have any questions regarding data from UCSC Xena. The google group is the most effective way to get the message to us. Jing |
Thanks @jingchunzhu - your group continues to be very helpful! For the cognoma community, here is a link to the UCSC Xena google group discussion about this issue |
@gwaygenomics, with @jingchunzhu's latest reply (quoted below), what's the status of this issue? Basically, should it be closed or is the issue ongoing (and what's the next step to progress forward)?
Yes. These are not from the same release. However I am surprised by "lacking several columns", could you say how the columns are different? Sample number change is not surprising because we periodically update TCGA data. This particular dataset is compiled by the Xena team at UCSC, and in almost all cases, TCGA has multiple version of mutation calls from several sequencing and analysis groups, broad, WashU, BCM, and UCSC, plus there are curated and automated calls, plus there are different sequencing platforms. So we made our internal decision on which dataset to include, and the exact selection has been changed over time, not drastically, but there are changes. The change will effect sample numbers.
Starting 2016, we store our release data on AWS S3, which means that all versions of data starting 2016 will be on S3. We plan to do so in the future as long as there is resource to sustain it. .json files are part of the data releases, which will stores the version information. Our previous data releases are not on S3. Do you need the previous version that you retrieved in June 12? We can send to you directly. Jing |
@dhimmel - I responded to @jingchunzhu on the google groups but the message was not posted. Not sure what happened here. My post listed the different columns between the two versions. There were many more columns in the older version. Perhaps @jingchunzhu is looking into it before passing my comment through the moderators? |
@gwaygenomics good to know. Give it time -- there is a delay between posting and the message appearing (perhaps an approval stage with a poor user experience). I actually posted a suggestion to move the Google Group to GitHub issues to avoid these blocks, although this post is also currently hidden. |
I don't see either of the two messages. Not sure what's going on. Sorry. > Perhaps @jingchunzhu is looking into it before passing my comment Greg, I don't know if the message will show up at all. Can you email me with your post that did not get through? |
@jingchunzhu, every time I post to the Google Group there is a substantial delay till it appears. I'm pretty sure the messages will show up if we wait. |
@jingchunzhu I'm starting to think that @gwaygenomics and my posts may actually be permanently missing this time. Is the Google Group moderated and if so, can you confirm that our posts are not waiting on approval? |
Yes. It is moderated. I think because Mary is off on vacation till next Monday. All incoming posts is in the to be approved queue. I will talk to her to give me approval permission after she comes back. Or perhaps to see if there is a feature in google group that can give some people or some accounts permission to bypass moderation. |
So It seems like one of the reasons for missing samples could be the upgrade to hg19 I'm content with the fluctuation in sample number -- we'll work with whatever the latest release from Xena contains. @gwaygenomics, it seems that there is still one outstanding question before we can close this issue. You mention that a previous release of |
@dhimmel @jingchunzhu sorry for the late response. The exact variables are Thanks! |
I have had this issue in the past (see zenodo file) and it looks like the current PANCAN_mutation file from xena has less samples and less columns than a previous version.
One of the columns we don't have is the specific nucleotide mutation and is preventing us from completing #15
It may be good to ask a direct question to the UCSC Xena Google Group. They have been helpful in the past (see #14)
The text was updated successfully, but these errors were encountered: