I'm also trying to download a file which only seems available through aws, but it says I need to get my aws key from: https://bionimbus-pdc.opensciencedatacloud.org/, but the website seems to be down. Is there somewhere else to get the key? Thanks
Greetings Jordan,
I can verify that the website you linked is down. Sometime ago things shifted to https://icgc.bionimbus.org/ . Additionally, documentation for what it seems like you are attempting can be found here: Data Download - ICGC DCC Docs
If, I'm terribly off base please let me know along with any additional details you might have.
Dan Biber
University of Chicago - CTDS
Senior Scientific Support Analyst
We managed to find the file at: Bionimbus PDC for ICGC
But these instructions seemed to indicate we'd only be able to find that file at aws: https://dcc.icgc.org/releases/PCAWG/consensus_snv_indel
Ah, I see. I'm glad you were able to find the file. Those instructions definitely seem very outdated.
Even though you already found the file -- with the thought of helping "tomorrow's" Gen3 Forum users, I'm going to add some additional info about finding the TCGA PCAWG files on PDC Bionimbus:
The TCGA PCAWG files are now accessible via the PDC data portal (https://icgc.bionimbus.org), and files are controlled access under dbGaP Study "phs000178". You can use the following link to request access if you don't already have it:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000178.v11.p8
Once you have access, log into the PDC data portal using your eRA commons ID / password, and the files can be browsed and downloaded using our file exploration GUI (Bionimbus PDC for ICGC). In the file explorer, there is a table of files, which you can subset using the filters on the left. Here you can discover file metadata (like the GUID, file name, format, type, etc) and download files in the browser by clicking the GUID in the table. You can also export a file download manifest for the files listed in the table for bulk file download using the gen3-client. Alternatively, you can use the exploration tool at https://dcc.icgc.org/ to select files in the PDC and export a file download manifest. You'll then need to convert that shell script to a gen3-client manifest (JSON) using the dcc_to_gen3.py script, which you can get from GitHub: https://github.com/uc-cdis/pdc_tools/archive/1.0.tar.gz
If you have a file's object ID (GUID) you can download the file by entering the GUID after "files/" in the URL, for example:
https://icgc.bionimbus.org/files/0e8a845d-a4f4-40bc-890b-5472702d087c
The last portion of the URL is the file's object_id (or GUID), for example:
0e8a845d-a4f4-40bc-890b-5472702d087c is the GUID for final_consensus_passonly.snv_mnv_indel.tcga.controlled.maf.gz
b4a167ea-a1b1-4627-a7c6-dfa79809d98c is the GUID for MEI_vcfs.tcga.controlled.tar.gz
The gen3-client is our command-line tool for downloading the files, which can be used to download single files or bulk download files listed in a manifest:
You can also query for files using graphQL on the query page (Bionimbus PDC for ICGC). This is a more technical but also more reliable method of finding the files. For example, get the object_id (aka GUID) for a file_name:
Click “Switch to Graph Model” and paste the following into the box on the left (query filenames using the ‘quick_search’ argument):
{datanode(quick_search: "snv_mnv_indel.tcga.controlled.maf.gz") {object_id}}
If you have further questions, it might be better to reach out to us by email at support@datacommons.io. We can dive into the details a bit more safely than on this public forum.
Finally - Do you know about our Slack channel that's available for the Gen3 community? You can sign up to join this community Slack channel by completing this form: Sign up to join our Gen3-Community on Slack! . It's a great place to ask questions from other Gen3 users appling the platform in many different contexts. We hope to see you there!
-- Sara