Using Gen3 for linking multiple datasets

Hi Gen3 community, I am working for research foundation and we are currently exploring NHGRI AnVIL as a ecosystem for bioinformatics. We have explored Terra and ran few analyses using publicly available datasets and workflows.

Here's the challenge we are facing, Our researchers are working in different labs developing mouse models, zebrafish models, cell lines from patient collected samples. They have WGS, CHIP-Seq, Single Cell, RNA-Seq sequencing data generated from these models. We have collected patients' genetic WES testing data. Also, we have survey and natural history data with health records.

But all these datasets are kind of siloed and not linkable. So we need to de-identify, link and centralize these various datasets.

Can you guide how Gen3 data commons can provide this link-ability and centralization. Also, how is de-identification supported by Gen3. Looking forward to hearing thoughts.

Thanks.

Hi, Nikhil_Shingte,

Thanks for reaching out! I'm a little confused, so I wanted to clarify some points before I try to answer.

It sounds like you work for a research foundation that has labs working on many different types of human samples and animal models, producing a variety of genetic and clinical data that is currently not de-identified. Your research foundation is interested in finding a cloud-based workspace/data commons where these researchers can share their lab's data with each other (and possibly external users?) and also be able to analyze the genetic data using bioinformatic pipelines.

I think you are looking at the AnVIL ecosystem to maybe provide this analysis space, through Terra. But, you would need to prepare your datasets -- de-identify, "centralize," and make them "linkable" (I'm not sure what these last two pieces mean - could you explain more?)

It's also possible that maybe you are instead talking about creating your own Gen3 data commons that is like AnVIL, but has all your researcher's data instead of the AnVIL data. (Is that what you mean?)

Thanks for helping me better understand your goals so I can be sure I'm answering the right question.

-- Sara