Project and Program

Is it possible to add other nodes as root nodes other than Program and Project?

Greetings Monica_Beato_Coelho,

Thank you for your inquiry. These nodes are required for a Gen3 instance to function properly. I've verified with my colleagues here at GEN3 as well as checked the documentation here.
I will add that these two nodes as well as the Core Metadata Collection node are the required. The required fields in each of these three nodes are also required. Fields outside of the required ones can be added.
Lastly, any nodes or fields below the project level are customizable.

Hi Dan,
thank you for your reply.

We are still in exploration node with Gen3 as a possible framework to help manage dataset metadata for Pharma R&D, and we would need a model as flexible as possible to accommodate multiple use cases:

  • For some of our use cases, Program and Study is not applicable (for example, external Real World Data - datasets acquired from a vendor and moved into our environment).
  • In other cases, the node relationships could be better model by: Organizational Unit > Disease > Program > Study > Experiment - in this case the concept of the project doesn't exist.

On Gen3 documentation here also states that "Project, Study, and Subject nodes are administrative nodes that are required for any Gen3 data commons" - so is subject node also required?

Could we at least rename the title of Program and Project nodes? or would that prevent Gen3 instance to work? Still not sure if that would be a good idea, but I was thinking if we could rename the root nodes to something else, we could have just 1 dummy Renamed_Program node and 1 dummy Renamed_Project node, and we could create nodes below as flexible as needed.

Thank you for your help

Hi, Monica,

I'm interested in trying to better understand how you want to use Gen3 so we can advise you appropriately. Is your primary goal to find a way to manage metadata for your studies? Or is it to find a way to analyze and visualize your data from many studies collectively to identify new information?

Thanks!

Hi Sara,
We are exploring if Gen3 would help us manage metadata for our datasets, that metadata would be used by downstream platforms such our data catalog platform.
Thank you!

Hi, Monica! Thanks for sharing that. I consulted with several others, and they argue that Gen3 is a great option for managing metadata. We have several services for that: sheepdog, peregrine, guppy, Tube, and metadata service, for example. Plus, the discovery and exploration pages do make it easier to search across a series of resources to locate the items of interest.

We can't change the program and project nodes (that is, you cannot rename them) -- but you can use them for, e.g., managing access to your data within your organization. Beyond that, though, you can create any data model you want. When you are designing the data model, you will want to try to create a framework that’s flexible and can ingest many different types of data, if there is a lot of variability in metadata across the studies you want to include. You can see an example of this in the data model for BloodPAC: data.bloodpac.org/dd, which is able to host a large variety of different types of experimental data. Then, you will harmonize what you can ("program" and "project" first to manage access, then Organizational Unit > Disease > Program (with a distinguishing name from the earlier unchangeable Program node) > Study > Experiment … subject, demographics, etc.), while any data type that is unique gets its own node/properties (e.g., if one study has special survey data, or some kind of DNA sequencing).

I am happy to say that, to your specific question:

On Gen3 documentation here also states that "Project, Study, and Subject nodes are administrative nodes that are required for any Gen3 data commons" - so is subject node also required?

We believe the only nodes that are truly required are program, project, and core_metadata_collection - not study and subject. Thank you for highlighting that inaccuracy in our documentation -- we will make a note to correct that!

Finally - Do you know about our Slack channel that's available for the Gen3 community? You can sign up to join this community Slack channel by completing this form: Sign up to join our Gen3-Community on Slack!. It's a great place to ask questions from other Gen3 users applying the platform in many different contexts. These users may also be able to share their experiences with data sets as variable as yours. We hope to see you there!

-- Sara

Hi Sara,
Thank you for your insights! I will join Slack and will explore on our Gen3 test instance if we can work with this limitation of the metadata model.