A few "can gen3 do this" questions

Dear GEN3 community,

I am trying to use GEN3 to build a repository that I have in mind, but cannot seem to find answers to a few of my questions from the videos and documentation that I have gone through so far. I would appreciate it if someone could me point to the right features that I need to check out:

  1. "Group of files". I would like to use a single ID to specify and download a group of files (e.g. bam and its index), something like 'gen3-client .... group-id`. What is the best way to do this?

  2. "multiple groups of properties": In my case the same data could be used for multiple purposes. For example a bam file could have "study=name1", and "study=name2", each have its own metadata. What is the best way to create multiple "views" of the same file? That is to say, the same data could be searched by more than one set of metadata.

  3. "filter or augment data". This is again another "view" of the data but on the data itself. For example, if the bam file contains data for a whole genome, can I specify some filter function and use some parameters to retrieve only chromosome 1? The filter could be some Object Lambda functions for AWS S3.

Many thanks in advance,
Bo

HI @BoPeng,
welcome to the Forum and thank you for your questions!

  1. a) You can select files from the File Explorer and export them as a file manifest, which you can call to download using the gen3-client. The client will always need a GUID to download the data file, as described here. b) You can also hit <commons-url>/index/index and find your files of interest and write a script which gives you the list of GUIDs you can use for the client again.

  2. Please define which metadata you would want to search for. Depending on, you could set up the data dictionary so that you have multiple parent nodes linking to the data file node.

  3. At this time, BAM slicing is not implemented in Gen3, but we're aware of this and started to scope how we can implement this in the future. There are no concrete plans at this time.

Hope this helps!

Many thanks for your response.

  1. Great to know the manifest feature, which I can create programmatically and provide to users.
  2. I will look into "multiple parent nodes" linking to the data file node.
  3. I was referring to a general solution for data filter (e.g. select snp markers or draw random samples), augment (e.g. add gene name), conversion (e.g. decompress), even simulation. If I have a "plugin" for manipulating data, where can I inject it to GEN3 make the application of the plugin transparent to users?

Hey!

  1. At this time, it is unfortunately not yet possible to manually add tools to Gen3. We are actively developing the product (workspaces) that would be the place in the future for "inserting" tools you mentioned.

Thanks! That was very helpful.