Staffing (and otherwise resourcing) a Gen3 Implementation

colinodden · January 14, 2020, 2:12am

Hi. I'm writing from Ohio State University, where I'm weighing options for building a data commons in our college of medicine. Put simply, I wonder how those of you who've implemented Gen3 would estimate the personnel and capital resource costs of doing so. I'm more interested in your experiences at your institution than in speculation (that could of course be well-informed and carefully considered!) what implementation elsewhere (e.g. OSU) would cost.

I'm also interested in how you've balanced effort between Gen3-focused staff and personnel who might be more embedded in projects, especially when it comes to responsibility for submitting data (this could be a silly question - if so, thanks for your patience but please say so!)

Thanks in advance!

Viktorija · January 15, 2020, 3:02pm

Hi, @colinodden! Welcome to the forum!

The team would need a software engineer to install a system and maintain it. To create/update dictionaries, the team would need a data scientist. For data submission, you would need data submitters. The load for these roles can change over time. On the initial step, a software engineer would set up the system. When it is ready, a data scientist will create a dictionary, and a software engineer would assist in putting it to the system (roll the system) on every iteration. Later data submitters would submit data; depending on a volume, it might be one or several people, and dictionary creator could also be involved in this process. A project manager can help to make this process more organized.

Also, there are expenses for hosting and serving data. You might find it useful to check Amazon Web Services calculator or Google Cloud Platform calculator for a rough estimate.

Brian_Walsh · June 4, 2021, 3:44pm

Re. the AWS calculator: Are there a list of instance types, services etc. that are typically used for a commons?

Viktorija · June 4, 2021, 6:52pm

Hi @Brian_Walsh,
You and I discussed this in the slack, but I'll repeat the information here becasue it might be useful for other forum members. Here is an estimate from our DevOps:

3 small Postgres RDS servers
an EKS cluster
the worker nodes on the cluster - depending on how you scale up our services and size the nodes that will probably be at least 5 medium-sized EC2 VM's
an elastic search cluster
an S3 bucket
whatever egress bandwidth to support clients downloading stuff out of the commons

Also, I want to share this document that provides valuable hints and questions for Gen3 commons planning steps: How to Stand Up a Data Commons Draft Outline - Google Docs

Topic		Replies	Views
High level ETL tools? Using Gen3	3	328	January 26, 2023
General Gen3 communication and Slack channel Other Services	79	3690	May 7, 2025
General question about the Gen3 data commons data model Using Gen3	2	559	September 28, 2022
Azure implementation documentation Using Gen3	2	412	December 2, 2022
Google Cloud and Azure Using Gen3	2	329	December 2, 2022

Staffing (and otherwise resourcing) a Gen3 Implementation

Related topics