Hi. I'm writing from Ohio State University, where I'm weighing options for building a data commons in our college of medicine. Put simply, I wonder how those of you who've implemented Gen3 would estimate the personnel and capital resource costs of doing so. I'm more interested in your experiences at your institution than in speculation (that could of course be well-informed and carefully considered!) what implementation elsewhere (e.g. OSU) would cost.
I'm also interested in how you've balanced effort between Gen3-focused staff and personnel who might be more embedded in projects, especially when it comes to responsibility for submitting data (this could be a silly question - if so, thanks for your patience but please say so!)
Thanks in advance!
Hi, @colinodden! Welcome to the forum!
The team would need a software engineer to install a system and maintain it. To create/update dictionaries, the team would need a data scientist. For data submission, you would need data submitters. The load for these roles can change over time. On the initial step, a software engineer would set up the system. When it is ready, a data scientist will create a dictionary, and a software engineer would assist in putting it to the system (roll the system) on every iteration. Later data submitters would submit data; depending on a volume, it might be one or several people, and dictionary creator could also be involved in this process. A project manager can help to make this process more organized.
Also, there are expenses for hosting and serving data. You might find it useful to check Amazon Web Services calculator or Google Cloud Platform calculator for a rough estimate.
Re. the AWS calculator: Are there a list of instance types, services etc. that are typically used for a commons?
Hi @Brian_Walsh,
You and I discussed this in the slack, but I'll repeat the information here becasue it might be useful for other forum members. Here is an estimate from our DevOps:
- 3 small Postgres RDS servers
- an EKS cluster
- the worker nodes on the cluster - depending on how you scale up our services and size the nodes that will probably be at least 5 medium-sized EC2 VM's
- an elastic search cluster
- an S3 bucket
- whatever egress bandwidth to support clients downloading stuff out of the commons
Also, I want to share this document that provides valuable hints and questions for Gen3 commons planning steps: How to Stand Up a Data Commons Draft Outline - Google Docs