Hi. I'm reading about Gen3, considering it's use in a data commons for diabetes research. I'm wondering about the choice of a graph db after reading about Tube - you convert the graph to a SQL db in order to do queries efficiently from the web interface. What then does the graph db itself offer as advantages, that it's worth regularly going through this conversion process. Thanks. I'm much more of a dev than an analytics person, so some obvious concepts may be missing for me.
-M
Hi, Michael!
We're glad to hear you're considering using the Gen3 framework! I wonder whether your question might be answered in part with a response we provided to another recent question here on the forum. I've pasted the relevant response here for convenience, but you can look at the whole question over here.
The data we're representing has a hierarchical relationship, and that's easily mapped to a graph representation. So, having a programmatic representation that also is graph-based made sense. If we were using SQL instead of GraphQL, every query would programmatically require us to do a ton of joins all the time. This would take a lot of programming energy, not to mention possibly making the system more vulnerable to programming errors and less robust to updates. With a graphical model, we can set the users up to query on relationships between several entities in a graph based manner, which is simpler because we can more easily describe the path to take through the model for the query. Also, GraphQl provides a more structured way to add new analysis entities (e.g., workflows, data outputs) once the commons are extant and begin to grow.
As you noted, the backend is a relational database setup in PostGres. Peregrine, our GraphQL service, allows you to construct queries of PostGres. Additionally, our ETL tool (tube) can flatten and store structured data in ElasticSearch, allowing for faster searches. This flattened data can be searched using Guppy (which allows you to use GraphQL queries on data in ElasticSearch.)
If you haven't yet seen this resource in the documentation, I encourage you to check out this page: Gen3 - Technical Intro
Let me know your next questions to continue the conversation!
Also - Do you know about our Slack channel that's available for the Gen3 community? You can sign up to join this community Slack channel by completing this form: Sign up to join our Gen3-Community on Slack!. It's a great place to ask questions from other Gen3 users appling the platform in many different contexts. We hope to see you there!
-- Sara
Thanks Sara. There's something I'm not understanding - you saying using GraphQL allows for more straight-forward design and data ingestion, and that flattening it into ElasticSearch allows for faster searches. Are those faster searches because it's been flattened, or because it's running on ElasticSearch? Or both? But otherwise I think I understand my question better now.