You are a data scientist working with clinicians at a hospital or a medical informatics company.Clinicians want you to develop a tool that will enable them to slice and dice terabytes of various hospital record data to help them analyze broad patterns such as correlation between diabetes and heart failure (say). Currently each select query with several filtering parameters takes you several hours to process since the data is voluminous and varied. It is also fragmented and distributed in various silos across multiple databases or schemas. A traditional relation data warehouse approach fails to scale and provide responses in human time (seconds or minutes) because the filtering and joins required cannot be completed using traditional tools. This makes the entire user experience cumbersome and non-responsive thereby significantly hindering accurate analytics.
At University of Washington Center for Web and Data Science we are developing a web application that will allow data scientists and domain experts to explore the data without having any knowledge on how to write SQL queries. This web application is being developed using jQuery to provide a drag and drop feature for populating values for a field and selecting a value will generate statistics and providing visualizations using D3.js to enhance user experience. Such tools will give the luxury to explore the data efficiently and identity good datasets. The goal for this application is to be independent from how is data stored at back end, whether it’s a database or some distributed file system divided on multiple nodes.
The demo given as part of the presentation will consist of an in-depth look on how this tool is being developed, what kind of databases we currently support and what kind of challenges we are facing.