What is a Datathon?
What is an ICU Datathon?
A Datathon per se is a voluntary, sprint-like event in which data scientists and experts in a certain field gather and work side by side with the aim of tackling major questions in the field through the analysis of big data. It is typically organized in the way of a competition with many concurring teams, and often held on a weekend. ICU Datathons do not differ much from this general model: teams composed by physicians, data scientist, statisticians and engineers are formed and all attempt to solve some of the current issues in the Intensive Care Unit (ICU) using the data from MIMIC Database, ANZICS APS, or JIPAD. The themes (clinical questions) are proposed by physicians, usually members of the national ICU society of the hosting country, before the actual Datathon takes place, while the teams are built just prior or at the event itself.
Datathon Kyoto will be held from Friday, 8th of March 2019 to Sunday, 10th of March. The first day will offer hands-on workshops and lectures. Saturday and part of Sunday will be for “hacking”, which in a health Datathon means the application of machine learning on health data. Participants with various backgrounds will work together with the shared goal of addressing a research question. At the end of Sunday, teams present their analyses. A Scientific Committee will select and award the best 3 projects based on clinical relevance, the novelty of the topic, the methodology, and the quality of the presentation.
The IT infrastructure is managed by the experts from Massachusetts Institute of Technologies and National University of Singapore and other societies. A team coming from various Japanese institutes takes care of setting up the database and the connections to the servers. These facilitators also provide support to the various teams during the competition.
The event is sponsored by companies and institutions. In past Datathons, companies like Google, Philips, General Electric, Hitachi and others (see past events) were involved with their national and international representatives.
Figure 1 Opening conference during the Madrid Datathon held December 2017
How does it work?
The opening ceremony is usually held by the members of the MIT Team, who welcome the participants, introduce the rules and the subject of the event, and explain all the tools and databases available during the competition.
The teams are then formed and assigned a clinical task: the modalities in which this is brought to completion vary. The core of the Datathon is the hacking phase, which takes place from Saturday morning through Sunday afternoon. Teams thus have less than 2 days to tackle the clinical question they were assigned. Afterwards, they are asked to present their results in front of both the public and judges. The board decides and announces the winning team and runner-ups during the closing ceremony where they are presented with their awards. The goal of the Datathon is ultimately to create interdisciplinary collaborations in critical care as well as promoting the use of advancing machine learning techniques in healthcare.
Before and/or during the competition, MIT experts, invited speakers and exponents from companies and institutions engage in presentations and talks about the subject, making up for the conference part of the event. This is usually held on Friday.
Figure 2 MIT-LCP Dr. Johnson during a presentation, Tokyo 2018
Figure 3 Teams final presentations, AP-HP, Paris 2018
Figure 4 In the middle of hands-on seminar , Tokyo 2018
Figure 5 MIT Data Experts, Datathon conclusion, Beijing 2017
Figure 6 MIT Data Experts, Datathon conclusion, Tokyo 2018
MIMIC-III (Medical Information Mart for Intensive Care III) is a large freely available database containing deidentified data coming from the Beth Israel Deaconess Medical Center collected from 2001 till these days. It is a relational database that contains over 40 tables which include different kind of information about more than 40,000 Intensive Care Unit patients.
The database is freely available to researchers worldwide and is becoming even more popular in the research community, as testified by the ever-growing number of publications that address it.
The data stored in MIMIC is made available to the participating teams in order to face the given project. Engineers and data scientists will first query the MIMIC-III to mine data, and then use machine learning algorithms to build models. They will work side by side with clinicians so as to eliminate confounders, introduce coherent parameters and overall provide their best solution to the assigned problem.
LCP - The MIT Laboratory for Computational Physiology
The Laboratory for Computational Physiology is recognized as experts in the application of machine learning and signal processing for secondary use of health data. Over the past decade, LCP, Beth Israel Deaconess Medical Center (BIDMC) and Philips Healthcare, with support from the National Institute of Biomedical Imaging and Bioinformatics have partnered to build and maintain the Medical Information Mart for Intensive Care (MIMIC) database. This public-access database, which now holds clinical data from over 60,000 stays in BIDMC intensive care units, has been meticulously de-identified and is freely shared online with the research community via PhysioNet. It is an unparalleled research resource; close to 10,000 researchers from more than 70 countries have free access to the clinical data under data use agreements.
“LCP research incorporates physiology, computer science, engineering, and applied mathematics. Using modern approaches to modeling, signal processing, pattern recognition, and machine learning, the lab’s researchers develop and refine methods for analyzing data and for generating predictive models that will aid in patient care.”
The LCP is led by Distinguished Professor Roger Mark. Its research comprises two major projects:
- Physionet: one of the world’s largest, most comprehensive, and most widely used repositories of freely available recorded physiologic signals and high-resolution clinical ICU data.
- Critical Care Informatics: Improving health care through the generation of new clinical knowledge, new monitoring technology and decision support through the application of data science and machine learning to large collection of critical care data. Within this area of research, the LCP developed the MIMIC (Medical Information Mart for Intensive Care) Database comprising deidentified health data associated with around 40.000 critical care patients.
The LCP is the true organizer of Datathons, with its team of experts led by Dr. Celi, founder of Sana and Clinical Research Director at MIT-LCP, who is both a registered physicians and an expert data scientist.
MIT Critical Care