MIT-Vietnam Clinical Intelligence Datathon 2026
April 2025
Background
In recent years, artificial intelligence (AI) has demonstrated growing potential in improving
health system performance through enhanced diagnostics, resource optimization, and clinical
decision support. However, the benefits of AI remain unevenly distributed, with low- and
middle-income countries (LMICs) frequently excluded from both the development and imple-
mentation of these technologies. This disparity stems from multiple systemic challenges: limited
access to local datasets, a shortage of technical expertise, and the absence of ethical frameworks
adapted to local contexts. These limitations have reinforced a cycle of dependency on externally-
developed models that may not reflect the epidemiological and sociocultural characteristics of
LMIC populations.
Vietnam presents a compelling context to address these inequities. With an evolving health
system, a growing burden of noncommunicable diseases (NCDs), and increasing investment in
digital transformation, the country is positioned to lead regionally in the ethical and context-
sensitive use of AI in health. Despite this potential, critical challenges remain, including geo-
graphic disparities in healthcare access, underdeveloped clinical informatics infrastructure, and
limited opportunities for interdisciplinary training in AI for healthcare.
The MIT-Vietnam Clinical Intelligence Datathon seeks to address these structural gaps
through an immersive, problem-solving platform that convenes Vietnamese healthcare profes-
sionals, data scientists, students, social scientists, and policymakers. Guided by the Critical
Data Village model developed at MIT, the event emphasizes co-creation, equity, and sustain-
ability. This Datathon represents a novel application of participatory research methods to
advance AI-enabled health solutions tailored to the needs and realities of LMIC settings.
Goals and Objectives
The overarching goal of the MIT-Vietnam Clinical Intelligence Datathon is to foster equitable
and context-responsive innovation in health AI through interdisciplinary collaboration, skill-
building, and community engagement. To achieve this, the Datathon is structured around the
following objectives:
Objective 1: Knowledge and Skill Development
Equip over 100 participants from Vietnam and international institutions with practical
skills in data analysis, machine learning, and ethical considerations for health AI through
workshops, mentorship, and applied team-based problem solving.
Objective 2: Contextualized Model Development
Facilitate the development of AI solutions using de-identified patient datasets from Viet-
namese healthcare institutions, ensuring that prototypes are grounded in the epidemio-
logical and systemic realities of local clinical practice.
Objective 3: Institutional Partnership and Capacity Building
Establish lasting collaborations between local hospitals, universities, and international
research institutions to support ongoing data access, workforce training, and joint publi-
cations.
Objective 4: Dissemination and Policy Integration
Produce a Datathon proceedings report, academic manuscripts, and policy briefs to in-
form national digital health strategies and facilitate the scale-up of promising prototypes
through existing Ministry of Health programs.
Program Design and Methods
The Datathon will be organized as a two-day, in-person event hosted in Hanoi in January
2026. It will be preceded by a virtual onboarding module and pre-event data ethics seminar
2
to ensure participants are adequately prepared. The Datathon itself will be structured around
collaborative team-based innovation, supported by technical and clinical mentorship.
Participant Profile: Participants will be organized into 15 interdisciplinary teams, each
composed of 7 individuals representing diverse domains—medicine, engineering, data science,
public health, pharmacy, and social science. Selection will prioritize regional representation,
gender equity, and diversity of experience.
Datathon Activities:
• Keynote sessions from regional and global leaders in digital health
• Technical workshops on data preprocessing, machine learning pipelines, and AI model
validation
• Mentorship sessions with experts from Harvard, MIT, and leading Vietnamese institutions
• Structured team time for dataset exploration, hypothesis generation, model building, and
iterative refinement
• Final pitch presentations evaluated by a panel of clinicians, AI researchers, and health
system leaders
Data Infrastructure: The event will use a comprehensive set of both local and interna-
tional datasets to support robust analysis and model development. Local data sources comprise
electronic health records (EHR) obtained from leading Vietnamese healthcare institutions, in-
cluding Saint Paul Hospital, University Medical Center (UMC) Ho Chi Minh City, and Hue
Central Hospital. Additionally, imaging data—specifically chest X-rays—has been sourced
from VinUniversity. To enhance generalizability and facilitate benchmarking against global
standards, the project will also utilize established, publicly available international datasets such
as MIMIC-IV, eICU, and MIMIC-CXR. The integration of these diverse datasets ensures a
rich foundation for conducting clinically relevant and scalable data science research aimed at
improving healthcare outcomes.
Post-Event Activities: Following the Datathon, selected teams will be invited to continue
prototype development through a virtual incubation program. Outputs will be documented in a
publicly accessible white paper and submitted to peer-reviewed journals. Institutional partners
will be encouraged to consider pilot testing solutions within their settings, with technical support
from the Critical Data network.
3
Event Timeline
Saturday, January 24, 2026
7:00 AM–8:00 AM Registration and Team Introductions
8:00 AM–8:15 AM Welcome and Overview
8:15 AM–8:45 AM Keynote Speaker
8:45 AM–9:00 AM Presentation
9:00 AM–9:15 AM Brief Overview of Datasets
9:15 AM–12:00 PM Datathon Teamwork
12:00 PM–1:00 PM Lunch
1:00 PM–4:00 PM Datathon Teamwork
4:00 PM–5:00 PM Team Report
5:00 PM–6:00 PM Optional: Extra Datathon Teamwork
Sunday, January 25, 2026
7:00 AM–8:00 AM Continental Breakfast
8:00 AM–12:00 PM Datathon Teamwork
12:00 PM–1:00 PM Lunch
1:00 PM–2:30 PM Datathon Team Presentations, Judging
2:30 PM-3:00PM Closing Remarks
Research Topics
The Datathon and subsequent research activities will center on a diverse set of critical issues
in clinical and translational medicine. Participants will be encouraged to pursue projects that
not only demonstrate methodological rigor but also contribute to broader societal goals, such
as reducing disparities and improving system-wide care delivery. The following thematic areas
will guide team focus and mentor alignment:
Clinical Applications
Projects may involve diagnostic support, treatment optimization, patient monitoring, and risk
stratification across inpatient, outpatient, emergency, and public health settings. Emphasis will
be placed on models and tools that can be translated into real-world clinical workflows and
decision-making.
4
Cross-Cutting Themes
Health Equity: Participants are encouraged to examine disparities in healthcare access, out-
comes, and representation within datasets. Solutions should explore mechanisms to improve
inclusivity in data collection and model performance across demographic subgroups.
AI Harm and Ethics: Teams may investigate algorithmic biases, unintended consequences
of predictive models, and the broader ethical implications of deploying AI in clinical practice.
This includes assessing transparency, explainability, and trust in AI systems used in healthcare
delivery.
Disease Areas of Focus While the Datathon is open to a wide range of health challenges,
specific support will be available for projects focusing on: cardiovascular diseases, oncology,
endocrinology, hematology, pharmacotherapy and medication safety.
Grading Criteria
Project submissions will be evaluated across four domains: Clinical, Data Science, Presenta-
tion, and Evaluation. Each domain includes specific components to guide scoring and provide
structure for judges’ feedback.
1. Clinical (1–5 points per subcategory)
The clinical dimension of the evaluation focuses on the project’s originality and its practical
relevance. Teams will be scored on the uniqueness and creativity of their approach to solving a
healthcare challenge. Judges will also assess how well the proposed solution could influence or
improve current clinical practices. High-scoring teams are expected to present ideas that not
only address a clear clinical need but also demonstrate potential for real-world application and
improvement in patient care.
2. Data Science (1–5 points per subcategory)
In the data science category, judges will examine both the innovation and technical soundness
of the analytical approach. Scoring will consider how well the team interprets and manages the
data provided, including handling of missing data and choice of modeling techniques. Teams
must also be evaluated on the appropriateness of their methodological choices—whether they
selected tools and models that align with the problem at hand, regardless of their complexity.
Projects that are methodologically rigorous and well-matched to the clinical question will be
rewarded.
5
3. Presentation (1–5 points per subcategory)
This category assesses the clarity, completeness, and interdisciplinary balance of the team’s
presentation. Judges will evaluate how well the clinical problem is described, particularly in
terms of importance, burden, and the role data science can play in addressing it. Additionally,
the strength of the data science framing—including the team’s articulation of their framework,
methods, and analysis—will be considered. Clear and compelling presentation of results is
essential, regardless of the performance of the solution itself. Teams are also expected to strike
an effective balance between clinical and technical content, showing an integrated understanding
across disciplines.
4. Evaluation (Yes/No questions)
The evaluation section explores the future potential and impact of the project. Judges will assess
whether the team has articulated a clear plan to continue development beyond the Datathon.
Further, the project should demonstrate its contribution to three key areas: patient outcomes
and risk stratification, health equity and diversity, and treatment optimization. Responses to
these components will be recorded as Yes or No. Teams that can show clear alignment with
these goals will be considered to have higher potential for real-world relevance and sustainability.
Partnership Strategy
Data Contributors: Participating hospitals and universities will receive institutional recogni-
tion and be included as co-authors on outputs where applicable. Data-sharing will be governed
by formal data use agreements and subject to local IRB approval.
Funding and Implementation Partners: Financial support will be sought from phil-
anthropic foundations, international donors, and corporate sponsors committed to equitable
digital health transformation. Sponsors will not influence the scientific content or outcomes of
the Datathon but may be recognized through branded materials and engagement opportunities.
Ministry and Policy Engagement: The Datathon will be developed in consultation
with the Vietnamese Ministry of Health to ensure alignment with national priorities under the
Digital Health Strategy (2020–2025). A steering committee will include representatives from
both governmental and academic sectors to support integration into broader health system
reform efforts.
6
Category Estimated Cost (USD)
International Travel and Accommodation $6,000
Venue Rental, Audio-Visual, and Supplies $2,000
Catering and Participant Meals $1,500
Award Fund and Certificates $1,500
Event Branding and Printing $1,000
Administrative and Volunteer Support $2,500
Total Estimated Budget $14,500
Budget Estimate
Expected Outcomes and Impact
The MIT-Vietnam Clinical Intelligence Datathon is expected to achieve both immediate and
long-term impacts across the Vietnamese health innovation ecosystem. Immediate outcomes
include increased technical competency among local clinicians and students, production of pro-
totype AI models adapted to Vietnamese health priorities, and strengthened institutional re-
lationships between local and global partners. Longer-term outcomes include the development
of sustainable AI training modules integrated into university curricula, increased participation
of LMIC institutions in global health data science research, and policy integration of successful
models into public sector digital health initiatives.
The event will be evaluated through a pre- and post-survey of participant knowledge and
confidence, qualitative feedback from stakeholders, and follow-up tracking of team progress.
Dissemination will include academic papers, conference presentations, and engagement with
local policymakers.
Conclusion
The MIT Critical Data Datathon: Vietnam 2026 represents a strategic and timely initiative
to advance ethical, inclusive, and context-sensitive health AI innovation in Southeast Asia.
Through capacity building, interdisciplinary collaboration, and sustained partnerships, the
event aims to contribute meaningfully to national and global efforts toward equitable digi-
tal health transformation. By empowering Vietnamese institutions and professionals to lead in
this space, the Hackathon aligns with international commitments to universal health coverage,
digital equity, and the Sustainable Development Goals.