‎ ‎

Devi Kanumilli: West Windsor Plainsboro High School North

Ishaan Arya: West Windsor Plainsboro High School North

Wasim Vorvoi: South Brunswick High School

Tracks:

03: AI That Actually Helps People

05: Open-ended

Inspiration

One of us spent a summer working and interacting with a healthcare research group. It struck us how many good clinical ideas die on a whiteboard, untried. An intern would have an insight from reviewing data and observing patterns, which they could possibly use to determine which patients would be readmitted from a few basics labs. However, their ideas would never come to fruition. Turning an insight into an actual model requires technical knowledge, a data scientist, the verifying and proofing of data, learning many complicated frameworks, and managing data privacy. Data privacy is especially important as in the healthcare industry, adherence to privacy regulations is crucial, and failure to comply can result in major lawsuits and financial devastation.

Tools which exist to facilitate the process of model creation fall short in several important ways. Many of them require data to be uploaded to the cloud, where they themselves may analyze this data. This violates the safety of personally identifiable information. Furthermore, these platforms are often highly technical and complex, with steep learning curves. These factors establish a high barrier of entry.

Our solution was developed to remove this barrier of entry. The user, in normal terms inputs their idea, and our product locally comes up with a working prototype and an honest evaluation of its effectiveness and accuracy. Sensitive information never even leaves the laptop.

What it does

DocLAB's training runs entirely locally, on your machine. You input in normal terms your insight and what your goal is.

From there it:

  • finds a relevant dataset, but only from a list we curated by hand ahead of time (we made it so it can't go off and grab some random dataset for privacy reasons, and it can't make one up),
  • shows you a plain-English plan and makes you approve it before anything runs,
  • trains the model locally while you watch the progress, and
  • hands back the metrics plus a model card a doctor can actually read.

It covers three kinds of data. Tabular data goes through XGBoost, and we always report the accuracy next to a majority-class baseline so you can see if the model is doing anything. Images go through a small CNN, and if the dataset is tiny the card warns you that the accuracy has the potential to overfit. Text goes through a small language model fine-tuned with LoRA, scored with ROUGE-L plus three real before and after examples to establish trust rather than blind faith.

Every run gets saved to a local history you can go back to, and every model card says, in no uncertain terms, that this is for research and prototyping and not for actual patient care.

THIS IS NOT MEANT FOR VIBE-TRAINING AN AI ! AI research is complex and this does not try to replace it. This is SOLELY for a proof of concept/prototyping that integrates non technical individuals.

How we built it

DocLab screenshot

It's a desktop app, with 4 components which integrate with each other seamlessly.

The front end is React + Tailwind running inside Tauri, with five screens: where you type your goal, where you review the plan, the training screen, the results, and the history. We were careful to keep the wording clinical ("predict," "classify," "summarize") and keep model names out of the main flow.

Rust handles the orchestration. It owns the job, the file paths, and a SQLite database that records every experiment. The actual training is a separate Python program that uses XGBoost, PyTorch, and Transformers with LoRA depending on what you asked for.

Boring solutions are often the most secure. Rust and Python programs don't talk over any live connection. The whole conversation occurs locally, on the disk, to ensure security and air gap our product.

Rust  ──writes──►  plan.json   ──►  Python worker
Python ──writes──► metrics.json ──►  Rust

This interaction through 2 files on disk keeps the two halves interacting with each other continously and cleanly, along with leaving a clean paper trail which can be inspected for troubleshooting purposes.

We built it in order rather than all at once. We got the tabular path rock solid first, then added images and text on top of the exact same loop instead of rebuilding it each time. On a Mac, deep learning workloads utilize the GPU via MPS by default, falling back to the CPU as a form of graceful degradation.

Challenges we ran into

The biggest one was resisting the urge to make the agent feel "smart." It would have been more impressive to let it search the web for datasets, but for healthcare that's a terrible idea, so we locked it to the curated list. It physically cannot pick something we didn't vet.

The second was that accuracy lies. A model that just guesses the most common class can look 85% accurate and be completely useless. We ended up making the baseline comparison a real feature instead of a footnote, and if the model barely beats it, the card says the model may not be learning anything. Same idea drove the small-dataset warning on the image side.

Getting three different runtimes to agree on anything was a PAIN. We had to nail down the exact file formats and a fixed set of error codes early, mostly so that when Python failed, the user saw a UI error message instead of the whole thing crashing (we had a lot of that)

Accomplishments that we're proud of

All three paths actually work end to end. We expected to ship only the tabular one, but images and text both work on the same machinery, and a full run finishes in a couple of minutes on a laptop (cuz of low epochs ofc).

We also kept the privacy promise. It's local because of how it's built the whole AI is trained on the users computer.

And we like that a clinician (user) much technical details. The whole thing speaks their language up front while still being completely reproducible underneath.

What we learned

Mostly that the limits we put on ourselves made the thing better. Forcing the agent to stick to a fixed dataset list and forcing the two halves to talk only through files made the whole system way easier to reason about than something more "autonomous" would have been.

We also came around on the idea that in medicine, a model telling you "this probably isn't learning anything" is worth more than one showing a big confident number. We spent almost as much time on the sanity checks as on the models themselves.

And keeping the layers genuinely separate paid off faster than we expected. Adding the CNN and the LoRA path didn't mean touching the Rust orchestration at all, because they only ever communicated through those JSON files.

We both got better at rust but we still depended on Claude Code and Cursor to push through.

What's next for DocLab

We'd like to widen the dataset marketplace a lot, with more curated public and synthetic datasets across more conditions and types of data.

Beyond that, more kinds of tasks (regression, multi-class, image segmentation) without changing the "just describe it" ease of access. We also want the model cards to go deeper, with things like calibration and fairness breakdowns across patient subgroups, so a clinician can actually poke at why a model behaves the way it does.

Longer term, we want project workspaces that group related experiments so you can watch a hypothesis evolve over many runs, and eventually some carefully privacy-preserving way to fine-tune on de-identified real data, always keeping the one rule we started with: training NEVER leaves the machine.

Built With

Share this project:

Updates