Shipping ML a hospital could actually use: building to FHIR

There’s a graveyard of healthcare ML projects that work beautifully in a Jupyter notebook and never touch a real patient. I wanted Heredicheck - a hereditary disease-risk model - to avoid that fate. The thing that decides whether a health model lives or dies usually isn’t accuracy. It’s interoperability.

The notebook-to-nowhere problem

A typical research model takes a clean CSV in and prints a number out. Real clinical systems don’t speak CSV. They speak FHIR (Fast Healthcare Interoperability Resources) - a standard for representing patients, conditions, observations, and family history as structured, exchangeable resources.

If your model can’t accept FHIR in and emit FHIR out, integrating it means a custom data-wrangling project for every hospital that wants it. That cost is why so many promising models never deploy.

Designing inputs and outputs around the standard

Instead of inventing my own schema and bolting on a converter later, I started from the FHIR resources the model would need to read and write:

Patient - the subject of the prediction.
FamilyMemberHistory - the relationships and conditions that drive hereditary risk.
Observation - where a computed risk score belongs, as a first-class result rather than a bespoke field.

Modelling the problem this way had a nice side effect: it forced clarity about what the model actually consumed. The FHIR resources became the contract between the ML and the outside world.

Why a graph neural network fit

Hereditary risk is relational - it propagates through family ties and shared genetic features. That maps almost directly onto FamilyMemberHistory, so the modelling approach (a graph neural network) and the data standard reinforced each other. The graph the GNN reasons over is essentially the family graph FHIR already describes.

Explainability is part of the interface

A risk score with no context is both alarming and useless. A clinician needs to know why. So the output wasn’t just a number - the frontend (Next.js + Framer Motion) paired each prediction with a plain-language account of the contributing factors. Treating explanation as a UI requirement, not an afterthought, is what made the result trustworthy.

What I’d tell my past self

Pick the integration standard before the model architecture. The standard constrains everything downstream; discovering it late means rework.
The boring interoperability work is the moat. Anyone can train a classifier. Making it honestly speak FHIR is what makes it deployable.
Build the explanation alongside the prediction, not as a later “nice to have.”

Accuracy gets you a good demo. Speaking the standard your users already use is what gets you into production.