A ClinicalBERT-based model to assess readmission risk at discharge by encoding discharge summaries and recent clinical notes.

The Challenges
Early-stage data scarcity and noisy clinical text made modeling difficult. Ensuring safety and avoiding bias were critical constraints.
Limited labeled data for readmission outcomes in early stages
High variability in unstructured clinical notes
Risk of bias across patient cohorts (HF, CKD severity)
Ensuring clinical safety and avoiding over-reliance on model outputs
The Strategy
We combined NLP with structured data and kept the model assistive, not prescriptive. Validation was done in shadow mode before any clinical exposure.
Fine-tuned ClinicalBERT on discharge summaries with outcome labels
Combined embeddings with structured features (LOS, prior admissions, labs)
Used risk-tiering (low/medium/high) instead of direct decisioning
Deployed in shadow mode for real-world validation
The Result:
The system significantly improved efficiency and risk stratification accuracy. It enabled focused care interventions while continuously improving with real-world data.
Identified high-risk patients within minutes vs ~1 hour manual review (~80% time reduction)
Achieved ~0.78–0.82 AUC in readmission prediction during validation
Enabled prioritization of top ~20–30% high-risk patients for intervention
Established continuous feedback loop improving model calibration over time


