Predictive Models

Built for Tuva Analytics Teams

In-Warehouse Predictive Intelligence

Built by healthcare data engineers for analytics teams, Illuminate Predictive Models removes the need to master ML pipelines, point-in-time feature engineering, or model deployment. Instead of spending six figures on vendor risk scores or months building custom ML infrastructure, you get production-ready predictions that run inside your existing dbt workflow.

No external ML platform required: runs inside your existing dbt workflow
No opaque scoring logic: transparent features, diagnostics, and model metadata
No selection bias: trained on your population, your cost structure, and your data completeness

What You Can Predict

Out-of-the-box spend and utilization models, plus configurable targets for your own workflows.

Total Spend

Expected paid amount per member over customizable time horizons

Inpatient Utilization

Predicted encounter rates for acute inpatient admissions

Emergency Department Visits

ED encounter probability and expected frequency

SNF Utilization

Skilled nursing facility encounter predictions

Custom Targets

Fully configurable target policy for any encounter type and time horizon

Overview

Illuminate Predictive Models makes it easy to train and deploy healthcare risk models without having to build ML infrastructure from scratch or depend on opaque third-party vendors. We train gradient-boosted models directly in your data warehouse on your own claims data, producing calibrated spend and utilization predictions as dbt tables with no external infrastructure required.

Comprehensive Feature Engineering

Demographics: Age, sex, race, state, enrollment tenure, and cold-start indicators
Utilization History: Paid amounts and encounter counts across 3/6/12-month lookback windows by encounter type
Chronic Conditions: CMS chronic condition assignments from both claims mart and raw diagnosis codes
HCC Risk Scores: Hierarchical Condition Category assignments normalized across payers and plan versions

Calibrated Probability Outputs

Count Thresholds: P(Y >= k), the probability of at least 1, 2, 3, or 5 encounters in a given category
Spend Percentiles: P(spend in top k%), the probability a member falls in the top 1% or 5% of spenders
Isotonic Calibration: Predictions calibrated to match aggregate actuals for reliable population-level estimates

Clinical and Operational Insights

Point-in-time feature construction with no lookahead bias or data leakage
Person-level train/test splits that prevent information leakage from overlapping monthly windows
Claims lag adjustment to account for incomplete recent claims data
Feature importance and fill-rate diagnostics to catch data quality issues early
Model registry with signature-based reuse to avoid unnecessary retraining

Purpose-Built for Tuva Users

Runs entirely within your dbt workflow with no Jupyter, Airflow, or external ML platforms
Trained on your population, your cost structure, your data completeness, with no selection bias
Separate models per data source for multi-payer environments
PHI-safe summary exports for non-technical stakeholders
Versioned model artifacts with full audit trail

Differentiation: Build vs Vendor vs Illuminate

Feature	Build In-House	Vendor Risk Scores	Illuminate Predictive Models
Training Data	Your own claims population, but requires substantial engineering investment	National averages that may not match your data	Your own claims population with no selection bias
Infrastructure	Pipeline orchestration, model hosting, serving, and monitoring all owned by your team	Separate ML platform, API integrations, or file transfers	Runs in your warehouse via dbt with zero external dependencies
Calibration	Must be designed and maintained internally	Requires manual adjustment factors for your population	Automatically calibrated to your actuals
Transparency	High if your team invests in diagnostics and documentation	Black-box scores with limited explainability	Full feature importance, fill rates, and diagnostics
Customization	Flexible but costly to build and maintain	Fixed model outputs, vendor-controlled roadmap	Configure targets, horizons, features, and thresholds via dbt vars
Updates	Dependent on internal roadmap and staffing	Annual or semi-annual vendor refresh cycles	Retrain anytime on fresh data with a single dbt run
Integration	Custom data products required for activation and BI	CSV drops, API calls, or proprietary formats	Native dbt tables in your warehouse, ready for downstream analytics

Quickstart Path

Add Tuva and illuminate_predictive_models to packages.yml and run dbt deps.
Set minimal vars in dbt_project.yml (for example ml_enabled: true).
Run dbt run --select package:illuminate_predictive_models.
Validate outputs in your ML schema before downstream operationalization.

Core Output Contract

Output Table	Description
`train_model_registry`	Train/reuse status, artifact URI, diagnostics, and model metadata for the current run
`predict_values`	Predicted values by person, anchor month, target definition, and prediction horizon
`predict_probabilities_long`	Threshold and percentile probability outputs, including P(Y >= k) and spend top-percent probabilities
`train_metrics_long`	Train/test evaluation metrics, including MAE, RMSE, R2, AUC, Brier, and logloss

Bring Predictive Modeling Into Your Existing Tuva Workflow

Keep your data, logic, and operational analytics in one place. Illuminate Predictive Models helps your team move from retrospective reporting to proactive risk targeting without adding a separate ML platform.

Book a Demo