← Back to all models
Regression

Medical Insurance

Uses age, BMI, and smoking status to predict a person's annual medical insurance charges.

Try It Yourself

Enter values below, then hit Predict to see what the model says.

Raw Dataset (original data)
agesexbmichildrensmokerregioncharges
Cleaned Dataset (the version we feed to the model)
AgeBMISmoker_YesCharges
The Code (how we built this model)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

df = pd.read_csv('medical_insurance_original.csv')

df = df[['Age', 'BMI', 'Smoker', 'Charges']]

df = df.dropna()

df = pd.get_dummies(df, drop_first=True)

X = df.drop('Charges', axis=1)
y = df['Charges']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(r2_score(y_test, y_pred))
Under the Hood (the equation the model learned)
Charges = −$11,708 + $259 × Age + $326 × BMI + $23,675 × Smoker

Smoking is the nuclear variable — being a smoker adds nearly $24K/year to your predicted charges, dwarfing age and BMI combined.

Try the Equation Yourself

Predicted Result