Medical Insurance — Machine Learning Showcase

Try It Yourself

Enter values below, then hit Predict to see what the model says.

Raw Dataset (original data)

age	sex	bmi	children	smoker	region	charges

Cleaned Dataset (the version we feed to the model)

Age	BMI	Smoker_Yes	Charges

The Code (how we built this model)

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

df = pd.read_csv('medical_insurance_original.csv')

df = df[['Age', 'BMI', 'Smoker', 'Charges']]

df = df.dropna()

df = pd.get_dummies(df, drop_first=True)

X = df.drop('Charges', axis=1)
y = df['Charges']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(r2_score(y_test, y_pred))

Under the Hood (the equation the model learned)

Charges = −$11,708 + $259 × Age + $326 × BMI + $23,675 × Smoker

Smoking is the nuclear variable — being a smoker adds nearly $24K/year to your predicted charges, dwarfing age and BMI combined.

Try the Equation Yourself

Age (years)

BMI

Smoker

Predicted Result —