Try It Yourself
Enter values below, then hit Predict to see what the model says.
Raw Dataset (original data)
| age | sex | bmi | children | smoker | region | charges |
|---|
Cleaned Dataset (the version we feed to the model)
| Age | BMI | Smoker_Yes | Charges |
|---|
The Code (how we built this model)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
df = pd.read_csv('medical_insurance_original.csv')
df = df[['Age', 'BMI', 'Smoker', 'Charges']]
df = df.dropna()
df = pd.get_dummies(df, drop_first=True)
X = df.drop('Charges', axis=1)
y = df['Charges']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(r2_score(y_test, y_pred))
Under the Hood (the equation the model learned)
Charges = −$11,708 + $259 × Age + $326 × BMI + $23,675 × Smoker
Smoking is the nuclear variable — being a smoker adds nearly $24K/year to your predicted charges, dwarfing age and BMI combined.
Try the Equation Yourself
Predicted Result
—