← Back to all models
Regression

House Sale Prices

Uses living area, overall quality rating, and basement size to predict the sale price of a home.

Try It Yourself

Enter values below, then hit Predict to see what the model says.

Raw Dataset (original data)
Unnamed: 0OrderPIDMS SubClassMS ZoningLot FrontageLot AreaStreetAlleyLot ShapeLand ContourUtilitiesLot ConfigLand SlopeNeighborhoodCondition 1Condition 2Bldg TypeHouse StyleOverall QualOverall CondYear BuiltYear Remod/AddRoof StyleRoof MatlExterior 1stExterior 2ndMas Vnr TypeMas Vnr AreaExter QualExter CondFoundationBsmt QualBsmt CondBsmt ExposureBsmtFin Type 1BsmtFin SF 1BsmtFin Type 2BsmtFin SF 2Bsmt Unf SFTotal Bsmt SFHeatingHeating QCCentral AirElectrical1st Flr SF2nd Flr SFLow Qual Fin SFGr Liv AreaBsmt Full BathBsmt Half BathFull BathHalf BathBedroom AbvGrKitchen AbvGrKitchen QualTotRms AbvGrdFunctionalFireplacesFireplace QuGarage TypeGarage Yr BltGarage FinishGarage CarsGarage AreaGarage QualGarage CondPaved DriveWood Deck SFOpen Porch SFEnclosed Porch3Ssn PorchScreen PorchPool AreaPool QCFenceMisc FeatureMisc ValMo SoldYr SoldSale TypeSale ConditionSalePrice
Cleaned Dataset (the version we feed to the model)
Gr Liv AreaOverall QualTotal Bsmt SFSalePrice
The Code (how we built this model)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

df = pd.read_csv('house_prices_original.csv')

df = df[['Gr Liv Area', 'Overall Qual', 'Total Bsmt SF', 'SalePrice']]

df = df.dropna()

df = pd.get_dummies(df, drop_first=True)

X = df.drop('SalePrice', axis=1)
y = df['SalePrice']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(r2_score(y_test, y_pred))
Under the Hood (the equation the model learned)
Price = −$110,127 + $55 × Sq Ft + $26,223 × Quality Rating + $46 × Basement Sq Ft

Quality rating is massive — each point on the 1–10 scale adds $26K. Living area adds $55/sq ft and basement adds $46/sq ft.

Try the Equation Yourself

Predicted Result