Decision Trees

We do two things

We might also drop any rows missing values

data = my_date.dropna(axis = 0)

We store the prediction targets of the dataframe in to a data series

y = data.price

We then select some columns that we're going to use as our feeatures to predict

We can pick columns used to determine the target. But these might not always give the same results and we can compare our features.

features = ['rooms','bathrooom', 'acres', 'lattitude', 'longitude']

By convention we then store the features we're using in a var X

X = data[features]

scikit-learn is a typical model used for creating models

It has several different types of models

Building a model, typically has the following steps

Basic model

from sklearn.tree import DecisionTreeRegressor

data = DecisionTreeRegressor(random_state=1)

data.fit(X, y)

Above we make the import in to python

We use the decision tree regressor

random_state is an int value given so that we always select the same training data.

We then plug in our features and prediction target

We can then get a value for what our prediction target may be

data.predict(X.head())

Will be in the form of a series

eg [1035000. 1465000. 1600000. 1876000. 1636000.]