For small datasets, good as a baseline, easy to explain.
for n_neighbors in neighbors_settings:
# build the model
clf = KNeighborsClassifier(n_neighbors=n_neighbors)
clf.fit(X_train, y_train)
# record training set accuracy
training_accuracy.append(clf.score(X_train, y_train))
# record generalization accuracy
test_accuracy.append(clf.score(X_test, y_test))
plt.plot(neighbors_settings, training_accuracy, label="training accuracy")
plt.plot(neighbors_settings, test_accuracy, label="test accuracy")
score method of KNeighborsRegressor returns the R^2 score, known as the coefficient of determination, from 0 to 1.Go-to as a first algorithm to try, good for very large datasets, good for very high-dimensional data
$$ Å· = w[0] * x[0] + w[1] * x[1] + ... + w[p] * x[p] + b $$
from sklearn.linear_model import LinearRegression
X, y = mglearn.datasets.make_wave(n_samples=60)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
lr = LinearRegression().fit(X_train, y_train)
<aside> 💡 To prevent overfitting, penalizing the magnitude of coefficients of features and minimizing the error between predicted and actual observations is needed. These are called ‘regularization’ techniques.
</aside>