Nearest neighbors

For small datasets, good as a baseline, easy to explain.

for n_neighbors in neighbors_settings:
	# build the model
	clf = KNeighborsClassifier(n_neighbors=n_neighbors)
	clf.fit(X_train, y_train)
	# record training set accuracy
	training_accuracy.append(clf.score(X_train, y_train))
	# record generalization accuracy
	test_accuracy.append(clf.score(X_test, y_test))
plt.plot(neighbors_settings, training_accuracy, label="training accuracy")
plt.plot(neighbors_settings, test_accuracy, label="test accuracy")

Linear models

Go-to as a first algorithm to try, good for very large datasets, good for very high-dimensional data

Linear models for regression

$$ Å· = w[0] * x[0] + w[1] * x[1] + ... + w[p] * x[p] + b $$

from sklearn.linear_model import LinearRegression
X, y = mglearn.datasets.make_wave(n_samples=60)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
lr = LinearRegression().fit(X_train, y_train)

<aside> 💡 To prevent overfitting, penalizing the magnitude of coefficients of features and minimizing the error between predicted and actual observations is needed. These are called ‘regularization’ techniques.

</aside>