How to make a KNN model in Python

RandomResearchAI
2 min readJul 21, 2023

--

Creating a K-Nearest Neighbors (KNN) model with split data involves the following steps:

1. Import necessary libraries.
2. Load and preprocess the dataset.
3. Split the dataset into training and testing sets.
4. Create and train the KNN model using the training data.
5. Evaluate the model’s performance using the testing data.

Let’s walk through each step in Python:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 2: Load and preprocess the dataset

# Load your dataset here, replace 'X' with your feature matrix and 'y' with your target variable.

# Make sure 'X' contains only numeric features and 'y' contains the corresponding labels.

# Example: X, y = load_your_dataset()

# Step 3: Split the dataset into training and testing sets

# Use the train_test_split function from scikit-learn to split the dataset.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Create and train the KNN model using the training data

# Instantiate the KNeighborsClassifier and specify the number of neighbors (k).

knn_model = KNeighborsClassifier(n_neighbors=5) # You can change the value of 'k' as needed.

knn_model.fit(X_train, y_train)

# Step 5: Evaluate the model's performance using the testing data

# Make predictions on the test data.

y_pred = knn_model.predict(X_test)

# Calculate the accuracy of the model.

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy: {:.2f}%".format(accuracy * 100))

# Display the classification report and confusion matrix.

print("Classification Report:")

print(classification_report(y_test, y_pred))

print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

In this example, we used scikit-learn’s `train_test_split` function to split the dataset into training and testing sets. The `test_size` parameter specifies the percentage of the data that will be used for testing, while the `random_state` parameter ensures the reproducibility of the results. We then created a KNN model with `KNeighborsClassifier`, specifying `n_neighbors` as the number of neighbors to consider.

After training the model with the training data, we used it to make predictions on the testing data (`X_test`). We then calculated the accuracy of the model by comparing the predicted labels (`y_pred`) with the true labels (`y_test`). Additionally, we used the `classification_report` and `confusion_matrix` functions to further evaluate the model’s performance.

Remember to replace `’X’` and `’y’` with your actual feature matrix and target variable, respectively. Additionally, feel free to adjust the value of `n_neighbors` to find the optimal value for your specific dataset.

--

--

RandomResearchAI
RandomResearchAI

No responses yet