Scikit-learn Code-Mixed Beginner Guide
What is Scikit-learn?
Scikit-learn holo Python er ekta powerful library ja diye supervised o unsupervised machine learning model
banaite paren. Ekhane data preprocessing, model building, evaluation etc. ekdom easy kore dewa hoyeche.
Scikit-learn Code-Mixed Beginner Guide
1. Data Load Kora
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
Ekhane X holo features (e.g. petal length, width), ar y holo labels (flower type).
Scikit-learn Code-Mixed Beginner Guide
2. Train-Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Ei step e data ke training ar testing part e divide kora hoy. Model shudhu train data diye shikhe, test data diye
performance measure kora hoy.
Scikit-learn Code-Mixed Beginner Guide
3. Preprocessing
Standardization:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Missing Value Handle:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_train = imputer.fit_transform(X_train)
Normalization:
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
X_train = normalizer.fit_transform(X_train)
Binarization:
from sklearn.preprocessing import Binarizer
binarizer = Binarizer(threshold=0.0)
X_bin = binarizer.fit_transform(X_train)
Label Encoding:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y = encoder.fit_transform(y)
Scikit-learn Code-Mixed Beginner Guide
4. Model Building
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
Ekhane KNN algorithm use kore model train kora holo.
Scikit-learn Code-Mixed Beginner Guide
5. Prediction
y_pred = model.predict(X_test)
Test data diye model predict kortese je flower ta kon class e pore.
Scikit-learn Code-Mixed Beginner Guide
6. Evaluation
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Ei step e bojha jay model koto bhalo predict kortese.
Scikit-learn Code-Mixed Beginner Guide
7. Grid Search
from sklearn.model_selection import GridSearchCV
params = {"n_neighbors": [1, 2, 3, 4]}
grid = GridSearchCV(model, param_grid=params, cv=4)
grid.fit(X_train, y_train)
print(grid.best_score_)
print(grid.best_params_)
Eta diye best parameter ber kora hoy.
Scikit-learn Code-Mixed Beginner Guide
8. PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)
X_pca = pca.fit_transform(X_train)
PCA holo dimensionality komanor technique.
Scikit-learn Code-Mixed Beginner Guide
Practice Task
1. Titanic dataset load kore preprocessing koro (handle missing values, encode categorical data).
2. Model build koro (e.g. Decision Tree, Logistic Regression).
3. Accuracy check koro.
4. Grid search use kore best parameter ber koro.
5. Confusion matrix o classification report print koro.