Data Preprocessing, Cross Validation
and NLP (Urdu-English Mix Notes)
1. Data Preprocessing Techniques ()ڈیٹا پری پروسیسنگ تکنیک
Definition:
Data preprocessing machine learning ka pehla aur important step hota hai jisme raw
data ko clean, organize aur usable format mein convert kiya jata hai.
Importance (Ahmiyat):
ML models ganda, incomplete ya inconsistent data se sahi learn nahi karte.
Accuracy, performance aur training speed improve hoti hai.
Bias aur overfitting ka chance kam hota hai.
1. Main Techniques:
Data Cleaning: Ghalat, missing ya duplicate data ko fix karna ya hata dena.
(Example: missing values ko fill karna - mean/median)
Data Integration: Multiple sources ka data combine karna aik consistent dataset
bananay ke liye. (Example: merge karna using keys)
Data Reduction: Data size reduce karna without losing important info. (Example:
PCA for dimensionality reduction)
Data Transformation: Data ko suitable format mein convert karna mining/analysis ke
liye. (Example: Normalization)
Transformation ke techniques:
Min/Max Scaler
Standard Scaler
Max Abs Scaler
Robust Scaler
2. Cross Validation ()کراس ویلیڈیشن
Definition:
Cross-validation aik statistical technique hai jo ML model ki performance ko unseen data
pe test karti hai.
2. Fayde (Key Benefits):
Hyperparameter Tuning: Best parameters choose karne mein madad milti hai.
Model Evaluation: Realistic performance metric milta hai.
Model Selection: Best model select kar sakte hain.
Overfitting Detection: Check kar sakte hain model sirf training pe acha to nahi chal
raha.
3. K-Fold Cross Validation:
Data ko K parts mein split karte hain.
Har dafa K-1 folds training ke liye, 1 fold testing ke liye.
Process K times repeat hoti hai aur average score nikalte hain.
4. K-Fold in Regression:
Continuous values predict karta hai.
Metrics: MAE, R²
5. K-Fold in Classification:
Categories ya labels predict karta hai.
Metrics: Accuracy, Precision, Recall, F1-Score etc.
3. NLP (Natural Language Processing) کیا ہے؟
Definition:
NLP AI ka aik subfield hai jo computer ko human language samajhne, interpret karne
aur generate karne mein madad deta hai.
6. Components of NLP:
NLU (Natural Language Understanding): Language ko samajhna — jaise intent
detect karna, grammar parse karna.
NLG (Natural Language Generation): Meaningful text banana — jaise
summarization, text completion, chatbot replies.
NLP Tools:
NLTK: Basic NLP tasks ke liye (Educational purposes).
spaCy: Fast aur industrial-level NLP toolkit.