0% found this document useful (0 votes)
20 views3 pages

NLP CrossValidation DataPreprocessing UrduEnglish

The document discusses data preprocessing techniques essential for machine learning, emphasizing the importance of cleaning, integrating, reducing, and transforming data to improve model accuracy and performance. It also covers cross-validation methods, particularly K-Fold, which help in model evaluation, hyperparameter tuning, and detecting overfitting. Additionally, it introduces Natural Language Processing (NLP) as a subfield of AI focused on enabling computers to understand and generate human language, detailing its components and tools.

Uploaded by

sharibahmed497
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views3 pages

NLP CrossValidation DataPreprocessing UrduEnglish

The document discusses data preprocessing techniques essential for machine learning, emphasizing the importance of cleaning, integrating, reducing, and transforming data to improve model accuracy and performance. It also covers cross-validation methods, particularly K-Fold, which help in model evaluation, hyperparameter tuning, and detecting overfitting. Additionally, it introduces Natural Language Processing (NLP) as a subfield of AI focused on enabling computers to understand and generate human language, detailing its components and tools.

Uploaded by

sharibahmed497
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Data Preprocessing, Cross Validation

and NLP (Urdu-English Mix Notes)


1. Data Preprocessing Techniques (‫)ڈیٹا پری پروسیسنگ تکنیک‬
Definition:
Data preprocessing machine learning ka pehla aur important step hota hai jisme raw
data ko clean, organize aur usable format mein convert kiya jata hai.

 Importance (Ahmiyat):
 ML models ganda, incomplete ya inconsistent data se sahi learn nahi karte.
 Accuracy, performance aur training speed improve hoti hai.
 Bias aur overfitting ka chance kam hota hai.

1. Main Techniques:

 Data Cleaning: Ghalat, missing ya duplicate data ko fix karna ya hata dena.
(Example: missing values ko fill karna - mean/median)
 Data Integration: Multiple sources ka data combine karna aik consistent dataset
bananay ke liye. (Example: merge karna using keys)
 Data Reduction: Data size reduce karna without losing important info. (Example:
PCA for dimensionality reduction)
 Data Transformation: Data ko suitable format mein convert karna mining/analysis ke
liye. (Example: Normalization)
 Transformation ke techniques:

 Min/Max Scaler
 Standard Scaler
 Max Abs Scaler
 Robust Scaler

2. Cross Validation (‫)کراس ویلیڈیشن‬


Definition:
Cross-validation aik statistical technique hai jo ML model ki performance ko unseen data
pe test karti hai.

2. Fayde (Key Benefits):

 Hyperparameter Tuning: Best parameters choose karne mein madad milti hai.
 Model Evaluation: Realistic performance metric milta hai.
 Model Selection: Best model select kar sakte hain.
 Overfitting Detection: Check kar sakte hain model sirf training pe acha to nahi chal
raha.

3. K-Fold Cross Validation:

 Data ko K parts mein split karte hain.


 Har dafa K-1 folds training ke liye, 1 fold testing ke liye.
 Process K times repeat hoti hai aur average score nikalte hain.

4. K-Fold in Regression:

 Continuous values predict karta hai.


 Metrics: MAE, R²

5. K-Fold in Classification:

 Categories ya labels predict karta hai.


 Metrics: Accuracy, Precision, Recall, F1-Score etc.

3. NLP (Natural Language Processing) ‫کیا ہے؟‬


Definition:
NLP AI ka aik subfield hai jo computer ko human language samajhne, interpret karne
aur generate karne mein madad deta hai.

6. Components of NLP:

 NLU (Natural Language Understanding): Language ko samajhna — jaise intent


detect karna, grammar parse karna.
 NLG (Natural Language Generation): Meaningful text banana — jaise
summarization, text completion, chatbot replies.
 NLP Tools:

 NLTK: Basic NLP tasks ke liye (Educational purposes).


 spaCy: Fast aur industrial-level NLP toolkit.

You might also like