0% found this document useful (0 votes)
11 views5 pages

Machine Learning - Lab Exercise Topics

Machine Learning - Lab Exercise Topics

Uploaded by

healthvisionxz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Machine Learning - Lab Exercise Topics

Machine Learning - Lab Exercise Topics

Uploaded by

healthvisionxz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Artificial Intelligence & Machine

Learning Comprehensive Assessment


(ML-ADVSA-400)
Duration: 17 hours

Part 1 (555 pts): Written Exam (Section A, B and C) – 7 hours

Part 2 (745 pts): Lab Exam – 8 hours

Part 3 (200 pts): Challenge and Defend – 2 hours

Part 2: Lab Exam (745 points)


Objective
Build, evaluate, and interpret a binary classification model for cancer detection using a neural
network-based approach. The dataset contains 30 numerical features per patient record.

Task 1: Data Preparation and Exploration (1 hour)


**Goals:** Understand the data distribution, ensure data quality, and prepare inputs for ML
model.

- Load the Breast Cancer Wisconsin dataset.


- Perform EDA: shape, nulls, outliers, class balance.
- Visualize key features, feature correlations, and target distribution.
- Normalize features using `StandardScaler`.

**Expanded Response:**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
df = X.copy()
df['target'] = y

# Check for missing values


print("Missing values:\n", df.isnull().sum())

# Visualize class distribution


sns.countplot(x='target', data=df)
plt.title('Class Distribution (0 = Malignant, 1 = Benign)')
plt.show()

# Correlation heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(df.corr(), cmap='coolwarm')
plt.title('Feature Correlation Heatmap')
plt.show()

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

Task 2: Model Architecture and Training (2 hours)


**Goals:** Build and train a simple neural network with appropriate configurations.

- Create 3-layer MLP (64-32-1 neurons) with dropout and batch normalization.
- Use `ReLU` for hidden layers and `Sigmoid` for output.
- Use `BCELoss` for binary classification.
- Implement training loop with early stopping and validation split.

**Expanded Response (PyTorch):**

```python
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

tensor_x = torch.tensor(X_train, dtype=torch.float32)


tensor_y = torch.tensor(y_train.values.reshape(-1, 1), dtype=torch.float32)
train_data = torch.utils.data.TensorDataset(tensor_x, tensor_y)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)
class CancerNet(nn.Module):
def __init__(self):
super(CancerNet, self).__init__()
self.model = nn.Sequential(
nn.Linear(30, 64),
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(64, 32),
nn.BatchNorm1d(32),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(32, 1),
nn.Sigmoid()
)

def forward(self, x):


return self.model(x)

model = CancerNet()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.BCELoss()

# Training loop (simplified)


for epoch in range(100):
model.train()
for xb, yb in train_loader:
optimizer.zero_grad()
pred = model(xb)
loss = criterion(pred, yb)
loss.backward()
optimizer.step()
```

Task 3: Model Evaluation (1 hour)


**Goals:** Assess model performance using classification metrics and diagnostic plots.

- Predict on validation data and compute metrics.


- Plot confusion matrix and ROC curve.
- Discuss model balance between sensitivity (recall) and specificity.

**Expanded Response:**

```python
from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix,
ConfusionMatrixDisplay

# Validation set
model.eval()
x_val_tensor = torch.tensor(X_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val.values.reshape(-1, 1), dtype=torch.float32)
y_pred = model(x_val_tensor).detach().numpy()

# Metrics
print(classification_report(y_val, y_pred > 0.5))
print("AUC Score:", roc_auc_score(y_val, y_pred))

# Confusion Matrix
cm = confusion_matrix(y_val, y_pred > 0.5)
ConfusionMatrixDisplay(cm).plot()
plt.title("Confusion Matrix")
plt.show()
```

Task 4: Interpretability and Debugging (1 hour)


**Goals:** Interpret model behavior using SHAP to uncover key predictive features.

- Use SHAP to visualize local and global explanations.


- Identify features influencing predictions for a specific patient.
- Provide interpretability summary.

**Expanded Response:**

```python
import shap
explainer = shap.Explainer(model, torch.tensor(X_scaled, dtype=torch.float32))
shap_values = explainer(torch.tensor(X_scaled[:100], dtype=torch.float32))

# Global view
shap.plots.beeswarm(shap_values)

# Local explanation
shap.plots.waterfall(shap_values[0])
```

**Analysis:**
Top features contributing to predictions include 'worst perimeter', 'mean concave points', and
'mean area'. SHAP confirms alignment with known clinical biomarkers.

Task 5: Error Analysis and Model Improvement (1 hour)


**Goals:** Identify weaknesses and propose enhancements.

- Examine incorrect predictions.


- Analyze decision boundaries.
- Suggest two improvement strategies with rationale.

**Expanded Analysis:**
- Many errors occur on ambiguous borderline cases.
- Misclassified malignant cases have feature overlap with benign class.

**Suggestions:**
1. **SMOTE for imbalance:** Minority class (malignant) can be synthetically expanded.
2. **Deeper CNN-like 1D layers:** If sequential/structural patterns emerge in features, a hybrid
model (e.g., MLP + CNN) could increase capacity.

You might also like