0% found this document useful (0 votes)
6 views2 pages

Python Ds ML Guide

The document provides an overview of key concepts in Python for Data Science and Machine Learning, focusing on data selection in pandas using loc and iloc, data wrangling techniques, and basic NumPy operations. It also covers machine learning processes with scikit-learn, including data splitting, preprocessing, and model fitting. Examples of code snippets illustrate the practical application of these concepts.

Uploaded by

freyalivanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Python Ds ML Guide

The document provides an overview of key concepts in Python for Data Science and Machine Learning, focusing on data selection in pandas using loc and iloc, data wrangling techniques, and basic NumPy operations. It also covers machine learning processes with scikit-learn, including data splitting, preprocessing, and model fitting. Examples of code snippets illustrate the practical application of these concepts.

Uploaded by

freyalivanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Python for Data Science & Machine Learning

1. Data Selection in pandas: loc vs iloc

- `loc` is label-based: it selects rows/columns using labels (names).

- `iloc` is integer-position based: it selects rows/columns using index positions.

import pandas as pd

df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35]
}, index=['a', 'b', 'c'])

print(df.loc['a']) # Row with index label 'a'


print(df.iloc[0]) # First row (position 0)

2. Data Wrangling with pandas

- Handling missing values

- Renaming, filtering, grouping

- Applying functions to columns

df.dropna() # Remove rows with NaNs


df.fillna(0) # Replace NaNs with 0
df.rename(columns={'age': 'Age'})
df[df['age'] > 25] # Filter rows
df.groupby('name').mean() # Group by
df['age'].apply(lambda x: x+1) # Apply function

3. NumPy Basics

NumPy provides fast numerical operations on arrays.


Python for Data Science & Machine Learning

import numpy as np

arr = np.array([1, 2, 3])


print(arr.mean())
print(np.arange(0, 10, 2)) # Create range of numbers

4. Machine Learning with scikit-learn

- Splitting data

- Preprocessing

- Fitting models

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train, y_train)
print(model.score(X_test, y_test))

You might also like