Python Data Analysis
STEP 1: Install Python
For Windows:
1. Go to [Link]
2. Download Python (latest version)
3. IMPORTANT: Check "Add Python to PATH" during installation
4. Click "Install Now"
For Mac:
1. Open Terminal
2. Type: python3 --version (to check if already installed)
3. If not installed, download from [Link]
For Linux:
Python is usually pre-installed. Open terminal and type:
python3 --version
STEP 2: Install Required Libraries
Open Command Prompt (Windows) or Terminal (Mac/Linux) and paste these commands one
by one:
pip install numpy
Press Enter, wait for it to finish, then:
pip install pandas
Press Enter, wait for it to finish, then:
pip install matplotlib
Press Enter, wait for it to finish, then:
pip install seaborn
OR Install all at once:
pip install numpy pandas matplotlib seaborn
STEP 3: Choose Your Code Editor
Option 1: IDLE (Comes with Python)
Already installed with Python
Simple and easy for beginners
Open: Search "IDLE" in your computer
Option 2: Visual Studio Code (Recommended)
Download from: [Link]
Install Python extension
Create new file with .py extension
Option 3: Jupyter Notebook (Best for Data Analysis)
Install by running:
pip install jupyter
Start it by typing:
jupyter notebook
READY-TO-RUN CODE EXAMPLE
Example 1: Hello World with NumPy
Just copy this entire code and paste it in your editor, then run!
# This is your first data analysis program!
import numpy as np
# Create an array of numbers
numbers = [Link]([10, 20, 30, 40, 50])
print("My numbers:", numbers)
print("Average:", [Link]())
print("Maximum:", [Link]())
print("Minimum:", [Link]())
print("Sum:", [Link]())
# Do some math
doubled = numbers * 2
print("Doubled:", doubled)
squared = numbers ** 2
print("Squared:", squared)
Expected Output:
My numbers: [10 20 30 40 50]
Average: 30.0
Maximum: 50
Minimum: 10
Sum: 150
Doubled: [ 20 40 60 80 100]
Squared: [ 100 400 900 1600 2500]
📝 Example 2: Simple Data Table with Pandas
Copy and paste this complete code:
import pandas as pd
# Create a simple grade table
data = {
'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Math': [85, 92, 78, 95, 88],
'English': [90, 85, 92, 88, 94],
'Science': [88, 90, 85, 92, 91]
}
# Make it into a table
df = [Link](data)
# Show the table
print("GRADE TABLE:")
print(df)
print("\n")
# Calculate average for each student
df['Average'] = (df['Math'] + df['English'] + df['Science']) / 3
print("WITH AVERAGES:")
print(df)
print("\n")
# Find the best student
best_student = [Link][df['Average'].idxmax(), 'Student']
print(f"Best student: {best_student}")
# Class statistics
print(f"\nClass average in Math: {df['Math'].mean()}")
print(f"Highest Math score: {df['Math'].max()}")
print(f"Lowest Math score: {df['Math'].min()}")
Expected Output:
GRADE TABLE:
Student Math English Science
0 Alice 85 90 88
1 Bob 92 85 90
2 Charlie 78 92 85
3 David 95 88 92
4 Emma 88 94 91
WITH AVERAGES:
Student Math English Science Average
0 Alice 85 90 88 87.666667
1 Bob 92 85 90 89.000000
2 Charlie 78 92 85 85.000000
3 David 95 88 92 91.666667
4 Emma 88 94 91 91.000000
Best student: David
Class average in Math: 87.6
Highest Math score: 95
Lowest Math score: 78
📝 Example 3: Simple Bar Chart
Copy and paste this complete code:
import [Link] as plt
# Data for chart
fruits = ['Apples', 'Bananas', 'Oranges', 'Grapes', 'Mangoes']
sales = [45, 60, 38, 52, 48]
# Create bar chart
[Link](figsize=(10, 6))
[Link](fruits, sales, color='skyblue', edgecolor='black', linewidth=1.5)
# Add labels and title
[Link]('Fruits', fontsize=12)
[Link]('Sales (units)', fontsize=12)
[Link]('Fruit Sales This Week', fontsize=14, fontweight='bold')
# Add value labels on top of bars
for i, value in enumerate(sales):
[Link](i, value + 1, str(value), ha='center', fontweight='bold')
# Show the chart
[Link]()
What you'll see: A colorful bar chart showing fruit sales!
Example 4: Working with Real Data (CSV File)
First, let's create a sample CSV file:
import pandas as pd
# Create sample data
employee_data = {
'Name': ['John', 'Sarah', 'Mike', 'Lisa', 'Tom', 'Emma', 'David', 'Anna'],
'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales', 'HR', 'IT'],
'Age': [28, 32, 35, 29, 41, 26, 38, 30],
'Salary': [50000, 75000, 55000, 48000, 82000, 52000, 62000, 71000],
'Experience': [3, 7, 8, 4, 15, 2, 10, 6]
}
df = [Link](employee_data)
# Save to CSV file
df.to_csv('[Link]', index=False)
print("CSV file created: [Link]")
print("\nPreview:")
print(df)
Now, let's read and analyze it:
import pandas as pd
import [Link] as plt
# Read the CSV file
df = pd.read_csv('[Link]')
print("📊 EMPLOYEE DATA ANALYSIS\n")
print("=" * 50)
# 1. Show the data
print("\n1. EMPLOYEE DATA:")
print(df)
# 2. Basic statistics
print("\n2. BASIC STATISTICS:")
print(f"Total Employees: {len(df)}")
print(f"Average Salary: ${df['Salary'].mean():,.2f}")
print(f"Average Age: {df['Age'].mean():.1f} years")
print(f"Average Experience: {df['Experience'].mean():.1f} years")
# 3. Department breakdown
print("\n3. EMPLOYEES PER DEPARTMENT:")
print(df['Department'].value_counts())
# 4. Salary by department
print("\n4. AVERAGE SALARY BY DEPARTMENT:")
dept_salary = [Link]('Department')['Salary'].mean()
print(dept_salary)
# 5. Find highest paid employee
highest_paid = [Link][df['Salary'].idxmax()]
print(f"\n5. HIGHEST PAID EMPLOYEE:")
print(f"Name: {highest_paid['Name']}")
print(f"Department: {highest_paid['Department']}")
print(f"Salary: ${highest_paid['Salary']:,}")
# 6. Create visualization
[Link](figsize=(12, 5))
# Plot 1: Salary by Department
[Link](1, 2, 1)
dept_salary.plot(kind='bar', color=['#FF6B6B', '#4ECDC4', '#45B7D1'],
edgecolor='black')
[Link]('Average Salary by Department', fontweight='bold', fontsize=12)
[Link]('Department')
[Link]('Average Salary ($)')
[Link](rotation=45)
[Link](axis='y', alpha=0.3)
# Plot 2: Age Distribution
[Link](1, 2, 2)
[Link](df['Age'], bins=6, color='#95E1D3', edgecolor='black', alpha=0.7)
[Link]('Age Distribution', fontweight='bold', fontsize=12)
[Link]('Age')
[Link]('Number of Employees')
[Link](axis='y', alpha=0.3)
plt.tight_layout()
[Link]()
print("\n✅ Analysis complete!")
Example 5: Interactive Data Filtering
Copy and paste this complete code:
import pandas as pd
# Create sample product data
products = [Link]({
'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones',
'Webcam', 'Speaker', 'USB Drive', 'Hard Drive', 'Router'],
'Category': ['Computer', 'Accessory', 'Accessory', 'Computer',
'Accessory',
'Accessory', 'Accessory', 'Storage', 'Storage', 'Network'],
'Price': [899, 25, 75, 299, 89, 65, 120, 15, 85, 110],
'Stock': [15, 150, 80, 25, 60, 45, 30, 200, 40, 35],
'Rating': [4.5, 4.2, 4.3, 4.7, 4.4, 4.0, 4.1, 3.9, 4.3, 4.2]
})
print("🛍️PRODUCT INVENTORY SYSTEM\n")
print("=" * 70)
# Show all products
print("\n📦 ALL PRODUCTS:")
print(products.to_string(index=False))
# Filter: Products under $100
print("\n💰 PRODUCTS UNDER $100:")
cheap_products = products[products['Price'] < 100]
print(cheap_products[['Product', 'Price']].to_string(index=False))
# Filter: High rated products (4.3 and above)
print("\n⭐ HIGH RATED PRODUCTS (4.3+):")
top_rated = products[products['Rating'] >= 4.3]
print(top_rated[['Product', 'Rating']].to_string(index=False))
# Filter: Low stock products (less than 50)
print("\n⚠️ LOW STOCK ALERT (Less than 50 units):")
low_stock = products[products['Stock'] < 50]
print(low_stock[['Product', 'Stock']].to_string(index=False))
# Filter: Accessories only
print("\n🖱️ ACCESSORIES:")
accessories = products[products['Category'] == 'Accessory']
print(accessories[['Product', 'Price']].to_string(index=False))
# Multiple filters: Cheap AND high rated
print("\n🎯 BEST VALUE (Under $100 AND rating 4.2+):")
best_value = products[(products['Price'] < 100) & (products['Rating'] >= 4.2)]
print(best_value[['Product', 'Price', 'Rating']].to_string(index=False))
# Summary statistics
print("\n📊 SUMMARY STATISTICS:")
print(f"Total Products: {len(products)}")
print(f"Average Price: ${products['Price'].mean():.2f}")
print(f"Most Expensive: {[Link][products['Price'].idxmax(), 'Product']}
(${products['Price'].max()})")
print(f"Cheapest: {[Link][products['Price'].idxmin(), 'Product']} ($
{products['Price'].min()})")
print(f"Average Rating: {products['Rating'].mean():.2f} stars")
print(f"Total Inventory Value: ${(products['Price'] *
products['Stock']).sum():,.2f}")
Example 6: Simple Data Visualization Dashboard
Copy and paste this complete code:
import pandas as pd
import [Link] as plt
import numpy as np
# Create sample monthly sales data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [45000, 52000, 48000, 61000, 58000, 67000]
expenses = [30000, 32000, 31000, 38000, 36000, 41000]
profit = [s - e for s, e in zip(sales, expenses)]
# Create a dashboard with multiple charts
fig, axes = [Link](2, 2, figsize=(14, 10))
[Link]('📊 BUSINESS DASHBOARD - First Half 2024', fontsize=16,
fontweight='bold')
# Chart 1: Sales trend
axes[0, 0].plot(months, sales, marker='o', linewidth=3, markersize=10,
color='#2ecc71')
axes[0, 0].set_title('Monthly Sales Trend', fontweight='bold')
axes[0, 0].set_ylabel('Sales ($)')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].fill_between(months, sales, alpha=0.3, color='#2ecc71')
# Chart 2: Profit bars
axes[0, 1].bar(months, profit, color='#3498db', edgecolor='black',
linewidth=1.5)
axes[0, 1].set_title('Monthly Profit', fontweight='bold')
axes[0, 1].set_ylabel('Profit ($)')
axes[0, 1].grid(axis='y', alpha=0.3)
for i, v in enumerate(profit):
axes[0, 1].text(i, v + 500, f'${v:,}', ha='center', fontweight='bold')
# Chart 3: Sales vs Expenses comparison
x = [Link](len(months))
width = 0.35
axes[1, 0].bar(x - width/2, sales, width, label='Sales', color='#2ecc71',
edgecolor='black')
axes[1, 0].bar(x + width/2, expenses, width, label='Expenses',
color='#e74c3c', edgecolor='black')
axes[1, 0].set_title('Sales vs Expenses', fontweight='bold')
axes[1, 0].set_ylabel('Amount ($)')
axes[1, 0].set_xticks(x)
axes[1, 0].set_xticklabels(months)
axes[1, 0].legend()
axes[1, 0].grid(axis='y', alpha=0.3)
# Chart 4: Profit margin percentage
profit_margin = [(p/s)*100 for p, s in zip(profit, sales)]
axes[1, 1].plot(months, profit_margin, marker='s', linewidth=3, markersize=10,
color='#9b59b6')
axes[1, 1].set_title('Profit Margin %', fontweight='bold')
axes[1, 1].set_ylabel('Margin (%)')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].axhline(y=[Link](profit_margin), color='r', linestyle='--',
label=f'Average: {[Link](profit_margin):.1f}%')
axes[1, 1].legend()
plt.tight_layout()
[Link]()
# Print summary
print("\n💼 BUSINESS SUMMARY (Jan-Jun 2024)")
print("=" * 50)
print(f"Total Sales: ${sum(sales):,}")
print(f"Total Expenses: ${sum(expenses):,}")
print(f"Total Profit: ${sum(profit):,}")
print(f"Average Monthly Sales: ${[Link](sales):,.2f}")
print(f"Average Profit Margin: {[Link](profit_margin):.1f}%")
print(f"Best Month: {months[[Link](max(sales))]} (${max(sales):,})")
print(f"Growth Rate: {((sales[-1] - sales[0]) / sales[0] * 100):.1f}%")
Quick Troubleshooting
Problem: "ModuleNotFoundError: No module named 'numpy'"
Solution: You need to install the library. Open terminal/command prompt:
pip install numpy
Problem: "pip is not recognized"
Solution: Python not added to PATH. Reinstall Python and check "Add to PATH"
Problem: Code doesn't run
Solution:
1. Make sure you saved the file with .py extension
2. Run it by pressing F5 (in IDLE) or clicking Run button
3. Or open terminal and type: python [Link]
Problem: Chart doesn't show
Solution: Add this at the end of your code:
[Link]()
What to Do Next
1. Start with Example 1 - Get comfortable with NumPy
2. Try Example 2 - Learn about data tables
3. Move to Example 3 - Create your first chart
4. Practice Example 4 - Work with CSV files
5. Experiment! - Change numbers, add more data, try different colors