0% found this document useful (0 votes)
7 views9 pages

Python Data Analysis

This document provides a comprehensive guide for performing data analysis using Python, including installation steps for Python and necessary libraries like NumPy, Pandas, Matplotlib, and Seaborn. It includes example codes for basic data analysis tasks, such as creating arrays, data tables, visualizations, and working with CSV files. Additionally, it offers troubleshooting tips for common issues encountered during setup and execution.

Uploaded by

lucasmmartin02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

Python Data Analysis

This document provides a comprehensive guide for performing data analysis using Python, including installation steps for Python and necessary libraries like NumPy, Pandas, Matplotlib, and Seaborn. It includes example codes for basic data analysis tasks, such as creating arrays, data tables, visualizations, and working with CSV files. Additionally, it offers troubleshooting tips for common issues encountered during setup and execution.

Uploaded by

lucasmmartin02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Python Data Analysis

STEP 1: Install Python


For Windows:

1. Go to [Link]
2. Download Python (latest version)
3. IMPORTANT: Check "Add Python to PATH" during installation
4. Click "Install Now"

For Mac:

1. Open Terminal
2. Type: python3 --version (to check if already installed)
3. If not installed, download from [Link]

For Linux:

Python is usually pre-installed. Open terminal and type:

python3 --version

STEP 2: Install Required Libraries


Open Command Prompt (Windows) or Terminal (Mac/Linux) and paste these commands one
by one:

pip install numpy

Press Enter, wait for it to finish, then:

pip install pandas

Press Enter, wait for it to finish, then:

pip install matplotlib

Press Enter, wait for it to finish, then:

pip install seaborn

OR Install all at once:


pip install numpy pandas matplotlib seaborn

STEP 3: Choose Your Code Editor


Option 1: IDLE (Comes with Python)

 Already installed with Python


 Simple and easy for beginners
 Open: Search "IDLE" in your computer

Option 2: Visual Studio Code (Recommended)

 Download from: [Link]


 Install Python extension
 Create new file with .py extension

Option 3: Jupyter Notebook (Best for Data Analysis)

Install by running:

pip install jupyter

Start it by typing:

jupyter notebook

READY-TO-RUN CODE EXAMPLE


Example 1: Hello World with NumPy

Just copy this entire code and paste it in your editor, then run!

# This is your first data analysis program!


import numpy as np

# Create an array of numbers


numbers = [Link]([10, 20, 30, 40, 50])

print("My numbers:", numbers)


print("Average:", [Link]())
print("Maximum:", [Link]())
print("Minimum:", [Link]())
print("Sum:", [Link]())

# Do some math
doubled = numbers * 2
print("Doubled:", doubled)

squared = numbers ** 2
print("Squared:", squared)

Expected Output:

My numbers: [10 20 30 40 50]


Average: 30.0
Maximum: 50
Minimum: 10
Sum: 150
Doubled: [ 20 40 60 80 100]
Squared: [ 100 400 900 1600 2500]

📝 Example 2: Simple Data Table with Pandas

Copy and paste this complete code:

import pandas as pd

# Create a simple grade table


data = {
'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Math': [85, 92, 78, 95, 88],
'English': [90, 85, 92, 88, 94],
'Science': [88, 90, 85, 92, 91]
}

# Make it into a table


df = [Link](data)

# Show the table


print("GRADE TABLE:")
print(df)
print("\n")

# Calculate average for each student


df['Average'] = (df['Math'] + df['English'] + df['Science']) / 3
print("WITH AVERAGES:")
print(df)
print("\n")

# Find the best student


best_student = [Link][df['Average'].idxmax(), 'Student']
print(f"Best student: {best_student}")

# Class statistics
print(f"\nClass average in Math: {df['Math'].mean()}")
print(f"Highest Math score: {df['Math'].max()}")
print(f"Lowest Math score: {df['Math'].min()}")

Expected Output:
GRADE TABLE:
Student Math English Science
0 Alice 85 90 88
1 Bob 92 85 90
2 Charlie 78 92 85
3 David 95 88 92
4 Emma 88 94 91

WITH AVERAGES:
Student Math English Science Average
0 Alice 85 90 88 87.666667
1 Bob 92 85 90 89.000000
2 Charlie 78 92 85 85.000000
3 David 95 88 92 91.666667
4 Emma 88 94 91 91.000000

Best student: David

Class average in Math: 87.6


Highest Math score: 95
Lowest Math score: 78

📝 Example 3: Simple Bar Chart

Copy and paste this complete code:

import [Link] as plt

# Data for chart


fruits = ['Apples', 'Bananas', 'Oranges', 'Grapes', 'Mangoes']
sales = [45, 60, 38, 52, 48]

# Create bar chart


[Link](figsize=(10, 6))
[Link](fruits, sales, color='skyblue', edgecolor='black', linewidth=1.5)

# Add labels and title


[Link]('Fruits', fontsize=12)
[Link]('Sales (units)', fontsize=12)
[Link]('Fruit Sales This Week', fontsize=14, fontweight='bold')

# Add value labels on top of bars


for i, value in enumerate(sales):
[Link](i, value + 1, str(value), ha='center', fontweight='bold')

# Show the chart


[Link]()

What you'll see: A colorful bar chart showing fruit sales!

Example 4: Working with Real Data (CSV File)


First, let's create a sample CSV file:

import pandas as pd

# Create sample data


employee_data = {
'Name': ['John', 'Sarah', 'Mike', 'Lisa', 'Tom', 'Emma', 'David', 'Anna'],
'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales', 'HR', 'IT'],
'Age': [28, 32, 35, 29, 41, 26, 38, 30],
'Salary': [50000, 75000, 55000, 48000, 82000, 52000, 62000, 71000],
'Experience': [3, 7, 8, 4, 15, 2, 10, 6]
}

df = [Link](employee_data)

# Save to CSV file


df.to_csv('[Link]', index=False)
print("CSV file created: [Link]")
print("\nPreview:")
print(df)

Now, let's read and analyze it:

import pandas as pd
import [Link] as plt

# Read the CSV file


df = pd.read_csv('[Link]')

print("📊 EMPLOYEE DATA ANALYSIS\n")


print("=" * 50)

# 1. Show the data


print("\n1. EMPLOYEE DATA:")
print(df)

# 2. Basic statistics
print("\n2. BASIC STATISTICS:")
print(f"Total Employees: {len(df)}")
print(f"Average Salary: ${df['Salary'].mean():,.2f}")
print(f"Average Age: {df['Age'].mean():.1f} years")
print(f"Average Experience: {df['Experience'].mean():.1f} years")

# 3. Department breakdown
print("\n3. EMPLOYEES PER DEPARTMENT:")
print(df['Department'].value_counts())

# 4. Salary by department
print("\n4. AVERAGE SALARY BY DEPARTMENT:")
dept_salary = [Link]('Department')['Salary'].mean()
print(dept_salary)

# 5. Find highest paid employee


highest_paid = [Link][df['Salary'].idxmax()]
print(f"\n5. HIGHEST PAID EMPLOYEE:")
print(f"Name: {highest_paid['Name']}")
print(f"Department: {highest_paid['Department']}")
print(f"Salary: ${highest_paid['Salary']:,}")

# 6. Create visualization
[Link](figsize=(12, 5))

# Plot 1: Salary by Department


[Link](1, 2, 1)
dept_salary.plot(kind='bar', color=['#FF6B6B', '#4ECDC4', '#45B7D1'],
edgecolor='black')
[Link]('Average Salary by Department', fontweight='bold', fontsize=12)
[Link]('Department')
[Link]('Average Salary ($)')
[Link](rotation=45)
[Link](axis='y', alpha=0.3)

# Plot 2: Age Distribution


[Link](1, 2, 2)
[Link](df['Age'], bins=6, color='#95E1D3', edgecolor='black', alpha=0.7)
[Link]('Age Distribution', fontweight='bold', fontsize=12)
[Link]('Age')
[Link]('Number of Employees')
[Link](axis='y', alpha=0.3)

plt.tight_layout()
[Link]()

print("\n✅ Analysis complete!")

Example 5: Interactive Data Filtering

Copy and paste this complete code:

import pandas as pd

# Create sample product data


products = [Link]({
'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones',
'Webcam', 'Speaker', 'USB Drive', 'Hard Drive', 'Router'],
'Category': ['Computer', 'Accessory', 'Accessory', 'Computer',
'Accessory',
'Accessory', 'Accessory', 'Storage', 'Storage', 'Network'],
'Price': [899, 25, 75, 299, 89, 65, 120, 15, 85, 110],
'Stock': [15, 150, 80, 25, 60, 45, 30, 200, 40, 35],
'Rating': [4.5, 4.2, 4.3, 4.7, 4.4, 4.0, 4.1, 3.9, 4.3, 4.2]
})

print("🛍️PRODUCT INVENTORY SYSTEM\n")


print("=" * 70)

# Show all products


print("\n📦 ALL PRODUCTS:")
print(products.to_string(index=False))
# Filter: Products under $100
print("\n💰 PRODUCTS UNDER $100:")
cheap_products = products[products['Price'] < 100]
print(cheap_products[['Product', 'Price']].to_string(index=False))

# Filter: High rated products (4.3 and above)


print("\n⭐ HIGH RATED PRODUCTS (4.3+):")
top_rated = products[products['Rating'] >= 4.3]
print(top_rated[['Product', 'Rating']].to_string(index=False))

# Filter: Low stock products (less than 50)


print("\n⚠️ LOW STOCK ALERT (Less than 50 units):")
low_stock = products[products['Stock'] < 50]
print(low_stock[['Product', 'Stock']].to_string(index=False))

# Filter: Accessories only


print("\n🖱️ ACCESSORIES:")
accessories = products[products['Category'] == 'Accessory']
print(accessories[['Product', 'Price']].to_string(index=False))

# Multiple filters: Cheap AND high rated


print("\n🎯 BEST VALUE (Under $100 AND rating 4.2+):")
best_value = products[(products['Price'] < 100) & (products['Rating'] >= 4.2)]
print(best_value[['Product', 'Price', 'Rating']].to_string(index=False))

# Summary statistics
print("\n📊 SUMMARY STATISTICS:")
print(f"Total Products: {len(products)}")
print(f"Average Price: ${products['Price'].mean():.2f}")
print(f"Most Expensive: {[Link][products['Price'].idxmax(), 'Product']}
(${products['Price'].max()})")
print(f"Cheapest: {[Link][products['Price'].idxmin(), 'Product']} ($
{products['Price'].min()})")
print(f"Average Rating: {products['Rating'].mean():.2f} stars")
print(f"Total Inventory Value: ${(products['Price'] *
products['Stock']).sum():,.2f}")

Example 6: Simple Data Visualization Dashboard

Copy and paste this complete code:

import pandas as pd
import [Link] as plt
import numpy as np

# Create sample monthly sales data


months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [45000, 52000, 48000, 61000, 58000, 67000]
expenses = [30000, 32000, 31000, 38000, 36000, 41000]
profit = [s - e for s, e in zip(sales, expenses)]

# Create a dashboard with multiple charts


fig, axes = [Link](2, 2, figsize=(14, 10))
[Link]('📊 BUSINESS DASHBOARD - First Half 2024', fontsize=16,
fontweight='bold')

# Chart 1: Sales trend


axes[0, 0].plot(months, sales, marker='o', linewidth=3, markersize=10,
color='#2ecc71')
axes[0, 0].set_title('Monthly Sales Trend', fontweight='bold')
axes[0, 0].set_ylabel('Sales ($)')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].fill_between(months, sales, alpha=0.3, color='#2ecc71')

# Chart 2: Profit bars


axes[0, 1].bar(months, profit, color='#3498db', edgecolor='black',
linewidth=1.5)
axes[0, 1].set_title('Monthly Profit', fontweight='bold')
axes[0, 1].set_ylabel('Profit ($)')
axes[0, 1].grid(axis='y', alpha=0.3)
for i, v in enumerate(profit):
axes[0, 1].text(i, v + 500, f'${v:,}', ha='center', fontweight='bold')

# Chart 3: Sales vs Expenses comparison


x = [Link](len(months))
width = 0.35
axes[1, 0].bar(x - width/2, sales, width, label='Sales', color='#2ecc71',
edgecolor='black')
axes[1, 0].bar(x + width/2, expenses, width, label='Expenses',
color='#e74c3c', edgecolor='black')
axes[1, 0].set_title('Sales vs Expenses', fontweight='bold')
axes[1, 0].set_ylabel('Amount ($)')
axes[1, 0].set_xticks(x)
axes[1, 0].set_xticklabels(months)
axes[1, 0].legend()
axes[1, 0].grid(axis='y', alpha=0.3)

# Chart 4: Profit margin percentage


profit_margin = [(p/s)*100 for p, s in zip(profit, sales)]
axes[1, 1].plot(months, profit_margin, marker='s', linewidth=3, markersize=10,
color='#9b59b6')
axes[1, 1].set_title('Profit Margin %', fontweight='bold')
axes[1, 1].set_ylabel('Margin (%)')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].axhline(y=[Link](profit_margin), color='r', linestyle='--',
label=f'Average: {[Link](profit_margin):.1f}%')
axes[1, 1].legend()

plt.tight_layout()
[Link]()

# Print summary
print("\n💼 BUSINESS SUMMARY (Jan-Jun 2024)")
print("=" * 50)
print(f"Total Sales: ${sum(sales):,}")
print(f"Total Expenses: ${sum(expenses):,}")
print(f"Total Profit: ${sum(profit):,}")
print(f"Average Monthly Sales: ${[Link](sales):,.2f}")
print(f"Average Profit Margin: {[Link](profit_margin):.1f}%")
print(f"Best Month: {months[[Link](max(sales))]} (${max(sales):,})")
print(f"Growth Rate: {((sales[-1] - sales[0]) / sales[0] * 100):.1f}%")

Quick Troubleshooting
Problem: "ModuleNotFoundError: No module named 'numpy'"

Solution: You need to install the library. Open terminal/command prompt:

pip install numpy

Problem: "pip is not recognized"

Solution: Python not added to PATH. Reinstall Python and check "Add to PATH"

Problem: Code doesn't run

Solution:

1. Make sure you saved the file with .py extension


2. Run it by pressing F5 (in IDLE) or clicking Run button
3. Or open terminal and type: python [Link]

Problem: Chart doesn't show

Solution: Add this at the end of your code:

[Link]()

What to Do Next
1. Start with Example 1 - Get comfortable with NumPy
2. Try Example 2 - Learn about data tables
3. Move to Example 3 - Create your first chart
4. Practice Example 4 - Work with CSV files
5. Experiment! - Change numbers, add more data, try different colors

You might also like