0% found this document useful (0 votes)

7 views9 pages

Python Data Analysis

This document provides a comprehensive guide for performing data analysis using Python, including installation steps for Python and necessary libraries like NumPy, Pandas, Matplotlib, and Seaborn. It includes example codes for basic data analysis tasks, such as creating arrays, data tables, visualizations, and working with CSV files. Additionally, it offers troubleshooting tips for common issues encountered during setup and execution.

Uploaded by

lucasmmartin02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

Python Data Analysis

Uploaded by

lucasmmartin02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Python Data Analysis

STEP 1: Install Python

For Windows:

1. Go to [Link]
2. Download Python (latest version)
3. IMPORTANT: Check "Add Python to PATH" during installation
4. Click "Install Now"

For Mac:

1. Open Terminal
2. Type: python3 --version (to check if already installed)
3. If not installed, download from [Link]

For Linux:

Python is usually pre-installed. Open terminal and type:

python3 --version

STEP 2: Install Required Libraries

Open Command Prompt (Windows) or Terminal (Mac/Linux) and paste these commands one
by one:

pip install numpy

Press Enter, wait for it to finish, then:

pip install pandas

Press Enter, wait for it to finish, then:

pip install matplotlib

Press Enter, wait for it to finish, then:

pip install seaborn

OR Install all at once:

pip install numpy pandas matplotlib seaborn

STEP 3: Choose Your Code Editor

Option 1: IDLE (Comes with Python)

 Already installed with Python

 Simple and easy for beginners
 Open: Search "IDLE" in your computer

Option 2: Visual Studio Code (Recommended)

 Download from: [Link]

 Install Python extension
 Create new file with .py extension

Option 3: Jupyter Notebook (Best for Data Analysis)

Install by running:

pip install jupyter

Start it by typing:

jupyter notebook

READY-TO-RUN CODE EXAMPLE

Example 1: Hello World with NumPy

Just copy this entire code and paste it in your editor, then run!

# This is your first data analysis program!

import numpy as np

# Create an array of numbers

numbers = [Link]([10, 20, 30, 40, 50])

print("My numbers:", numbers)

print("Average:", [Link]())
print("Maximum:", [Link]())
print("Minimum:", [Link]())
print("Sum:", [Link]())

# Do some math
doubled = numbers * 2
print("Doubled:", doubled)

squared = numbers ** 2
print("Squared:", squared)

Expected Output:

My numbers: [10 20 30 40 50]

Average: 30.0
Maximum: 50
Minimum: 10
Sum: 150
Doubled: [ 20 40 60 80 100]
Squared: [ 100 400 900 1600 2500]

📝 Example 2: Simple Data Table with Pandas

Copy and paste this complete code:

import pandas as pd

# Create a simple grade table

data = {
'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Math': [85, 92, 78, 95, 88],
'English': [90, 85, 92, 88, 94],
'Science': [88, 90, 85, 92, 91]
}

# Make it into a table

df = [Link](data)

# Show the table

print("GRADE TABLE:")
print(df)
print("\n")

# Calculate average for each student

df['Average'] = (df['Math'] + df['English'] + df['Science']) / 3
print("WITH AVERAGES:")
print(df)
print("\n")

# Find the best student

best_student = [Link][df['Average'].idxmax(), 'Student']
print(f"Best student: {best_student}")

# Class statistics
print(f"\nClass average in Math: {df['Math'].mean()}")
print(f"Highest Math score: {df['Math'].max()}")
print(f"Lowest Math score: {df['Math'].min()}")

Expected Output:
GRADE TABLE:
Student Math English Science
0 Alice 85 90 88
1 Bob 92 85 90
2 Charlie 78 92 85
3 David 95 88 92
4 Emma 88 94 91

WITH AVERAGES:
Student Math English Science Average
0 Alice 85 90 88 87.666667
1 Bob 92 85 90 89.000000
2 Charlie 78 92 85 85.000000
3 David 95 88 92 91.666667
4 Emma 88 94 91 91.000000

Best student: David

Class average in Math: 87.6

Highest Math score: 95
Lowest Math score: 78

📝 Example 3: Simple Bar Chart

Copy and paste this complete code:

import [Link] as plt

# Data for chart

fruits = ['Apples', 'Bananas', 'Oranges', 'Grapes', 'Mangoes']
sales = [45, 60, 38, 52, 48]

# Create bar chart

[Link](figsize=(10, 6))
[Link](fruits, sales, color='skyblue', edgecolor='black', linewidth=1.5)

# Add labels and title

[Link]('Fruits', fontsize=12)
[Link]('Sales (units)', fontsize=12)
[Link]('Fruit Sales This Week', fontsize=14, fontweight='bold')

# Add value labels on top of bars

for i, value in enumerate(sales):
[Link](i, value + 1, str(value), ha='center', fontweight='bold')

# Show the chart

[Link]()

What you'll see: A colorful bar chart showing fruit sales!

Example 4: Working with Real Data (CSV File)

First, let's create a sample CSV file:

import pandas as pd

# Create sample data

employee_data = {
'Name': ['John', 'Sarah', 'Mike', 'Lisa', 'Tom', 'Emma', 'David', 'Anna'],
'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales', 'HR', 'IT'],
'Age': [28, 32, 35, 29, 41, 26, 38, 30],
'Salary': [50000, 75000, 55000, 48000, 82000, 52000, 62000, 71000],
'Experience': [3, 7, 8, 4, 15, 2, 10, 6]
}

df = [Link](employee_data)

# Save to CSV file

df.to_csv('[Link]', index=False)
print("CSV file created: [Link]")
print("\nPreview:")
print(df)

Now, let's read and analyze it:

import pandas as pd
import [Link] as plt

# Read the CSV file

df = pd.read_csv('[Link]')

print("📊 EMPLOYEE DATA ANALYSIS\n")

print("=" * 50)

# 1. Show the data

print("\n1. EMPLOYEE DATA:")
print(df)

# 2. Basic statistics
print("\n2. BASIC STATISTICS:")
print(f"Total Employees: {len(df)}")
print(f"Average Salary: ${df['Salary'].mean():,.2f}")
print(f"Average Age: {df['Age'].mean():.1f} years")
print(f"Average Experience: {df['Experience'].mean():.1f} years")

# 3. Department breakdown
print("\n3. EMPLOYEES PER DEPARTMENT:")
print(df['Department'].value_counts())

# 4. Salary by department
print("\n4. AVERAGE SALARY BY DEPARTMENT:")
dept_salary = [Link]('Department')['Salary'].mean()
print(dept_salary)

# 5. Find highest paid employee

highest_paid = [Link][df['Salary'].idxmax()]
print(f"\n5. HIGHEST PAID EMPLOYEE:")
print(f"Name: {highest_paid['Name']}")
print(f"Department: {highest_paid['Department']}")
print(f"Salary: ${highest_paid['Salary']:,}")

# 6. Create visualization
[Link](figsize=(12, 5))

# Plot 1: Salary by Department

[Link](1, 2, 1)
dept_salary.plot(kind='bar', color=['#FF6B6B', '#4ECDC4', '#45B7D1'],
edgecolor='black')
[Link]('Average Salary by Department', fontweight='bold', fontsize=12)
[Link]('Department')
[Link]('Average Salary ($)')
[Link](rotation=45)
[Link](axis='y', alpha=0.3)

# Plot 2: Age Distribution

[Link](1, 2, 2)
[Link](df['Age'], bins=6, color='#95E1D3', edgecolor='black', alpha=0.7)
[Link]('Age Distribution', fontweight='bold', fontsize=12)
[Link]('Age')
[Link]('Number of Employees')
[Link](axis='y', alpha=0.3)

plt.tight_layout()
[Link]()

print("\n✅ Analysis complete!")

Example 5: Interactive Data Filtering

Copy and paste this complete code:

import pandas as pd

# Create sample product data

products = [Link]({
'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones',
'Webcam', 'Speaker', 'USB Drive', 'Hard Drive', 'Router'],
'Category': ['Computer', 'Accessory', 'Accessory', 'Computer',
'Accessory',
'Accessory', 'Accessory', 'Storage', 'Storage', 'Network'],
'Price': [899, 25, 75, 299, 89, 65, 120, 15, 85, 110],
'Stock': [15, 150, 80, 25, 60, 45, 30, 200, 40, 35],
'Rating': [4.5, 4.2, 4.3, 4.7, 4.4, 4.0, 4.1, 3.9, 4.3, 4.2]
})

print("🛍️PRODUCT INVENTORY SYSTEM\n")

print("=" * 70)

# Show all products

print("\n📦 ALL PRODUCTS:")
print(products.to_string(index=False))
# Filter: Products under $100
print("\n💰 PRODUCTS UNDER $100:")
cheap_products = products[products['Price'] < 100]
print(cheap_products[['Product', 'Price']].to_string(index=False))

# Filter: High rated products (4.3 and above)

print("\n⭐ HIGH RATED PRODUCTS (4.3+):")
top_rated = products[products['Rating'] >= 4.3]
print(top_rated[['Product', 'Rating']].to_string(index=False))

# Filter: Low stock products (less than 50)

print("\n⚠️ LOW STOCK ALERT (Less than 50 units):")
low_stock = products[products['Stock'] < 50]
print(low_stock[['Product', 'Stock']].to_string(index=False))

# Filter: Accessories only

print("\n🖱️ ACCESSORIES:")
accessories = products[products['Category'] == 'Accessory']
print(accessories[['Product', 'Price']].to_string(index=False))

# Multiple filters: Cheap AND high rated

print("\n🎯 BEST VALUE (Under $100 AND rating 4.2+):")
best_value = products[(products['Price'] < 100) & (products['Rating'] >= 4.2)]
print(best_value[['Product', 'Price', 'Rating']].to_string(index=False))

# Summary statistics
print("\n📊 SUMMARY STATISTICS:")
print(f"Total Products: {len(products)}")
print(f"Average Price: ${products['Price'].mean():.2f}")
print(f"Most Expensive: {[Link][products['Price'].idxmax(), 'Product']}
(${products['Price'].max()})")
print(f"Cheapest: {[Link][products['Price'].idxmin(), 'Product']} ($
{products['Price'].min()})")
print(f"Average Rating: {products['Rating'].mean():.2f} stars")
print(f"Total Inventory Value: ${(products['Price'] *
products['Stock']).sum():,.2f}")

Example 6: Simple Data Visualization Dashboard

Copy and paste this complete code:

import pandas as pd
import [Link] as plt
import numpy as np

# Create sample monthly sales data

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [45000, 52000, 48000, 61000, 58000, 67000]
expenses = [30000, 32000, 31000, 38000, 36000, 41000]
profit = [s - e for s, e in zip(sales, expenses)]

# Create a dashboard with multiple charts

fig, axes = [Link](2, 2, figsize=(14, 10))
[Link]('📊 BUSINESS DASHBOARD - First Half 2024', fontsize=16,
fontweight='bold')

# Chart 1: Sales trend

axes[0, 0].plot(months, sales, marker='o', linewidth=3, markersize=10,
color='#2ecc71')
axes[0, 0].set_title('Monthly Sales Trend', fontweight='bold')
axes[0, 0].set_ylabel('Sales ($)')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].fill_between(months, sales, alpha=0.3, color='#2ecc71')

# Chart 2: Profit bars

axes[0, 1].bar(months, profit, color='#3498db', edgecolor='black',
linewidth=1.5)
axes[0, 1].set_title('Monthly Profit', fontweight='bold')
axes[0, 1].set_ylabel('Profit ($)')
axes[0, 1].grid(axis='y', alpha=0.3)
for i, v in enumerate(profit):
axes[0, 1].text(i, v + 500, f'${v:,}', ha='center', fontweight='bold')

# Chart 3: Sales vs Expenses comparison

x = [Link](len(months))
width = 0.35
axes[1, 0].bar(x - width/2, sales, width, label='Sales', color='#2ecc71',
edgecolor='black')
axes[1, 0].bar(x + width/2, expenses, width, label='Expenses',
color='#e74c3c', edgecolor='black')
axes[1, 0].set_title('Sales vs Expenses', fontweight='bold')
axes[1, 0].set_ylabel('Amount ($)')
axes[1, 0].set_xticks(x)
axes[1, 0].set_xticklabels(months)
axes[1, 0].legend()
axes[1, 0].grid(axis='y', alpha=0.3)

# Chart 4: Profit margin percentage

profit_margin = [(p/s)*100 for p, s in zip(profit, sales)]
axes[1, 1].plot(months, profit_margin, marker='s', linewidth=3, markersize=10,
color='#9b59b6')
axes[1, 1].set_title('Profit Margin %', fontweight='bold')
axes[1, 1].set_ylabel('Margin (%)')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].axhline(y=[Link](profit_margin), color='r', linestyle='--',
label=f'Average: {[Link](profit_margin):.1f}%')
axes[1, 1].legend()

plt.tight_layout()
[Link]()

# Print summary
print("\n💼 BUSINESS SUMMARY (Jan-Jun 2024)")
print("=" * 50)
print(f"Total Sales: ${sum(sales):,}")
print(f"Total Expenses: ${sum(expenses):,}")
print(f"Total Profit: ${sum(profit):,}")
print(f"Average Monthly Sales: ${[Link](sales):,.2f}")
print(f"Average Profit Margin: {[Link](profit_margin):.1f}%")
print(f"Best Month: {months[[Link](max(sales))]} (${max(sales):,})")
print(f"Growth Rate: {((sales[-1] - sales[0]) / sales[0] * 100):.1f}%")

Quick Troubleshooting
Problem: "ModuleNotFoundError: No module named 'numpy'"

Solution: You need to install the library. Open terminal/command prompt:

pip install numpy

Problem: "pip is not recognized"

Solution: Python not added to PATH. Reinstall Python and check "Add to PATH"

Problem: Code doesn't run

Solution:

1. Make sure you saved the file with .py extension

2. Run it by pressing F5 (in IDLE) or clicking Run button
3. Or open terminal and type: python [Link]

Problem: Chart doesn't show

Solution: Add this at the end of your code:

[Link]()

What to Do Next
1. Start with Example 1 - Get comfortable with NumPy
2. Try Example 2 - Learn about data tables
3. Move to Example 3 - Create your first chart
4. Practice Example 4 - Work with CSV files
5. Experiment! - Change numbers, add more data, try different colors

Pandas Research
No ratings yet
Pandas Research
14 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Day 1-3 Basics
No ratings yet
Day 1-3 Basics
30 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
55 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Python Syntax and Functions For Data Mining
No ratings yet
Python Syntax and Functions For Data Mining
6 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas
No ratings yet
Pandas
50 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
Python & MySQL For Data Analysis
No ratings yet
Python & MySQL For Data Analysis
45 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Practical Exam - Class 12 IP Cbse
No ratings yet
Practical Exam - Class 12 IP Cbse
6 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Pandas
No ratings yet
Pandas
35 pages
BasicAnalysis Using PYTHON
No ratings yet
BasicAnalysis Using PYTHON
6 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
BDA File
No ratings yet
BDA File
26 pages
Python Libraries for Statistical Analysis
No ratings yet
Python Libraries for Statistical Analysis
40 pages
Analyzing Supermarket Sales Data
No ratings yet
Analyzing Supermarket Sales Data
6 pages
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
No ratings yet
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
66 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
FDS Exp4
No ratings yet
FDS Exp4
5 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Python Comands
No ratings yet
Python Comands
3 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
Data Analysis With Python Core Libraries
No ratings yet
Data Analysis With Python Core Libraries
5 pages
DHP Journal
No ratings yet
DHP Journal
29 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
11 pages
Pandas Practice for Students
No ratings yet
Pandas Practice for Students
12 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Python Data Science: Pandas & ML Basics
100% (1)
Python Data Science: Pandas & ML Basics
41 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Pandas Chatgpt
No ratings yet
Pandas Chatgpt
28 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas
No ratings yet
Pandas
25 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
DATASCI Practcal2
No ratings yet
DATASCI Practcal2
9 pages
Practical 1
No ratings yet
Practical 1
5 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Cheat Sheet
No ratings yet
Cheat Sheet
12 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Python Pandas Data Manipulation Guide
No ratings yet
Python Pandas Data Manipulation Guide
11 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Learning NumPy and Pandas
No ratings yet
Learning NumPy and Pandas
3 pages
Numeracy Skills Supp
No ratings yet
Numeracy Skills Supp
6 pages
Oop Cat
No ratings yet
Oop Cat
1 page
Distribution of Goods Level 6 Full
No ratings yet
Distribution of Goods Level 6 Full
8 pages
Entreprenuership Level 6 Full Reviewed
No ratings yet
Entreprenuership Level 6 Full Reviewed
7 pages
Offer Customer Services L6
No ratings yet
Offer Customer Services L6
5 pages
Manual G41 - Anleitung
No ratings yet
Manual G41 - Anleitung
33 pages
Drug Stability Test Chamber Features
No ratings yet
Drug Stability Test Chamber Features
2 pages
Business Data Mining and Warehousing-2024-2025
No ratings yet
Business Data Mining and Warehousing-2024-2025
122 pages
Class 10 IT PYQs E Book Readers Venue 2025 03 03 05 04 6
100% (11)
Class 10 IT PYQs E Book Readers Venue 2025 03 03 05 04 6
40 pages
Archicad Shortcuts
No ratings yet
Archicad Shortcuts
6 pages
SP Question
0% (1)
SP Question
28 pages
Sequence Tutorial
No ratings yet
Sequence Tutorial
5 pages
Configuración del Servidor Rakion
No ratings yet
Configuración del Servidor Rakion
5 pages
5 Theory Notes L2
No ratings yet
5 Theory Notes L2
2 pages
Assignment 5522 - 3 Lab Worksheet DIgSILENT Power Factory Intro
No ratings yet
Assignment 5522 - 3 Lab Worksheet DIgSILENT Power Factory Intro
12 pages
ITTH Portfolio
No ratings yet
ITTH Portfolio
10 pages
ZQN Block Diagram & Power Status
No ratings yet
ZQN Block Diagram & Power Status
37 pages
Visual Basic Loop Control Structures
100% (1)
Visual Basic Loop Control Structures
15 pages
Understanding Software Defects and Leakage
No ratings yet
Understanding Software Defects and Leakage
16 pages
THE INTERNET Notes
No ratings yet
THE INTERNET Notes
5 pages
C# Interview Question
No ratings yet
C# Interview Question
2 pages
YASSER - Designed Resume
No ratings yet
YASSER - Designed Resume
3 pages
The Automated Testing Framework
No ratings yet
The Automated Testing Framework
9 pages
KUM SW System Flow - Application - Wayleave and Permit - CPC - CMGD v1.5
No ratings yet
KUM SW System Flow - Application - Wayleave and Permit - CPC - CMGD v1.5
5 pages
Linux Netwokring Updated PDF (Chapter 7)
No ratings yet
Linux Netwokring Updated PDF (Chapter 7)
37 pages
Creating An Aqwa Model Using Workbench Designmodeler
No ratings yet
Creating An Aqwa Model Using Workbench Designmodeler
12 pages
GM Data Link Communication Guide
No ratings yet
GM Data Link Communication Guide
8 pages
The Art of Site Reliability Engineering (SRE) With Azure: Building and Deploying Applications That Endure 1st Edition Unai Huete Beloki Online Reading
No ratings yet
The Art of Site Reliability Engineering (SRE) With Azure: Building and Deploying Applications That Endure 1st Edition Unai Huete Beloki Online Reading
152 pages
C++ Theory Fundamentals
No ratings yet
C++ Theory Fundamentals
6 pages
Capstone Titles Compilation
58% (12)
Capstone Titles Compilation
20 pages
Mani Resume
No ratings yet
Mani Resume
2 pages
Windows 10 Grade 4
No ratings yet
Windows 10 Grade 4
20 pages
Unit 5 24cs302 Advanced Java Programming
No ratings yet
Unit 5 24cs302 Advanced Java Programming
64 pages
MSC Dissertation Guide Sept 2021 Revised
No ratings yet
MSC Dissertation Guide Sept 2021 Revised
18 pages
Streamline Building Security with Verkada
No ratings yet
Streamline Building Security with Verkada
2 pages