Complete Matplotlib Pyplot Guide for Data
Visualization
Table of Contents
1. Introduction and Setup
2. Basic Plot Types
3. Customizing Plots
4. Axes and Subplots
5. Advanced Plot Types
6. Styling and Themes
7. Annotations and Text
8. 3D Plotting
9. Interactive Features
10.Saving and Exporting
1. Introduction and Setup
What is Matplotlib?
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in
Python. Pyplot is its state-based interface that provides a MATLAB-like plotting framework.
Basic Setup
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Enable inline plotting in Jupyter notebooks
%matplotlib inline
Key Concepts
● Figure: The entire window or page that everything is drawn on
● Axes: The area on which data is plotted (subplot)
● Artist: Everything visible on the figure (lines, text, etc.)
2. Basic Plot Types
2.1 Line Plots
Basic Syntax
plt.plot(x, y, format_string, **kwargs)
Parameters
● x: Array-like, x-axis data
● y: Array-like, y-axis data
● format_string: String specifying color, marker, linestyle (e.g., 'ro-')
● label: String, legend label
● linewidth or lw: Float, line width
● linestyle or ls: String, line style ('-', '--', '-.', ':')
● color or c: Color specification
● marker: Marker style ('o', 's', '^', etc.)
● markersize or ms: Float, marker size
● alpha: Float (0-1), transparency
Implementation Examples
# Simple line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.title('Basic Line Plot')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()
# Multiple lines with customization
plt.figure(figsize=(10, 6))
plt.plot(x, np.sin(x), 'b-', label='sin(x)', linewidth=2)
plt.plot(x, np.cos(x), 'r--', label='cos(x)', linewidth=2)
plt.plot(x, np.tan(x), 'g:', label='tan(x)', alpha=0.7)
plt.legend()
plt.grid(True)
plt.ylim(-2, 2)
plt.show()
2.2 Scatter Plots
Basic Syntax
plt.scatter(x, y, s=None, c=None, **kwargs)
Parameters
● s: Float or array-like, marker sizes
● c: Array-like or color, marker colors
● marker: Marker style
● cmap: Colormap name
● alpha: Transparency
● edgecolors: Edge color
● linewidths: Edge line width
Implementation
# Basic scatter plot
np.random.seed(42)
x = np.random.randn(100)
y = np.random.randn(100)
colors = np.random.rand(100)
sizes = 1000 * np.random.rand(100)
plt.figure(figsize=(10, 6))
plt.scatter(x, y, c=colors, s=sizes, alpha=0.6, cmap='viridis')
plt.colorbar()
plt.title('Scatter Plot with Variable Colors and Sizes')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()
2.3 Bar Charts
Basic Syntax
plt.bar(x, height, width=0.8, **kwargs)
plt.barh(y, width, height=0.8, **kwargs) # Horizontal bars
Parameters
● x: Array-like, bar positions
● height: Array-like, bar heights
● width: Float or array-like, bar widths
● bottom: Float or array-like, bottom positions (for stacking)
● color: Color specification
● edgecolor: Edge color
● linewidth: Edge line width
● align: Alignment ('center', 'edge')
Implementation
# Vertical bar chart
categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]
plt.figure(figsize=(10, 6))
bars = plt.bar(categories, values, color=['red', 'blue', 'green', 'orange', 'purple'])
plt.title('Bar Chart Example')
plt.xlabel('Categories')
plt.ylabel('Values')
# Add value labels on bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 1,
f'{height}', ha='center', va='bottom')
plt.show()
# Horizontal bar chart
plt.figure(figsize=(10, 6))
plt.barh(categories, values, color='skyblue')
plt.title('Horizontal Bar Chart')
plt.xlabel('Values')
plt.ylabel('Categories')
plt.show()
# Stacked bar chart
categories = ['Q1', 'Q2', 'Q3', 'Q4']
values1 = [20, 35, 30, 35]
values2 = [25, 25, 15, 30]
plt.figure(figsize=(10, 6))
plt.bar(categories, values1, label='Product A')
plt.bar(categories, values2, bottom=values1, label='Product B')
plt.title('Stacked Bar Chart')
plt.xlabel('Quarters')
plt.ylabel('Sales')
plt.legend()
plt.show()
2.4 Histograms
Basic Syntax
plt.hist(x, bins=None, **kwargs)
Parameters
● x: Array-like, data values
● bins: Integer or array-like, number of bins or bin edges
● range: Tuple, range of values to include
● density: Boolean, normalize to show probability density
● cumulative: Boolean, cumulative histogram
● histtype: String, histogram type ('bar', 'step', 'stepfilled')
● orientation: String, 'horizontal' or 'vertical'
● color: Color specification
● alpha: Transparency
● edgecolor: Edge color
Implementation
# Basic histogram
np.random.seed(42)
data = np.random.normal(100, 15, 1000)
plt.figure(figsize=(12, 4))
# Basic histogram
plt.subplot(1, 3, 1)
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title('Basic Histogram')
plt.xlabel('Values')
plt.ylabel('Frequency')
# Normalized histogram (probability density)
plt.subplot(1, 3, 2)
plt.hist(data, bins=30, density=True, color='lightgreen', alpha=0.7)
plt.title('Normalized Histogram')
plt.xlabel('Values')
plt.ylabel('Density')
# Cumulative histogram
plt.subplot(1, 3, 3)
plt.hist(data, bins=30, cumulative=True, color='coral', alpha=0.7)
plt.title('Cumulative Histogram')
plt.xlabel('Values')
plt.ylabel('Cumulative Frequency')
plt.tight_layout()
plt.show()
2.5 Pie Charts
Basic Syntax
plt.pie(x, labels=None, **kwargs)
Parameters
● x: Array-like, wedge sizes
● labels: List, wedge labels
● colors: List, wedge colors
● autopct: String or function, label format
● startangle: Float, starting angle
● explode: Array-like, wedge separation
● shadow: Boolean, drop shadow
● textprops: Dict, text properties
Implementation
# Basic pie chart
sizes = [30, 25, 20, 15, 10]
labels = ['A', 'B', 'C', 'D', 'E']
colors = ['gold', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']
explode = (0.1, 0, 0, 0, 0) # explode first slice
plt.figure(figsize=(10, 8))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=90, explode=explode, shadow=True)
plt.title('Pie Chart Example')
plt.axis('equal') # Equal aspect ratio ensures circular pie
plt.show()
3. Customizing Plots
3.1 Colors and Colormaps
Color Specifications
# Different ways to specify colors
plt.plot(x, y, color='red') # Named colors
plt.plot(x, y, color='r') # Single letter
plt.plot(x, y, color='#FF0000') # Hex codes
plt.plot(x, y, color=(1, 0, 0)) # RGB tuple
plt.plot(x, y, color=(1, 0, 0, 0.5)) # RGBA tuple
Colormaps
# Common colormaps
colormaps = ['viridis', 'plasma', 'inferno', 'magma',
'Blues', 'Reds', 'Greens', 'coolwarm', 'seismic']
# Using colormaps
plt.scatter(x, y, c=values, cmap='viridis')
plt.colorbar(label='Color Scale')
3.2 Markers and Line Styles
Marker Styles
markers = ['o', 's', '^', 'v', '<', '>', 'd', 'p', 'h', '*', '+', 'x']
# Line styles
linestyles = ['-', '--', '-.', ':']
# Format strings combine color, marker, and line style
plt.plot(x, y, 'ro-') # Red circles with solid line
plt.plot(x, y, 'b^--') # Blue triangles with dashed line
3.3 Labels and Titles
Comprehensive Labeling
plt.figure(figsize=(10, 6))
plt.plot(x, y)
# Titles and labels with customization
plt.title('Main Title', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('X-axis Label', fontsize=12, fontweight='bold')
plt.ylabel('Y-axis Label', fontsize=12, fontweight='bold')
# Subtitle using suptitle
plt.suptitle('Figure Title', fontsize=18, y=0.98)
# Custom text positioning
plt.text(0.5, 0.5, 'Custom Text', transform=plt.gca().transAxes,
fontsize=12, ha='center', va='center')
plt.show()
3.4 Legends
Legend Customization
plt.figure(figsize=(10, 6))
plt.plot(x, np.sin(x), label='sin(x)')
plt.plot(x, np.cos(x), label='cos(x)')
# Legend with customization
plt.legend(loc='upper right', # Location
frameon=True, # Frame on/off
fancybox=True, # Rounded corners
shadow=True, # Drop shadow
ncol=1, # Number of columns
fontsize=12, # Font size
title='Functions', # Legend title
title_fontsize=14) # Title font size
plt.show()
3.5 Grid Customization
Grid Options
plt.figure(figsize=(10, 6))
plt.plot(x, y)
# Grid customization
plt.grid(True, # Enable grid
linestyle='-', # Line style
linewidth=0.5, # Line width
alpha=0.7, # Transparency
color='gray') # Color
# Fine control over major and minor grids
plt.grid(True, which='major', linestyle='-', alpha=0.7)
plt.grid(True, which='minor', linestyle=':', alpha=0.4)
plt.minorticks_on()
plt.show()
4. Axes and Subplots
4.1 Figure and Axes Management
Creating Figures
# Method 1: pyplot interface
plt.figure(figsize=(12, 8))
plt.plot(x, y)
plt.show()
# Method 2: Object-oriented interface
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(x, y)
plt.show()
# Figure parameters
fig = plt.figure(figsize=(12, 8), # Size in inches
dpi=100, # Dots per inch
facecolor='white', # Background color
edgecolor='black') # Edge color
4.2 Subplots
Basic Subplots
# Method 1: plt.subplot()
plt.figure(figsize=(15, 10))
plt.subplot(2, 2, 1) # 2 rows, 2 columns, position 1
plt.plot(x, np.sin(x))
plt.title('sin(x)')
plt.subplot(2, 2, 2)
plt.plot(x, np.cos(x))
plt.title('cos(x)')
plt.subplot(2, 2, 3)
plt.plot(x, np.tan(x))
plt.title('tan(x)')
plt.subplot(2, 2, 4)
plt.scatter(x[::10], np.sin(x[::10]))
plt.title('Scatter')
plt.tight_layout()
plt.show()
# Method 2: plt.subplots()
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
axes[0, 0].plot(x, np.sin(x))
axes[0, 0].set_title('sin(x)')
axes[0, 1].plot(x, np.cos(x))
axes[0, 1].set_title('cos(x)')
axes[1, 0].plot(x, np.tan(x))
axes[1, 0].set_title('tan(x)')
axes[1, 0].set_ylim(-5, 5)
axes[1, 1].scatter(x[::10], np.sin(x[::10]))
axes[1, 1].set_title('Scatter')
plt.tight_layout()
plt.show()
Advanced Subplot Layouts
# Using GridSpec for complex layouts
from matplotlib.gridspec import GridSpec
fig = plt.figure(figsize=(15, 10))
gs = GridSpec(3, 3, figure=fig)
# Different sized subplots
ax1 = fig.add_subplot(gs[0, :]) # Top row, all columns
ax1.plot(x, np.sin(x))
ax1.set_title('Full width plot')
ax2 = fig.add_subplot(gs[1, 0]) # Middle left
ax2.plot(x, np.cos(x))
ax2.set_title('cos(x)')
ax3 = fig.add_subplot(gs[1:, 1:]) # Bottom right (2x2)
ax3.scatter(np.random.randn(100), np.random.randn(100))
ax3.set_title('Large scatter plot')
plt.tight_layout()
plt.show()
4.3 Axis Customization
Axis Limits and Scales
plt.figure(figsize=(15, 5))
# Linear scale
plt.subplot(1, 3, 1)
plt.plot(x, np.exp(x/10))
plt.xlim(0, 10)
plt.ylim(0, 10)
plt.title('Linear Scale')
# Log scale
plt.subplot(1, 3, 2)
plt.plot(x, np.exp(x/10))
plt.yscale('log')
plt.title('Log Y Scale')
# Both axes log scale
plt.subplot(1, 3, 3)
plt.loglog(x[1:], x[1:]**2)
plt.title('Log-Log Scale')
plt.tight_layout()
plt.show()
Tick Customization
plt.figure(figsize=(12, 6))
plt.plot(x, np.sin(x))
# Custom tick locations and labels
plt.xticks(np.arange(0, 11, 2), # Tick positions
['Zero', 'Two', 'Four', 'Six', # Custom labels
'Eight', 'Ten'],
rotation=45, # Rotation
fontsize=12) # Font size
plt.yticks(np.arange(-1, 1.2, 0.5))
# Hide ticks
plt.tick_params(axis='x', # Which axis
which='both', # Major and minor ticks
bottom=False, # Tick marks on bottom
top=False, # Tick marks on top
labelbottom=False) # Labels on bottom
plt.show()
5. Advanced Plot Types
5.1 Heatmaps
Using imshow()
# Create sample data
data = np.random.rand(10, 10)
plt.figure(figsize=(12, 5))
# Basic heatmap
plt.subplot(1, 2, 1)
plt.imshow(data, cmap='hot', interpolation='nearest')
plt.colorbar(label='Values')
plt.title('Basic Heatmap')
# Customized heatmap
plt.subplot(1, 2, 2)
im = plt.imshow(data, cmap='coolwarm', aspect='auto')
plt.colorbar(im, shrink=0.8)
plt.title('Customized Heatmap')
# Add value annotations
for i in range(data.shape[0]):
for j in range(data.shape[1]):
plt.text(j, i, f'{data[i, j]:.2f}',
ha='center', va='center', color='black')
plt.tight_layout()
plt.show()
5.2 Contour Plots
Contour and Contourf
# Create meshgrid
x = np.linspace(-3, 3, 50)
y = np.linspace(-3, 3, 50)
X, Y = np.meshgrid(x, y)
Z = np.exp(-(X**2 + Y**2))
plt.figure(figsize=(15, 5))
# Contour lines
plt.subplot(1, 3, 1)
contour = plt.contour(X, Y, Z, levels=10)
plt.clabel(contour, inline=True, fontsize=8)
plt.title('Contour Lines')
# Filled contour
plt.subplot(1, 3, 2)
plt.contourf(X, Y, Z, levels=20, cmap='viridis')
plt.colorbar(label='Values')
plt.title('Filled Contour')
# Combined
plt.subplot(1, 3, 3)
plt.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.7)
contour = plt.contour(X, Y, Z, levels=10, colors='black', alpha=0.4)
plt.clabel(contour, inline=True, fontsize=8)
plt.colorbar(label='Values')
plt.title('Combined')
plt.tight_layout()
plt.show()
5.3 Box Plots
Box Plot Syntax
plt.boxplot(x, labels=None, **kwargs)
Parameters and Implementation
# Generate sample data
np.random.seed(42)
data1 = np.random.normal(100, 10, 200)
data2 = np.random.normal(90, 20, 200)
data3 = np.random.normal(80, 5, 200)
data = [data1, data2, data3]
plt.figure(figsize=(12, 6))
# Basic box plot
plt.subplot(1, 2, 1)
bp = plt.boxplot(data, labels=['Group A', 'Group B', 'Group C'])
plt.title('Basic Box Plot')
plt.ylabel('Values')
# Customized box plot
plt.subplot(1, 2, 2)
bp = plt.boxplot(data,
labels=['Group A', 'Group B', 'Group C'],
patch_artist=True, # Fill boxes
notch=True, # Notched boxes
showmeans=True, # Show means
meanline=True) # Mean as line
# Color the boxes
colors = ['lightblue', 'lightgreen', 'lightcoral']
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
plt.title('Customized Box Plot')
plt.ylabel('Values')
plt.tight_layout()
plt.show()
5.4 Violin Plots
Violin Plot Implementation
plt.figure(figsize=(10, 6))
# Violin plot (requires seaborn or manual implementation)
# Here's a basic approach using matplotlib
positions = [1, 2, 3]
parts = plt.violinplot(data, positions=positions, showmeans=True, showmedians=True)
# Customize colors
for pc in parts['bodies']:
pc.set_facecolor('lightblue')
pc.set_alpha(0.7)
plt.xticks(positions, ['Group A', 'Group B', 'Group C'])
plt.title('Violin Plot')
plt.ylabel('Values')
plt.show()
5.5 Error Bars
Error Bar Implementation
# Sample data with errors
x = np.arange(0, 10, 1)
y = np.exp(-x/10.0)
yerr = 0.1 * y
xerr = 0.1
plt.figure(figsize=(12, 6))
# Basic error bars
plt.subplot(1, 2, 1)
plt.errorbar(x, y, yerr=yerr, xerr=xerr, fmt='o-')
plt.title('Basic Error Bars')
plt.xlabel('X values')
plt.ylabel('Y values')
# Customized error bars
plt.subplot(1, 2, 2)
plt.errorbar(x, y, yerr=yerr, xerr=xerr,
fmt='s-', # Square markers, solid line
capsize=5, # Error bar cap size
capthick=2, # Cap thickness
ecolor='red', # Error bar color
elinewidth=2, # Error bar width
alpha=0.7) # Transparency
plt.title('Customized Error Bars')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.tight_layout()
plt.show()
6. Styling and Themes
6.1 Built-in Styles
Available Styles
# List available styles
print(plt.style.available)
# Using styles
plt.style.use('seaborn-v0_8') # Seaborn style
plt.style.use('ggplot') # ggplot style
plt.style.use('classic') # Classic matplotlib
plt.style.use('dark_background') # Dark theme
# Temporary style context
with plt.style.context('seaborn-v0_8'):
plt.plot(x, y)
plt.show()
6.2 Custom Styling
RC Parameters
# Modify global parameters
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 14
plt.rcParams['lines.linewidth'] = 2
plt.rcParams['grid.alpha'] = 0.3
# Or use rcParams context
with plt.rc_context({'font.size': 16, 'lines.linewidth': 3}):
plt.plot(x, y)
plt.show()
Font Customization
# Font properties
font_title = {'family': 'serif',
'weight': 'bold',
'size': 16}
font_label = {'family': 'sans-serif',
'weight': 'normal',
'size': 12}
plt.figure(figsize=(10, 6))
plt.plot(x, np.sin(x))
plt.title('Custom Font Title', fontdict=font_title)
plt.xlabel('X Label', fontdict=font_label)
plt.ylabel('Y Label', fontdict=font_label)
plt.show()
7. Annotations and Text
7.1 Text and Annotations
Adding Text
plt.figure(figsize=(10, 6))
plt.plot(x, np.sin(x))
# Simple text
plt.text(5, 0.5, 'Simple Text', fontsize=12)
# Text with box
plt.text(7, -0.5, 'Boxed Text', fontsize=12,
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
# Annotation with arrow
plt.annotate('Maximum', xy=(np.pi/2, 1), xytext=(3, 1.2),
arrowprops=dict(arrowstyle='->', color='red'),
fontsize=12, ha='center')
plt.title('Text and Annotations')
plt.show()
Advanced Annotations
plt.figure(figsize=(12, 8))
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
# Different arrow styles
arrow_styles = ['->', '-|>', '<->', '<|-|>']
positions = [(1, 0.8), (2, 0.5), (3, 0.2), (4, -0.2)]
for i, (pos, style) in enumerate(zip(positions, arrow_styles)):
plt.annotate(f'Style {i+1}', xy=pos, xytext=(pos[0], pos[1]+0.3),
arrowprops=dict(arrowstyle=style, color=f'C{i}'),
fontsize=10, ha='center')
plt.title('Different Arrow Styles')
plt.show()
7.2 Mathematical Expressions
LaTeX in Matplotlib
plt.figure(figsize=(10, 6))
# Mathematical expressions
plt.plot(x, np.sin(x), label=r'$y = \sin(x)$')
plt.plot(x, np.cos(x), label=r'$y = \cos(x)$')
plt.title(r'Trigonometric Functions: $f(x) = \sin(x)$ and $g(x) = \cos(x)$')
plt.xlabel(r'$x$ (radians)')
plt.ylabel(r'$f(x)$')
# Complex mathematical expression
plt.text(5, 0.7, r'$\int_0^{2\pi} \sin(x) dx = 0$', fontsize=14,
bbox=dict(boxstyle='round', facecolor='lightblue'))
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
8. 3D Plotting
8.1 3D Setup
from mpl_toolkits.mplot3d import Axes3D
# Creating 3D subplot
fig = plt.figure(figsize=(12, 9))
ax = fig.add_subplot(111, projection='3d')
8.2 3D Plot Types
3D Line and Scatter Plots
fig = plt.figure(figsize=(15, 5))
# 3D Line plot
ax1 = fig.add_subplot(1, 3, 1, projection='3d')
t = np.linspace(0, 4*np.pi, 100)
x = np.cos(t)
y = np.sin(t)
z=t
ax1.plot(x, y, z)
ax1.set_title('3D Line Plot')
# 3D Scatter plot
ax2 = fig.add_subplot(1, 3, 2, projection='3d')
n = 100
x = np.random.randn(n)
y = np.random.randn(n)
z = np.random.randn(n)
colors = np.random.rand(n)
ax2.scatter(x, y, z, c=colors, marker='o')
ax2.set_title('3D Scatter Plot')
# 3D Surface plot
ax3 = fig.add_subplot(1, 3, 3, projection='3d')
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))
surf = ax3.plot_surface(X, Y, Z, cmap='viridis', alpha=0.7)
ax3.set_title('3D Surface Plot')
plt.tight_layout()
plt.show()
9. Interactive Features
9.1 Interactive Backend
# Enable interactive backend
%matplotlib widget # For Jupyter widgets
# or
%matplotlib notebook # For notebook backend
9.2 Event Handling
def onclick(event):
print(f'Button: {event.button}, x: {event.xdata:.2f}, y: {event.ydata:.2f}')
fig, ax = plt.subplots()
ax.plot(x, np.sin(x))
# Connect event handler
cid = fig.canvas.mpl_connect('button_press_event', onclick)
plt.show()
# Disconnect when done
# fig.canvas.mpl_disconnect(cid)
10. Saving and Exporting
10.1 Saving Figures
Save Methods
plt.figure(figsize=(10, 6))
plt.plot(x, np.sin(x))
plt.title('Sample Plot for Saving')
# Different formats and options
plt.savefig('plot.png', # Filename
dpi=300, # Resolution
bbox_inches='tight', # Tight bounding box
facecolor='white', # Background color
transparent=False, # Transparent background
format='png') # File format
# Other formats
plt.savefig('plot.pdf', format='pdf', bbox_inches='tight')
plt.savefig('plot.svg', format='svg', bbox_inches='tight')
plt.savefig('plot.eps', format='eps', bbox_inches='tight')
plt.show()
Format-Specific Options
# PNG with high DPI for publications
plt.savefig('high_res.png', dpi=600, bbox_inches='tight',
facecolor='white', edgecolor='none')
# PDF with metadata
plt.savefig('plot_with_metadata.pdf',
bbox_inches='tight',
metadata={'Title': 'My Plot', 'Author': 'Data Scientist'})
# SVG for web use
plt.savefig('web_plot.svg', format='svg', bbox_inches='tight')
10.2 Multiple Figures Management
# Create multiple figures
fig1 = plt.figure(figsize=(8, 6))
plt.plot(x, np.sin(x))
plt.title('Figure 1')
fig2 = plt.figure(figsize=(8, 6))
plt.plot(x, np.cos(x))
plt.title('Figure 2')
# Save specific figures
fig1.savefig('sine_plot.png', dpi=300, bbox_inches='tight')
fig2.savefig('cosine_plot.png', dpi=300, bbox_inches='tight')
# Show specific figure
plt.figure(fig1.number)
plt.show()
# Close figures to save memory
plt.close(fig1)
plt.close(fig2)
# Or close all
plt.close('all')
11. Performance and Optimization
11.1 Large Dataset Handling
Efficient Plotting Techniques
# For large datasets, use sampling or aggregation
large_x = np.random.randn(1000000)
large_y = np.random.randn(1000000)
# Method 1: Sample data
sample_size = 10000
indices = np.random.choice(len(large_x), sample_size, replace=False)
plt.scatter(large_x[indices], large_y[indices], alpha=0.5)
plt.title('Sampled Large Dataset')
plt.show()
# Method 2: Use hexbin for density plots
plt.figure(figsize=(10, 6))
plt.hexbin(large_x, large_y, gridsize=50, cmap='Blues')
plt.colorbar(label='Count')
plt.title('Hexbin Plot of Large Dataset')
plt.show()
# Method 3: 2D histogram
plt.figure(figsize=(10, 6))
plt.hist2d(large_x, large_y, bins=100, cmap='Blues')
plt.colorbar(label='Count')
plt.title('2D Histogram of Large Dataset')
plt.show()
11.2 Animation Basics
Simple Animation
from matplotlib.animation import FuncAnimation
# Set up figure and axis
fig, ax = plt.subplots(figsize=(10, 6))
ax.set_xlim(0, 2*np.pi)
ax.set_ylim(-1.5, 1.5)
line, = ax.plot([], [], 'b-')
ax.set_title('Animated Sine Wave')
ax.set_xlabel('x')
ax.set_ylabel('sin(x)')
# Animation function
def animate(frame):
x = np.linspace(0, 2*np.pi, 1000)
y = np.sin(x + frame/10)
line.set_data(x, y)
return line,
# Create animation
anim = FuncAnimation(fig, animate, frames=200, interval=50, blit=True)
plt.show()
# Save animation (requires ffmpeg)
# anim.save('sine_wave.gif', writer='pillow', fps=20)
12. Working with DataFrames
12.1 Pandas Integration
Plotting from DataFrames
# Create sample DataFrame
dates = pd.date_range('2023-01-01', periods=100)
df = pd.DataFrame({
'date': dates,
'value1': np.cumsum(np.random.randn(100)) + 100,
'value2': np.cumsum(np.random.randn(100)) + 50,
'category': np.random.choice(['A', 'B', 'C'], 100)
})
# Direct plotting from DataFrame
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# Time series plot
axes[0, 0].plot(df['date'], df['value1'], label='Value 1')
axes[0, 0].plot(df['date'], df['value2'], label='Value 2')
axes[0, 0].set_title('Time Series')
axes[0, 0].legend()
axes[0, 0].tick_params(axis='x', rotation=45)
# Histogram
axes[0, 1].hist(df['value1'], bins=20, alpha=0.7, label='Value 1')
axes[0, 1].hist(df['value2'], bins=20, alpha=0.7, label='Value 2')
axes[0, 1].set_title('Histograms')
axes[0, 1].legend()
# Box plot by category
categories = df['category'].unique()
data_by_cat = [df[df['category'] == cat]['value1'].values for cat in categories]
axes[1, 0].boxplot(data_by_cat, labels=categories)
axes[1, 0].set_title('Box Plot by Category')
# Scatter plot
scatter = axes[1, 1].scatter(df['value1'], df['value2'],
c=df.index, cmap='viridis', alpha=0.7)
axes[1, 1].set_xlabel('Value 1')
axes[1, 1].set_ylabel('Value 2')
axes[1, 1].set_title('Scatter Plot')
plt.colorbar(scatter, ax=axes[1, 1])
plt.tight_layout()
plt.show()
12.2 Grouped Data Visualization
# Grouped data analysis
grouped_data = df.groupby('category')['value1'].agg(['mean', 'std'])
# Bar plot with error bars
fig, ax = plt.subplots(figsize=(10, 6))
x_pos = np.arange(len(grouped_data.index))
bars = ax.bar(x_pos, grouped_data['mean'], yerr=grouped_data['std'],
capsize=5, alpha=0.7, color=['red', 'green', 'blue'])
ax.set_xlabel('Category')
ax.set_ylabel('Mean Value')
ax.set_title('Mean Values by Category with Error Bars')
ax.set_xticks(x_pos)
ax.set_xticklabels(grouped_data.index)
# Add value labels on bars
for i, (bar, mean_val) in enumerate(zip(bars, grouped_data['mean'])):
ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
f'{mean_val:.1f}', ha='center', va='bottom')
plt.show()
13. Statistical Plots
13.1 Distribution Plots
Q-Q Plots and Probability Plots
from scipy import stats
# Generate sample data
np.random.seed(42)
normal_data = np.random.normal(0, 1, 1000)
uniform_data = np.random.uniform(0, 1, 1000)
exponential_data = np.random.exponential(1, 1000)
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
# Histograms with theoretical distributions
axes[0, 0].hist(normal_data, bins=50, density=True, alpha=0.7, color='blue')
x = np.linspace(-4, 4, 100)
axes[0, 0].plot(x, stats.norm.pdf(x, 0, 1), 'r-', linewidth=2, label='Theoretical')
axes[0, 0].set_title('Normal Distribution')
axes[0, 0].legend()
axes[0, 1].hist(uniform_data, bins=50, density=True, alpha=0.7, color='green')
axes[0, 1].axhline(y=1, color='red', linestyle='-', linewidth=2, label='Theoretical')
axes[0, 1].set_title('Uniform Distribution')
axes[0, 1].legend()
axes[0, 2].hist(exponential_data, bins=50, density=True, alpha=0.7, color='orange')
x = np.linspace(0, 6, 100)
axes[0, 2].plot(x, stats.expon.pdf(x, scale=1), 'r-', linewidth=2, label='Theoretical')
axes[0, 2].set_title('Exponential Distribution')
axes[0, 2].legend()
# Q-Q plots
stats.probplot(normal_data, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title('Q-Q Plot: Normal Data vs Normal Distribution')
stats.probplot(uniform_data, dist="norm", plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot: Uniform Data vs Normal Distribution')
stats.probplot(exponential_data, dist="norm", plot=axes[1, 2])
axes[1, 2].set_title('Q-Q Plot: Exponential Data vs Normal Distribution')
plt.tight_layout()
plt.show()
13.2 Correlation and Regression
Correlation Heatmap and Regression Plots
# Create correlated data
np.random.seed(42)
n_vars = 5
n_obs = 200
# Generate correlation matrix
corr_matrix = np.random.rand(n_vars, n_vars)
corr_matrix = (corr_matrix + corr_matrix.T) / 2 # Make symmetric
np.fill_diagonal(corr_matrix, 1) # Diagonal should be 1
# Generate multivariate normal data
data = np.random.multivariate_normal(np.zeros(n_vars), corr_matrix, n_obs)
df_corr = pd.DataFrame(data, columns=[f'Var{i+1}' for i in range(n_vars)])
# Calculate correlation matrix
corr = df_corr.corr()
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
# Correlation heatmap
im = axes[0].imshow(corr, cmap='coolwarm', vmin=-1, vmax=1)
axes[0].set_xticks(range(len(corr.columns)))
axes[0].set_yticks(range(len(corr.columns)))
axes[0].set_xticklabels(corr.columns, rotation=45)
axes[0].set_yticklabels(corr.columns)
axes[0].set_title('Correlation Heatmap')
# Add correlation values to cells
for i in range(len(corr.columns)):
for j in range(len(corr.columns)):
text = axes[0].text(j, i, f'{corr.iloc[i, j]:.2f}',
ha="center", va="center", color="black")
plt.colorbar(im, ax=axes[0])
# Regression plot
x = df_corr['Var1']
y = df_corr['Var2']
axes[1].scatter(x, y, alpha=0.6)
# Add regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
line = slope * x + intercept
axes[1].plot(x, line, 'r-', linewidth=2,
label=f'y = {slope:.2f}x + {intercept:.2f}\nR² = {r_value**2:.3f}')
axes[1].set_xlabel('Var1')
axes[1].set_ylabel('Var2')
axes[1].set_title('Regression Plot')
axes[1].legend()
plt.tight_layout()
plt.show()
14. Advanced Customization Techniques
14.1 Custom Color Schemes
Creating Custom Colormaps
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
# Create custom colormap from colors
colors = ['#FF0000', '#FFFF00', '#00FF00', '#00FFFF', '#0000FF']
custom_cmap = LinearSegmentedColormap.from_list('custom', colors, N=256)
# Create discrete colormap
discrete_colors = ['red', 'blue', 'green', 'orange', 'purple']
discrete_cmap = ListedColormap(discrete_colors)
# Test custom colormaps
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Continuous custom colormap
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
Z = np.exp(-(X**2 + Y**2))
im1 = axes[0].contourf(X, Y, Z, levels=20, cmap=custom_cmap)
axes[0].set_title('Custom Continuous Colormap')
plt.colorbar(im1, ax=axes[0])
# Discrete colormap
categories = np.random.randint(0, 5, (20, 20))
im2 = axes[1].imshow(categories, cmap=discrete_cmap)
axes[1].set_title('Custom Discrete Colormap')
plt.colorbar(im2, ax=axes[1], ticks=range(5))
# Color cycle for line plots
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=discrete_colors)
for i in range(5):
axes[2].plot(np.random.randn(50).cumsum(), label=f'Series {i+1}')
axes[2].set_title('Custom Color Cycle')
axes[2].legend()
plt.tight_layout()
plt.show()
14.2 Advanced Text and Annotation Formatting
Rich Text and Mathematical Expressions
fig, ax = plt.subplots(figsize=(12, 8))
# Sample plot
x = np.linspace(0, 10, 100)
y = np.exp(-x/5) * np.cos(2*x)
ax.plot(x, y, 'b-', linewidth=2)
# Various text formatting options
ax.text(2, 0.8, r'$e^{-x/5}\cos(2x), fontsize=20,
bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.7))
# Multi-line text with different styles
multiline_text = r'''This is a multi-line annotation
$\alpha = \frac{\beta}{\gamma}$
Bold: $\mathbf{vector}$
Italic: $\mathit{variable}''
ax.text(6, 0.5, multiline_text, fontsize=12, verticalalignment='top',
bbox=dict(boxstyle='square,pad=0.5', facecolor='lightblue', alpha=0.8))
# Fancy arrow annotation
ax.annotate('Local Maximum', xy=(1.57, 0.67), xytext=(3, 0.9),
fontsize=12, ha='center',
arrowprops=dict(arrowstyle='->', lw=2, color='red',
connectionstyle='arc3,rad=0.2'))
# Custom annotation box
ax.annotate('Exponential Decay', xy=(8, 0.1), xytext=(5, -0.3),
fontsize=12, ha='center',
bbox=dict(boxstyle='round,pad=0.3', facecolor='lightgreen'),
arrowprops=dict(arrowstyle='-|>', lw=2, color='green'))
ax.set_title(r'Function: $f(x) = e^{-x/5}\cos(2x), fontsize=16)
ax.set_xlabel('x', fontsize=14)
ax.set_ylabel('f(x)', fontsize=14)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
14.3 Custom Tick Formatters
Advanced Tick Formatting
from matplotlib.ticker import FuncFormatter, MultipleLocator
from matplotlib.dates import DateFormatter, MonthLocator
import datetime as dt
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
# Custom number formatting
def currency_formatter(x, pos):
return f'${x:.0f}K'
def percentage_formatter(x, pos):
return f'{x:.1f}%'
# Example 1: Financial data
x = np.arange(2020, 2025)
revenue = [120, 135, 150, 175, 200]
axes[0, 0].plot(x, revenue, 'o-', linewidth=2, markersize=8)
axes[0, 0].yaxis.set_major_formatter(FuncFormatter(currency_formatter))
axes[0, 0].set_title('Revenue Growth')
axes[0, 0].set_ylabel('Revenue')
axes[0, 0].grid(True, alpha=0.3)
# Example 2: Percentage data
categories = ['Q1', 'Q2', 'Q3', 'Q4']
growth_rates = [5.2, 7.8, 12.1, 15.6]
axes[0, 1].bar(categories, growth_rates, color='lightgreen')
axes[0, 1].yaxis.set_major_formatter(FuncFormatter(percentage_formatter))
axes[0, 1].set_title('Quarterly Growth Rates')
axes[0, 1].set_ylabel('Growth Rate')
# Example 3: Scientific notation
x = np.logspace(1, 6, 50)
y = 1/x
axes[1, 0].loglog(x, y)
axes[1, 0].set_title('Power Law Relationship')
axes[1, 0].set_xlabel('Input (log scale)')
axes[1, 0].set_ylabel('Output (log scale)')
axes[1, 0].grid(True, which="both", ls="-", alpha=0.3)
# Example 4: Date formatting
dates = [dt.datetime(2023, i, 1) for i in range(1, 13)]
values = np.random.randn(12).cumsum() + 100
axes[1, 1].plot(dates, values, 'o-')
axes[1, 1].xaxis.set_major_locator(MonthLocator(interval=2))
axes[1, 1].xaxis.set_major_formatter(DateFormatter('%b\n%Y'))
axes[1, 1].set_title('Time Series with Date Formatting')
axes[1, 1].tick_params(axis='x', rotation=0)
plt.tight_layout()
plt.show()
15. Best Practices and Common Patterns
15.1 Creating Publication-Ready Figures
Professional Figure Setup
def setup_publication_figure():
"""Setup matplotlib for publication-quality figures"""
plt.rcParams.update({
'font.size': 12,
'font.family': 'serif',
'font.serif': ['Times New Roman'],
'axes.linewidth': 1.2,
'axes.spines.top': False,
'axes.spines.right': False,
'xtick.direction': 'out',
'ytick.direction': 'out',
'xtick.major.size': 6,
'xtick.minor.size': 3,
'ytick.major.size': 6,
'ytick.minor.size': 3,
'legend.frameon': False,
'figure.dpi': 300
})
# Apply settings
setup_publication_figure()
# Create publication figure
fig, ax = plt.subplots(figsize=(8, 6))
x = np.linspace(0, 10, 100)
ax.plot(x, np.sin(x), label='sin(x)', linewidth=2)
ax.plot(x, np.cos(x), label='cos(x)', linewidth=2)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Amplitude')
ax.set_title('Trigonometric Functions')
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('publication_figure.pdf', dpi=300, bbox_inches='tight')
plt.show()
# Reset to default
plt.rcdefaults()
15.2 Error Handling and Debugging
Common Issues and Solutions
# Common matplotlib issues and solutions
# 1. Memory management for multiple figures
def create_and_save_plots():
for i in range(10):
fig, ax = plt.subplots()
ax.plot(np.random.randn(100))
ax.set_title(f'Plot {i}')
plt.savefig(f'plot_{i}.png')
plt.close(fig) # Important: close figure to free memory
# 2. Handling different data types
def robust_plotting(x_data, y_data):
try:
# Convert to numpy arrays
x = np.asarray(x_data)
y = np.asarray(y_data)
# Check for valid data
if len(x) != len(y):
raise ValueError("x and y must have the same length")
# Handle NaN values
mask = ~(np.isnan(x) | np.isnan(y))
x_clean = x[mask]
y_clean = y[mask]
if len(x_clean) == 0:
raise ValueError("No valid data points")
plt.figure()
plt.plot(x_clean, y_clean)
plt.show()
except Exception as e:
print(f"Plotting error: {e}")
# 3. Backend issues
def check_backend():
print(f"Current backend: {plt.get_backend()}")
print(f"Available backends: {plt.backend_bases.Backend}")
# Switch backend if needed
# plt.switch_backend('Agg') # For non-interactive use
15.3 Performance Tips
Optimizing Matplotlib Performance
# Performance optimization techniques
# 1. Use appropriate plot types for data size
def efficient_plotting(data_size):
x = np.random.randn(data_size)
y = np.random.randn(data_size)
if data_size < 10000:
# Use scatter for small datasets
plt.scatter(x, y, alpha=0.6)
else:
# Use hexbin or hist2d for large datasets
plt.hexbin(x, y, gridsize=50)
plt.colorbar()
# 2. Batch operations and avoid loops
def batch_plotting():
# Bad: Multiple plot calls
# for i in range(n):
# plt.plot(x[i], y[i])
# Good: Single call with 2D array
data = np.random.randn(5, 100)
plt.plot(data.T) # Transpose to plot each row
# 3. Use blitting for animations
def fast_animation():
fig, ax = plt.subplots()
line, = ax.plot([], [])
def animate(frame):
# Update data
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x + frame/10)
line.set_data(x, y)
return line, # Return artists for blitting
# Enable blitting for better performance
from matplotlib.animation import FuncAnimation
anim = FuncAnimation(fig, animate, blit=True, interval=50)
return anim
16. Integration with Other Libraries
16.1 Seaborn Integration
Using Matplotlib with Seaborn
import seaborn as sns
# Set seaborn style but use matplotlib for plotting
sns.set_style("whitegrid")
sns.set_palette("husl")
# Create sample data
tips = sns.load_dataset("tips")
# Use matplotlib with seaborn styling
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
# Matplotlib plots with seaborn styling
axes[0, 0].hist(tips['total_bill'], bins=20, alpha=0.7)
axes[0, 0].set_title('Total Bill Distribution')
axes[0, 0].set_xlabel('Total Bill ($)')
axes[0, 1].scatter(tips['total_bill'], tips['tip'], alpha=0.6)
axes[0, 1].set_title('Tips vs Total Bill')
axes[0, 1].set_xlabel('Total Bill ($)')
axes[0, 1].set_ylabel('Tip ($)')
# Box plot by category
for i, day in enumerate(tips['day'].unique()):
day_data = tips[tips['day'] == day]['total_bill']
axes[1, 0].boxplot(day_data, positions=[i], widths=0.6)
axes[1, 0].set_xticklabels(tips['day'].unique())
axes[1, 0].set_title('Total Bill by Day')
# Combined seaborn and matplotlib
sns.scatterplot(data=tips, x='total_bill', y='tip', hue='time', ax=axes[1, 1])
axes[1, 1].set_title('Tips by Time of Day')
plt.tight_layout()
plt.show()
16.2 Plotly Integration
Converting Between Matplotlib and Plotly
# Create matplotlib figure
fig_mpl, ax = plt.subplots()
x = np.linspace(0, 10, 100)
ax.plot(x, np.sin(x), label='sin(x)')
ax.plot(x, np.cos(x), label='cos(x)')
ax.set_title('Matplotlib Figure')
ax.legend()
# Note: Plotly integration would require additional setup
# This is conceptual code showing the pattern
Summary and Quick Reference
Essential Functions Quick Reference
# Basic plotting
plt.plot(x, y) # Line plot
plt.scatter(x, y) # Scatter plot
plt.bar(x, height) # Bar chart
plt.hist(x) # Histogram
plt.pie(x, labels=labels) # Pie chart
# Customization
plt.title('Title') # Set title
plt.xlabel('X Label') # X-axis label
plt.ylabel('Y Label') # Y-axis label
plt.legend() # Add legend
plt.grid(True) # Add grid
plt.xlim(0, 10) # Set x limits
plt.ylim(0, 10) # Set y limits
# Subplots
fig, axes = plt.subplots(2, 2) # Create subplots
plt.subplot(2, 2, 1) # Select subplot
# Saving
plt.savefig('plot.png') # Save figure
plt.show() # Display figure
plt.close() # Close figure
Common Parameters
● figsize: Figure size (width, height) in inches
● dpi: Resolution in dots per inch
● alpha: Transparency (0-1)
● color or c: Color specification
● linewidth or lw: Line width
● linestyle or ls: Line style ('-', '--', '-.', ':')
● marker: Marker style ('o', 's', '^', etc.)
● markersize or ms: Marker size
● label: Legend label
This comprehensive guide covers all major aspects of Matplotlib pyplot for data visualization. Practice
with these examples and gradually incorporate more advanced techniques as you become comfortable
with the basics.