0% found this document useful (0 votes)
45 views1 page

Pandas DataFrame Notes - 12pages-Pages-1

This cheat sheet provides a quick reference for using the pandas DataFrame object in Python, including how to import necessary modules, create DataFrames from various data sources (CSV, Excel, MySQL), and the structure of DataFrames and Series. It outlines conventions for naming variables and includes code examples for common operations. The document serves as a guide for efficiently managing and manipulating data using pandas.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views1 page

Pandas DataFrame Notes - 12pages-Pages-1

This cheat sheet provides a quick reference for using the pandas DataFrame object in Python, including how to import necessary modules, create DataFrames from various data sources (CSV, Excel, MySQL), and the structure of DataFrames and Series. It outlines conventions for naming variables and includes code examples for common operations. The document serves as a guide for efficiently managing and manipulating data using pandas.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Cheat Sheet: The pandas DataFrame Object

Preliminaries Get your data into a DataFrame

Always start by importing these Python modules Instantiate an empty DataFrame


import numpy as np df = DataFrame()
import matplotlib.pyplot as plt
import pandas as pd Load a DataFrame from a CSV file
from pandas import DataFrame, Series df = pd.read_csv('file.csv') # often works
Note: these are the recommended import aliases df = pd.read_csv('file.csv', header=0,
Note: you can put these into a PYTHONSTARTUP file index_col=0, quotechar='"', sep=':',
na_values = ['na', '-', '.', ''])
Note: refer to pandas docs for all arguments
Cheat sheet conventions Get data from inline CSV text to a DataFrame
from io import StringIO
Code examples data = """, Animal, Cuteness, Desirable
# Code examples are found in yellow boxes row-1, dog, 8.7, True
row-2, cat, 9.5, True
In the code examples, typically I use: row-3, bat, 2.6, False"""
 s to represent a pandas Series object; df = pd.read_csv(StringIO(data), header=0,
index_col=0, skipinitialspace=True)
 df to represent a pandas DataFrame object;
 idx to represent a pandas Index object. Note: skipinitialspace=True allows for a pretty layout
 Also: t – tuple, l – list, b – Boolean, i – integer,
a – numpy array, st – string, d – dictionary, etc. Load DataFrames from a Microsoft Excel file
# Each Excel sheet in a Python dictionary
workbook = pd.ExcelFile('file.xlsx')
d = {} # start with an empty dictionary
The conceptual model for sheet_name in workbook.sheet_names:
df = workbook.parse(sheet_name)
d[sheet_name] = df
DataFrame object: is a two-dimensional table of data
with column and row indexes (something like a spread Note: the parse() method takes many arguments like
sheet). The columns are made up of Series objects. read_csv() above. Refer to the pandas documentation.

Column index (df.columns) Load a DataFrame from a MySQL database


import pymysql
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://'
Series of data

Series of data

Series of data

Series of data
Series of data

Series of data

Series of data
(df.index)

+'USER:PASSWORD@HOST/DATABASE')
Row index

df = pd.read_sql_table('table', engine)

Data in Series then combine into a DataFrame


# Example 1 ...
s1 = Series(range(6))
s2 = s1 * s1
A DataFrame has two Indexes: s2.index = s2.index + 2 # misalign indexes
 Typically, the column index (df.columns) is a list of df = pd.concat([s1, s2], axis=1)
strings (variable names) or (less commonly) integers
 Typically, the row index (df.index) might be: # Example 2 ...
o Integers - for case or row numbers; s3 = Series({'Tom':1, 'Dick':4, 'Har':9})
o Strings – for case names; or s4 = Series({'Tom':3, 'Dick':2, 'Mar':5})
o DatetimeIndex or PeriodIndex – for time series df = pd.concat({'A':s3, 'B':s4 }, axis=1)
Note: 1st method has in integer column labels
Series object: an ordered, one-dimensional array of Note: 2nd method does not guarantee col order
data with an index. All the data in a Series is of the
same data type. Series arithmetic is vectorised after first Get a DataFrame from a Python dictionary
aligning the Series index for each of the operands. # default --- assume data is in columns
df = DataFrame({
s1 = Series(range(0,4)) # -> 0, 1, 2, 3 'col0' : [1.0, 2.0, 3.0, 4.0],
s2 = Series(range(1,5)) # -> 1, 2, 3, 4 'col1' : [100, 200, 300, 400]
s3 = s1 + s2 # -> 1, 3, 5, 7 })

Version 30 April 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
1

You might also like