0% found this document useful (0 votes)
27 views1 page

PandasGUIA PYTHON-04

The document discusses the Pandas library in Python which provides data structures and analysis tools. It covers Pandas Series and DataFrames, I/O methods like reading/writing CSV, Excel and SQL databases. It also covers data selection, applying functions, data alignment and summarizing DataFrames.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views1 page

PandasGUIA PYTHON-04

The document discusses the Pandas library in Python which provides data structures and analysis tools. It covers Pandas Series and DataFrames, I/O methods like reading/writing CSV, Excel and SQL databases. It also covers data selection, applying functions, data alignment and summarizing DataFrames.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

> I/O > Retrieving Series/DataFrame Information

Python For Data Science Read and Write to CSV Basic Information

Pandas Basics Cheat Sheet >>> pd.read_csv(‘file.csv’, header=None, nrows=5)


>>> df.to_csv('myDataFrame.csv')
>>>
>>>
>>>
df.shape #(rows,columns)
df.index #Describe index
df.columns #Describe DataFrame columns
>>> df.info() #Info on DataFrame
Learn Pandas Basics online at www.DataCamp.com Read and Write to Excel >>> df.count() #Number of non-NA values

>>> pd.read_excel(‘file.xlsx’)
>>> df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
Summary
Read multiple sheets from the same ile df.sum() #Sum of values
Pandas
>>>
>>> df.cumsum() #Cummulative sum of values
>>> xlsx = pd.ExcelFile(‘file.xls’)
>>> df.min()/df.max() #Minimum/maximum values
>>> df = pd.read_excel(xlsx, 'Sheet1')
>>> df.idxmin()/df.idxmax() #Minimum/Maximum index value
>>> df.describe() #Summary statistics
The Pandas library is built on NumPy and provides easy-to-us data
structures and data analysis tools for the Pytho programming language. Read and Write to SQL Query or Database Table >>>
>>>
df.mean() #Mean of values
df.median() #Median of values

Use the following import convention >>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:')
>>> import pandas as pd >>>
>>>
pd.read_sql("SELECT * FROM my_table;", engine)
pd.read_sql_table('my_table', engine) > Applying Functions
>>> pd.read_sql_query("SELECT * FROM my_table;", engine)
read_sql() is a convenience wrapper around read_sql_table() and read_sql_query() >>> f = lambda x: x*2

> Pandas Data Structures >>> df.to_sql('myDf', engine) >>> df.apply(f) #Apply function
>>> df.applymap(f) #Apply function element-wise

Series
> Selection Also see NumPy Arrays
> Data Alignment
A one-dimensional labeled array a 3
capable of holding any data type Geting
Index
b -5 Internal Data Alignment
c 7
>>> s['b'] #Get one element
-5
NA values are introduced in the indices that don’t overlap:
d 4
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> df[1:] #Get subset of a DataFrame >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
Country Capital Population
>>> s + s3
1 India New Delhi 1303171035
a 10.0
Dataframe 2 Brazil Brasília 207847528
b NaN
c 5.0
d 7.0
A two-dimensional labele data structure Selecting, Boolean Indexing & Seting
with column of potentially diferent types
By Position Arithmetic Operations with Fill Methods
Columns Country Capital Population
>>> df.iloc[[0],[0]] #Select single value by row & column
You can also do the internal data alignment yourself wit the help of the ill methods:
0 Belgium Brussels 11190846 'Belgium
Index
1 India New Delhi 1303171035 >>> df.iat([0],[0]) >>> s.add(s3, fill _values=0
'Belgium' a 10.
2 Brazil Brasilia 207847528
b -5.
By Label
>>> data = {'Country':
['Belgium', 'India', 'Brazil'], c 5.
'Capital': ['Brussels', 'New Delhi', 'Brasília'], >>> df.loc[[0], ['Country']] #Select single value by row & column labels d 7.
'Population': [11190846, 1303171035, 207847528]} 'Belgium' >>> s.sub(s3, fill_value=2
>>> df = pd.DataFrame(data, >>> df.at([0], ['Country']) >>> s.div(s3, fill_value=4
columns=['Country', 'Capital', 'Population']) 'Belgium' >>> s.mul(s3, fill_value=3)

By Label/Position

>>> df.ix[2] #Select single row of subset of rows


> Dropping Country Brazi
Capital Brasíli
Population 207847528
>>> s.drop(['a', 'c']) #Drop values from rows (axis=0) >>> df.ix[:,'Capital'] #Select a single column of subset of columns
>>> df.drop('Country', axis=1) #Drop values from columns(axis=1) 0 Brussel
1 New Delhi
2 Brasíli
>>> df.ix[1,'Capital'] #Select rows and columns

> Asking For Help 'New Delhi'

Boolean Indexing
>>> help(pd.Series.loc) >>> s[~(s > 1)] #Series s where value is not >1
>>> s[(s < -1) | (s > 2)] #s where value is <-1 or >2
>>> df[df['Population']>1200000000] #Use filter to adjust DataFrame

> Sort & Rank Seting

>>> s['a'] = 6 #Set index a of Series s to 6


>>> df.sort_index() #Sort by labels along an axis Learn Data Skills Online at
>>> df.sort_values(by='Country') #Sort by the values along an axis
>>> df.rank() #Assign ranks to entries
www.DataCamp.com

You might also like