0% found this document useful (0 votes)
59 views79 pages

Data Exploration Lab

The document provides a detailed guide on installing data analysis and visualization tools such as Python and R, including step-by-step instructions for setting up Anaconda and R on Windows. It also covers practical exercises for data exploration and visualization using libraries like Pandas, Numpy, and Matplotlib, with examples of creating arrays, data frames, and various plots. The document emphasizes the importance of these tools for analyzing datasets, including email data, and visualizing insights.

Uploaded by

gurudevanaids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
59 views79 pages

Data Exploration Lab

The document provides a detailed guide on installing data analysis and visualization tools such as Python and R, including step-by-step instructions for setting up Anaconda and R on Windows. It also covers practical exercises for data exploration and visualization using libraries like Pandas, Numpy, and Matplotlib, with examples of creating arrays, data frames, and various plots. The document emphasizes the importance of these tools for analyzing datasets, including email data, and visualizing insights.

Uploaded by

gurudevanaids
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 79
petical Exercises ° PRACTICAL EXERCISES gsample No.1: Install the data Analysis and Visualization tool: R/ Python /Tableau Publie/ Power BI. iptalling Python using Anaconda Python is a popular language for scientific computing, and great for general-purpose sogramming as well. Installing all of the scientific packages we use in the lesson ipgividually can be a bit cumbersome, and therefore recommend the all-in-one installer anaconda. Windows e — Open https://www.anaconda.com/products/individual in your web browser. e Download the Anaconda Python 3 installer for Windows. e Double-click the executable and install Python 3 using the recommended settings. Make sure that Register Anaconda as my default Python 3.x option is checked — it should be in the latest version of Anaconda. «Verify the installation: click Start, search and select Anaconda Prompt from the menu. A window should pop up where you can now type commands such as checking your Conda installation with: conda—help 2. Required Python Packages The following are packages needed for this workshop: Pandas Jupyter notebook Numpy Matplotlib Plotnine All packages apart from plotnine will have automatically been installed with Anaconda and we can use Anaconda as a package manager to install the missing plotnine. Command to install conda install -y -c conda-forge plotnine This will then install the latest version of plotnine into your conda environment. © scanned with Oken Scanner 2 Data Exploration and Visualization ‘To import miniconda package conda install -y numpy pandas matplotlib jupyter conda install -c conda-forge plotnine Activate the new environment with: conda activate python-ecology-lesson You can deactivate the environment with: conda deactivate Launch a Jupyter notebook After installing either Anaconda or Miniconda and the workshop packages, launch a Jupyter notebook by typing this command from the terminal: jupyter notebook The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888. Installing R on Windows OS To install R on Windows OS: Go to the CRAN website. Click on “Download R for Windows”. Click on “install R for the first time” link to download the R executable (exe) file. Run the R executable file to start installation, and allow the app to make changes to your device. Select the installation language. Select Setup Language i Select the language to use during the installation, © scanned with Oken Scanner ; ri “ow the uctions f #B Setup - Rfor Windows 4.1.2 x Information Please read the folowing important information before continuing. R When you are reaidy to continue with Setup, ck Next. [[ Weert Pustic License | Version 2, June 1991 | | copyright (C) 1999, 1991 Free Software Feundaten, In. | 51 Frankin St, Fifth Floor, Boston, MA 02110-1301 USA | Everyone is permitted to copy and dstrbute verbatim copies | of thes cense document, but changing itis not alowed, Preamble | the kcenses for most software ae designed to take away Your |Ireedom to share and changeit. By contrast, the GNU General Pubic |Lcense is intended to guarantee your freedom to share and change free |gofiware--to make sure the oftware i free fr alts users, This |General Pubic License apples to most ofthe Free Sofware - Completing the R for Windows 4.1.2 Setup Wizard 1 fected netatng fondo 4120 70 Sek ha enon eared by sera inetated shorteuts. dk Finch to ent Seb. © scanned with Oken Scanner a Data Exploration and Visualization R has now been sucessfully installed on your Windows OS. Open the R GUI to Start writing R codes. arssonse suey to sci mi © scanned with Oken Scanner practical Exercises 5 gxample No. 2: Perform explorato data set. Export Pandas data fra the data, ry data analysis (EDA) on with datasets like email all Your emails as a dataset, import them inside a me, visualize them and get different insights from The CData Python Connector for E . . mail enables you dules to analyze and visualize live Email data you use pandas and other modu in Python, Stepl: Download Email dataset from hitps://www.kaggle.com/code/jaykrishna/topic-modeling-enron-email-dataset/data Step 2: Import needed package importos,sys,email,re importnumpyasnp importpandasaspd : Step 3: # Read the data into a DataFrame emails_df=pd.read_csv(‘../input/emails.csv’) print(emails_df.shape) emails_df.head() Output file message 0 allen-p/_sent_mail/1. Message-ID: <18782981.1075855378110.JavaMail.e. 1 allen-p/_sentmail/10. Message-ID: <15464986.1 075855378456.JavaMail.e... 2 allen-p/_sent_mail/100. Message-ID: <24216240.107585568745 1.JavaMail 3. allen-p/_sent_mail/1000. Message-ID: <13505866.1 075863688222.JavaMail 4 alien-p/_sent.mail/1001. Message-ID: <30922949.1075863688243.JavaMaile... Step 4 ## Helper functions def get_text_from_email(msg): ‘To get the content from email objects” f © scanned with Oken Scanner Data Exploration and Visualization parts [] for part in msg.walk(): if part.get_content_type() =="text/plain’: parts,append( part.get_payload() ) if return” join(parts) def split_email_addresses(line): “To separate multiple email addresse. if lin addrs= line.split(‘,") addrs=frozenset(map(lambda x: x.strip(), addrs)) else: addrs=None return addrs - # Parse the emails into a list email objects messages=list(map(email.message_from_string, emails_dff‘message’})) emails_df.drop(‘message’, axis=1, inplace=True) # Get fields from parsed email objects keys= messages{0].keys() for key in keys: emails_d{[key] = [doc[key] for doc in messages] # Parse content from emails emails_df[‘content’] =list(map(get_text_from_email, messages)) # Split multiple email addresses emails_df[‘From’] = emails_di[‘From’ ].map(split_email_addresses) emails_df[‘To”] = emails_df]‘To’ ].map(split_email_addresses) # Extract the root of ‘file’as ‘user’ emails_df[‘user’] = emails_df]‘file”].map(lambda x:x.split(‘/”)[0]) del messages emails_df-head() © scanned with Oken Scanner es eT © scanned with Oken Scanner Data Exploration and Visualization Example No: 3 Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib. Creating a Numpy Array # Creating a single-dimensional array a =np.array({1.2,3]) # Calling the array function print(a) 023) # Creating a multi-dimensional array 2 Each set of elements within a square bracket indicates a row # Array of two rows and two columns b =np.array({{1,2), [3,4])) print(b) [11 2] (34) # Creating an ndarray by wrapping a list list] = [1,2,3,4,5] # Creating a list arr= np.array(list!) # Wrapping the list print(arr) [12345] # Creating an array of numbers of a specified range arr] =np.arange(10, 100) # Array of numbers from 10 up to and excluding 100 print(arr!) [10 1112 13 14 15 16.17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52.53 54.55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94.95 96 97 98 99)’ # Creating a 5x5 array of zeroes arr =np.zeros((5,5)) print(arr2) © scanned with Oken Scanner practical Exercises . {(0. 0. 0. 0. 0.) me eees * Pandas is an open-source Python library providing efficient, easy-to-use data structure and data analysis tools. The name Pandas is derived from “Panel Data” - an Econometrics from Multidimensional Data. Pandas is well suited for many different kinds of data: Tabular data with heterogeneously-type columns. e Ordered and unordered time series data. e Arbitary matrix data with row and column labels. e Any other form observational/statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure. To import library e — importpandasaspd series=pd.Series()# The Series() function creates a new Series print(series) . Series({], dtype: float64) # Creating a series from an ndarray # Note that indexes are a assigned automatically if not specifies (71 arr=np.array((10,20,30,40,50]) d.Series(arr) series print(series!) 0 10 1 20 2 30 340 4 50 dtype: int#4#32 # Creating a series from a Python dict # Note that the keys of the dictionary are used to assign indexes during conversion © scanned with Oken Scanner 10 Data Exploration and Visualization data={‘a":1 0,"b":20,’c:30} /\ __ series2=pd.Series(data) print(series2) a 10 b 20 © 30 dtype: int64~ \ # Retrieving a part of the series using slicing J) print(series![1:4]) eA 1 20 DD) 3 40 dtype: intéd 22. DataFrames 1, A DataFrame is a 2D data structure in which data is aligned in a tabular fashion consisting of rows & columns 2. A DataFrame can be created using the following constructor - pandas.DataFrame(data, index, dtype, copy) 3. Data can be of multiple data types such as ndarray, list, constants, series, dict ete. . 4. Index Row and column labels of the dataframe; defaults to np.arrange(n) if no index is passed 5, Data type of each column 6. Creates a deep copy of the data, set to false as default Creating a DataFrame # Converting a list into a DataFrame }0,20,30,40] pd.DataFrame(list!) print(table) 0 010 1 20 2 30 3 40 _ © scanned with Oken Scanner practical Exercises 7 J Mable1=pd.DataFrame(data) In [27]: # Creating a DataFrame from a list of dictionaries data=[{‘a:1,"b':2},{¢ :4,°c":8}] print(table1) # NaN (not a number) is stored in areas where no data is provided abe # Creating a DataFrame from a list of dictionaries and accompaying row indices _/)_ sble2=pd.DataFrame(data,index=[‘irst'second?) SA) # Dict keys become column lables print(table2) abc first 1 2 NaN second 2 4 8.0 10 a DataFrame ‘a'b'e'), ve''a')) # Converting a dictionary of seri datal={‘one’ :pd.Series({1,2,3],ind ‘two’ :pd.Series({1,2,3,4],index=[‘a’,’b table3=pd.DataFrame(data!) print(table3) # the resultant index is the union of all the series indexes passed one two alo J b 2.0.2 ¢ 3.0 3 d NaN 4 DataFrame - Addition & Deletion of Columns #4 new column can be added to a DataFrame when the data is passed as a Series table3[‘three’ }=pd.Series({10,20,30],index=[‘a’,’b’,’c"]) print(table3) one two three ald 1 10.0 b2.0 2 20.0 ¢ 3.0 3 30.0 © scanned with Oken Scanner Data Exploration and Visualization d NaN 4 NaN # DataFrame columns can be deleted using the del() function deltable3[‘one’] print(table3) two three a 1 10.0 b 2 20.0 © 3 30.0 d 4 NaN # DataFrame columns can be deleted using the pop() function table3.pop(‘two’) print(table3) three a 10.0 b 20.0 c 30.0 d NaN DataFrame - Addition & Deletion of Rows # DataFrame rows can be selected by passing the row lable to the loc() function print(table3.loc[‘c’]) three 30.0 Name: ¢, dtype: floaté4 # Row selection can also be done using the row index print(table3.iloc[2]) three 30.0 Name: ¢, dtype: float4 Matplotlib 1, Matplotlib is a Python library that is specially designed for the development of graphs, charts etc., in order to provide interactive data visualisation 2. Matplotlib is inspired from the MATLAB software and reproduces many of it’s features # Import Maiplotlib submodule for plotting importmatplotlib.pyplotasplt zs © scanned with Oken Scanner pactical Exercises a potting in Matplotlib plt.plot([1,2,3,4])# List of vertical co-ordinates of the points plotied plt.show()# Displays plot # Implicit X-axis values from 0 to (N-1) where N is the length of the list 4.04 3.55 3.04 254 2.04 ( 0.02 F105. 1054.5) 2 20°. 25 730 # We can specify the values for both axes x srange(5) # Sequence of values for the x-axis # X-axis values specified - [0,1,2,3,4] plt.plot(x, [x1**2for x1 in x]) # vertical co-ordinates of the points plotted: y = x"2 plt.show() © scanned with Oken Scanner 14 Data Exploration and Visualization Multiline Plots linkeode # Multiple functions can be drawn on the same plot x=range(5) plt.plot(x,[x 1 forx linx]) plt.plot(x,[x1*x1 forx linx]) plt.plot(x,[x1 *x1*x1 forx Linx) plt.show() 00 05 #10 #15 #20 25 30 35 40 Adding a Legend # Legends explain the meaning of each line in the graph x=np.arange(5) plt.plot(x,x,label="linear’) plt.plot(x,x*x,label=’square”) plt.plot(x,x*x*x,label=’cube’) plt.grid(True) plt.xlabel(*X-axis") plt.ylabel('Y-axis") plt.title(“Polynomial Graph") plt.tegend() plt.show() _ © scanned with Oken Scanner practical Exercises 15 Polynomial Graph 305 |_| T T T T 7 0.0 gOS M1015 w20 N25 30. 35 40 Xaxis Matplotlib provides many types of plot formats for visualising information 1, Scatter Plot 2. Histogram 3. Bar Graph 4, Pie Chart Histogram # Histograms display the distribution of a variable over a range of frequencies or values .random.randn(100,100)# 100x100 array of a Gaussian distribution plthist(y)# Function to plot the histogram takes the dataset as the parameter pltshow() SReRBRS = | © scanned with Oken Scanner Data Exploration and Visualization, 16 Barchart -tmeor matplet ti pyc ae ; A hampy as PA plt.bar({12.3}. asp? F plt.show() Pie Chart plt.figuré(figsize=(3,3))# Size of the plot in inches x=[40,20,5]# Proportions of the sectors labels=[‘Bikes’,’Cars’,' Buses"] plt.pie(x,labels=labels) plt.show() Scatter Plot ‘ Seater plots display values for ovo ses of data, visualised asa collection of ints # Two Gaussion distribution plotted © scanned with Oken Scanner practical Exercises : x=np.random.rand(1000) @ epost worplet ih - pyret as Pt: y=np.random.rand(1000) . plt.scatter(x,y) ® venpor*® Hoespy 0 OP pit.show() Hag, fe A © scanned with Oken Scanner 18 Data Exploration and Visualization Example No.4: Explore variow: Apply various plot features in R ulation in R on the ‘Census Income which contains the income information variable and row filters in R for cleaning daty To perform data extraction and data manip dataset from the UCI Machine Learning Repository, of over 48,000 individuals, taken from the 1994 US census. To import the dataset: consus<- _rend.esv(“C:\\Users\\Intellipaat-Team\\Desktop\\census. income.csv”)on sample data sets and visualize. class(census) [1] “data.frame” >dim(census) [1] 3016215 >names(census) [1] “age” “workelass” “fnlwgt” “education” — “education.num” [6] “marital.status” “occupation” “relationship” “race” ‘sex’ [11 “capital.gain” “capital.loss” “hours.per.week” “native.country” “X” >head(census) #First six rows and all columns age workclass fnlwgt education education.num 139 State-gov 77516 Bachelors 13 2 50Self-emp-not-inc 83311 Bachelors 13 3 38 Private 215646 HS-grad 9 4 53 Private 234721 ith tf 5 28 Private 338409 Bachelors 13 6 37 “Private 284582 Masters 14 To remove whitespaces from the above columns, we use the mutate if and the str_trim functions from the dplyr and the stringr packages, respectively. library(“dplyr”) library(stringr) census %>% mutate_if(is.character, str_trim) -> census After performing the above operation, all th i ai . ill be removed. , all the leading and trailing whitespaces ‘0 convert the above columns back to factors to get back to the original structu! lo get t eI a © scanned with Oken Scanner practical Exercises 19 #Convert character columns back to factors census$workclass <- as.factor(census$workclass) census$occupation <- as.factor(census$occupation) census$native.country <- as.factor(census$native.country) census$education <- as.factor(census$education) census$marital.status <- as.factor(census$marital.status) censusSrelationship <- as.factor(censusSrelationship) census$race <- as.factor(census$race) census$sex <- as.factor(census$sex) census$X <- as.factor(census$X) Data Extraction in R Data must be in clean and tidy format. First, use the base R functions to extract rows and columns from a data In this example, we will use the indexing features in R to perform data extraction on the ‘census’ dataset. frame. For example: Hselect columns age, education, sex mycol<- o(“age”, “education”, “sex”) >census[mycol] age education sex 39 Bachelors Male 1 2 50 Bachelors Male 3.38 HS-grad Male 4°53 11th Male 5 28 Bachelors Female 6 37. Masters Female # First Row and 2nd and third column census{1, 2:3] workclass fnlwgt 1 State-gov 77516 4 First 6 Rows and Second Column as a data frame as.data.frame( census[1:6,2], drop=false) census[1:10, 2] 1° State-gov ee © scanned with Oken Scanner 20 Data Exploration and Visualization 2 Self-emp-not-ine 3 Private 4 Private 5 Private 6 Private #Element at Sth row, tenth column census[5,10][1] Female Levels: Female Male # exclude columns age, occupation, and race mycols<- names(census) %in% e(“age”, “occupation”, “race”) newdata<- census[!mycols] # exclude 3rd and Sth column newdata<- census[c(-3,-5)] # delete columns fnlwgt and education.num census$fnlwegt <- censusSeducation.num <- NULL 4selecting rows based on column values newdata<- census[ which(census$sex==’Female” & census$age > 65), ] © scanned with Oken Scanner Practical Exercises 21 Example No.:5. Perform Time Series Analysis and apply the vari techniques. us visualization Different types of visualizations time series data. They are: Line Plots. Histograms and Density Plots. Box and Whisker Plots. Heat Maps. Lag Plots or Scatter Plots. Autocorrelation Plots. ayrepe This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.The units are in degrees Celsius and there are 3,650 observations. The source of the data is credited as the Australian Bureau of Meteorology. Step 1:Download the dataset https://github.com/jbrownlee/Datasets daily-minimum-temperatures.csv”. ( package and read dataset + Step 2: import from pandas import read_csv from matplotlib import pyplot © scanned with Oken Scanner 2 Data Exploration and Visualization series = read_esv(‘/content/daily-min- temperatures.csv’, header=0, index_col=0, parse_dates=True, squee7 True) print(scrics.head()) © free panes import rend 106 taper hands, Inder sole, parse Atenen, smmatenTas) teatetes Step 3: Time Series Line Plot from pandas import read_csv from matplotlib import pyplot series = read_csv(‘/content/daily-min- temperatures.csv’, header=0, index_col=0, parse_dates=True, squeeze=True) series.plot() pyplot.show() © tom pentan tacort rent env em etplct hi ingore pypiot raed covt (entent/dably-nin:tenpersterst.ct0" trees saueezestree) SEP eee Perr er Step 4: use aasnea une series.plot(style="k.”) pyplot.show() _ © scanned with Oken Scanner practical Exercises 3 2 step 5: group data by year a Minimum Daily Temperatures dataset spans 10 years. We can group data by year = create a line plot for cach year for direct comparison.A plot of this contrived DataFrame is created with each column visualized as a subplot with legends removed to cut back on the clutter, from pand from pandas import DataFrame from pandas import Grouper import read_csv from matplotlib import pyplot series = read_csv(‘/content/daily-min- temperatures.csv’, header=0, index_col=0, parse_dates=True, squeeze=Truc) groups = series.groupby(Grouper(freq="A’)) years = DataFrame() for name, group in-groups: years{name.year] = group.values years.plot(subplots=True, legend=False) pyplot.show() (7) fron pandas: import: end cov from pandas import OataFrane aries + rend cor ontent/Saiy Seige cselesereuiy(eromer( (re) years = oataFrane() for name, group in grvps: years[noe year] = growpvalons years plot(ssbplotstevt, laendsels pyplet.show) Step 6: Time Series Histogram and Density Plots creates a histogram plot of the observations in t dataset. he Minimum Daily Temperatures © scanned with Oken Scanner \ 24 Data Exploration and Vistar, from pandas import read_es¥ aa t a from matplotlib import pyplo fae senipratutes. 05" fee - i sdaily-minimum+ i, series = read_esv(‘dai parse_dates=True, squec? rue) = st() pyplot.show() © iin aacascu'y aserad, Sinden ole, parse tnessTeue Seer rend exec fecntent/aatly mint Step 7: density plot of the Minimum Daily Temperatures dataset. from pandas import read_csv from matplotlib import pyplot series = read_csv(‘daily-minimum-temperatures.csv’, header=0, index_col=0, parse_dates=True, squeeze=True) series.plot(kind="kde’) pyplot.show() wRone © Irom andes import resdcav from matpletllb Lepore pyplot series « read.cxv("/content/dally-aln-tempersturescov' a se fe y-aln-tewpen + beeders®, Index cols0, parse datesstrue, sqveezestrve) © scanned with Oken Scanner practical Exercises 25 step 8: Time Serfes Box and Whisker Plots by Interval Box and whisker plots can x be created and com series, such as years, month: S, or days, from pandas import read_esy from pandas import DataFrame from pandas import Grouper from matplotlib import pyplot series = read_esv(‘daily- 'um-temperatures.csv’, header=0, index_col=0, parse_dates=True, Squeeze=True) groups = series. groupby(Grouper(freq—"A")) years = DataFrame() for name, group in groups: years[name. year] = group. values years.boxplot() pyplot.show() pared for each interval ina time Ineodereoy Andon cols0, perse,satenstron, sqveetectrve) Step 9: box and whisker plot are created for each month-column in the newly constructed DataFrame. # Create a boxplot of monthly data from pandas import read_csv from pandas import DataFrame from pandas import Grouper from matplotlib import pyplot from pandas import concat : series = read_csv(‘/content/daily-min- © scanned with Oken Scanner 26 Data Exploration and Visualization temperatures.csv’, header=0, index_col=0, parse_dates=True, squeeze=True) one_year = series[*1990°] , groups = one_year,groupby(Grouper(freq="M")) ; ({DataFrame(x[].values) for x in groups}, axis=1) months.boxplot() pyplot.show0 r/ic/iytbnh.2/estpacagenfetget bbe talt_ py 9s Wbaopncaioarng:Crestine nlarey fron rugand astedsemances ( TF oeeae aa t A talntancety rpsaerey) ase pater) Step 10: Time Series Heat Maps creating a heatmap of the Minimum Daily Temperatures data. The matshow() function from the matplotlib library is used as no heatmap support is provided directly in Pandas. from pandas import read_csv from pandas import DataFrame from pandas import Grouper from matplotlib import pyplot series = read_ v(‘/content/daily-min- temperatures.csy’, header=0, index_col=0, parse_dates=True, squecze=True) groups = series.groupby(Grouper(freq="A’)) years = DataFrame() for name, group in groups: © scanned with Oken Scanner Practical Exercises 27 years[name.year] = group.values years = years. pyplot.matshow(years, interpolation=None, aspect="auto’) pyplot.show() ° from ponte inport rand coy trom pon nport Bateheome rom pandet Inport Greer from satplet Lib Inport pyptet vs Hanae Inia colon, gare tatereron, ancora) Step 11: heat map comparing heat map comparing the months of the year in 1990. Each column represents one month, with rows representing the days of the month from 1 to 31. from pandas import read_csv from pandas import DataFrame from pandas import Grouper from matplotlib import pyplot from pandas import concat series = read_esv(‘/content/daily-min- temperatures.csv’, header=0, index_col=0, parse_dates=True, squeeze=True) one_year = series{‘1990"] groups = one_year.groupby(Grouper(freq="M")) months = concat([DataFrame(x[1 ].values) for x in groups), ax’ =1) months = DataFrame(months) months.columns = range(1,13) pyplot.matshow(months, interpolation=None, aspect="auto’) pyplot.show() : © scanned with Oken Scanner Data Exploration and Visualization Step Whe ABs, :12 Time Series Lag Scatter Plots Ina lag plot a ball in the middle or a spread across the plot suggests a weak or no relationship. # create a scatter plot from pandas import read_csv from matplotlib import pyplot from pandas.plotting import lag_plot series = read_esv(‘/content/daily-min- temperatures.csv’, header= lag_plot(series) pyplot.show() , index_col=0, parse_dates=True, squeeze=True) ¥ OH erate » seater pet fron gate apr an sv from atgat ib Sort plot from pads pleting Saprt agg sche «redolent daly abn: tegaatcnsci, hander, open a ee se 1 beter, Indes clot, pase stereo, sqnntestrv) pnlet-had) © scanned with Oken Scanner practical Exercises ship be with its lag! value, i tion with each value in the last week. from pandas import read_esy from pandas import DataFrame from pandas import concat from matplotlib import Pyplot from pandas.plotting import scatter_matrix series = read_esv(*/content/daily-min- temperatures.csv’, header=0, index_col=0, values = DataFrame(series.values) lags =7 columns = [values] for i in range(1,(lags + 1): columns.append(values.shift(i)) dataframe = concat(columns, axis=1) columns = [‘t+1’] for i in range(1,(lags + 1): columns.append(‘t-’ + str(i)) dataframe.columns = columns pyplot.figure(1) for i in range(1,(lags + 1)): ax = pyplot.subplot(240 + Y . ax.set_title(‘t+1 vs t-’ + str(i ot 7 any seatuntciciaheneltel bealies y=dataframe[‘t-’+str(i)].values) pyplot.show() parse_dates=True, squeeze=True) sy henderse, Antenscelet, parse aatnstran, sevetenteut) Paes © scanned with Oken Scanner 30. Step 13: {ay Data Exploration and Visualization ime Series Autocorrelation Plots from pandas import read_csv from matplotlib import pyplot from pandas.plotting import autocorre series = read_esv(‘/eontent/daily-min- header=0, index_col=0, Jation_plot parse_dates=True, squeeze=True) temperatures.csv’, autocorrelation_plot(series) pyplot.show() fron pandas ieport rend_csv roa matplotlib ieport pyplot fron pandas plotting isport autocorrelation plot series = read csv( "/content/daily-win-tenperatures.c5¥"y autocorrelation plot(series) prlet.shod) headers, index_cole®, porse datessTrue, squeezesTree) © scanned with Oken Scanner practical Exercises Example No.6: erg a form Data Analysis and representation i eas on a Map using vario n us, ith Mouse Rollover effect, user interaction, Multilayer interactive map Step1 : Folium supports GeoPandas have GeoDataFrame, create a multi-la Setup and Data %%capture ete. Creatin, built 8 Maps with multiple layers. Recent versions of IN Support to create interactive folium ‘maps from a er interactive map using 2 vector datasets. Download if ‘google.colab’ in str(get_ipython()): ‘apt install libspatialindex ‘pip install fiona shapely Pyproj rtree mapclassify pip install geopandas import os import folium dev from folium import Figure import geopandas as gpd data_folder = ‘data’ output_folder = ‘output’ if not os.path.exists(data_folder): os.mkdir(data_folder) if not os.path.exists(output_folder): os.mkdir(output_folder) step 2 import the data set def download(url): filename = os.path,join(data_folder, os.path.basename(url)) if not os.path.exists(filename): from urllib.request import urlretrieve local, _ = urlretrieve(url, filename) print(‘Downloaded * + local) filename = ‘karnataka.gpkg’ data_url = ‘https://github.com/spatialthoughts/python-dataviz-web/raw/main/ data/osm/’ download(data_url + filename) © scanned with Oken Scanner 32 Step 3 pata Exploration and Visa, Using GeoPandas explore() . hod to create an interactive folium map fro, rore() a folium object is ereated. We can sa rs to the map. th We can use the explore() met! tha GeoDataFrame. When call exp! oh object and use it to display or add more lay’ data_pkg_path = ‘data’ filename = ‘karnataka.gpkg’ path = os.path,join(data_pkg_path, filename) : : roads_gdf = gpd.read_file(path, layer="karnataka_highways ) ct file(path, layer="karnataka_districts’) districts_gdf = gpd.read_| state_gdf'= gpd.read_file(path, layer="karnataka’) m = districts_gdfexplore() bounds = districts_gdf.total bounds bounds i output array((74.05096229, 11.58237791, 78.58829529, 1 a 8.47673602}) The explore() function takes amagi m folium map to which to render the Cea we can supply an existing igure(width=800, height=4g9) the m= folium.Map() é © scanned with Oken Scanner practical Exercises 33 m.fit_bounds({{bounds| 1],bounds{OJ}, [bounds[3],bounds{2}]}) districts_gdf.explore(m=m) fig.add_child(m) Output mera. emengtennngagstaannn te gy teense Aistedets gtfserplore (mn) Step 5 Folium supports a variety of basemaps. Let’s change the basemap to use Stamen Terrain tiles. Additionally, we can change the styling using the color and style_kwds parameters. fig = Figure(width=800, height=400) m = folium.Map(tiles=’Stamen Terrain’) m.fit_bounds({{boundsf !],bounds{0]}, [bounds[3],bounds[2]]]) districts_gdf.explore( m=m, color="black’, style_kwds=f‘fillOpacity’: 0.3, ‘weight’: 0.5}, ) fig.add_child(m) Step 6 The GeoDataFrame contains roads of different categories as given in the ref column. Let’s add a category column use it to apply different styles to each category of the road. © scanned with Oken Scanner pata Exploration and Viguy, def get_category(row)* ref = str(row[‘ref"]) if ‘NH’ in ref: return ‘NH? elif ‘SH’ in ref: return ‘SH’ else: return ‘NA’ roads_gdff‘category’] = roads_gdf-apply(get_category, axis=1) roads_gdf* cme ena} + rut ge pat coer met) mage , SE es oe ie os) at cote tho eo FF AARTRINESTRONG (5 68801 1322165, 1585m9 yy =o ey 1 ee tna ak Sepiee-tgdaytees WON 80 F FP MATUNESTANCTS ODN DUR. Net af oat we Legge C2 ee emcamcomn. 2 eee an ey Ga OP te gs cuaemcapiiecioenens o 25 Metoet fC) fete Ca cu tcemcstensranetnae’ a muted as BS Sct eitcetebcivosanecnae’ 2 ie mre tae awemmeme Se ee ee igecececer ne os Smet Se eer Se eomereere ree Step 7 Create Multi-layer Maps When call explore() a folium object is created. You can save that object and add more layers to the same object. fig = Figure(width=800, height=400) m = folium.Map(tiles="Stamen Terrain’) m.fit_bounds({[bounds{1], bound: districts_gdf-explore( {H],bounds{0}}, (bounds{3},bounds{2]]}) m=m, color="black’ style_kwds={*fillOpacity’: 0.3, « name="districts’, : tooltip=False) roads_gdf.explore( ‘weight’:0,5}, © scanned with Oken Scanner practical Exercises 35 columi ‘category’, categories=[‘NH", ‘SH°], emap=[‘#1178b4", ‘#e3lale’], categorical=True, name=highways’ ) fig.add_child(m) fig.add_chtld(e) < Bsieng SPA tenet ono 8 Sperseoal sou © scanned with Oken Scanner 36 Example No Build cartograp various countries © asemap 7.1 Cartographic visualization from b: Step 1: Robinson Projection {pip install "basemap 1.3.0b1" from mpl_toolkits.basemap import BasemaP import numpy as np import matplotlib.pyplot as pit # lon_0 is central longitude of projection. # resolution = ’e” means use crude resolution coastlines. resolution="c*) m = Basemap(projection="robin’ ,lon_0 m.drawcoastlines() mJillcontinents(color=’red’ ,lake_color=" # draw parallels and meridians. m.drawparallels(np.arange(-90.,120.,30.)) m.drawmeridians(np.arange(0.,360.,60.)) m.drawmapboundary(fill_color="green’) plt.title(“Robinson Projection”) plt.show() coral”) © fron npl_teolkits.basenap inort sasemap Arocrt numpy 25 9p inport matplotlib.pyplot 3s plt @ lon_@ 42 central longitude of projection. @ rescluticn = ‘c' means use crude resolution coastlines. f= Basemap(projection= ‘robin’ ,1en_@«2,resolution='<*) n.dreucoastlines() a. fi11continents(colors"red* =*corai* S draw porellcts end meridions; hey n.drauparallels(np.erange(-90. ,120.,2¢.)) es. dranreridions(np.arange(e. 2 n.draanapboundary (Fi11_color=' green") plt.title("Rebinzon Prosection")] plt.showc) Robinson Project © scanned with Oken Scanner practical Exercises step 2: Gall-Peters Projection 37 from mpl_tootkits.basemap i f x s.basemap import Bas import matplotlib.pyplot as vt 7 map = Basemap() map.drawcoastlines() plt.show() plt.savefig(‘test.png") eo from mpl_toolkits bas: from a sbasemap import Ba: import matplotlib.pyplot as plt pane map = Basemap() map .drawcoastlines() pit.show() plt.savefig(‘test.png')
step 3: Draw great circle between NY and London. from mpl_toolkits.basemap import Basemap import numpy as np import matplotlib.pyplot as plt # create new figure, axes instances. fig=plt.figure() ax: ig.add_axes([0.1,0.1,0.8,0.8}) # setup mercator map projection. m = Basemap(llcrnrlon=-1 00.,llernrlat=20.,urcrnrlon=20.,urcrnrlat=60.,\ © scanned with Oken Scanner Data Exploration and Visuaizays n 38 .\ rsphere=(63781 37.00,6356752.3 142), resolution" ,projection="mere’.\ lat_0=40.,lon_| 20.,lat_ts=20-) # nylat, nylon are lat/lon of New York nylat = 40.78; nylon = -73.98 # lonlat, lonlon are lat/lon of London. Jonlat = 51.53; lonlon = 0.08 and London # draw great circle route between NY mdrawereatcircle(nylon,nylat,lonlon,tonlat, m.drawcoastlines() m.fillcontinents() # draw parallels m.drawparallels(np.arange(10,90,20),labels=[1,1,0,1]) # draw meridians m.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1]) ax.set_title(*Great Circle from New York to London’) plt.show() linewidth=5,color="r’) © ‘ros npl_toolkits.beseaap import sesenap Aroort munoy 3s "5 Arport matplotliv.pyplot as plt fo create new Figure, axes instences. figeplt. figured) 8. # nylat, nylon are at/lon of new York nylat = 29.78; nylon = -73.98 ‘= lontat, Lonlen are lat/1on of London. lonlat « $1.53; lonton « 2.03 t circle route between tty and Longon “amparallels(np. srange(10, 90, 20 12,8, nel (10,90, 20), 190e150[1,1,0,1]) m.drenmer idiens(np.arenge(-190, 180,30), Labels=£1,1,0, 1 n +380, 20), 11,1,0,1)) ax.set_title('Great Circle from New York to London’), setete tot > © scanned with Oken Scanner practical Exercises 39 step 4: Draw day-night terminator o import numpy as np from mpl_toolkits.basema import, matplotlib.pyplot a: from datetime import datet # miller projection map = Basemap(projection=" # plot coastlines, draw label 1 map.drawcoastlines() map.drawparallels(np.arange(-90,90,30),labe map.drawmeridians(np.arange(map,lonmin,map.lonmax-+30,60),labels=[0,0,0,1]) # fill continents ‘coral’ (with zorder=0), color wet areas ‘aqua’ map.drawmapboundary(fill_color="aqua’) map fillcontinents(color="coral’ lake_color="aqua’) # shade the night areas, with alpha transparency so the # map shows through. Use current time in UTC. date = datetime.utcnow() CS=map.nightshade(date) plt.title(‘Day/Night Map for %s (UTC)’ % date.strftime(“%d %b %Y %H:%M:%S”)) plt.show() nmap, P import Basemap s pit time mill’ Jon_0=180) meridians and parallels. [1,0,0,0]) ser ney 0 ° from mel_toolkits.Basenap ircort Basenap. Sicistetthagaic oe ‘toate soar aie othe poten Bis euptoefectlN ano) “ase contoys teal ards spares © scanned with Oken Scanner Data Exploration and Visualization Step 5: contour lines over filled continent background from mpl_toolkits.basemap import Basemap import matplotlib.pyplot as plt import numpy as np # set up orthographic map projection with # perspective of satellite looking down at # use low resolution coastlines. 50N, 100W. map = Basemap(projection="ortho’ at_0=45,lon_0=-1 00,resolution="1’) # draw coastlines, country boundaries, fill continents. map.drawcoastlines(linewidth=0.25) map.drawcountries(linewidth=0.25) map.fillcontinents(color="coral’ ,lake_color="aqua’) # draw the edge of the map projection region (the projection limb) map.drawmapboundary(fill_color=" green’) # draw lat/lon grid lines every 30 degrees. map.drawmeridians(np.arange(0,360,30)) map.drawparallels(np.arange(-90,90,30)) # make up some data on a regular lat/lon grid. nlats = 73; nlons = 145; delta = 2.*np.pi/(nlons-1) lats = (0.5*np.pi-delta*np.indices((nlats,nlons)){0,:,:]) lons = (delta*np.indices((nlats,nlons))[1,:,:]) wave = 0.754(np.sin(2.*lats)**8*np.cos(4.*lons)) mean = 0.5*np.cos(2.*lats)*((np.sin(2.*lats))**2 + 2.) # compute native map projection coordinates of lat/lon grid. x, y = map(lons*180./np.pi, lats*180./np.pi) # contour data over the map. cs = map.contour(x,y,wave+mean, | 5,linewidths=1.5) pit.title(*contour lines over filled continent background’) plt.show() © scanned with Oken Scanner Practical Exercises P fran Latiten grad Lined ress. ap dramariaians(np.arange(e, 269,293} pees droeparatteda(op.arange(-f6, 56,380) on a regu Lanrton eet lens © 148; ae] ‘ dana 0 Ce Sen °C (np, © censure native aap pragee Step 6: Mercator Projection from mpl_toolkits.basemap import Basemap import numpy as np import matplotlib.pyplot as plt # Ilernrlat,llcrnrlon,urcrnrlat,urcrnrlon, # are the lat/lon values of the lower left and upper right corners # of the map. # lat_ts is the latitude of true scale. # resolution = ’c’ means use crude resolution coastlines. m = Basemap(projection=’ mere’ Ilcrnrlat=-80,urcrnrlat=80,\ Iernrlon=-180,urcrnrlon=180,lat_ts=20,resolution= m.drawcoastlines() 41 © scanned with Oken Scanner 42 Data Exploration and Visualization m-filleontinents(color=' yellow’ Jake_color=aqua’) # draw parallels and meridians. m.drawparallels(np.arange(-90.,91.,30.)) m.drawmeridians(np.arange(-180.,181.,60.)) m.drawmapboundary(fill_color="green”) plt.itle(““Mereator Projection”) plt.show() fron apl_toolkits.basemap import Basenap import numpy as np import matplotlib.pyplot as pit @ Lcrarlat, lcrarion,urcenrlat,urcrarion * are the lat/lon values of the lower left and upper right corners ® of the map. # latts is the latitude of true scale. * resolution = ‘c' means use crude resolution coastlines. = Basenap(projections‘nerc' ,licrnrlate-s2,urcrarlatese,\ Licenrlons-180,urcrnrlons180, lat_ts=2@,resolutions'c') m.drancoastlines() n.fillcontinents(colora'yellon’ ,lake_colors" aqua’) # draw parallels and rerigians. n.dranparallels(np.arange(-90.,91.,30.)) A.dranneridians(np.arange(-180.,2181.,60.)) n.drannapboundary (Fill_colors' green’) plt.title("vercator Projection") plt.show() Mercator Projection © scanned with Oken Scanner Practical Exercises 7.2 visualization for multiple datasets Indi - Step 1: Installing GeoPandas and Shapely !pip install geopandas © bolo sostann gropints Looking tr indexes: httess ‘i © iecting geounaay AS M“OveLcRmee, bios futon, thate/la ates /licimer Onno ES 8.28.3 4.091. rcoe- aya (1.0 8) Re 08 6.0 M/s pirenent already sitistiel? shapes fearon ame SRS: SMeeepat.e in suse/tocalstibjertion t/4ist pack berber iced FS Pandas9-0.23.0 im fuse /Local/ brayenond Hie Oa E21 6517 cotta ny Linunit sae Atl (16.7 9) [er a collectieg prproyy2. 200 Pamboating psreo) 1.3.1 647 coa.nanylin20{0, eto 64.sh) (6.9.90) eres coldecting c1ag}9°0.5 Damlonsing Clig}-0.7.2-py3-rone-anyaatl (2-4 48) CoereTeg Acuege batisttess shosa.7 in fusrocatabreythoms sist packages fren fionyhvmsntan) (118) Downloating monch-2.5.0 py2.py2-none:any.wht (10 8) Nchoe4.0 in fusr/local/lib/pytnen).776ist- packages. (ow Flonyet 8 sgeogsrdas) (2.1.2) ecCiFL in fuse/ocal/Tib/pythond.7/dist-pachages. (Fron Flonadel, -ogecpantas) (1922.6.15) Requirement already satisfied: setuptools im /usr/local/libyoythons.1/aist packages (from Fiondd-hot ogeerondas) (51.8.0) Collecting CLick-pluginsyet Cownloating €1i¢ plugine 1.1.4 py2.py):Aee-anyoabl 8 48) Requirement already satisfied: attesscd? in fuse/local/lib/pytton.7/aist- packages (fron (on Roquieement already satisfied: numpy2e1:12.3 4a fuse/local/Lib/python)./dst-puchages ({Foe purdasy.25. gest eequtrement already satisfied: pytz>-2011.1 in fusr/Local/ibeython).F/dst-packapes (How f8°Ea090.25.0 rfropantis) (002-2 Requirement already satistied: python dateutil=2,7.3 in fuse lecal/Libygytton).27Gist- packages (Prom pevdanoo0.25.0 9gepanba) (2.8.2) Installing collectes packages: munch, c1ij, chick plugins, pypro}, fis, ceopandas Sucesstully installed eLick-plugins-1.1.1 C1igj-02.2 ona-1.8.21 geopantay0.10.2 mench-2.8.8 6 rom grceardas) (1.8.4) Backages (from gecoandus) (1) A-rgwopantas) (22.1.0) !pip install pyshp 4) tpip install pyshp i " pblictsinpley Looking in indexes: httes://nypi.ore/simale, bttps://us-ny.thon, okg de/colap-nheels/ou collecting pyshp Oonntoadin ! Installing collected packages: pyshp successfully installed pyshp-2.3-2 ny.whl (46 KB) } 46 kB 2.7 MB/s none Step 2 : Importing the libra import numpy as np import pandas as pd import matplotlib.pyplot as pit import seaborn as sns import geopandas as gpd import shapefile as shp from shapely.geometry sns.set_style(‘whitegrid’) import Point © scanned with Oken Scanner ae Data Exploration and Visualization fp = r’/eontent/india-polygon.shp’ map_df = epd.read_file(tp) map_df_copy = gpd.read_file(fp) map_df.head() Step 3 : Download the mapping data hon https://github.com/Princenihith/Maps_with_pyt Step 4: Load the data [6] fp = r'/content/india-polygon. shp’ nap_df = gpd.read_file(fp) map_df_copy = gpd.read_file(fp) nap_df.head() id st_pa geonetry 7 0 None Andaman and Nicobar Islands MULTIPOLYGON (((93.84831 7.24028, 93.92705 7.0... 4 None ‘Arunachal Pradesh POLYGON ((95.23643 26,68105, 95.19594 27,03612... 2 None Assam POLYGON ((95.19594 27.03612, 95.08795 26.94578... 3 None Bihar POLYGON ((88.11357 26.54028, 88 28006 26 37640... 4 None Chandigarh POLYGON ((76.4208 30.76124, 76.83758 30.72552... © scanned with Oken Scanner practical Exercises 45 step 5: Plotting the Shapefiles map_df-plot() [7] map_df.plot() a fi matplotlib.axes._subplots.AxesSubplot at 0x7fabofs3bcdo> nb 8s & BS w eld iv & ae Ree a wie eC << Ss . & Step 6 : Adding better data insights into the map Is_ df= pd.read_csv(‘/content/globallandslides.csv’) pd.set_option(‘display.max_columns’, None) \s df= Is_dffls_df.country_name=="India”] Is df[*Year"] = pd.to_datetime(Is_dflevent_date"]).dt.year \s_ df= Is_df[ls_df.landslide_category=="landslide”] \s df[“admin_division_name”].replace(“Nagaland”, "Nagaland”,inplace = True) Is df[‘admin_division_name”]-replace(“Meghalaya”, Meghalaya”,inplace = True) \s df[“admin_division_name”].replace(“Tamil Nadu”, "Tamil Nadu” inplace = True) Is df[‘admin_division_name”].replece(“Karnataka”, ”Karnataka”,inplace = True) \s dff“admin_division_name”].replace(“Gujarat”, Gujarat” inplace = True) Is d“admin_division_name”]-replace(“Aruniichal Pradesh”, Arunachal Pradesh” inplace = True) state_df = Is_df[“admin_division_name”] .value_counts() state_df = state_df.to_frame() state_df.reset_index(level=0, inplace=True) state_df.columns = [‘State’, *Count’] state_df.at{15,”Count”] = 69 State_df.at(0,”State”] = "Jammu and Kashmir” state_df.at[20,”State”] = Delhi” State_df.drop(7) © scanned with Oken Scanner 46 90 ° 1 2 3 4 6 6 8 ° 10 " 2 0 1“ 15 16 7 18 oT 20 at 2 23 2 25 26 a 20 Jammu and Kashmir Utterknand Himachal Pradesh ‘Assam Nagaland Manoresnire Manipur Korat Arunachol Pradesh ‘Tamil Nadu Kamateke ‘Sikkim Meghalaya Maoram ‘West Bengal Goa Andhra Predesh Rojastnan Odisna Doint NeT Topura Haryana, Gujarat Uttar Pradesh State of Odisha ‘Madnya Pradesh Binar pata Exploration and Visualizatig, zANUNNBYOD Step 7: Merge the data merged = map_df.set_index(‘ merged{‘Count’] = merged[‘Count’] merged.head() St_nm’),join(state_df.set_index(State’)) replace(np.nan, 0) © scanned with Oken Scanner Practical Exercises 47 af sk) ooln¢state se Indentstate") serged.nenay) |” pherehed oa) “ ceometry come Zt sm ‘Andaman and Wiecberalanés None MULTIFOLYGON (0304831 724028,000270570.. 00 ‘Arunachal Pradesh None POLYGON (6 29089 20 60108,05.0804 2703012.. 400 ‘Assam None POLYGON (5.10604 27.0362, 9508708 2604878. 740 Bae None POLYGON (0.11957 2654028, 082000626 37640.. 10 Chandigarh Nove POLYGON (78.4208 9076124, 7.00788 3072882... 00 Step 8 : Plotting the data on the Shapefile fig, ax = plt.subplots(1, figsize=(10, 10)) ax.axis(‘off”) ax.set_title(‘Number of landslides in India state- wise’, fontdict={‘fontsize’: °20', *fontweight’ : ’10'}) # Plot the figure merged.plot(column=’Count’,cmap="Y1OrRd’, linewidth=0.8, ax=ax, edgecolor="0", legend=True,markersize=[39.739192, - 104.990337], legend_kwds=({‘label’: "Number of landslides”}) in eater of tne tea sant, mete ate 9 Fintan Cor sme VOR, ROE A ta gee apne snr, ESN gp ee i Nee Sea pet nen sats ansont a ete ‘Number of landslides in indi sta © scanned with Oken Scanner 48 Data Exploration and Visualizatig Example No.8. Perform EDA on Wine Quality Data Set. -Wi Download the datase ttps://github.com/aniruddhachoudhury/Red ‘Wine- Quality/blob/master/winequality-red.csv winequality-red.csv 7 gare: = Meet ect Perit fom pened tn! | oy ee pT) tase Sie ese) oe ow dS PP 22 ston s pa conten ras ot Mart ec fm "8 Jrowenee PETG Ss EBT ER enacts Gov WA ee aimee OT Lom \ Sew a newt sa - F ¢ 1 fueeeey tsa 2 a a 3 a ‘ ‘ Fy sl 3 u 2 ‘ a xl_asmn| asi ose] 9a] 7 al ‘ol_agn| stl ose! od ‘ 2 ‘aol eae ) x saf_ageet] 33] oa! ’ 7 iol — esa Pi lass] 339] oa) 2a on toon rl ‘al ans] se] 057] 3 ua os] aad as) con] 7 rot _eovnl as] eal ans 24 est] cel 16] sf lay] aa oss a3 ase ea] oon a inl_osen] 33st asf ans 4" rT ons] 1g oc] ae sf _esul ssf os st B_ coe] 1 oun oI Bl evmd aa asa 6 aa} oul as) anf BY 16) eved] 3a] os] 93 nts cal eu] as) ant ai aloes! 317] omit af us en] —ostd tal con a] talent asl on) tos) 7] sl as ol 1} onal ro saga aul ant a »_ esi] eo eo zl_amnl ant of 7d aa en} ost] 1a) esa] v lav) aca) neal oa 2 cz] eusl nal oor ol oval an! os! oad __d 3 onl al asa 2 7 evel in ent > xg ea] eat) ae a_ayve 31] esil ss ——— a eer Step 1: Importing some essential libraries in Python, import numpy as np import pandas as pd import matplotlib.pyplot as plt {2}, import numpy as np import pandas -as-pd import matplotlib.pyplot as pit import seaborn as sns © scanned with Oken Scanner practical Exercises Step 2: read data set = train_df = pd.read_esv(“/contentiwine lity. ” train_df.sample(6) ne © taint Ween enc nner te trite) em) c {tt wy te ey cade ards tt len tet ey IAs ty wm " om om no mo wo tme se se nm § “on wm mono x no tw sn tw nom 4 a er ey oo tm 1% sum 5 wom oak am ws wy vies a9 yar mom om aes a ayn at sams Cr 2 ss twess dam 4 z Step 3:checkull element in the data train_df.isnull().sum() & trains. isnull()-sum() fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol it qe te: intes 00H H9HHHKHDOD lues Step 4: statistical summary, excluding NaN va train_df.describeQ) © scanned with Oken Scanner 50, © vanercncerme osty od amr 70 tom Magagradd i Step 5: Exploratory Data Analysis plt.figure(figsize=(15,15)) setae 0 ow000 sere om fev at sea te setty ee stmates eat sown roma Data Exploration and Visualization ris eewes tam caeo wm osme0 ma cos00 HNO TSMR Isr arm onarer 3309 OEE DEW try, sevwsce center 0yS0N8 | ONEHOT 10S or toomce —owwore «27450003009 FEO ay rome —ommuco 32000 OMNENDD NEED Sees a a ra ‘evorene omrins = 2uecom ona Aeag ‘iooron vensno «012000 200000 490m A sns.heatmap(train_df.corr(),color = "K”, annot=True) Step:6 Quality of the Wine(from the Data) The following features are relatively correlated: total sulfur dioxide with free sulfur dioxide; fixed acidity with density and citric acid; alcohol with quality. The following features are inversely correlated: fixed acidity with pH citric acid with pH and volatile acidity © scanned with Oken Scanner practical Exercises 51 © ens. countplot(xs ‘quality’, datactrain f) emotplotlib.aces-_subplots .AxesSubplot at Ox7fbc10562¢507 ost oS $38 8 88 3 ausiny eo sns.countplot(x="alcohol’, data=train_df) Cc cmatplotlib.axes._subplots.Axessubplot at Ox7fbc10b885d0> Step 7: Different Plots s between fixed acidity a' nd quality Relation: plt.figure(figsize=(15.5)) sns.swarmplot(x= quality”, ¥ plt.title(‘fixed acidity and quality’) ="fixed acidity” , data = train_df) © scanned with Oken Scanner 2 Data Exploration and Visualizatic rift aay ate «eta site neta walt) ‘mann. gins wan Hof coe He mE secre cg, rare) rere p26: Wg 3.1 te ple comet pee yo nym cent i) ety ot att) on td ey | Relations between Relations between alcohol & chlorides sns.swatmplot(x= “alcohol”, y="chlorides” , data = train_df) plt.title(‘Relations between alcohol & chlorides’) Aelatons betevenakahal & londes Step 8: Plot Step 8.1 :Relations between fixed acidity and quality, plt.figure(figsize=(15,5)) sns.boxplot(x="quality” “fixed acidity”, data=train_df ) © scanned with Oken Scanner Practical Exercises » Yerfbees eteity",—tantrainet ) 53 sositolets Avesttptot at eeftcettocn Step 8.2: Relationship between alcohol & chlorides plt.figure(figsize=(15,5)) sns.boxplot(x="alcohol”, y=”chlorides”, data=train_df ) © pit. figure tigstzea(as,5)) sns.boxplot(xe"elcohol”, yeTenlorices", dateateain.6f ) Ce erstolethtb.exes._sunplots axessubplot at ex7fber000"edd> oe . os | whe : po | " olf tit i i Atha sito J train_ df-groupby(‘quality’)[‘fixed acidity’].mean().plot.line) pit. ylabel(“fixed acidity”) 4 .groupby( equality!) Fixed pcioity’ Jemean()-plot.2ine() 22 teeter encetined oetalty") qext(o, ous, “fixed sesaity’? © scanned with Oken Scanner El Data Exploration and Visualization Step 8.3: Relations between alcohol and chloride plt.figure(figsize=(10,4)) sns,barplot(x="alcohol”, y="chlorides”, data=train_df) [aay pit. figure tiestzend Shs berplot(xeslechol, coteetratnot ) snvtotorni.tnes,svoters Aresucelot a ex cettoese» h lac Step 8.4: Relations between volatile acidity and ee plt.figure(figsize=(10,4)) sns.barplot(x="quality”, y="volatile acidity”, data=train_df ) ne tinwetesiesnne ° ‘sns.barplott yervoletile ecidity", ¢atastrain_cf ) Step 8.5:Relations between quality and volatile acidity train_df.groupby(‘quality’)[‘volatile acidity’ ].mean().plot.line() plt.ylabel(“volatile acidity”) © trainer, serounby pitayiepel voles 1, ‘volatile sereity") sty" )C-volnttte actotty"]menn() plot Dine() etetty") —_— © scanned with Oken Scanner Practical Exercises 55 Step 8.6: Relation between quality and sulphates plt.figure(figsize=(10,4)) sns.barplot(x="quality”, y= ulphates”, data=train_df) Step 8.7: Group by: train_df.groupby(‘quality’)[‘sulphates' ].mean().plot.line() plt.ylabel(“sulphates”) @ train_dé.groupby( ‘quality')[ "sulphates '].mean().plot-1ine() plt.ylabel (“sulphates”) CG Texte, @.5, ‘sulphates*) 77s 750 72s 0700 067s 060 oszs 0600 os7s. Step 8.8: Realtion between quality and sulphates sns.boxplot(x="quality”, y="sulphates”, data=train_df ) © sns.boxplot(xe"quality", ys"sulphates", datastrain_of ) tplotlib.axes._subplots.axesSubplot at Ox7fbcof20Ff1e> © scanned with Oken Scanner

You might also like