0% found this document useful (0 votes)
12 views6 pages

Unit 3 Advance Python

The document provides an overview of using Python for artificial intelligence, focusing on handling CSV files, libraries like NumPy and Pandas for data manipulation, and the Scikit-learn library for machine learning. It explains how to read and write CSV files, create and manipulate data structures like Series and DataFrames, and implement machine learning algorithms such as K-Nearest Neighbors. Additionally, it covers the process of splitting datasets, training models, and evaluating their accuracy.

Uploaded by

jk6424540
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Unit 3 Advance Python

The document provides an overview of using Python for artificial intelligence, focusing on handling CSV files, libraries like NumPy and Pandas for data manipulation, and the Scikit-learn library for machine learning. It explains how to read and write CSV files, create and manipulate data structures like Series and DataFrames, and implement machine learning algorithms such as K-Nearest Neighbors. Additionally, it covers the process of splitting datasets, training models, and evaluating their accuracy.

Uploaded by

jk6424540
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

USING PYTHON IN ARTIFICIAL INTELLIGENCE

UNDERSTANDING CSV file (Comma Separated Values)

 CSV files are delimited files that store tabular data (data stored in rows and columns).
 Each line in a csv file is a data record. Each record consists of more than one fields(columns).
 Working with csv file in python.

 importing library import csv

 Opening in reading mode file= open(“student.csv”, “r”)

 Opening in writing mode file= open(“student.csv”, “w”)

 closing a file file.close( )

 writing rows wr=csv.writer(file) wr.writerow( [ 12, “Kalesh”, 480] )

 Reading rows details = csv.reader(file ) for rec in details: print(rec)

Sample Program-10

Write a Program to open a csv file students.csv and display its details

INTRODUCING LIBRARIES

 A library in Python is refers to a collection of reusable modules or functions that provide specific
functionality
 For example, the "math" library contains numerous functions like sqrt(), pow(), abs(), and sin(),
which facilitate mathematical operations and calculations.
 To utilize a library in a program, it must be imported by using "import math" statement.
 This allows us to access and utilize the functionalities provided by the math library.

NUMPY

 NumPy, which stands for Numerical Python, is a powerful library in Python used for numerical
computing.
 NumPy provides the ndarray (N-dimensional array) data structure, which represents arrays of any
dimension. These arrays are homogeneous (all elements are of the same data type) and can
contain elements of various numerical types (integers, floats, etc.)
 NumPy can be installed using Python's package manager, pip. pip install numpy
 Creating a Numpy Array –
PANDAS ("Panel Data", and "Python Data)

 The Pandas library is a Python tool for data analysis and manipulation.
 Pandas is well suited for working with tabular data, such as CSV or SQL tables
 Pandas can be installed using: pip install pandas
 Pandas generally provide two data structures for manipulating data, they are: Series and
DataFrame.

Series

 A Series is a one-dimensional array containing a sequence of values of any data type (int, float,
list, string, etc.) which by default have numeric data labels starting from zero. The data label
associated with a particular value is called its index. We
can also assign values of other data types as index. A
Series is a one-dimensional array containing a
sequence of values of any data type (int, float, list,
string, etc.)
 Default datatype is numeric data starting from zero.
 The data label associated with a particular value is
called its index.
 We can also assign values of other data types as index.
Create a simple Pandas Series from a list:

import pandas as pd
a = [‘Mark’, ‘Justin’, ‘John’,’Vicky’]
myvar = pd.Series(a)
print(myvar)

DataFrame

 DataFrame is a 2 dimensional data structure like a table.


 It contains rows and columns, and therefore has both a row and column index.
 Each column can have a diEerent type of value such as numeric, string, boolean, etc., as in
tables of a database.

Creation of DataFrame

import pandas as pd

lst = [10,20,30,40,50]
df = pd.DataFrame(lst)
print(df)
Dealing with Rows and Columns

import pandas as pd
data = [ [90, 92, 89, 81, 94], [91, 81, 91, 71, 95], [97, 96, 88, 67, 99] ]
columns = ['Rajat', 'Amrita', 'Meenakshi', 'Rose', 'Karthika']
index = ['Maths', 'Science', 'Hindi']
Result = pd.DataFrame(data, index=index, columns=columns)
print(Result)

Working with Data frame

 Accessing DataFrame Elements Result.loc[‘Science’]


 Adding a New Row to a DataFrame: Result[‘Fathima’]=[89,78,76]
 Adding a New Row to a DataFrame:- Result.loc[‘English’]=[90,92,89,80,90,88]
 Deleting Rows from a DataFrame: Result.drop(“Hindi’,axis=0)
 Deleting Columns from a DataFrame: : Result.drop([‘Rajat’, “Meenakshi’, “Karthika’],axis=1)

Understanding Missing Values

 Missing Data or Not Available data can occur when no information is provided.
 In DataFrame it is stored as NaN (Not a Number).

Attributes of DataFrames

 Displaying Row Indexes - Teacher.index


 Displaying column Indexes - Teacher.columns
 Displaying datatype of each - Teacher.dtypes
 Displaying data in Numpy Array form - Teacher.values
 Displaying total number of rows and
columns (row, column) - Teacher.shape
 Displaying first n rows (here n = 2) - Teacher. head (2)
 Displaying last n rows (here n = 2) - Teacher. tail (2)

Importing a CSV file to a DataFrame


Exporting a DataFrame to a CSV file
import pandas as
df = pd.DataFrame(data, index=index)
df = pd.read_csv('results.csv', index_col='Subject')
df.to_csv('results.csv', index_label='Subject')
print(Result)
Scikit-learn (Learn)
 Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python.
 Sklearn is built on NumPy, SciPy and Matplotlib.

Key Features:

 OEers a wide range of supervised and unsupervised learning algorithms.


 Provides tools for model selection, evaluation, and validation.
 Supports various tasks such as classification, regression, clustering, dimensionality
reduction, and more.
 Install scikit-learn ` pip install scikit-learn

load_iris (In sklearn.datasets)

 The Iris is widely used dataset for classification tasks.


 It comprises measurements of various characteristics of iris flowers, such as sepal length, sepal
width, petal length, and petal width, along with the corresponding species of iris to which they
belong.
 The dataset typically includes three species: setosa, versicolor, and virginica.

from sklearn.datasets import importing iris dataset


load_iris
iris = load_iris() calls the “load_iris()” function to load the iris dataset

X = iris.data X is a variable and assigned as feature vector. The


feature vectors contain the input data for the machine
learning model

y= iris.target Y is a variable and assigned as target variable. The target


variable contains the output or the variable we want to
predict with the model.
Sample output – First 10 rows of X

Here, each row represents a sample (i.e., an iris flower), and each column represents a feature (i.e., a
measurement of the flower). For example, the first row [ 5.1 3.5 1.4 0.2] corresponds to an iris flower with
the following measurements:
 Sepal length: 5.1 cm
 Sepal width: 3.5 cm
 Petal length: 1.4 cm
 Petal width: 0.2 cm

train_test_split (In sklearn.model_selection)

 Datasets are usually split into training set and testing set. T
 he training set is used to train the model and testing set is used to test the model.
 Most common splitting ratio is 80: 20. (Training -80%, Testing-20%)
from sklearn.model_selection import importing train_test_split
train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)
X_train, y_train the feature vectors and target variables of the training set
respectively.
X_test, y_test the feature vectors and target variables of the testing set
respectively.
test_size = 0.2 specifies that 20% of the data will be used for testing, and
the remaining 80% will be used for training.
random_state = 1 Ensures reproducibility by fixing the random seed. This
means that every time you run the code, the same split
will be generated.

KNeighborsClassifier (In sklearn.neighbors)

Scikit-learn has wide range of Machine Learning (ML) algorithms which have a consistent interface for
fitting, predicting accuracy, recall etc. Here we are going to use KNN (K nearest neighbors) classifier.
from sklearn.neighbors import importing KneighboursClassifier
KNeighboursClassifier (type of supervised learning algorithm used for classification
tasks.)
knn = KNeighborsClassifier(n_neighbors we create an instance of the KNeighborsClassifier class .
=3) n_neighbors = 3 indicates that the classifier will consider the
3 nearest neighbors when making predictions. This is a
hyperparameter that can be tuned to improve the
performance of the classifier.
knn.fit(X_train, y_train) trains the KNeighborsClassifier model using the fit method.
it constructs a representation of the training data that allows
it to make predictions based on the input features.
y_pred = knn.predict(X_test) The knn object contains the trained model, make
predictions on new, unseen data.

Calculating Accuracy- metrics

This calculates the accuracy of the model by comparing the predicted target values (y_pred) with the
actual target values (y_test). The accuracy_score represents the proportion of correctly predicted
instances out of all instances in the testing set.
from sklearn import metrics
Accuracy = metrics.accuracy_score(y_test, y_pred))
Now, to validate the model's predictive accuracy, we can use some sample data.
sample = [[5, 5, 3, 2], [2, 4, 3, 5]]
preds = knn.predict(sample)
for p in preds:
pred_species.append(iris.target_names[p])
print("Predictions:", pred_species)

You might also like