0% found this document useful (0 votes)
11 views6 pages

ML Assigment 1

machine learmih

Uploaded by

927623mca060
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

ML Assigment 1

machine learmih

Uploaded by

927623mca060
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

DEPARTMENT OF

MASTER OF COMPUTER APPLICATIONS

SELF STUDY ASSIGNMENT – 1

MCB1724– MACHINE LEARNING USING

Name : KAVIN N

Register No.: 927623MCA022

Year / Sec : I / -MCA

Marks Awarded:

Technology Conclusion
Descriptive Solution
Objective Problem and Results
and a nd Total
Headline Analysis Statistics References
(2) (3) (4) (20)
(4)
(4) ( 3)
Assignment Topic CO PO addressed BTL Level
INTRODUCTION TO POPULAR
PO1, PO2, PO3, BTL 4
PYTHON LIBRARIES FOR ML: CO1,
PO5,PO6,PO7,PO8,PO9,P
NUMPY, PANDAS, CO2
O11,PO12
MATPLOTLIB

TITLE: INTRODUCTION TO POPULAR PYTHON LIBRARI E


S FOR ML

(NU MPY,PANDAS,MATPLOTLIB)

Objective:
NumPy:

NumPy aims to provide efficient numerical operations on large arrays and matrices. It
facilitates mathematical and logical operations on arrays, along with a vast collection of high-
level mathematical functions to operate on these arrays.

pandas:

The main objective of pandas is to provide easy-to-use data structures and data analysis tools
for Python. It simplifies the process of working with structured data, such as tabular data and
time series data, by offering powerful data manipulation and analysis capabilities.

Matplotlib:

Matplotlib's primary objective is to create high-quality static, animated, and interactive


visualizations in Python. It enables users to generate a wide variety of plots and charts to
explore and communicate data effectively, facilitating data visualization tasks in scientific
computing and data analysis.

Problem Analysis:

NumPy:

Before NumPy, numerical computations on large datasets in Python were inefficient and
slow. Standard Python lists lack the ability to perform vectorized operations, leading to
lengthy for-loops or list comprehensions for even basic operations. This inefficiency was a
significant bottleneck for scientific computing tasks, such as linear algebra operations, signal
processing, and statistical analysis.
pandas:

Working with structured data, such as CSV files or database tables, in Python often required
writing complex code and using multiple libraries. Without pandas, tasks like loading,
cleaning, transforming, and analyzing tabular data were cumbersome and error-prone.
Standard Python data structures like lists or dictionaries lacked the functionality and
expressiveness needed for efficient data manipulation and analysis.

Matplotlib:

Before Matplotlib, creating high-quality visualizations in Python was challenging and


required stitching together various low-level plotting functions and libraries. There was no
comprehensive plotting library that provided a wide range of plotting options and
customization features. As a result, data scientists and researchers spent significant time and
effort on creating and fine-tuning plots, hindering the exploration and communication of data
insights.

Solution and Results:

NumPy:

Solution: NumPy addresses the inefficiency of numerical computations in Python by


introducing the ndarray, a powerful n-dimensional array object. It provides vectorized
operations, allowing mathematical operations to be performed on entire arrays at once,
eliminating the need for explicit looping. Additionally, NumPy offers a vast collection of
mathematical functions optimized for array operations, including linear algebra, Fourier
transforms, and random number generation.

Results: With NumPy, developers can write concise and efficient code for numerical
computations, significantly speeding up the execution of scientific computing tasks and
machine learning algorithms. The ability to perform vectorized operations on large arrays
enables faster data processing and analysis, leading to improved productivity and
performance in data-driven applications.

pandas:

Solution: pandas simplifies working with structured data in Python by introducing two main
data structures: DataFrame and Series. DataFrame represents tabular data with rows and
columns, similar to a spreadsheet or SQL table, while Series represents a one-dimensional
labeled array. pandas provides a wide range of functions for data manipulation, including
indexing, filtering, grouping, and aggregation, as well as handling missing data and
time series data.
Results: Using pandas, developers can load, clean, transform, and analyze datasets with ease,
streamlining the data preprocessing and exploration process in machine learning workflows.
The intuitive API and powerful functionality of pandas enable faster iteration and
experimentation, leading to more robust and accurate machine learning models.

Matplotlib:

Solution: Matplotlib offers a comprehensive plotting toolkit for creating static, animated, and
interactive visualizations in Python. It provides a MATLAB-like interface for generating a
wide variety of plots and charts, including line plots, scatter plots, bar plots, histograms, and
more. Matplotlib allows fine-grained control over every aspect of the plot, such as colors,
labels, axes, and annotations, enabling users to create publication-quality visualizations
tailored to their specific needs.

Results: By using Matplotlib, developers can effectively explore and communicate data
insights through visualizations, enhancing understanding and interpretation. Whether it's
exploring data distributions, comparing trends, or presenting model performance, Matplotlib's
flexibility and customization options empower users to create informative and compelling
visualizations that drive decision-making and insight generation.

Technology and methodology:

NumPy:

Technology: NumPy is primarily built using the Python programming language, but it
heavily relies on optimized, low-level libraries written in languages like C and Fortran, such
as BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package).
These libraries provide efficient implementations of mathematical functions and operations.

Methodology: NumPy follows an array-oriented computing methodology, where


mathematical operations are applied to entire arrays rather than individual elements. This
approach enables efficient vectorized computation and facilitates numerical analysis and
manipulation of large datasets.

pandas:

Technology: pandas is also written in Python and builds upon the NumPy library. It leverages
the fast and efficient array operations provided by NumPy, along with additional features
implemented in Python. pandas may also utilize libraries like Cython for performance
optimization when dealing with large datasets.

Methodology: pandas follows a methodology centered around data manipulation and


analysis. It provides data structures like DataFrame and Series, which allow users to work
with structured data in a tabular format. pandas emphasizes ease of use and productivity,
offering high-level functions for data cleaning, transformation, and analysis.
Matplotlib:

Technology: Matplotlib is implemented in Python and makes extensive use of NumPy for
numerical computation. It does not rely on external libraries for basic plotting functionalities.
However, for certain advanced features or specific plot types, Matplotlib may interface with
other libraries or tools.

Methodology: Matplotlib follows a methodology of data visualization, providing a flexible


and customizable plotting toolkit. It allows users to create a wide range of static, animated,
and interactive visualizations to explore and communicate data insights effectively.
Matplotlib supports both procedural and object-oriented approaches to plotting, catering to
different user preferences and requirements.

Statistical data:

NumPy provides functions for various statistical calculations, such as computing measures
of central tendency (mean, median, mode), dispersion (standard deviation, variance), and
percentiles. These functions operate efficiently on NumPy arrays, making them suitable for
large datasets.

import numpy as np

data = np.array([1, 2, 3, 4, 5])

mean = np.mean(data)

median = np.median(data)

std_dev = np.std(data)

pandas:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})

summary_stats = df.describe()

quantile = df['A'].quantile(0.5) # Median

Matplotlib:

import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 3, 3, 4, 4, 5]

plt.hist(data, bins=5)

plt.xlabel('Value')
plt.ylabel('Frequency')

plt.title('Histogram of Data')

plt.show()

Conclusion:

NumPy facilitates efficient numerical operations with its array-oriented computing. pandas
simplifies data manipulation and analysis through its DataFrame and Series structures.
Matplotlib provides a flexible platform for creating a wide range of visualizations to explore
and communicate data insights effectively. Together, these libraries form a powerful trio that
empowers users to tackle data-driven challenges with ease and clarity in Python.

References:

1.https://www.almabetter.com/bytes/tutorials/python/popular-python-libraries

2.https://towardsdatascience.com/top-5-machine-learning-libraries-in-python-e36e3e0e02af

You might also like