0% found this document useful (0 votes)
7 views5 pages

Unit II 01 Course Work

Uploaded by

victor.seelan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Unit II 01 Course Work

Uploaded by

victor.seelan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT II

Computational Foundations – Key Topics

 Basic Python Programming:


Introduction to Python using Jupyter Notebooks, focusing on core
syntax, data structures, control flow, and writing scripts for data
tasks.

 Scientific Computing Libraries:


Usage of packages like NumPy (numerical
computations), SciPy (scientific methods), and Matplotlib (data
visualization), enabling efficient statistical analysis and plotting.

 Data Preprocessing Techniques:


Techniques to handle missing values, label encoding, one-hot
encoding, and data standardization. These steps are crucial to
prepare raw data for analysis and modeling.

 Data Wrangling:
Using Pandas for manipulation and cleaning of datasets—merging
tables, filtering, transformation of columns, and restructuring data
frames.

 Machine Learning Basics:


Introduction to classical machine learning methods using Scikit-
learn, including workflow for model fitting, prediction, evaluation,
and selection of appropriate algorithms.

1. Introduction to Python for Data Science


 Installing Python and Jupyter Notebooks
 Introduction to Python Syntax
 Variables, Data Types, Control Structures (loops, conditionals)
 Functions and Modules
 File Handling

2. Scientific Computing with Python


 NumPy:
o Arrays and Array Operations
o Indexing, Slicing, and Broadcasting
o Linear Algebra and Random Number Generation
 SciPy:
o Mathematical and Statistical Functions
o Optimization and Integration

3. Data Visualization
 Matplotlib:
o Line, Bar, Scatter, Histogram, and Box Plots
o Plot Customization (titles, labels, legends)
4. Data Preprocessing Techniques
 Handling Missing Data:
o Imputation Methods (mean, median, mode)
o Deletion Methods
 Encoding Categorical Data:
o Label Encoding
o One-Hot Encoding
 Feature Scaling:
o Standardization (Z-score)
o Normalization (Min-Max)

5. Data Wrangling Using Pandas


 DataFrames and Series
 Importing/Exporting Data (CSV, Excel)
 Merging, Grouping, Filtering, and Aggregating Data
 DateTime Processing

6. Introduction to Machine Learning with Scikit-Learn


 Overview of Supervised and Unsupervised Learning
 Basic Models:
o Linear Regression
o Logistic Regression
o K-Nearest Neighbors
 Model Evaluation Metrics:
o Accuracy, Precision, Recall, F1 Score

Introduction to Python using Jupyter Notebooks:


There are several popular Python programming notebook platforms that offer
excellent environments to learn, practice, and explore Python effectively. Jupyter Notebook
is a widely-used open-source web application that allows users to create and share documents
containing live code, equations, visualizations, and narrative text. It supports over 40
programming languages and is ideal for tasks such as data analysis, machine learning, and
visualization.

Google Colab is a free cloud-based notebook environment developed by Google that


runs on Jupyter. It requires no installation and provides free access to GPUs/TPUs, along with
seamless integration with Google Drive.

Kaggle Kernels is another online platform tailored for data science and machine
learning projects. It offers preloaded datasets, GPU support, and a collaborative cloud
environment.
Deepnote is designed for collaborative data science work, enabling real-time
collaboration and version control, making it suitable for teams.

Visual Studio Code (VS Code) Notebooks allows users to work with Jupyter
notebooks directly within the IDE, benefiting from rich extensions and powerful debugging
tools.

Azure Notebooks, hosted by Microsoft, is a cloud-based Jupyter service that supports


Python execution with features like integration with Azure services and collaboration tools.

Lastly, Binder is a tool that converts GitHub repositories into interactive, shareable
Jupyter notebooks that can run directly in the cloud without any setup. Each of these
platforms caters to different needs, from beginners to advanced users, and collectively they
provide robust tools for learning Python, data science, and machine learning.

Some of the environments to run Python programs:

 IDLE (Integrated Development and Learning Environment): Built-in with


Python installation and Simple and lightweight; ideal for beginners
 PyCharm: Full-featured Python IDE by JetBrains and Ideal for software
development and debugging
 Spyder: Scientific Python Development Environment and Integrated with
Anaconda; good for data science tasks
 Anaconda Navigator: GUI for managing Python environments and tools like
Jupyter and Spyder and Excellent for data science workflows
 Thonny: Simple IDE designed for beginners. Easy to use with built-in
debugger
 Replit: Online IDE for running Python in a browser. Useful for quick tests and
collaborative coding
 Terminal / Command Line (CLI): Run Python scripts directly using python
filename.py. Useful for scripting and automation tasks
Variables in Python
Definition

A variable is a named storage location in memory used to hold a value. In Python, a variable
is created automatically when you assign it a value; there is no need for explicit declaration of
type because Python is dynamically typed.

You might also like