UNIT II
Computational Foundations – Key Topics
Basic Python Programming:
Introduction to Python using Jupyter Notebooks, focusing on core
syntax, data structures, control flow, and writing scripts for data
tasks.
Scientific Computing Libraries:
Usage of packages like NumPy (numerical
computations), SciPy (scientific methods), and Matplotlib (data
visualization), enabling efficient statistical analysis and plotting.
Data Preprocessing Techniques:
Techniques to handle missing values, label encoding, one-hot
encoding, and data standardization. These steps are crucial to
prepare raw data for analysis and modeling.
Data Wrangling:
Using Pandas for manipulation and cleaning of datasets—merging
tables, filtering, transformation of columns, and restructuring data
frames.
Machine Learning Basics:
Introduction to classical machine learning methods using Scikit-
learn, including workflow for model fitting, prediction, evaluation,
and selection of appropriate algorithms.
1. Introduction to Python for Data Science
Installing Python and Jupyter Notebooks
Introduction to Python Syntax
Variables, Data Types, Control Structures (loops, conditionals)
Functions and Modules
File Handling
2. Scientific Computing with Python
NumPy:
o Arrays and Array Operations
o Indexing, Slicing, and Broadcasting
o Linear Algebra and Random Number Generation
SciPy:
o Mathematical and Statistical Functions
o Optimization and Integration
3. Data Visualization
Matplotlib:
o Line, Bar, Scatter, Histogram, and Box Plots
o Plot Customization (titles, labels, legends)
4. Data Preprocessing Techniques
Handling Missing Data:
o Imputation Methods (mean, median, mode)
o Deletion Methods
Encoding Categorical Data:
o Label Encoding
o One-Hot Encoding
Feature Scaling:
o Standardization (Z-score)
o Normalization (Min-Max)
5. Data Wrangling Using Pandas
DataFrames and Series
Importing/Exporting Data (CSV, Excel)
Merging, Grouping, Filtering, and Aggregating Data
DateTime Processing
6. Introduction to Machine Learning with Scikit-Learn
Overview of Supervised and Unsupervised Learning
Basic Models:
o Linear Regression
o Logistic Regression
o K-Nearest Neighbors
Model Evaluation Metrics:
o Accuracy, Precision, Recall, F1 Score
Introduction to Python using Jupyter Notebooks:
There are several popular Python programming notebook platforms that offer
excellent environments to learn, practice, and explore Python effectively. Jupyter Notebook
is a widely-used open-source web application that allows users to create and share documents
containing live code, equations, visualizations, and narrative text. It supports over 40
programming languages and is ideal for tasks such as data analysis, machine learning, and
visualization.
Google Colab is a free cloud-based notebook environment developed by Google that
runs on Jupyter. It requires no installation and provides free access to GPUs/TPUs, along with
seamless integration with Google Drive.
Kaggle Kernels is another online platform tailored for data science and machine
learning projects. It offers preloaded datasets, GPU support, and a collaborative cloud
environment.
Deepnote is designed for collaborative data science work, enabling real-time
collaboration and version control, making it suitable for teams.
Visual Studio Code (VS Code) Notebooks allows users to work with Jupyter
notebooks directly within the IDE, benefiting from rich extensions and powerful debugging
tools.
Azure Notebooks, hosted by Microsoft, is a cloud-based Jupyter service that supports
Python execution with features like integration with Azure services and collaboration tools.
Lastly, Binder is a tool that converts GitHub repositories into interactive, shareable
Jupyter notebooks that can run directly in the cloud without any setup. Each of these
platforms caters to different needs, from beginners to advanced users, and collectively they
provide robust tools for learning Python, data science, and machine learning.
Some of the environments to run Python programs:
IDLE (Integrated Development and Learning Environment): Built-in with
Python installation and Simple and lightweight; ideal for beginners
PyCharm: Full-featured Python IDE by JetBrains and Ideal for software
development and debugging
Spyder: Scientific Python Development Environment and Integrated with
Anaconda; good for data science tasks
Anaconda Navigator: GUI for managing Python environments and tools like
Jupyter and Spyder and Excellent for data science workflows
Thonny: Simple IDE designed for beginners. Easy to use with built-in
debugger
Replit: Online IDE for running Python in a browser. Useful for quick tests and
collaborative coding
Terminal / Command Line (CLI): Run Python scripts directly using python
filename.py. Useful for scripting and automation tasks
Variables in Python
Definition
A variable is a named storage location in memory used to hold a value. In Python, a variable
is created automatically when you assign it a value; there is no need for explicit declaration of
type because Python is dynamically typed.