KCA University
Nairobi, Kenya
BDS 3203 PROGRAMMING FOR DATA SCIENCE
Prepared By:
Dr. Linus Aloo, PhD
E-mail: linusaloo88@[Link]
Phone: 0754188380
Course Text
1. Peter Morgan, “Data Analysis from Scratch with Python: Step by Step Guide,” AI Sciences LLC, 1st Edition, 2016.
2. Roger D. Peng, “R Programming for Data Science,” Lean Publishing, 2015.
3. Vijay Kotu and Bala Deshpande, “Data Science: Concepts and Practice,” Elsevier, Second Edition, 2019.
4. Sinan Ozdemir, “Principles of Data Science,” Packt Publishing,1st Ed. 2016.
References
1. Introduction to Data Science a Python approach to concepts, Techniques and Applications, Igual, L; Seghi’, S. Springer,
ISBN:978-3-319-50016-4
2. Jesús Rogel-Salazar, “DATA SCIENCE AND ANALYTICS WITH PYTHON” CRC Press Taylor & Francis Group, 1st Ed. 2017.
BSD 3203 Programming for Data Science
1/24/2025 1
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline
BSD 3203 Programming for Data Science
Contact 52 hrs
Hours
Pre-requisite Principles of Data science
Purpose/Aim The main aim of this course are to learn how to use tools like python and R for
acquiring, cleaning, analyzing, exploring, and visualizing data; making data-
driven inferences and decisions; and effectively communicating results.
Course 1. To identify and use available R packages and associated Open Source
Objectives software to meet given scientific objectives
2. To design and write efficient programs using R and Python to perform
routine and specialized data Manipulation /management and analysis
tasks
3. To write python and R programs that can perform data visualizations
BSD 3203 Programming for Data Science
1/24/2025 2
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline
Course Introduction: Introduction to R for data science, introduction to Python for
Content data science
Installation and configuration of R and Python programming environment
Applied Machine learning
Best-practice software engineering techniques
Programming with R for data science:
Explore R language fundamentals, including basic syntax, variables, and
types
How to create functions and use control flow.
Details on reading and writing data in R
Work with data in R
Create and customize visualizations using ggplot2
Perform predictive analytics using R
Programming with Python for data science
The NumPy package for scientific computing
The pandas data analysis library, including reading and writing of CSV
files
The Jupyter and PyDev development environments
The Matplotlib 2D plotting library
Understanding the shell
Using Git and GitHub
BSD 3203 Programming for Data Science
1/24/2025 3
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline
Learning & Teaching Lectures, tutorials and computer laboratory exercises
Methods
Instructional Tools Classroom with audio visual aids, Computer laboratory and Internet access
Course Assessment Type Weighting(%)
Examination 50
Continuous Assessment 50
Total 100
Recommended Reading Author Title Publisher
Joel Grus(2015) Data Science from Scratch: O’Reilly Media
First Principles with Python
Additional Reading Matt Harrison(2016) Learning the Pandas Library: CreateSpace Independent
Python Tools for Data Publishing Platform
Munging, Analysis, and
Visualization
Wickham, H. (2009) Ggplot2. Springer. [Link]
cord=b7232787~S39a
Peng, R.D. (2015) Exploratory Data Analysis [Link]
with R.
Other Support Material variety of A multimedia systems and electronic information
Resources as prescribed by the lecturer. Various application
manuals, URL search and journals.
BSD 3203 Programming for Data Science
1/24/2025 4
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline
Learning & Teaching Methodologies
Lectures, tutorials and planning exercises
Instructional Materials/Equipment
Classroom with audio visual aids
Computer laboratory
Course Assessment
BSD3203 EXPLORATORY DATA ANALYSIS
Course Assessment Type Weighting (%)
Lab Ass I 5%
Lab Ass II 5%
Lab Ass II 10 %
CATs-CAT 1&2 30 %
END SEM EXAMS 50%
TOTALS 100%
BSD 3203 Programming for Data Science
1/24/2025 5
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.0. Objectives
•Understand the role of R and Python in data science.
•Learn key features and benefits of each language.
•Explore basic syntax and functionality.
•Understand when to choose R or Python for a data science
project.
BSD 3203 Programming for Data Science
1/24/2025 6
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.1. What is Data Science?
•Definition: Data science is the discipline of extracting
knowledge and insights from structured and unstructured data.
•Key Components:
• Data Collection
• Data Cleaning
• Exploratory Data Analysis (EDA)
• Statistical Modeling and Machine Learning
• Data Visualization
• Communication of Results
BSD 3203 Programming for Data Science
1/24/2025 7
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.1. What is Data Science?
Fig.1.1: Steps for performing EDA. Source [[Link]
BSD 3203 Programming for Data Science
1/24/2025 8
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.2. Why Learn R and Python?
•R:
• Tailored for statistical analysis and visualization.
• Extensive packages for data manipulation and modeling (e.g., ggplot2,
dplyr).
• Strong community support for statistical methodologies.
•Python:
• General-purpose programming language.
• Wide variety of libraries for data science (e.g., pandas, NumPy, scikit-
learn).
• Excellent for integration with web apps and production environments.
•Both are open-source and widely used in industry and academia.
BSD 3203 Programming for Data Science
1/24/2025 9
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.3. Introduction to R
•Overview:
• Developed for statistical computing.
• Interactive environment ideal for exploratory data analysis.
•Key Features:
• Built-in statistical functions.
• Flexible and high-quality visualization tools.
• Extensive CRAN repository.
•Use Cases:
• Hypothesis testing, linear and nonlinear modeling, time-series
analysis.
BSD 3203 Programming for Data Science
1/24/2025 10
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.4. Basic R Syntax
Example Code:
Data Structures in R:
•Vectors, Matrices, Data Frames, Lists.
BSD 3203 Programming for Data Science
1/24/2025 11
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.5. Introduction to Python
•Overview:
• General-purpose language with a focus on simplicity and readability.
• Excellent for scripting, automation, and data analysis.
•Key Features:
• Extensive libraries for data science (e.g., pandas, Matplotlib,
seaborn).
• Integration with machine learning and deep learning frameworks (e.g.,
TensorFlow, PyTorch).
• Broad support for file formats and databases.
•Use Cases:
• Data manipulation, machine learning, web scraping, and deployment.
BSD 3203 Programming for Data Science
1/24/2025 12
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.6. Basic Python Syntax
Example Code:
Data Structures in Python:
•Lists, Tuples, Dictionaries, Sets.
BSD 3203 Programming for Data Science
1/24/2025 13
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.7. Comparing R and Python
Feature R Python
General-purpose
Strength Statistical analysis
programming
Advanced (ggplot2, Flexible (Matplotlib,
Visualization
lattice) seaborn)
Moderate (domain-
Ease of Learning Easy (general-purpose)
specific)
Excellent (scikit-learn,
Machine Learning Moderate (caret, mlr)
TensorFlow)
Strong (statistical Strong (general-
Community Support
focus) purpose)
BSD 3203 Programming for Data Science
1/24/2025 14
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.8. Choosing the Right Tool
•Use R when:
• The focus is on statistical modeling.
• Advanced visualization is required.
•Use Python when:
• A broader programming application is needed.
• Integration with web apps or production systems is essential.
•Use Both:
• Leverage the strengths of each language via interoperability tools
(e.g., reticulate in R or rpy2 in Python).
BSD 3203 Programming for Data Science
1/24/2025 15
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.9. Interoperability Between R and Python
•Why Combine R and Python?
•Leverage R’s statistical and visualization capabilities with
Python’s machine learning and integration strengths.
•Tools for Interoperability:
•reticulate package in R.
•rpy2 library in Python.
•Example Workflow:
•Use R for data visualization and Python for model deployment.
BSD 3203 Programming for Data Science
1/24/2025 16
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.20. Advanced Features and Libraries
•R Advanced Features:
• Shiny for interactive dashboards.
• R Markdown for reproducible reports.
•Python Advanced Features:
• Streamlit for interactive web apps.
• FastAPI for building APIs.
•Libraries to Explore:
• R: tidyr, lubridate, caret.
• Python: PyTorch, Flask, BeautifulSoup.
BSD 3203 Programming for Data Science
1/24/2025 17
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Getting Started with R and Python
•Install R:
•Download from R Project.
•IDE: RStudio for enhanced functionality.
•Install Python:
•Download from [Link].
•IDEs: Jupyter Notebook, PyCharm, or VS Code.
•Recommended Resources:
•Online tutorials, MOOCs, and documentation.
BSD 3203 Programming for Data Science
1/24/2025 18
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
•Objective:
• Understand how to set up R and Python on your system.
• Learn the key tools and configurations needed for
programming.
•Agenda:
• Installation of R
• Installation of Python
• IDEs and Tools Setup
• Configurations and Best Practices
BSD 3203 Programming for Data Science
1/24/2025 19
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Installing R
•Steps:
[Link] the R installer:
•Go to CRAN.
•Select your operating system (Windows/Mac/Linux).
[Link] R:
•Run the downloaded installer.
•Follow the on-screen instructions.
•Choose default settings unless specific configurations are needed.
[Link] Installation:
•Open the R console.
•Type version to check the R version.
BSD 3203 Programming for Data Science
1/24/2025 20
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Installing Python
•Steps:
[Link] the Python installer:
•Go to [Link].
•Select your operating system.
[Link] Python:
•Run the downloaded installer.
•Enable "Add Python to PATH".
•Choose default or customize installation as needed.
[Link] Installation:
•Open a terminal or command prompt.
•Type python --version or python3 --version.
BSD 3203 Programming for Data Science
1/24/2025 21
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
IDEs and Tools for R and Python
•R IDEs:
• RStudio (Highly recommended)
• Download from RStudio.
• Install and configure with R.
•Python IDEs:
• PyCharm
• Visual Studio Code (VS Code)
• Jupyter Notebook/ JupyterLab
•Tips:
• Ensure compatibility with installed R/Python versions.
BSD 3203 Programming for Data Science
1/24/2025 22
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Configuring R and Python Environments
•For R:
•Set up libraries using [Link]("package_name").
•Configure RStudio settings (e.g., themes, fonts).
•For Python:
•Use virtual environments for package management:
•Create: python -m venv env_name
•Activate: source env_name/bin/activate (Mac/Linux) or
env_name\Scripts\activate (Windows)
•Install libraries with pip: pip install package_name.
BSD 3203 Programming for Data Science
1/24/2025 23
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Best Practices
•R:
•Regularly update R and packages: [Link]().
•Organize projects using RStudio projects.
•Python:
•Use [Link] for dependencies:
• Follow PEP 8 coding guidelines (use linters like flake8).
•General:
• Backup your work regularly.
• Keep software up to date.
BSD 3203 Programming for Data Science
1/24/2025 24
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Troubleshooting Tips
•Common Issues:
• R or Python not recognized in PATH.
• Library/package installation errors.
•Solutions:
• Check installation paths.
• Reinstall problematic libraries.
• Refer to official documentation or community forums.
BSD 3203 Programming for Data Science
1/24/2025 25
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Resources
•R:
• CRAN Documentation: [Link]
• RStudio Support: [Link]
•Python:
• Official Python Docs: [Link]/doc/
• PyPI Packages: [Link]
•General:
• Stack Overflow: [Link]
BSD 3203 Programming for Data Science
1/24/2025 26
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.22. Hands-on Exercise
•Task: Perform basic data analysis using R and Python.
•R Exercise:
•Load a dataset using [Link]().
•Perform summary statistics with summary().
•Create a basic plot with plot().
•Python Exercise:
•Load a dataset using pandas.
•Perform summary statistics with [Link]().
•Create a basic plot with Matplotlib.
BSD 3203 Programming for Data Science
1/24/2025 27
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.23. Summary
•R and Python are powerful tools for data science.
•Each has unique strengths and applications.
•Learning both can maximize your data analysis
capabilities.
•Practice with real-world datasets to deepen your
understanding.
BSD 3203 Programming for Data Science
1/24/2025 28
Lecture Notes By Dr. Linus .A. Aloo