0% found this document useful (0 votes)
36 views28 pages

Lecture 1 - Chapter 1-Introduction

The document outlines the course BDS 3203 Programming for Data Science at KCA University, detailing its objectives, content, and assessment methods. It emphasizes the use of R and Python for data manipulation, analysis, and visualization, along with the installation and configuration of programming environments. The course aims to equip students with practical skills in data science through lectures, tutorials, and hands-on exercises.

Uploaded by

graciesoni04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views28 pages

Lecture 1 - Chapter 1-Introduction

The document outlines the course BDS 3203 Programming for Data Science at KCA University, detailing its objectives, content, and assessment methods. It emphasizes the use of R and Python for data manipulation, analysis, and visualization, along with the installation and configuration of programming environments. The course aims to equip students with practical skills in data science through lectures, tutorials, and hands-on exercises.

Uploaded by

graciesoni04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

KCA University

Nairobi, Kenya

BDS 3203 PROGRAMMING FOR DATA SCIENCE

Prepared By:
Dr. Linus Aloo, PhD
E-mail: linusaloo88@[Link]
Phone: 0754188380
Course Text

1. Peter Morgan, “Data Analysis from Scratch with Python: Step by Step Guide,” AI Sciences LLC, 1st Edition, 2016.

2. Roger D. Peng, “R Programming for Data Science,” Lean Publishing, 2015.


3. Vijay Kotu and Bala Deshpande, “Data Science: Concepts and Practice,” Elsevier, Second Edition, 2019.
4. Sinan Ozdemir, “Principles of Data Science,” Packt Publishing,1st Ed. 2016.
References

1. Introduction to Data Science a Python approach to concepts, Techniques and Applications, Igual, L; Seghi’, S. Springer,
ISBN:978-3-319-50016-4

2. Jesús Rogel-Salazar, “DATA SCIENCE AND ANALYTICS WITH PYTHON” CRC Press Taylor & Francis Group, 1st Ed. 2017.

BSD 3203 Programming for Data Science


1/24/2025 1
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline

BSD 3203 Programming for Data Science


Contact 52 hrs
Hours
Pre-requisite  Principles of Data science
Purpose/Aim The main aim of this course are to learn how to use tools like python and R for
acquiring, cleaning, analyzing, exploring, and visualizing data; making data-
driven inferences and decisions; and effectively communicating results.
Course 1. To identify and use available R packages and associated Open Source
Objectives software to meet given scientific objectives
2. To design and write efficient programs using R and Python to perform
routine and specialized data Manipulation /management and analysis
tasks
3. To write python and R programs that can perform data visualizations

BSD 3203 Programming for Data Science


1/24/2025 2
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline
Course  Introduction: Introduction to R for data science, introduction to Python for
Content data science
 Installation and configuration of R and Python programming environment
 Applied Machine learning
 Best-practice software engineering techniques
 Programming with R for data science:
 Explore R language fundamentals, including basic syntax, variables, and
types
 How to create functions and use control flow.
 Details on reading and writing data in R
 Work with data in R
 Create and customize visualizations using ggplot2
 Perform predictive analytics using R
 Programming with Python for data science
 The NumPy package for scientific computing
 The pandas data analysis library, including reading and writing of CSV
files
 The Jupyter and PyDev development environments
 The Matplotlib 2D plotting library
 Understanding the shell
 Using Git and GitHub
BSD 3203 Programming for Data Science
1/24/2025 3
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline
Learning & Teaching Lectures, tutorials and computer laboratory exercises
Methods
Instructional Tools Classroom with audio visual aids, Computer laboratory and Internet access

Course Assessment Type Weighting(%)


Examination 50
Continuous Assessment 50
Total 100
Recommended Reading Author Title Publisher
Joel Grus(2015) Data Science from Scratch: O’Reilly Media
First Principles with Python

Additional Reading Matt Harrison(2016) Learning the Pandas Library: CreateSpace Independent
Python Tools for Data Publishing Platform
Munging, Analysis, and
Visualization

Wickham, H. (2009) Ggplot2. Springer. [Link]


cord=b7232787~S39a

Peng, R.D. (2015) Exploratory Data Analysis [Link]


with R.

Other Support Material variety of A multimedia systems and electronic information


Resources as prescribed by the lecturer. Various application
manuals, URL search and journals.

BSD 3203 Programming for Data Science


1/24/2025 4
Lecture Notes By Dr. Linus .A. Aloo
• BDS 3203 PROGRAMMING FOR DATA SCIENCE
Course Outline
Learning & Teaching Methodologies
Lectures, tutorials and planning exercises
Instructional Materials/Equipment
Classroom with audio visual aids
Computer laboratory
Course Assessment
BSD3203 EXPLORATORY DATA ANALYSIS

Course Assessment Type Weighting (%)


Lab Ass I 5%

Lab Ass II 5%

Lab Ass II 10 %

CATs-CAT 1&2 30 %

END SEM EXAMS 50%


TOTALS 100%
BSD 3203 Programming for Data Science
1/24/2025 5
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.0. Objectives

•Understand the role of R and Python in data science.


•Learn key features and benefits of each language.
•Explore basic syntax and functionality.
•Understand when to choose R or Python for a data science
project.

BSD 3203 Programming for Data Science


1/24/2025 6
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.1. What is Data Science?
•Definition: Data science is the discipline of extracting
knowledge and insights from structured and unstructured data.
•Key Components:
• Data Collection
• Data Cleaning
• Exploratory Data Analysis (EDA)
• Statistical Modeling and Machine Learning
• Data Visualization
• Communication of Results
BSD 3203 Programming for Data Science
1/24/2025 7
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.1. What is Data Science?

Fig.1.1: Steps for performing EDA. Source [[Link]


BSD 3203 Programming for Data Science
1/24/2025 8
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.2. Why Learn R and Python?
•R:
• Tailored for statistical analysis and visualization.
• Extensive packages for data manipulation and modeling (e.g., ggplot2,
dplyr).
• Strong community support for statistical methodologies.
•Python:
• General-purpose programming language.
• Wide variety of libraries for data science (e.g., pandas, NumPy, scikit-
learn).
• Excellent for integration with web apps and production environments.
•Both are open-source and widely used in industry and academia.
BSD 3203 Programming for Data Science
1/24/2025 9
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.3. Introduction to R
•Overview:
• Developed for statistical computing.
• Interactive environment ideal for exploratory data analysis.
•Key Features:
• Built-in statistical functions.
• Flexible and high-quality visualization tools.
• Extensive CRAN repository.
•Use Cases:
• Hypothesis testing, linear and nonlinear modeling, time-series
analysis.
BSD 3203 Programming for Data Science
1/24/2025 10
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.4. Basic R Syntax

Example Code:

 Data Structures in R:
•Vectors, Matrices, Data Frames, Lists.
BSD 3203 Programming for Data Science
1/24/2025 11
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.5. Introduction to Python
•Overview:
• General-purpose language with a focus on simplicity and readability.
• Excellent for scripting, automation, and data analysis.
•Key Features:
• Extensive libraries for data science (e.g., pandas, Matplotlib,
seaborn).
• Integration with machine learning and deep learning frameworks (e.g.,
TensorFlow, PyTorch).
• Broad support for file formats and databases.
•Use Cases:
• Data manipulation, machine learning, web scraping, and deployment.
BSD 3203 Programming for Data Science
1/24/2025 12
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.6. Basic Python Syntax
 Example Code:

 Data Structures in Python:


•Lists, Tuples, Dictionaries, Sets.
BSD 3203 Programming for Data Science
1/24/2025 13
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.7. Comparing R and Python

Feature R Python
General-purpose
Strength Statistical analysis
programming
Advanced (ggplot2, Flexible (Matplotlib,
Visualization
lattice) seaborn)
Moderate (domain-
Ease of Learning Easy (general-purpose)
specific)
Excellent (scikit-learn,
Machine Learning Moderate (caret, mlr)
TensorFlow)
Strong (statistical Strong (general-
Community Support
focus) purpose)

BSD 3203 Programming for Data Science


1/24/2025 14
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.8. Choosing the Right Tool
•Use R when:
• The focus is on statistical modeling.
• Advanced visualization is required.
•Use Python when:
• A broader programming application is needed.
• Integration with web apps or production systems is essential.
•Use Both:
• Leverage the strengths of each language via interoperability tools
(e.g., reticulate in R or rpy2 in Python).

BSD 3203 Programming for Data Science


1/24/2025 15
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.9. Interoperability Between R and Python

•Why Combine R and Python?


•Leverage R’s statistical and visualization capabilities with
Python’s machine learning and integration strengths.
•Tools for Interoperability:
•reticulate package in R.
•rpy2 library in Python.
•Example Workflow:
•Use R for data visualization and Python for model deployment.

BSD 3203 Programming for Data Science


1/24/2025 16
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.20. Advanced Features and Libraries

•R Advanced Features:
• Shiny for interactive dashboards.
• R Markdown for reproducible reports.
•Python Advanced Features:
• Streamlit for interactive web apps.
• FastAPI for building APIs.
•Libraries to Explore:
• R: tidyr, lubridate, caret.
• Python: PyTorch, Flask, BeautifulSoup.
BSD 3203 Programming for Data Science
1/24/2025 17
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Getting Started with R and Python

•Install R:
•Download from R Project.
•IDE: RStudio for enhanced functionality.
•Install Python:
•Download from [Link].
•IDEs: Jupyter Notebook, PyCharm, or VS Code.
•Recommended Resources:
•Online tutorials, MOOCs, and documentation.

BSD 3203 Programming for Data Science


1/24/2025 18
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment

•Objective:
• Understand how to set up R and Python on your system.
• Learn the key tools and configurations needed for
programming.
•Agenda:
• Installation of R
• Installation of Python
• IDEs and Tools Setup
• Configurations and Best Practices

BSD 3203 Programming for Data Science


1/24/2025 19
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment

Installing R
•Steps:
[Link] the R installer:
•Go to CRAN.
•Select your operating system (Windows/Mac/Linux).
[Link] R:
•Run the downloaded installer.
•Follow the on-screen instructions.
•Choose default settings unless specific configurations are needed.
[Link] Installation:
•Open the R console.
•Type version to check the R version.
BSD 3203 Programming for Data Science
1/24/2025 20
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Installing Python
•Steps:
[Link] the Python installer:
•Go to [Link].
•Select your operating system.
[Link] Python:
•Run the downloaded installer.
•Enable "Add Python to PATH".
•Choose default or customize installation as needed.
[Link] Installation:
•Open a terminal or command prompt.
•Type python --version or python3 --version.
BSD 3203 Programming for Data Science
1/24/2025 21
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
IDEs and Tools for R and Python
•R IDEs:
• RStudio (Highly recommended)
• Download from RStudio.
• Install and configure with R.
•Python IDEs:
• PyCharm
• Visual Studio Code (VS Code)
• Jupyter Notebook/ JupyterLab
•Tips:
• Ensure compatibility with installed R/Python versions.
BSD 3203 Programming for Data Science
1/24/2025 22
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Configuring R and Python Environments
•For R:
•Set up libraries using [Link]("package_name").
•Configure RStudio settings (e.g., themes, fonts).
•For Python:
•Use virtual environments for package management:
•Create: python -m venv env_name
•Activate: source env_name/bin/activate (Mac/Linux) or
env_name\Scripts\activate (Windows)
•Install libraries with pip: pip install package_name.

BSD 3203 Programming for Data Science


1/24/2025 23
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Best Practices
•R:
•Regularly update R and packages: [Link]().
•Organize projects using RStudio projects.
•Python:
•Use [Link] for dependencies:

• Follow PEP 8 coding guidelines (use linters like flake8).


•General:
• Backup your work regularly.
• Keep software up to date.
BSD 3203 Programming for Data Science
1/24/2025 24
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Troubleshooting Tips
•Common Issues:
• R or Python not recognized in PATH.
• Library/package installation errors.
•Solutions:
• Check installation paths.
• Reinstall problematic libraries.
• Refer to official documentation or community forums.

BSD 3203 Programming for Data Science


1/24/2025 25
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.21. Installation and Configuration of R and Python
Programming Environment
Resources
•R:
• CRAN Documentation: [Link]
• RStudio Support: [Link]
•Python:
• Official Python Docs: [Link]/doc/
• PyPI Packages: [Link]
•General:
• Stack Overflow: [Link]

BSD 3203 Programming for Data Science


1/24/2025 26
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.22. Hands-on Exercise
•Task: Perform basic data analysis using R and Python.
•R Exercise:
•Load a dataset using [Link]().
•Perform summary statistics with summary().
•Create a basic plot with plot().
•Python Exercise:
•Load a dataset using pandas.
•Perform summary statistics with [Link]().
•Create a basic plot with Matplotlib.

BSD 3203 Programming for Data Science


1/24/2025 27
Lecture Notes By Dr. Linus .A. Aloo
CHAPTER ONE: INTRODUCTION TO R AND
PYTHON FOR DATA SCIENCE
1.23. Summary

•R and Python are powerful tools for data science.


•Each has unique strengths and applications.
•Learning both can maximize your data analysis
capabilities.
•Practice with real-world datasets to deepen your
understanding.

BSD 3203 Programming for Data Science


1/24/2025 28
Lecture Notes By Dr. Linus .A. Aloo

You might also like