0% found this document useful (0 votes)
15 views42 pages

Lecture 1 Course Introduction

Lecture_1_Course_Introduction

Uploaded by

hailinh071220052
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views42 pages

Lecture 1 Course Introduction

Lecture_1_Course_Introduction

Uploaded by

hailinh071220052
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

FDC104: Programming for Data Analysis and Scientific Computing

Lecture 1: Course Introduction & Environment Setup

© 2021 Datapot. All rights reserved.


Lecture overview

Topics Activities
• What is data science? • Setup Python programing environment
• Write and run first Python program
• Course objectives and overview
• Example of a data science project

2
© 2021 Datapot. All rights reserved.
Why?
Jobs?

3
© 2021 Datapot. All rights reserved.
Why?
Jobs?

© 2021 Datapot. All rights reserved.


Why?

• Why do I love data science?


• Why are you here?

5
© 2021 Datapot. All rights reserved.
Memes !

6
© 2021 Datapot. All rights reserved.
Why?

• Why are you here?

7
© 2021 Datapot. All rights reserved.
Co u rs e I nt ro d u c t i o n

Section 1: What is data science?

© 2021 Datapot. All rights reserved.


History

• Long time ago (thousands of years) science was only emprical and people
counted stars

9
© 2021 Datapot. All rights reserved.
History
Khufu, Khafre, Menkaure
Constellation Orion

Pyramid (Antarctica)

10
© 2021 Datapot. All rights reserved.
History (cont)

• Long time ago (thousands of years) science was only emprical and people
counted stars or crops

11
© 2021 Datapot. All rights reserved.
History (cont)

• Long time ago (thousands of years) science was only emprical and people
counted stars or crops and used the data to create to create machines to
describe the phenomena

Stonehedge
© 2021 Datapot. All rights reserved.
Antikihira mechanism 12
History (cont)

• Few hundred years: theoretical approaches, try to derive equations to


describe general phenomena.

13
© 2021 Datapot. All rights reserved.
History (cont)

• About a hundred years ago: computational approaches

14
© 2021 Datapot. All rights reserved.
History (cont)

• And then … data science: “Data Science is a multidisciplinary field that uses
scientific methods, processes, algorithms and systems to extract knowledge
and insights from structured and unstructured data.” – Wikipedia.

• Inter-disciplinary
• Data and task focused
• Adaptable to changes in the
environment and needs

15
© 2021 Datapot. All rights reserved.
The Potential of Data Science
Business Analytics Disease Diagnosis

Getting insights from business data Detecting malaria from blood smears
Drug Discovery Agriculture

Quickly discovering new drugs Precision agriculture 16


The Data Science Process

Ask an interesting question


What is the scientific goal?

Get the Data What would you do if you had all of the data?

What do you want to predict or estimate?


Explore the Data

Model the Data

Communicate/Visualize the Results

17
© 2021 Datapot. All rights reserved.
The Data Science Process

Ask an interesting question


How were the data sampled?

Get the Data Which data are relevant?

Are there privacy issues?


Explore the Data

Model the Data

Communicate/Visualize the Results

18
© 2021 Datapot. All rights reserved.
The Data Science Process

Ask an interesting question


Plot the data.

Get the Data Are there anomalies or egregious issues?

Are there patterns?


Explore the Data

Model the Data

Communicate/Visualize the Results

19
© 2021 Datapot. All rights reserved.
The Data Science Process

Ask an interesting question


Build a model.

Get the Data Fit the model.

Validate the model.


Explore the Data

Model the Data

Communicate/Visualize the Results

20
© 2021 Datapot. All rights reserved.
The Data Science Process

Ask an interesting question


What did we learn?

Get the Data Do the results make sense?

Can we effectively tell a story?


Explore the Data

Model the Data

Communicate/Visualize the Results

21
© 2021 Datapot. All rights reserved.
L e ct u re 1 : Co u rse In t ro d u ct i o n & E nv i ro n m en t Set u p

Section 1: Course objectives and overview

© 2021 Datapot. All rights reserved.


Course objectives

This course is designed to help you start your data science career
journey. After completing this course, you should be able to:
• Define basic concepts in programming for data analytic and scientific
computing.
• Use Python programing language for importing & reading different types of data
• Use Python programing language and libraries (Pandas) for cleaning data
• Use Python programming language for analyzing & visualizing data.
• Use libraries (Scikit Learn) to build and evaluate basic machine learning models
doing inference on data

23
© 2021 Datapot. All rights reserved.
Course outline

• Lecture 1: Course overview & • Lecture 9 + 10: Model Development


environment setup
(Linear and Logistic from scratch)
• Lecture 2 + 3: Python Basics
• Lecture 11: AI Application 1
• Lecture 4 + 5: Importing & loading data
with Python • Lecture 12: AI Application 2 and Course
wrap-up
• Lecture 6 + 7: Data manipulation with
Pandas library

• Lecture 8: Data Visualization

24
© 2021 Datapot. All rights reserved.
Course prerequisites

• General Required Knowledge


• There is no prerequisites for this course

• Preferred Knowledge
• Basic knowledge on algebra and calculus

• Basic knowledge on statistics & probability

• Other requirements
• You must have a computer in order to do coding

25
© 2021 Datapot. All rights reserved.
Co u rs e I nt ro d u c t i o n

Section 3: Course policies and projects

© 2021 Datapot. All rights reserved.


Graded Components

Attendance: 10% Exercises & Homework: 40% Course Project: 50%

- You will get full 10% if you - You will need to finish all - You will work on course projects
attend all classes exercises & homework for each in group (3-4 students)
- You will not be graded all module. - You can choose a project in
course if you miss 20% of - All homework are weighted predefined projects or you can
the classes (3 classes). equally. propose your own projects.
- Due at the beginning of the next
lecture.

27
© 2021 Datapot. All rights reserved.
Course Projects

There will be a group project (3-4 students)


• You will be provided several pre-defined projects you could you for the
course project.
• In some cases, you can use your own (public) data set and your own
project definition (to be approved by the instructors)
• Project topics will be announced on the next lecture
• At the end of the course, you will need to have a presentation and a report
in form of a notebook (Jupyter Notebook).

28
© 2021 Datapot. All rights reserved.
Tools for the course

Jupyterhub Canvas

• Exercises • Lectures & materials


• Homework • Discussion
• Grades • Assignments
• Grades
29
© 2021 Datapot. All rights reserved.
30
© 2021 Datapot. All rights reserved.
Why Python
• The most polular programming language
Activity - Setup
• Using in many fields, especially in data
Python Programming and AI
Environment

31 © 2021 Datapot. All rights reserved.


Why Python
• The most polular programming language
Activity - Setup
• Using in many fields, especially in data
Python Programming and AI
Environment

32 © 2021 Datapot. All rights reserved.


How do we program with Python
• The World’s most popular data science
Activity - Setup platform: Anaconda
Python Programming • Jupyter Notebooks: An powerful tool for
Environment interactive developing & presenting data
science projects.
• Jupyterhub: A hub server for Jupyter
Notebooks

33 © 2021 Datapot. All rights reserved.


Follow along documents
• Windows:
Activity – Install https://docs.anaconda.com/anaconda/in
stall/windows/
Anaconda • macOS:
https://docs.anaconda.com/anaconda/in
stall/mac-os/
• Linux:
https://docs.anaconda.com/anaconda/in
stall/linux/
• Check your setup
• Create a conda environment

34 © 2021 Datapot. All rights reserved.


Multiple conda environments
Activity – Install • Create a new environment:
Anaconda conda create -n fdc104 python=3.8

• Use new environment


Step 1:
source activate datapot
Step 2:
conda install ipykernel
python -m ipykernel install --user --name fdc104

• Remove an environment
conda remove --name fdc104 --all

35 © 2021 Datapot. All rights reserved.


import sys
import random
Activity – Write &
def say_hello(user):
Run your first # some greeting in different languages
prefix_dict = {
program 1: "Hello ",
2: "Xin Chao ",
3: "ni hao "
}
key = random.randint(1,3)
prefix = prefix_dict[key]
print(prefix + user)

if __name__ == "__main__":
user = sys.argv[1]
say_hello(user)

36 © 2021 Datapot. All rights reserved.


• Jupyter Notebooks is already included in
Anaconda
Activity – Run & Use • Run Jupyter Notebooks: Open Command
line and run “jupyter notebook”
Jupyter Notebooks • Jupyter Notebook will be opened on a
browser

37 © 2021 Datapot. All rights reserved.


• We prepared a Jupyter Notbook Server
for you to use for doing exercises and
Activity – Using assignments
Jupyterhub • You do not need to install anything, just
use the notebook.
• Access:
• First login:
• Username: Your email
• Input your password (this will be your
account’s password)

38 © 2021 Datapot. All rights reserved.


Activity – How a real • Example of a real data science project
data science project
looks like?

39 © 2021 Datapot. All rights reserved.


Co u rs e I nt ro d u c t i o n

Wrap-up

© 2021 Datapot. All rights reserved.


Summary

In summary, in this lecture, you learned how to:


• Recognize the purpose of this programing course
• Recognize the course structure
• Recognize the process of data science projects
• Configure Python programming environment and run a Python program

41
© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you

© 2019 Datapot. All rights reserved.

You might also like