0% found this document useful (0 votes)
9 views18 pages

Lecture-4 5

Uploaded by

mmiti2330122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views18 pages

Lecture-4 5

Uploaded by

mmiti2330122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture 5: Data Acquisition

and Sources
Md Ashiqur Rahman, Lecturer

Department of Computer Science and Engineering


United International University
“It
is a capital mistake to
theorize before one has
data”
-Sherlock Holmes
Data Acquisition

Definition
Data acquisition (also called data mining) is the process of gathering
data. Ideally, we have a question in mind before we collect the data

❑ Questions that we consider when acquiring data:


○ What type of data is needed to achieve the goal?
○ How much data is needed?
○ Where and how can this data be found?
○ What legal and privacy concerns should be considered?
Type of Data

❑ Structured Data

❑ Semi-Structured Data

❑ Unstructured Data

Now let’s identify the following


illustrations’ data type.
Type of Data(cont.)

Semi-
Properties Structured Unstructured
Structured
Schema
Not fully tabular, There is no specific
Flexibility Dependent ,
but marked up rules for its flexibility
tabular

XML,
Relational RDF( Resource Character or binary
Technology database Description data
Framework)

Version Versioning over Versioning over


Versioned as a whole
Management tuples, row, tables tuples or graph
Data Collection
“systematic process of gathering, measuring, and analyzing information
from various sources to get a complete and accurate picture of an area of
interest”

Popular Data Collection Techniques:

❑ Survey

❑ Web Scraping

❑ APIs

❑ Databases
Survey

Or Join the Survey at : https://www.menti.com/al3uiyk4u12b


Survey(Cont.)

❑ What type of data can we collect from Surveys?


Mostly, Structured Data.

Strengths Weaknesses

Respondents may not feel encouraged to


Easy to develop and administer. Cost-Effective
provide accurate, honest answers

Data errors due to question non-responses may


Can be developed in less-time
exist

Respondents may not be fully aware of their


A broad range of data can be collected reasons for any given answer because of
boredom.
Web Scraping

“the process of extracting data


from websites and converting it into
a structured format like CSV, JSON
or XML”
Web Scraping(cont.)

Python Notebook for Web Scraping Example


Web Scraping(cont.)
API

Application Programming
Interface
API

Application Programming
Interface
API(cont.)- Case Study

Passenger Feedback Analysis at


Shahjalal Airport
Shahjalal Airport, keen on enhancing its services
and facilities, recognized the need to
understand passenger opinions
comprehensively. In the age of digital feedback and
online reviews, the airport found itself at a
disadvantage, lacking a systematic approach to
gather and analyze passenger feedback.
API- Case Study(cont.)

• The airport tapped into popular platforms like


TripAdvisor and Google Maps, utilizing their
APIs to gather reviews.

• By making GET requests to these APIs, the


airport could efficiently collect large volumes of
data, including ratings, comments, and specific
feedback on different services.

• This data was methodically stored in a database


for further analysis, enabling the airport to track
trends and common themes in passenger
feedback.
Databases
Databases(cont.)

Relational databases, such as SQL Server, Oracle,


MySQL, and IBM DB2, are used to store data in an
organized manner in these systems. Data from
databases and data warehouses can be utilized as an
analysis source. Data from a retail transaction
system, for example, can be used to analyze sales
in different regions, while data from a customer
relationship management system can be used to forecast
sales. There are additional publicly and privately available
datasets outside of the organization.
The End

You might also like