Lecture 5: Data Acquisition
and Sources
Md Ashiqur Rahman, Lecturer
Department of Computer Science and Engineering
United International University
“It
is a capital mistake to
theorize before one has
data”
-Sherlock Holmes
Data Acquisition
Definition
Data acquisition (also called data mining) is the process of gathering
data. Ideally, we have a question in mind before we collect the data
❑ Questions that we consider when acquiring data:
○ What type of data is needed to achieve the goal?
○ How much data is needed?
○ Where and how can this data be found?
○ What legal and privacy concerns should be considered?
Type of Data
❑ Structured Data
❑ Semi-Structured Data
❑ Unstructured Data
Now let’s identify the following
illustrations’ data type.
Type of Data(cont.)
Semi-
Properties Structured Unstructured
Structured
Schema
Not fully tabular, There is no specific
Flexibility Dependent ,
but marked up rules for its flexibility
tabular
XML,
Relational RDF( Resource Character or binary
Technology database Description data
Framework)
Version Versioning over Versioning over
Versioned as a whole
Management tuples, row, tables tuples or graph
Data Collection
“systematic process of gathering, measuring, and analyzing information
from various sources to get a complete and accurate picture of an area of
interest”
Popular Data Collection Techniques:
❑ Survey
❑ Web Scraping
❑ APIs
❑ Databases
Survey
Or Join the Survey at : https://www.menti.com/al3uiyk4u12b
Survey(Cont.)
❑ What type of data can we collect from Surveys?
Mostly, Structured Data.
Strengths Weaknesses
Respondents may not feel encouraged to
Easy to develop and administer. Cost-Effective
provide accurate, honest answers
Data errors due to question non-responses may
Can be developed in less-time
exist
Respondents may not be fully aware of their
A broad range of data can be collected reasons for any given answer because of
boredom.
Web Scraping
“the process of extracting data
from websites and converting it into
a structured format like CSV, JSON
or XML”
Web Scraping(cont.)
Python Notebook for Web Scraping Example
Web Scraping(cont.)
API
Application Programming
Interface
API
Application Programming
Interface
API(cont.)- Case Study
Passenger Feedback Analysis at
Shahjalal Airport
Shahjalal Airport, keen on enhancing its services
and facilities, recognized the need to
understand passenger opinions
comprehensively. In the age of digital feedback and
online reviews, the airport found itself at a
disadvantage, lacking a systematic approach to
gather and analyze passenger feedback.
API- Case Study(cont.)
• The airport tapped into popular platforms like
TripAdvisor and Google Maps, utilizing their
APIs to gather reviews.
• By making GET requests to these APIs, the
airport could efficiently collect large volumes of
data, including ratings, comments, and specific
feedback on different services.
• This data was methodically stored in a database
for further analysis, enabling the airport to track
trends and common themes in passenger
feedback.
Databases
Databases(cont.)
Relational databases, such as SQL Server, Oracle,
MySQL, and IBM DB2, are used to store data in an
organized manner in these systems. Data from
databases and data warehouses can be utilized as an
analysis source. Data from a retail transaction
system, for example, can be used to analyze sales
in different regions, while data from a customer
relationship management system can be used to forecast
sales. There are additional publicly and privately available
datasets outside of the organization.
The End