Data acquisition
The next stage of Al Project Cycle, after Problem Scoping, is Data
Acquisition, where data required for the project is acquired in specific
forms and formats. In order to work upon and produce outcomes, correct
data in right form, must be fed to an AI Project.
SIGNIFICANCE OF DATA
AI Project means an (artificially) intelligent project that is capable of
making decisions or performing some intelligent tasks.
Data plays a crucial role for an AI project to behave intelligently as the AI
project is trained using data to behave in a specific way.
To build an Al system, you would need to source large amounts of data
and create data sets for training, testing and evaluation, and then
deployment of the Al project. This process is repeated through several
rounds of training, testing and evaluation.
Quality Data Characteristics
As data is crucial for the success of any AI project, it is important to
ensure that it is quality data. Quality data has these characteristics
1)Accuracy
Is the data accurate as per timeliness and real data?
2)Relevance (i.e., Do you really need this information?)
3) Completeness (i.e., How full (comprehensive) is the information?);
4)Timeliness (i.e., How up-to-date is the information?);
5) Reliability (i.e., Does the information contradict other trusted
resources?
6 Validity (i.e., Is the information compliant with requirements?)
Data is broadly of two types :
Structured Data
Structured data is data that has a purposely designed, pre-defined
structure as per some existing data model, such as simple 2D spreadsheet
arrays, complex relational databases or knowledge graphs etc. The
structured data has well-defined relationships among its elements.
Unstructured Data
Unstructured data is data that is not organised according to any pre-
existing data model. Unstructured data is unprocessed and is often
generated by machine-led systems for example, social media posts,
surveillance camera footage, or satellite imagery etc. The unstructured
data can have its own internal structure, which may not fit in some well-
defined format. For example, in an Al system for analysing the most
popular social media posts, the data - social-media-post, does not have a
predefined structure; it can be text or video or a link or an image or even
some other undefined structure.
Finding reliable data sources
1. Interview
It is one of the most effective sources of data gathering. In this method,
an analyst talks to the users and clients who know about the system, its
functions and flaws.
An interview refers to a one-on-one conversation between an analyst and
the users and clients to find out about the systems, its functions,
shortcomings and flaws
2. Survey
In Surveys, first the goal of the survey is ascertained and thereafter the
questionnaires are formed accordingly.
A survey refers to a study of the opinions, responses, etc. Of a group of
stakeholders
3. Observation
Under the observation method, the responsible person observes the team
in a real working environment and gets ideas about the required data and
its form, and subsequently documents the observation
The observation method refers to human or mechanical watching, noticing
or per- ceiving of what people actually do or what events take place in a
specific working environment.
4. Application Programming Interface (API) API is a specialized technique
in which specific type of data is collected through the use of a
programming interface, such as using social media programs' interface,
data like people's most preferred game, most liked post, most used time
etc. may be gathered
An API refers to Application Program- ming Interface that works behind a
popular software program or game to collect specific type of data
pertaining to users' way of using that program.
5. Web Scraping
Web scraping, web harvesting, or web data extraction is data scraping
used for extracting data from websites. A web scraper is a specialized tool
designed to carry the web Scraping
Web Scraping refers to a data collection technique using a tool called web
scraper that extracts data from websites.
6. Sensors
Sensors or electronic sensors can measure various different parameters
such as
Weather, humidity, body temperature, blood pressure, heart beat, weight
and many more. For instance, you can see that modern medical diagnosis
and wearables like Fitbit, ‘Apple watch’ make good use of sensors.
Internet of Things (IoT) cannot function without sensor
Sensors are mini devices that can collect data about an environment or a
body or a specific task.
7. Cameras
Cameras, because of their video recording and image capturing features
have proven to be good data collection tools in various situations such as
traffic rules violations, automatic detection of flaws in design and outlook
of products, places, buildings etc.