DELHI PUBLIC SCHOOL-BOPAL, AHMEDABAD
EXTRA NOTES ON PART-B UNIT-2 AI PROJECT CYCLE
CLASS: X SUBJECT: ARTIFICIAL INTELLIGENCE SESSION: 2024-2025
TOPIC: PROJECT CYCLE
BIG DATA & DATA ANALYTICS
Big Data includes a large volume of structured and unstructured data which is very complex.
Traditional data management tools cannot be used to manage such a large amount of data. This
is the reason that Big Data tools were developed to manage it.
Data analytics is a process of extracting useful information from the raw data which helps
businesses to make decisions. There are many differences between Big Data and Data Analytics
and we will look into them in detail.
Big Data
Big data consists of a large volume of data which can be structured, unstructured, or semi-
structured. There are many big data management tools that are used to manage the data. These
tools are used to store the data and process it. Some of the characteristics of big data include
velocity, variety, and volume. The sources from which the data is extracted are stock exchanges,
jet engines, social media, etc.
Uses of Big Data
The uses of big data are as follows −
• Big Data for Financial Services
• Big Data in Communications
• Media and Entertainment
• Big Data for Retail
• Banking and Securities
Features of Big Data
Big data comes with a lot of features which we will discuss here. The 5V’s are
• Volume − Big data has the ability to store large volumes of data and then processing methods
are used to process the data. The amount of data is used to find out whether this is big data
or not.
• Variety − Large data sets consist of different types of data which include tabular databases,
images, video data, audio data, and many more.
• Velocity − Velocity in big data refers to the speed at which the data is generated. The
generation of data is continuous and it is added to the datasets.
• Veracity − The generated data can be complex and may have many inconsistencies. So
veracity is needed for the processing and management of the data.
• Value – A successful big data analytics strategy must generate value. The insights derived
from the analysis should provide meaningful guidance for improving operations, enhancing
customer service, or creating other forms of value. An integral part of developing a big data
analytics strategy is distinguishing between data that can contribute value and data that
cannot.
Types of Big Data
Big data is of many types and we will discuss each of them here.
• Structured Data − Structured data is in the form of a specific structure and can be easily
processed. This is so because users can go through the data and understand it easily.
Structured data is data whose elements are addressable for effective analysis. It has been
organized into a formatted repository that is typically a database. It concerns all data which
can be stored in database SQL in a table with rows and columns. They have relational keys
and can easily be mapped into pre-designed fields. Today, those data are most processed in
the development and simplest way to manage information. Example: Relational data.
• Semi-Structured Data − This is a kind of data which does not follow a specific structure but
is still in the form of a structure. Some of these structures can be hierarchy, grouping, etc.
Semi-structured data is information that does not reside in a relational database but that has
some organizational properties that make it easier to analyze. With some processes, you can
store them in the relation database (it could be very hard for some kind of semi-structured
data), but Semi-structured exist to ease space. Example: XML data.
• Unstructured Data − This is a kind of data which does not follow any structure. Such data
includes pictures, text, video, audio, and many more.
Unstructured data is a data which is not organized in a predefined manner or does not have
a predefined data model, thus it is not a good fit for a mainstream relational database. So for
Unstructured data, there are alternative platforms for storing and managing, it is
increasingly prevalent in IT systems and is used by organizations in a variety of business
intelligence and analytics applications. Example: Word, PDF, Text, Media logs.