100% found this document useful (3 votes)
1K views14 pages

Data Analytics

This document provides an overview of the unit on data analysis from the MCA Semester - IV course on Data Analytics with R. It defines data analytics as the process of analyzing data to discover useful information and support decision-making. It describes the types of data like structured, semi-structured, and unstructured data. It also discusses the importance of big data analytics and characteristics of good data like accuracy, completeness, consistency, uniqueness, and timeliness that are important for efficient data analytics.

Uploaded by

pratyusha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
1K views14 pages

Data Analytics

This document provides an overview of the unit on data analysis from the MCA Semester - IV course on Data Analytics with R. It defines data analytics as the process of analyzing data to discover useful information and support decision-making. It describes the types of data like structured, semi-structured, and unstructured data. It also discusses the importance of big data analytics and characteristics of good data like accuracy, completeness, consistency, uniqueness, and timeliness that are important for efficient data analytics.

Uploaded by

pratyusha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MCA SEMESTER – IV

Subject Name: Data Analytics with R


Subject Code: 3640005

UNIT – I
Introduction to Data Analysis
Overview of Data Analytics (DA)
Analysis of data, also known as data analytics, is a process of inspecting,
cleansing, transforming, and modeling data with the goal of discovering useful
information, suggesting conclusions, and supporting decision-making.

Data analytics technologies and techniques are widely used in commercial


industries to enable organizations to make more-informed business decisions
and by scientists and researchers to verify or disprove scientific models,
theories and hypotheses.

Data analytics is the science of extracting patterns, trends, and actionable


information from large sets of data. As a term, data analytics predominantly
refers to an assortment of applications, from basic business intelligence (BI),
reporting and online analytical processing (OLAP) to various forms of advanced
analytics.

Business Intelligence (BI) is a broad category of computer software solutions


that enables a company or organization to gain insight into its critical
operations through reporting applications and analysis tools.

OLAP is an acronym for Online Analytical Processing. OLAP performs


multidimensional analysis of business data and provides the capability for
complex calculations, trend analysis, and sophisticated data modeling.

Advanced Analytics is the autonomous or semi-autonomous examination of


data or content using sophisticated techniques and tools, typically beyond
those of traditional business intelligence (BI), to discover deeper insights, make
predictions, or generate recommendations.

Data analytics initiatives can help businesses increase revenues, improve


operational efficiency, optimize marketing campaigns and customer service
efforts, respond more quickly to emerging market trends and gain a
competitive edge over rivals -- all with the ultimate goal of boosting business
performance. Depending on the particular application, the data that's analyzed
can consist of either historical records or new information that has been
processed for real-time analytics uses. In addition, it can come from a mix of
internal systems and external data sources.

Why is big data analytics important? (Need of Data Analytics)

There are four types of big data BI that really aid business:

1. Prescriptive – This type of analysis reveals what actions should be taken.


This is the most valuable kind of analysis and usually results in rules and
recommendations for next steps.
2. Predictive – An analysis of likely scenarios of what might happen. The
deliverables are usually a predictive forecast.
3. Diagnostic – A look at past performance to determine what happened
and why. The result of the analysis is often an analytic dashboard.
4. Descriptive – What is happening now based on incoming data. To mine
the analytics, you typically use a real-time dashboard and/or email
reports.
Big data analytics helps organizations harness their data and use it to identify
new opportunities. That, in turn, leads to smarter business moves, more
efficient operations, higher profits and happier customers.

1. Cost reduction. Big data technologies such as Hadoop and cloud-based


analytics bring significant cost advantages when it comes to storing large
amounts of data – plus they can identify more efficient ways of doing
business.

2. Faster, better decision making. With the speed of Hadoop and in-
memory analytics, combined with the ability to analyze new sources of
data, businesses are able to analyze information immediately – and
make decisions based on what they’ve learned.

3. New products and services. With the ability to gauge customer needs
and satisfaction through analytics comes the power to give customers
what they want. Davenport points out that with big data analytics, more
companies are creating new products to meet customers’ needs.
Classification of Data
Structured Data

Structured data concerns all data which can be stored in database SQL in table
with rows and columns. They have relational key and can be easily mapped
into pre-designed fields. Today, those data are the most processed in
development and the simplest way to manage information.

But structured data represent only 5 to 10% of all informatics data.

Semi structured data

Semi-structured data is information that doesn’t reside in a relational database


but that does have some organizational properties that make it easier to
analyze. With some process you can store them in relation database (it could
be very hard for some kind of semi structured data), but the semi structure
exist to ease space, clarity or compute.

Examples of semi-structured: CSV, XML and JSON documents are semi


structured documents, NoSQL databases are considered as semi structured.

But as Structured data, semi structured data represents a few parts of data (5
to 10%).
Unstructured data

Unstructured data represent around 80% of data. It often includes text and
multimedia content. Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations, WebPages and many
other kinds of business documents. Note that while these sorts of files may
have an internal structure, they are still considered « unstructured » because
the data they contain doesn’t fit neatly in a database.

Unstructured data is everywhere. In fact, most individuals and organizations


conduct their lives around unstructured data. Just as with structured data,
unstructured data is either machine generated or human generated.

Here are some examples of machine-generated unstructured data:

 Satellite images: This includes weather data or the data that the
government captures in its satellite surveillance imagery. Just think
about Google Earth, and you get the picture.
 Scientific data: This includes seismic imagery, atmospheric data, and
high energy physics.
 Photographs and video: This includes security, surveillance, and traffic
video.
 Radar or sonar data: This includes vehicular, meteorological, and
oceanographic seismic profiles.

The following list shows a few examples of human-generated unstructured


data:

 Text internal to your company: Think of all the text within documents,
logs, survey results, and e-mails. Enterprise information actually
represents a large percent of the text information in the world today.
 Social media data: This data is generated from the social media
platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr.
 Mobile data: This includes data such as text messages and location
information.
 Website content: This comes from any site delivering unstructured
content, like YouTube, Flickr, or Instagram.

And the list goes on.

The unstructured data growing quickiest than the other, and their exploitation
could help in business decision.
A group called the Organization for the Advancement of Structured
Information Standards (OASIS) has published the Unstructured Information
Management Architecture (UIMA) standard. The UIMA « defines platform-
independent data representations and interfaces for software components or
services called analytics, which analyze unstructured information and assign
semantics to regions of that unstructured information. »

Many industry watchers say that Hadoop has become the de facto industry
standard for managing Big Data.

Characteristics of Data

There is lot of buzz around data these days. Businesses, big and small, have
started relying on data analytics for critical business decisions. However, it is
observed that not all businesses are able to leverage the benefits of data
analytics in the same ratio. Let us try to understand the reason behind this.

There are five data characteristics that are the building blocks of an efficient
data analytics solution: accuracy, completeness, consistency, uniqueness, and
timeliness. Understanding each of these will help us in understanding why
different businesses are not able to leverage the benefits of data analytics in
the same ratio.

Accuracy
When they are insights extracted from a well-developed and well-tested data
analytics solution, we are assuming that the data is reliable and accurate.
However, flaws in data collection, data storage, or data retrieving will result in
unreliable data and this will reduce the accuracy of the insights extracted by a
data analytics solution.
Completeness
The insights or information extracted by a data analytics solution depends a
great deal on the completeness of the data. Partial data or a dataset with lot of
missing values represents an incomplete picture. Thus, the degree of
completeness of a data determines the accuracy of a data analytics solution.

Consistency
The consistency within a dataset is another important factor that determines
the degree of accuracy of a data analytics solution. A consistent dataset is less
prone to errors and results in better accuracy of a data analytics solution.

Uniqueness
One of the essential components of any business is high quality data. This data,
if used properly, can make a company competitive or can keep a company
competitive. Thus, the degree of uniqueness of data explains the efficiency of a
data analytics solution. In order to add value to any business, the data should
be unique and distinctive.

Timeliness
A data analytics solution that uses out-dated data can restrict a company from
achieving their goals or from surviving in a competitive arena. New and current
data is more valuable to a business than old out-dated data. Though old data
should not be completely over-looked by a data analytics solution, but
emphasis should be placed on the current data.
Applications of Data Analytics/ Uses of Data Science

Using data science, companies have become intelligent enough to push & sell
products as per customers purchasing power & interest. Here’s how they are
ruling our hearts and minds:

Internet Search

When we speak of search, we think ‘Google’. Right? But there are many other
search engines like Yahoo, Bing, Ask, AOL, Duckduckgo etc. All these search
engines (including Google) make use of data science algorithms to deliver the
best result for our searched query in fraction of seconds. Considering the fact
that, Google processes more than 20 petabytes of data everyday. Had there
been no data science, Google wouldn’t have been the ‘Google’ we know today.
Digital Advertisements (Targeted Advertising and re-targeting)

If you thought Search would have been the biggest application of data science
and machine learning, here is a challenger – the entire digital marketing
spectrum. Starting from the display banners on various websites to the digital
bill boards at the airports – almost all of them are decided by using data
science algorithms.

This is the reason why digital ads have been able to get a lot higher CTR than
traditional advertisements. They can be targeted based on user’s past
behaviour. This is the reason why I see ads of analytics trainings while my
friend sees ad of apparels in the same place at the same time.

Recommender Systems

Who can forget the suggestions about similar products on Amazon? They not
only help you find relevant products from billions of products available with
them, but also adds a lot to the user experience.

A lot of companies have fervidly used this engine / system to promote their
products / suggestions in accordance with user’s interest and relevance of
information. Internet giants like Amazon, Twitter, Google Play, Netflix,
Linkedin, imdb and many more uses this system to improve user experience.
The recommendations are made based on previous search results for a user.
Image Recognition

You upload your image with friends on Facebook and you start getting
suggestions to tag your friends. This automatic tag suggestion feature uses face
recognition algorithm. Similarly, while using whatsapp web, you scan a barcode
in your web browser using your mobile phone. In addition, Google provides
you the option to search for images by uploading them. It uses image
recognition and provides related search results. To know more about image
recognition, check out this amazing (1:31) mins video:

https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/
Speech Recognition

Some of the best example of speech recognition products are Google Voice,
Siri, Cortana etc. Using speech recognition feature, even if you aren’t in a
position to type a message, your life wouldn’t stop. Simply speak out the
message and it will be converted to text. However, at times, you would realize,
speech recognition doesn’t perform accurately. Just for laugh, check out this
hilarious video(1:30 mins) and the conversation between Cortana & Satya
Nadela (CEO, Microsoft).

https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/

Gaming

EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led gaming


experience to the next level using data science. Games are now designed using
machine learning algorithms which improve / upgrade themselves as the
player moves up to a higher level. In motion gaming also, your opponent
(computer) analyzes your previous moves and accordingly shapes up its game.
Price Comparison Websites

At a basic level, these websites are being driven by lots and lots of data which
is fetched using APIs and RSS Feeds. If you have ever used these websites, you
would know, the convenience of comparing the price of a product from
multiple vendors at one place. PriceGrabber, PriceRunner, Junglee, Shopzilla,
DealTime are some examples of price comparison websites. Now a days, price
comparison website can be found in almost every domain such as technology,
hospitality, automobiles, durables, apparels etc.

Airline Route Planning

Airline Industry across the world is known to bear heavy losses. Except a few
airline service providers, companies are struggling to maintain their occupancy
ratio and operating profits. With high rise in air fuel prices and need to offer
heavy discounts to customers has further made the situation worse. It wasn’t
for long when airlines companies started using data science to identify the
strategic areas of improvements. Now using data science, the airline
companies can:

1. Predict flight delay

2. Decide which class of airplanes to buy

3. Whether to directly land at the destination, or take a halt in between


(For example: A flight can have a direct route from New Delhi to New
York. Alternatively, it can also choose to halt in any country.)

4. Effectively drive customer loyalty programs

5. Southwest Airlines, Alaska Airlines are among the top companies who’ve
embraced data science to bring changes in their way of working.

6. Fraud and Risk Detection


One of the first applications of data science originated from Finance discipline.
Companies were fed up of bad debts and losses every year. However, they had
a lot of data which use to get collected during the initial paper work while
sanctioning loans. They decided to bring in data science practices in order to
rescue them out of losses. Over the years, banking companies learned to divide
and conquer data via customer profiling, past expenditures and other essential
variables to analyze the probabilities of risk and default. Moreover, it also
helped them to push their banking products based on customer’s purchasing
power.

Delivery logistics

Who says data science has limited applications? Logistic companies like DHL,
FedEx, UPS, Kuhne+Nagel have used data science to improve their operational
efficiency. Using data science, these companies have discovered the best
routes to ship, the best suited time to deliver, the best mode of transport to
choose thus leading to cost efficiency, and many more to mention. Further
more, the data that these companies generate using the GPS installed,
provides them a lots of possibilities to explore using data science.
Miscellaneous

Apart from the applications mentioned above, data science is also used in
Marketing, Finance, Human Resources, Health Care, Government Policies and
every possible industry where data gets generated. Using data science, the
marketing departments of companies decide which products are best for Up
selling and cross selling, based on the behavioral data from customers. In
addition, predicting the wallet share of a customer, which customer is likely to
churn, which customer should be pitched for high value product and many
other questions can be easily answered by data science. Finance (Credit Risk,
Fraud), Human Resources (which employees are most likely to leave,
employees performance, decide employees bonus) and many other tasks are
easily accomplished using data science in these disciplines.

Common questions

Powered by AI

The presence of structured, semi-structured, and unstructured data requires different methods and tools for effective analysis due to their inherent differences. Structured data, with its organized format, is analyzed using traditional database management tools like SQL, which provide precise queries and structured data handling. Semi-structured data, like JSON or XML, requires more specialized tools such as NoSQL databases that can handle hierarchical data organization and support a flexible schema design. Unstructured data, the largest category, demands advanced analytics tools powered by machine learning, natural language processing, and big data technologies like Hadoop to process and extract insights from large and complex datasets. Each data type imposes distinct requirements and challenges for data analytics, thus influencing the selection of appropriate tools and techniques to be employed .

Unstructured data differs from structured data in that it does not reside in relational databases and lacks a predefined model, making it difficult to analyze directly using conventional tools. Examples include text from documents, emails, and multimedia content, which represents about 80% of global data. Structured data, however, is highly organized and easily searchable, typically stored in relational databases using SQL, and constitutes only about 5-10% of data. This distinction affects data analytics significantly as unstructured data requires sophisticated data processing techniques, such as natural language processing and machine learning algorithms, to extract meaningful information, while structured data can be efficiently handled with traditional database tools. The prevalence of unstructured data presents a challenge for analytics but also offers vast opportunities for extracting valuable insights if properly managed .

Predictive analytics and prescriptive analytics have different focuses and impacts on decision-making. Predictive analytics centers on forecasting potential future events by analyzing historical data, thus generating scenarios that might occur and assisting organizations in anticipating changes and trends. Its primary deliverable is a forecast that supports strategic planning. In contrast, prescriptive analytics goes a step further by not only predicting outcomes but also recommending specific actions to achieve desired results. This type of analytics results in actionable guidelines that directly influence decision-making, making it more valuable for organizational strategies that aim to enact specific interventions or optimize resource allocations. Consequently, prescriptive analytics is especially valuable when decisions need to be made about the most efficient paths to achieve projected outcomes .

Timeliness in data analytics refers to the relevance of data being current and immediately usable for decision-making purposes. It is important because using outdated data can misguide decision-makers and lead to strategies that do not address current market conditions or consumer behavior accurately. Prioritizing timeliness enables businesses to react swiftly to emerging trends, maintain competitiveness, and capitalize on opportunities as they arise. Moreover, it ensures that business strategies remain aligned with the latest data insights, which is vital in fast-paced industries where conditions and consumer preferences change rapidly .

Data science significantly enhances airline route planning by allowing airlines to make informed decisions that improve efficiency and reduce costs. By leveraging historical flight data and real-time analytics, airlines can predict flight delays, decide optimal aircraft types for different routes, determine whether to offer direct flights or include stopovers, and fine-tune scheduling to match passenger demand more closely. Data science offers insights into customer preferences and market trends, helping to optimize loyalty programs and pricing strategies. These improvements lead to increased operational efficiency, better resource utilization, and improved customer satisfaction, ultimately boosting profitability for airlines .

Data accuracy is crucial for the effectiveness of data analytics solutions because it ensures that the insights and conclusions drawn from the data are reliable and valid. If data is inaccurate, any analysis performed on it could lead to misleading conclusions, potentially resulting in poor decision-making. Flaws in data collection, storage, or retrieval processes can compromise accuracy, undermining the trustworthiness of the analytics outcomes. Therefore, ensuring high data accuracy is essential to derive dependable insights that can support strategic and operational decisions effectively .

Data science in digital advertisements outperforms traditional methods by enabling precise targeting and retargeting based on user behavior, leading to significantly higher click-through rates (CTR). By analyzing large volumes of data, data science can segment audiences and customize advertisements to individual user preferences and purchasing habits. This level of personalization ensures that ads are relevant to the user, increasing engagement and conversion rates. Unlike traditional advertising, which casts a broad net with less personalization, data-driven digital advertisements can fine-tune their outreach efforts to maximize efficiency and ROI effectively .

Big data technologies like Hadoop and cloud-based analytics provide numerous advantages in improving business operational efficiency. These technologies enable businesses to store and process large volumes of data cost-effectively, helping identify more efficient ways to operate. With their ability to process information rapidly, these technologies facilitate faster and better decision-making, allowing businesses to analyze large datasets in real-time and derive actionable insights immediately. Moreover, by analyzing diverse data sources, businesses can gain a comprehensive understanding of operational bottlenecks and customer needs, enabling them to develop new products and services more aligned with market demands. This agility in responding to data-driven insights helps businesses optimize their supply chains, marketing strategies, and customer service efforts, ultimately leading to higher profits and customer satisfaction .

Completeness in data analytics refers to the extent to which all necessary data is present and accounted for in a dataset. It is significant because incomplete data can lead to incorrect or misleading insights, which affect the reliability of analytical outcomes. Completeness ensures that data-driven insights are based on the full context of the dataset, allowing for accurate modeling and prediction. If data used for analysis is incomplete, the resulting analytic solutions may not fully capture the conditions influencing the business environment, leading to suboptimal decision-making .

The four types of big data business intelligence (BI) are Prescriptive, Predictive, Diagnostic, and Descriptive. Prescriptive analytics suggests actions to be taken, providing rules and recommendations, which is the most valuable because it impacts decision-making directly. Predictive analytics forecasts likely scenarios, aiding strategic planning by offering insights into future occurrences. Diagnostic analytics examines past performance to understand causes and effects, often resulting in an analytic dashboard that helps identify successes and failures. Descriptive analytics shows what is happening in real-time, typically through dashboards, which helps organizations react promptly to current events. Together, these analytics forms help organizations optimize decision-making, enhance operational efficiency, and improve profit margins by understanding past, present, and forecasted data points .

You might also like