What are the Characteristics (5Vs) of Big Data ?
➢ Volume: the amount of data collected in various forms, including
files, records, tables, etc. Quantities of data reach almost
incomprehensible proportions.
➢ Velocity: The speed of data processing can be extremely high.
In most cases, we deal with real-time data.
➢ Variety: The number of types/formats of data. The data could be
structured (e.g., SQL tables or CSV files), semi-structured
(e.g., HTML code), or unstructured (e.g., video messages).
➢ Veracity: The reliability of the data and if the data
verifiable.
➢ Value : The usefulness and value of the data measured in terms
of dollar & cent.
SLIDESMANIA.COM
Which diagram do you think is Structured, Unstructured or Semi-
structured data ? Discuss each type of data with example (s).
1 2 3
SLIDESMANIA.COM
Source: https://www.selecthub.com
Which diagram do you think is Structured, Unstructured or Semi-
structured data ? Discuss each type of data with example (s).
1- Structured-
refers to information with a high degree of organisation. Items can be organised
in tables and are commonly stored in a database where each field represents the
same type of information.
2-Unstructured-
refers to information with a low degree of organisation. Items are unorganised and
cannot be presented in tabular form, such as text messages, tweets, and emails.
3- Structured-
may have the qualities of both structured and unstructured data such as HTML code.
SLIDESMANIA.COM
Simple Fintech Application for market sentiment
Which stock would you invest in ? Why ?
You might check their profile using data analytics
software, Google Trend and Social blade
a) CIMB
b) MayBank
c) Public Bank
SLIDESMANIA.COM
Google trend
SLIDESMANIA.COM
Social Blade
❖Source: https://www.selecthub.com
❖ CIMB needs to improve their Grade
SLIDESMANIA.COM
Simple Fintech Application for market sentiment
Which stock would you invest in ? Why ?
You might check their profile using data analytics
software, Google Trend and Social blade
a) Either MayBank or Public Bank,
-Interest over time for Maybank is better compared
to CIMB and Public Bank based on Google Trend.
-Maybank has better grade (B) compared to
Public Bank (B-) and CIMB (c).
-However, based on traditional financial statement
analysis –i.e. P/E ratio, generally,
Public Bank has better fundamental compared
SLIDESMANIA.COM
to Maybank for investing.
Reliability : Which one is likely to be unreliable and not facts? A/B,
Why ?
A B
SLIDESMANIA.COM
Reliability : Which one is likely to be unreliable and not facts? A/B,
Why ?
A:More reliable as there is B:Less reliable as the disclosure
disclosure about the author and indicates that the author is involved
author’s intention ( not advising/not VS in a lot of projects that are related to
recommending to buy any cryptocurrency. Thus, his opinion
cryptocurrencies). Thus, the article may be biased and the information
is less biased , more neutral and in the article may not be as reliable
reliable)
SLIDESMANIA.COM
What are Issues and Challenges of Big Data?
Does the dataset have selection bias, missing data or data outliers?
o Is the volume of collected data sufficient?
o Is the dataset well suited for the type of analysis?
o In most instances, the data must be sourced, cleansed and organised before
analysis can occur.
o This process can be extremely difficult with alternative data owing to the
unstructured characteristics of the data involved which are more often qualitative
(eg. texts, photos, and videos) than quantitative in nature.
o Data science : extracting information from Big Data
SLIDESMANIA.COM