Foundations of Data Science (IT3101) (Section B)
TYPE OF DATA
▪ Type on the basis of Technology
▪ Type as per varieties
TYPE OF DATA
▪ Type based on Technology
▪ Structured : Transaction data and OLAP, XLS, CSV
▪ Unstructured: Text documents, PDFs, images and video
▪ Semi Structured: XML data files that are self describing and defined by an xml schema
▪ Quasi Structured: Web clickstream data that may contain some inconsistencies in data values
and formats
▪ Quasi Unstructured: Chemical structure, genome data, RNA, DNA
TYPE OF DATA
▪ Type as per varieties : NORI
▪ N: Nominal- without any quantitative value and unordered; gender (male/female)
▪ O: Ordinal- categorical and ordered; education level (“high school”,”BS”,”MS”,”PhD”), socio
economic status (“low income”,”middle income”,”high income”)
▪ R: Ratio- Numerical (quantitative) and ratio; Income, height, weight, annual sales, market share
▪ I: Interval- Numerical and grouped; temperature (in Celsius or Fahrenheit), mark grading, IQ test and
CGPA.
TYPE OF DATA
TYPE OF DATA
TYPE OF DATA: Guess?
▪ Information that is highly organized, factual, and to-the-point. It usually comes in the
form of letters and numbers that fit nicely into the rows and columns of tables. Data
commonly exists in tables similar to Excel files and Google Docs spreadsheets.
▪ Any pre-defined structure to it and comes in all its diversity of forms. The data vary from
imagery and text files like PDF documents to video and audio files, to name a few.
▪ Pretty much everyone has dealt with booking a ticket via one of the airline reservation
systems or withdrawing cash using an ATM. During these operations, we don’t normally
think of what kind of applications we deal with and what types of data they process.
▪ we can take social media posts of a travel agency or all posts for that matter. Each post
contains some metrics shares or hashtags, likes, comments etc data.
▪ If an agency posts new travel tours and wants to know the audience’s reactions
(comments), they will need to examine the post in its native format (view the post via
social media app or use advanced techniques like sentiment analysis.
▪ CRM software runs data through analytical tools to create datasets that reveal customer
behaviour patterns and trends.