ACADEMICPLANNER
ON
“DataAnalytics Using R”
Departmento
f
ComputerScience&Engineering (AI&ML)
CMRENGINEERINGCOLLEGE
(ApprovedbyAICTE-NewDelhi,AffiliatedtoJNTU,Hyderabad)
Kandlakoya(v),MedchalRoad,Hyderabad-
501401,TelanganaState,India.Website:www.cmrec.ac.in
(2024-25)
ACADEMICPLANNER
Subject: DataAnalytics Using R(III-B.Tech I Sem)
S.NO CONTENTS
(1) - Preamble/Introduction
(2) - Prerequisites
(3) - ObjectivesandOutcomes
(4) - Syllabus
1. R22-CMREC
2. GATE
3. IES(Not Applicable)
(5) -
ListofExpertDetails(Local/National/InternationalwithCon
tactdetails/Profilelink/Blogs/theirresearch
Contributiontowardsthesubject)
(6) - Journalswithmin5refpaper forliteraturestudy
(7) - Subject-Lessonplan
(8) - SuggestedBooks(prescribedandReferences)
(9) - WebsitesforselflearningResourceslike
www.geeksforgeeks.org, www.schools.com,
Coursera,edX,Udemy,Khan Academy, NPTELetc
alongRegistrationprocedures)
(10) -
QuestionBanks1.JNTUH/
Modelpapers
2.GATE
(11) - Twocasestudypresentations
withProject/Product/Model/prototypes/Industrialapp
lications.
(12) - AssignmentQuestion/InnovativeAssignmentssets.
(13) - ListoftopicsforstudentsSeminarswithGuidelines
(14) - STEP/Coursematerialinsoftcopy
(15) - ExpertLectureswithtopics&Schedules (ifany)
2
(1) Preamble/Introduction
Data Analytics is the process of examining raw data to find the patterns, draw conclusions
and make the predictions. It helps businesses to make better decisions, improve efficiency
& gain deeper knowledge understanding their customer and operations.
(2) PREREQUISITES
Acourseon "Database ManagementSystems"
A course on“probability and statistics”.
A course on Machine Learning”.
(3) ObjectivesandOutcomes
COURSE OBJECTIVES
Toexplorethefundamentalconceptsofdata analytics
Tolearntheprinciplesandmethodsofstatisticalanalysis
Discoverinterestingpatterns,analyzesupervisedandunsupervisedmodelsandestimatethe
accuracyof thealgorithms.
Tounderstandthevarioussearchmethodsandvisualizationtechniques.
COURSE OUTCOMES
CO1 : Describe the Concepts of Data Management
CO2: Use different techniques to Purify the raw Data.
CO3: Analyze the Various Machine Learning Models
CO4:Categorize the different Learning Techniques
CO5: Apply various Visualization Techniques to Generate Graphs.
(4)SCOPE
TheScopeofthissubjectistoprovideunderstandingoffundamentalconceptsofdata analytics
(4.1)SYLLABUS– CMREC
UNITI
Data Management: Design Data Architecture and manage the data for analysis,
understandvarioussourcesofDatalikeSensors/Signals/GPSetc.DataManagement,DataQuality(noise,outliers,m
issingvalues,duplicatedata)andDataProcessing&Processing.
UNITII
DataAnalytics:IntroductiontoAnalytics,IntroductiontoToolsandEnvironment,ApplicationofModelinginBusin
ess,Databases&TypesofDataandvariables,DataModelingTechniques,Missing Imputationsetc.Need
forBusiness Modeling.
3
UNITIII
Regression— Concepts,Blue property assumptions,Least SquareEstimation, VariableRationalization,and
ModelBuildingetc.
Logistic Regression: Model Theory, Model fit Statistics, Model Construction,
AnalyticsapplicationstovariousBusiness Domainsetc.
UNITIV
ObjectSegmentation:RegressionVsSegmentation—SupervisedandUnsupervisedLearning, Tree Building —
Regression, Classification, Overfilling, Pruning and
Complexity,MultipleDecisionTreesetc.TimeSeriesMethods:Arima,MeasuresofForecastAccuracy,
STL approach, Extract features from generated model as Height, Average Energy etc andAnalyze
forprediction
UNITV
DataVisualization:Pixel-OrientedVisualizationTechniques,GeometricProjectionVisualization Techniques,
Icon-Based Visualization Techniques, Hierarchical VisualizationTechniques,VisualizingComplex Data
andRelations.
(4.2)SYLLABUS–GATE
Notapplicable
(4.3)SYLLABUS-IES
Notapplicable
(5) LIST OFEXPERTDETAILS
INTERNATIONAL
1. Hadley Alexander Wickham (born 14 October 1979) is a New Zealand statistician known for his
work on open-source software for the R statistical programming environment. E-
mail:
[email protected].
2. Gregorypiatetsky-shapironationalMichael Brodie, leading database researcher, industry leader,
thinker. SIGKDD, KDnuggets, Brookline, MA, USA.
NATIONAL
1. Dr. Dr. Geeta Patil, Associate Professor
[email protected] BMS Institute of Technology and Management.
https://bmsit.irins.org/profile/206713
4
2. Prof. D.Lakshmi, Professor, VIT, Bhopal
Email:
[email protected] Mobile No: 9945379089
REGIONAL
1. Mr.RajeshPrabhakarKaila,TimesCenterforLearningLimited,TimesofIndiaGroup,VisitingFaculty
Member (Business Analytics & Finance) at Symbiosis International University,Hyderabad.
2. Dr. A.V.Krishna Prasad Faculty of CSE Dept, MVSR Engineering College, Hyderabad,
MCmemberforCSIHyderabad Chapter.
(6) JOURNALS
1. Big data ,openaccesspeer-reviewedjournal,providesaforumfor world-class research
exploring thechallengesandopportunities incollecting,
analyzing,anddisseminatingvastamountsofdata.Liebertpublishers.
2. Case studies inbusiness,industry and governmentstatistics ,electronicjournal,bentleyuniv
ersity.
3. Data science journal , published bythecommittee ondata forscienceandtechnology
(codata) oftheinternationalcouncilforscience(ICSU).
4. Epj data sciencejournal ,springeropen.
5. Ieee transactions onknowledge and data engineering
6. Information visualization , a central forum for all aspects ofinformation
visualization and its application (palgravemacmillanjournals).
7. Intelligent data analysis journal(IOSpress).
8. Journal ofbig data ,aspringeropenjournal.
9. Journal ofdata mining andknowledge discovery ,tri-monthly,ISSN:2229–6662,2229–
6670,bioinfopublications,India.
10. Journal ofdata science ,aninternationaljournaldevotedtoapplicationsofstatisticalmeth
odsatlarge.
11. Journal ofintelligent informationsystems .
12. Journal ofmachinelearningresearch
13. Kais: knowledge and information systems: aninternational journal (sprin
ger-verlag).
5
(7) SUBJECT(LESSON)PLAN
Topic No. Of
Suggested Teaching
S.NO (CMREC Sub-Topic lectures
Books Methods
Syllabus) required
UNIT – I(11)
1 Introduction to Data L1 T1 M1
Management.
2 Design Data Architecture and L2-L3 T1 M1
manage the data for analysis.
understand various sources of
3 Data like Sensors/Signals/GPS L4-L5 T1 M2(PPT)
Data etc.
4 Data Management, Data Quality L6-L7 T1 M1
Management
noise.
5 outliers, missing values, duplicate L8-L9 T1,R1 M2(PPT)
data)
6 Data Processing L10 T1 M1
7 Data Pre-Processing L11 T1 M2(PPT)
UNIT – II(11)
8 Introduction to Analytics L12 T1 M1
9 Introduction to Tools and L13-L14 T1 M1
Environment
10 Application of Modelling in L15-L16 T1,R2 M2(PPT)
Data
Business
Analytics Databases & Types of Data and
11 L17-L18 T1,R3 M1
variables
12 Data Modelling Techniques L19-20 T1 M2(PPT)
13 Missing Imputations etc L21 T1 M2(PPT)
14 Need for Business Modelling L22 T1 M2(PPT)
UNIT – III(14)
15 Regression Concepts L23-L24 T1 M1
16 Blue property assumptions L25-L26 T1,R3 M1
17 Regression Least Square Estimation L27-L28 T1 M1
18 Variable Rationalization, Model L29-L30 T1,R2 M1(PPT)
Building etc.
19 Logistic Regression: Model L31-L32 T1 M1(PPT)
Theory
20 Model fit Statistics, Model L33-L34 T1,R2 M2(PPT)
Construction
6
21 Analytics applications to various L35-L36 T1,R2 M2(PPT)
Business Domains etc.
UNIT-IV(13)
Regression Vs Segmentation,
22 Supervised and Unsupervised L37-L38 T1 M1
Learning
23 Tree Building – Regression, L39-40 T1 M1
Classification
Over fitting, Pruning and T1 M1
24 L41- L42
Complexity
Object
25 Multiple Decision Trees etc. L43 T1 M1
Segmentation
Time Series Methods: Arima,
26 L44-L45 T1,R2 M1
Measures of Forecast Accuracy
STL approach, Extract features
27 L46-L47 T1,R2 M1
from generated model as Height
Average Energy etc and Analyze
28 L48-L49 T1,R1 M1
for prediction
UNIT- V(07)
29 Pixel-Oriented Visualization L50-L51 TI M1
Techniques
30 Geometric Projection L52-L53 T1 M1
Visualization Techniques
31 Data Icon-Based Visualization L54 T1 M1
Techniques
Visualization
32 Hierarchical Visualization L55 T1,R1 M1
Techniques
Visualizing Complex Data and
33 L56 T1,R3 M1
Relations
METHODSOFTEACHING:
M1 :Lecture Method M4 :Presentation/PPT M7: Assignment
M2: DemoMethod M5: Lab/Practical M8 :Industry Visit
M3: Guest Lecture M6: Tutorial M9 :ProjectBased
NOTE :
1.AnySubjectinaSemesterissupposetobecompletedin51to58 Periods.
2.EachPeriodisof50minutes.
3. Eachunitduration&completionshouldbementionedintheRemarkscolumn.
4. ListofSuggestedbookscanbemarkedwithCodeslikeT1,T2,R1,R2 etc.
7
(8) SUGGESTEDBOOKS
Textbooks
T1.Student'sHandbook forAssociateAnalytics— II, III.
T2.DataMining:ConceptsandTechniques,3rded.- JiaweiHanandKamberMorganKauffmann.Publishers.
REFERENCEBOOKS
R1.Introduction to Data Mining Tan, Steinbach and Kumar Addison Welsy
2006R2.DataMiningAnalysis and Concepts M. Zakiand W. Meira
R3.
MiningonMassiveDatasetsJureLeskovecStanfordUniv,AnandRajaramMilliwayLabs,JefferyDUllmanStanfor
d Univ.
(9) WEBSITES
https://www.coursera.org/specializations/jhu-data-science
https://onlinecourses.nptel.ac.in/noc24_mg113/preview
https://www.coursera.org/professional-certificates/google-data-analytics
https://www.udemy.com/topic/data-analysis/
https://www.geeksforgeeks.org/r-tutorial/?ref=dhm
https://www.javatpoint.com/
www.analyticsvidhya.com/
https://www-users.cs.umn.edu/~kumar001/dmbook/index.php
https://www.edx.org/course/mining-massive-datasets
(10) QUESTIONBANK
PREVIOUS QUESTION PAPERS
8
CMR ENGINEERING COLLEGE: : HYDERABAD
UGC AUTONOMOUS
III–B.TECH–I–Semester End Examinations (Regular) - December- 2022
DATA ANALYTICS USING R
(CSE)
[Time: 3 Hours] [Max. Marks: 70]
Note: This question paper contains two parts A and B.
Part A is compulsory which carries 20 marks. Answer all questions in Part A.
Part B consists of 5 Units. Answer any one full question from each unit. Each question
carries 10 marks.
1. a) How can the default path to package library be changed in R? [2M]
b) List out two IDEs for R. [2M]
c) What are mean and median with a neat example? [2M]
d) What are the advantages of using data visualization? [2M]
e) Give the general equation for computing linear regression? [2M]
f) What is the syntax of lm() function? [2M]
g) What is a three-way contingency table? [2M]
h) What are the major diagnostic functions of the ‘LogisticDx’ package? [2M]
i) Name the packages used to build decision trees in R? [2M]
j) List out the names of learning algorithms that create a decision tree. [2M]
PART-B (50 Marks)
2. a) Explain RSQLite package. [5M]
b) Explain the commands using R: summary (), str (), head (), tail (), view (), edit () [5M]
OR
3. Create a dataset, ‘Watch’ and store the information about watches of four different [10M]
companies. Explain all the steps of simple analytical data processing from input to
output on this dataset.
4. a) What are the data frames? Write its significance in R-Language? [5M]
b) Explain the graphical techniques used by Exploratory Data Analysis using R. [5M]
OR
5. What is bar chart? Discussthe various types of bar charts using R? [10M]
6. Compare and Contrast Multiple R-squared and Adjusted R-squared. [10M]
OR
7. What is model Fitting? Explain various models and their commands in R. [10M]
8. Create a table with a ‘pizza’ column that stores the information that is necessary to [10M]
implement multinomial logistics regression. After placing the information, implement
multinomial logistics regression on this table.
OR
9. a) Explain binary logistic regression with a single categorical variable. [5M]
b) Explain about likelihood function. [5M]
10. Create a dataset that contains the features of apples. Now find out the “entropy” and [10M]
“information gain” for this dataset. Also, find out the best feature of the apple dataset.
OR
11. Write and explain ID3 decision tree construction algorithm. [10M]
************
9
UNIT-1
Long Questions
1. Explainthe sources of primaryData.
2. ExplainDataArchitectureindetail.
3. Write aboutdatapreprocessingneeds.
4. Explainin detail forgeneratingprimarydata.
5. Explaindataarchitectureindetail.
6. Explainelementsof dataarchitecture.
7. ExplainSurveymethodsand experimentalmethod.
8. Explainthe sourcesofsecondarydata.
9. Explain GPS and signal data .
10.Whatisdataquality?Explain.
ShortQuestions
1. Whatis Data Management?
2. What isBigData?
3. ListoutEnterprise Requirements
4. Whatisworkplacesafety?
5. Whatdidyou understand aboutisBigdata Tools?
UNIT-2
Long Questions
1. ExplainHypothesistestingandDetermining
2. Explainthemultipleanalyticalmethodologies3.Explai
n supervised and unsupervised learning.
4.ListoutMachinelearningtasksandexplain.
6. Explain aboutsupportvectormachine.
7. Distinguishbetweenmachinelearninganddatamining.
10
8. ExplaintheKDD taskindetail
9. ExplainTrainmodelusingmachinelearningalgorithms&model.10
.WhatarestepsfollowedinML Algorithm?Explain.
UNIT-3
ShortQuestions
1. WhatisMachineLearningalgorithm
2. WhatisHypothesisTesting ?
3. Whatis Supervised learning ?
4. What is Unsupervisedlearning ?
5. WhatisReinforcementlearning?
Long Questions
1.What is Data presentation architecture (DPA)?
Explain.2.WhatisDatavisualization? Explain.
3.Explain Data Visualization in Tablue.
4.ExplaindifferentDataVisualizationTools
5. ExplainthefeaturesofTablue.
6. Howwillyouvisualizetheinformation usingTablue?.Explain
UNIT-4
ShortQuestions
1. WhatisDPA?
2. Whydowe useDataVisualization?
3. WhatistheroleofTableuinData Visualization?
4. WritedownstepsinvolvedinDataVisualizationin Tableu.
Long Questions
1.Explain the optim() function with syntax and an example.
2.Explain binary logistic regression with a covariate variable.
3.Which function implements the GLM model in R? Explain with an example and syntax.
11
4.Explain binary logistic regression with a single categorical variable.
5.Create a table with a ‘person’ column that stores the information like name, age, gender, annual
income and other. Implement the binary logistic regression with single categorical and three-way
contingency table after placing the required information on the table.
UNIT-5
ShortQuestions
1. What is a decision tree?
2. What is an undirected graph?
3.Which packages build decision trees in R?
4. What is the use of the rpart() function?
5. What do you mean by discrete value?
Long Questions
1. Write the names of two metrics that find the best attributes of a decision tree..
2. What is the best classifier of the ID3 algorithm? Explain.
3. Explain the prune() function with syntax and an example..
4. Create a dataset that contains discrete values. Generate the decision tree for it using the
ctree() function.
5. Explain the packages data. tree, entropy and information gain with examples.
(11) CASE STUDIES
1.Case Study: How Does a Bike-Share Navigate Speedy Success.
Introduction:
Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-
world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different
characters and team members. In order to answer the key business questions, you will follow the steps of
the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study
Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By
the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the
details of this case study anytime. Then, when you begin your job hunt, your case study will be a tangible
way to demonstrate your knowledge and skills to potential employers.
Scenario:
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in
Chicago. The director of marketing believes the company’s future success depends on maximizing the
12
number of annual memberships. Therefore, your team wants to understand how casual riders and annual
members use Cyclistic bikes differently. From these insights, your team will design a new marketing
strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your
recommendations, so they must be backed up with compelling data insights and professional
datavisualizations.
2.Case Study : Telecom Churn Case Study.
In the telecom industry, customers are able to choose from multiple service providers and actively switch
from one operator to another. In this highly competitive market, the telecommunications industry
experiences an average of 15-25% annual churn rate. Given the fact that it costs 5-10 times more to acquire
a new customer than to retain an existing one, customer retention has now become even more important
than customer acquisition to reduce customer churn, telecom companies need to predict which customers
are at high risk of churn.
Business Goal
In this project, you will analyse customer-level data of a leading telecom firm, build predictive models to
identify customers at high risk of churn and identify the main indicators of churn.
(12) ASSIGNMENTQUESTIONS
Assignment-IQuestions
1) Explainindetail aboutProtecthealth& safetyasyourwork.
2) Explainaboutwork placesafety.
3) WriteshortnoteonRandomizedBlockDesign.
4) Explainvarioussourcesof DatalikeSensorsandGPS.
5) Writeshortnoteon Accidents &Emergencies.
Assignment-II Questions
1.ExplainData presentation architecture in detail.
2.ExplainDatavisualizationusingTablue.
3.Explain Data Visualization using visualisation tools.
4.ExplaindifferentDataVisualizationTools.
5.ExplainthecharacteristicsofTablue.
INNOVATIVEASSIGNMENTQUESTIONS
1. AnalyzeanydatasetofyourinterestavailableonKaggle.
2. Build logistic regression model for prediction.
13
(13) TOPICSFORSTUDENT’SSEMINARS:
1. HypothesisTesting
2. DataVisualization
3. Prediction
4. Analysisofrealtimedataset
5. R forDataAnalysis
6. CaseStudyonData AnalyticsApplication
14-STEP/Coursematerialinsoftcopy
15-ExpertLectureswithtopics&Schedules(ifany)
A Topic on “Explorative Data Analytics”by Prof . Lakshmi, Professor, VIT BHOPAL
by 20 th of August 2024.
14