Tutorial For Course Work

Uploaded by

claudisroshan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views15 pages

Tutorial For Course Work

Uploaded by

claudisroshan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Task A explanation

Hai Huang
Task A

 Task A (25 marks): Hive Data Warehouse Design

Please design a data warehouse in Hive with your own data (a collection with at least 50 records in
at least 3 tables). Please implement 10 different queries on that data. Make sure the data and queries
show adequate variety and complexity. Please provide appropriate explanation/discussion and
adequate screenshots to prove your implementation of data, queries, and results of queries.
Task A
• You need to access Hadoop Virtual Machine
• Please follow the lab document of week 2 to access your own Hadoop Virtual
Machine
• If you work at home on your laptop, please access student virtual desktop first by URL:
https://rdweb.wvd.microsoft.com/arm/webclient/index.html , then you can access Hadoop
Virtual Machine in student virtual desktop.
Task A
• Step1: build CSV files for Hive tables (You need to build your own data)
• Please note that you can’t transfer a file from University desktops (or student virtual desktop) to your
Hadoop Virtual Machine.
• Please use copy/paste to copy data to a file in Hadoop VM (such as copying the content of a csv file on university desktop to a file on Hadoop VM )
• Or you can edit files directly on Hadoop VM

• Step2: start Hadoop (if it is not started yet)

• You can use command: start-all.sh
• It might be possible that Hadoop fails to work properly. In this case, you can restart Hadoop by two commands:
• stop-all.sh (to quit Hadoop service)
• start-all.sh (to start Hadoop service)

• Step3: upload your csv files on to Hadoop

• by using command: hdfs dfs -put . For example: hdfs dfs -put file123.csv (Be sure that the working directory
contains file123.csv)
Task A
• Step 4: Access Hive by input: hive

• Step5: Create Tables. For example:

create table testTable(c1 STRING, c2 STRING, c3 STRING, v4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS
TEXTFILE;

• Step6: Load the data into the tables you have built in Hive. For example:
LOAD DATA INPATH ‘file123.csv' OVERWRITE INTO TABLE testTable; (Be sure that file123.csv has been put on Hadoop at step 3)

• Step7: Design your queries and execute them on Hive

• In your report, you need to include screenshots of each query and their results in Hive.
• You also need to include the screenshots of the records in each table.
Task B explanation
Hai Huang
Task B

Please design a MapReduce algorithm (using Pseudo-codes or Java Codes) to the task assigned. The
algorithm is expected to be as efficient as possible.

• Task b.1: Output the number of papers by each author for each year.
• Task b.2: Output the average number of papers per conference for each year.
• Task b.3: Output the number of authors by each conference for each year.
• Task b.4: Output the average number of authors per paper for each year.
• Task b.5: Output the number of papers by each conference for each year.
Select your Task B
For example, a student ID is : 001374720
The last digit is “0”, the student need to select task B.1
Select your Task B

Please ignore any number after “-” in your student id.

For example, if your student id shows “011340894 – 1 ”,

please ignore 1. The digit 4 is the last digit and you need to
select task b.3.
Task B
• Please review the lecture slides of Lecture 6 (Advanced MapReduce
programming, 19 Feb)
• Pseudo codes are highly recommended
• Please check the lecture 6 slides about Pseudo code styles
• Design at least two classes: Mapper, Reducer
• Design Map function for Mapper; Reduce function for Reducer
• Clearly show the input & output key value pairs for Map function and Reduce
function
• To make your algorithms more efficient, you can consider design:
• Combiner
• In-Mapper combiner
Task B
• You should also explain how the input is mapped into (key, value) pairs by the
map stage, i.e., specify what is the key and what is the associated value in each
pair, and, how the key(s) and value(s) are computed.

• Then you should explain how the output (key, value) pairs of the map stage are
processed by the reduce stage to get the final answer(s).

• You need to discuss the efficiency of your algorithm (How does your design make
your algorithm efficient?).
Combiner
In-Mapper combiner
Task C explanation
Hai Huang
Task C.1
Task C.2
Task C.3

Cloud hosting strategy

Exam Question Paper - BDT - 35
No ratings yet
Exam Question Paper - BDT - 35
3 pages
Big Data Coursework Guidelines for COMP1702
No ratings yet
Big Data Coursework Guidelines for COMP1702
5 pages
Assignment - 1
No ratings yet
Assignment - 1
16 pages
Module - 4
No ratings yet
Module - 4
58 pages
Analyzing Unstructured Data in Hadoop
No ratings yet
Analyzing Unstructured Data in Hadoop
5 pages
Hadoop Analytics Course Plan
No ratings yet
Hadoop Analytics Course Plan
9 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Homework Labs WithProfessorNotes
33% (3)
Homework Labs WithProfessorNotes
129 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
14 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
94 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
CA01
No ratings yet
CA01
14 pages
BDA Exp (1 To 7)
No ratings yet
BDA Exp (1 To 7)
22 pages
Big Data
No ratings yet
Big Data
2 pages
Hadoop File Formats and Processing
No ratings yet
Hadoop File Formats and Processing
12 pages
CCS334 Set4
No ratings yet
CCS334 Set4
2 pages
CCS334 Set4
No ratings yet
CCS334 Set4
2 pages
Bda3 7
No ratings yet
Bda3 7
30 pages
Big Data Course Overview and Tools
No ratings yet
Big Data Course Overview and Tools
4 pages
BDA - II Sem - II Mid
100% (1)
BDA - II Sem - II Mid
4 pages
Project 3
No ratings yet
Project 3
5 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
Bda Record 18071a0597-1
No ratings yet
Bda Record 18071a0597-1
28 pages
Int 421
No ratings yet
Int 421
2 pages
Big Data Spark Cs606pc Syllabus
No ratings yet
Big Data Spark Cs606pc Syllabus
4 pages
Dsbdal Te It Manual
No ratings yet
Dsbdal Te It Manual
86 pages
BDA Journal
No ratings yet
BDA Journal
52 pages
Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
BDA Question Bank
No ratings yet
BDA Question Bank
5 pages
CSCI461 Assignment 2 Spring24
No ratings yet
CSCI461 Assignment 2 Spring24
3 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
Big Data & Hadoop Curriculum
0% (1)
Big Data & Hadoop Curriculum
13 pages
Bda Lab Record
No ratings yet
Bda Lab Record
32 pages
J2EE Lab Assignment: HDFS & HBase Tasks
No ratings yet
J2EE Lab Assignment: HDFS & HBase Tasks
60 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
3 pages
Bda QB
No ratings yet
Bda QB
8 pages
DSA Practical Index
No ratings yet
DSA Practical Index
3 pages
Bda Lab
No ratings yet
Bda Lab
36 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Hadoop's Role in Big Data Analytics
No ratings yet
Hadoop's Role in Big Data Analytics
3 pages
Big Data Lab Guide for CS Students
No ratings yet
Big Data Lab Guide for CS Students
53 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Bda QB3
No ratings yet
Bda QB3
22 pages
Final Exam Big Data - 11112
100% (1)
Final Exam Big Data - 11112
6 pages
IA Big Data Lab Works
No ratings yet
IA Big Data Lab Works
7 pages
Project 1
No ratings yet
Project 1
4 pages
Cloudera Testpassport CCD-470
No ratings yet
Cloudera Testpassport CCD-470
33 pages
Big Data with Spark Syllabus
No ratings yet
Big Data with Spark Syllabus
2 pages
Big Data and Hadoop Course Overview
No ratings yet
Big Data and Hadoop Course Overview
6 pages
Course Outline CSC 588 Data Warehousing and Data Mining1
No ratings yet
Course Outline CSC 588 Data Warehousing and Data Mining1
5 pages
Manual 5
No ratings yet
Manual 5
51 pages
CSET 371 Course File
No ratings yet
CSET 371 Course File
81 pages
DSE 3222 05 Mar 2025
No ratings yet
DSE 3222 05 Mar 2025
14 pages
BDA Practical
No ratings yet
BDA Practical
18 pages
CCS334 Set1
No ratings yet
CCS334 Set1
3 pages
D-NWR-DY-01 Dumps - Dell NetWorker Deploy Exam
No ratings yet
D-NWR-DY-01 Dumps - Dell NetWorker Deploy Exam
30 pages
RDBMS 12
No ratings yet
RDBMS 12
35 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
Migrate Your Data - Migration Cockpit
No ratings yet
Migrate Your Data - Migration Cockpit
90 pages
Postgres & Java for Developers
No ratings yet
Postgres & Java for Developers
55 pages
College ERP SRS
No ratings yet
College ERP SRS
3 pages
Uptycs Intro To Osquery - Course Slides
No ratings yet
Uptycs Intro To Osquery - Course Slides
87 pages
BDT Notes
No ratings yet
BDT Notes
40 pages
Overview of Information Retrieval in CS583
No ratings yet
Overview of Information Retrieval in CS583
33 pages
Essential MySQL Commands Guide
No ratings yet
Essential MySQL Commands Guide
3 pages
Real Estate Portal
No ratings yet
Real Estate Portal
57 pages
Operating Concern Generation in SAP CO-PA
No ratings yet
Operating Concern Generation in SAP CO-PA
85 pages
A Project Report TMS
No ratings yet
A Project Report TMS
38 pages
Pharas Ram Rai - Denodo Admin - Autodesk
No ratings yet
Pharas Ram Rai - Denodo Admin - Autodesk
2 pages
Understanding FOML in Text Communication
No ratings yet
Understanding FOML in Text Communication
35 pages
Assignment 3 Dang Vi Luan Nguyen Dinh Nhat Minh Vu Quoc Anh
No ratings yet
Assignment 3 Dang Vi Luan Nguyen Dinh Nhat Minh Vu Quoc Anh
13 pages
VirtualBox & Oracle Linux Setup Guide
No ratings yet
VirtualBox & Oracle Linux Setup Guide
87 pages
UNIT 6 - DATABASE JPR Notes
No ratings yet
UNIT 6 - DATABASE JPR Notes
23 pages
Keerti - Resume - PDF 4 1
No ratings yet
Keerti - Resume - PDF 4 1
3 pages
SQL Basics: Commands and Examples
No ratings yet
SQL Basics: Commands and Examples
11 pages
Aryan Resume 18thsept
No ratings yet
Aryan Resume 18thsept
1 page
SMP Gateway and AREVA MiCOM Relays
No ratings yet
SMP Gateway and AREVA MiCOM Relays
9 pages
Zabbix Configuration - Ubuntu
No ratings yet
Zabbix Configuration - Ubuntu
17 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
22 pages
(Ebook PDF) Shelly Cashman Series Microsoft Office 365 & Excel 2016: Intermediatepdf Download
100% (6)
(Ebook PDF) Shelly Cashman Series Microsoft Office 365 & Excel 2016: Intermediatepdf Download
47 pages
SQL Server Blocking and Deadlocks
No ratings yet
SQL Server Blocking and Deadlocks
7 pages
Data Redaction
No ratings yet
Data Redaction
51 pages
DWH MCQ
No ratings yet
DWH MCQ
34 pages
Dax Full
No ratings yet
Dax Full
674 pages
(Ebook) Emerging Technologies of Text Mining: Techniques and Applications by Hercules Antonio Do Prado, Edilson Ferneda (Editors) ISBN 9781599043739, 1599043734 Available All Format
No ratings yet
(Ebook) Emerging Technologies of Text Mining: Techniques and Applications by Hercules Antonio Do Prado, Edilson Ferneda (Editors) ISBN 9781599043739, 1599043734 Available All Format
138 pages

Tutorial For Course Work

Uploaded by

Tutorial For Course Work

Uploaded by

Task A explanation

 Task A (25 marks): Hive Data Warehouse Design

• Step2: start Hadoop (if it is not started yet)

• Step3: upload your csv files on to Hadoop

• Step5: Create Tables. For example:

• Step7: Design your queries and execute them on Hive

Please ignore any number after “-” in your student id.

For example, if your student id shows “011340894 – 1 ”,

Cloud hosting strategy

You might also like