0% found this document useful (0 votes)

23 views7 pages

Yunus BigData Module1 Assignment2

Uploaded by

sunvegan8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

Yunus BigData Module1 Assignment2

Uploaded by

sunvegan8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

ASSIGNMENT-02(BIG DATA)

Name: Shaik Mohammad Yunus

RegNo:12206551

Section: K22GT

Download files to local: wget https://github.com/logpai/loghub/raw/master/Hadoop/Hadoop_2k.log

wget https://raw.github.com/logpai/loghub/raw/master/Hadoop/Hadoop_2k.log_structured.csv

Creating raw_logs directory

hdfs dfs -mkdir -p /user/ubuntu/raw_logs/

Upload to HDFS: hdfs dfs -put Hadoop_2k.log /user/ubuntu/raw_logs/ hdfs dfs -put
Hadoop_2k.log_structured.csv /user/ubuntu/raw_logs/

Verify

hdfs dfs -ls /user/ubuntu/raw_logs/

DATA SETUP

hdfs dfs -mkdir -p /user/ubuntu/raw_logs

hdfs dfs -mkdir -p /user/ubuntu/processed_logs

hdfs dfs -mkdir -p /user/ubuntu/logs_archive

File Upload Verification

hdfs dfs -ls /user/ubuntu/raw_logs/

List Files with Details

hdfs dfs -ls -h /user/ubuntu/raw_logs/

(replication factor 1 of both files) (h- human readable)

SIZE: (375.9K of Hadoop_2k.log) (522.3k of Hadoop_2k.log_structrued.csv)

OPERATIONS:

File Copy:hdfs dfs -cp /user/ubuntu/raw_logs/Hadoop_2k.log /usFile Rename hdfs dfs -mv
/user/ubuntu/raw_logs/Hadoop_2k.log_structured.csv
/user/ubuntu/raw_logs/structured_hadoop_logs.csv
Move File

hdfs dfs -mv/user/ubuntu/raw_logs/structured_hadoop_logs.csv /user/ubuntu/processed_logs/

Delete a File

hdfs dfs -rm /user/ubuntu/raw_logs/Hadoop_2k.loger/ubuntu/processed_logs/

Preview File Content

hdfs dfs -head /user/ubuntu/processed_logs/Hadoop_2k.log | head -n 20 or hadoop fs -cat

/user/ubuntu/processed_logs/Hadoop_2k.log | head -20

View File Metadata hdfs dfs -stat %s,%r,%b

/user/ubuntu/processed_logs/structured_hadoop_logs.csv 0r hadoop fsck
/user/ubuntu/processed_logs/structured_hadoop_logs.csv -files -blocks - locations Find Number of
Lines in File hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | wc -l

Search for a String in Logs

hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep "ERROR"

Count String Occurrences hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep -o
"WARN" | wc -l

Set Replication Factor

hdfs dfs -setrep 2 /user/ubuntu/processed_logs/structured_hadoop_logs.csv

Verify Replication Factor

hdfs dfs -ls /user/ubuntu/processed_logs/structured_hadoop_logs.csv

Check File Blocks

hdfs fsck /user/ubuntu/processed_logs/Hadoop_2k.log -files -blocks

Directory Size hdfs dfs -du -h /user/ubuntu/processed_logs/ Disk Space Usage hdfs dfs -du -h
/user/ubuntu
Directory Size
hdfs dfs -du -h /user/ubuntu/processed_logs/ Disk Space Usage hdfs dfs -du -h /user/ubuntu

Clean Up Empty Directories

hadoop fs -ls -R /user/ubuntu | awk '$1 ~ /^d/ {print $8}' | while read dir; do if [ -z "$(hadoop fs -ls
$dir 2>/dev/null | tail -n +2)" ]; then hadoop fs -rm -r "$dir" echo "Deleted: $dir" fi done

Filter Large Files

hdfs dfs -ls -R /user/ubuntu | awk '$5 > 1048576 {print $NF}'

Log Filtering

hadoop fs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep "INFO" | hadoop fs -put -f -

/user/ubuntu/processed_logs/info_logs.txt

Error Logs Count hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep ERROR | wc -l

Generate Checksums

hadoop fs -checksum /user/ubuntu/processed_logs/structured_hadoop_logs.csv

Set Permissions

hdfs dfs -chmod 755 /user/ubuntu/processed_logs

Set ACLs

Check ACLs

hadoop fs -chmod 770 /user/ubuntu/raw_logs

Append to File

hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | head -50 | tee temp_50_lines.txt

Merge Logs hdfs dfs -ls /user/ubuntu/processed_logs/merged_logs.txt hdfs dfs -cat

/user/ubuntu/processed_logs/Hadoop_2k.log hdfs dfs -put
/user/ubuntu/processed_logs/structured_hadoop_logs.csv
/user/ubuntu/processed_logs/merged_logs.txt hdfs dfs -cat
/user/ubuntu/processed_logs/Hadoop_2k.log | head -10
Archive Old Logs

hdfs dfs -ls /user/ubuntu/raw_logs | awk -v date="$(date -d '7 days ago' '+%Y-%m-%d')" '$6 < date
{print $8}' | while read file; do hdfs dfs -mv "$file" /user/ubuntu/logs_archive/ done

Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Hadoop Setup & File Management Guide
No ratings yet
Hadoop Setup & File Management Guide
16 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
HDFS Command Practice Guide
No ratings yet
HDFS Command Practice Guide
10 pages
BDA Record
No ratings yet
BDA Record
34 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Essential Hadoop Command Guide
No ratings yet
Essential Hadoop Command Guide
5 pages
Ccs334-Bda Lab Manual
No ratings yet
Ccs334-Bda Lab Manual
48 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Ccs334-Bda Lab Manual
No ratings yet
Ccs334-Bda Lab Manual
50 pages
CCS334 BIg Data Final Front Sheet BATHIMA - Pagenumber
No ratings yet
CCS334 BIg Data Final Front Sheet BATHIMA - Pagenumber
47 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
Ccs 334 Bigdata Manual
No ratings yet
Ccs 334 Bigdata Manual
45 pages
HDFS
No ratings yet
HDFS
6 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Hadoop HDFS Command Cheatsheet
No ratings yet
Hadoop HDFS Command Cheatsheet
12 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
1 page
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Bigdata Manual Final
No ratings yet
Bigdata Manual Final
66 pages
BDM Hdfs
No ratings yet
BDM Hdfs
37 pages
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
CCS334 Bda Lab Manual
No ratings yet
CCS334 Bda Lab Manual
48 pages
Hadoop HDFS Setup and Commands Guide
No ratings yet
Hadoop HDFS Setup and Commands Guide
35 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
Bda Record
No ratings yet
Bda Record
46 pages
Lab2 BigData-HDFSp
No ratings yet
Lab2 BigData-HDFSp
4 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
Unit 4
No ratings yet
Unit 4
14 pages
HDFS fsck Command Overview
No ratings yet
HDFS fsck Command Overview
7 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Apache Hadoop
No ratings yet
Apache Hadoop
3 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
BDA LabManual
No ratings yet
BDA LabManual
32 pages
Big Data Printout
No ratings yet
Big Data Printout
46 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
81 pages
Unit 4 Unit 4 Bda
No ratings yet
Unit 4 Unit 4 Bda
16 pages
Hadoop
No ratings yet
Hadoop
5 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Hadoop Lab: Data Node Calculations
100% (2)
Hadoop Lab: Data Node Calculations
6 pages
Bda Unit-4 Notes
No ratings yet
Bda Unit-4 Notes
15 pages
Big Data Record
No ratings yet
Big Data Record
69 pages
Unit 4 Bda
No ratings yet
Unit 4 Bda
19 pages
Hadoop File Management Guide
No ratings yet
Hadoop File Management Guide
3 pages
Idea Submission Deck - Agentic AI Day
No ratings yet
Idea Submission Deck - Agentic AI Day
13 pages
Ashutosh Recommended Format
No ratings yet
Ashutosh Recommended Format
2 pages
Yunus BigData CA2
No ratings yet
Yunus BigData CA2
7 pages
Yunus DBMS Module1 Assignment
No ratings yet
Yunus DBMS Module1 Assignment
4 pages
1st Round Closing Score
No ratings yet
1st Round Closing Score
8 pages
Hospital Pharmacy Terms & Definitions
No ratings yet
Hospital Pharmacy Terms & Definitions
7 pages
Sistema Eléctrico c37
No ratings yet
Sistema Eléctrico c37
14 pages
TOBIAS 03 - Hoppe, Geoffrey - The Ascension Series
100% (1)
TOBIAS 03 - Hoppe, Geoffrey - The Ascension Series
140 pages
Journal Homepage: - : Introduction
No ratings yet
Journal Homepage: - : Introduction
11 pages
Live Break Bundle - Free Edition User Guide
100% (1)
Live Break Bundle - Free Edition User Guide
15 pages
External Reservoir For Seal Barrier: Safematic Safesiphon 10
No ratings yet
External Reservoir For Seal Barrier: Safematic Safesiphon 10
4 pages
AI Course Overview for CSE Students
No ratings yet
AI Course Overview for CSE Students
77 pages
Panasonic CSR and Green Strategy Analysis
No ratings yet
Panasonic CSR and Green Strategy Analysis
9 pages
Ringkasan Materi Optimasi Tugas Mata Kul
No ratings yet
Ringkasan Materi Optimasi Tugas Mata Kul
15 pages
Tenable OT Security-User Guide
No ratings yet
Tenable OT Security-User Guide
383 pages
Educators: Nurturing Student Well-being
No ratings yet
Educators: Nurturing Student Well-being
3 pages
VHDL Neural Networks for Test Generation
No ratings yet
VHDL Neural Networks for Test Generation
11 pages
Cardiovascular MCQs for Med Students
100% (2)
Cardiovascular MCQs for Med Students
9 pages
Management Concepts and Practices Exam Guide
No ratings yet
Management Concepts and Practices Exam Guide
4 pages
Danfoss Series 90 Pump and Motor Guide
100% (1)
Danfoss Series 90 Pump and Motor Guide
34 pages
Cape Comm Studies Review 2021
No ratings yet
Cape Comm Studies Review 2021
15 pages
Censoring & Truncation
No ratings yet
Censoring & Truncation
14 pages
Emas
50% (2)
Emas
46 pages
Biography of Alexander The Great
No ratings yet
Biography of Alexander The Great
17 pages
Year 8 Homework HT3
No ratings yet
Year 8 Homework HT3
2 pages
Characteristics of Indian Philosophy
No ratings yet
Characteristics of Indian Philosophy
13 pages
Effects of Sleep Deprivation On The Academic Performance of Senior High School Students
No ratings yet
Effects of Sleep Deprivation On The Academic Performance of Senior High School Students
61 pages
TESDA Circular No. 150-2020
No ratings yet
TESDA Circular No. 150-2020
42 pages
Practical Class 2 Text Artificial Intelligence
No ratings yet
Practical Class 2 Text Artificial Intelligence
3 pages
Frederick Schauer, The Proof - Zebras, Horses and The Nature of Inference
No ratings yet
Frederick Schauer, The Proof - Zebras, Horses and The Nature of Inference
31 pages
701P48938 FreeFlow Accxes V13.0 Drivers Install Guide
No ratings yet
701P48938 FreeFlow Accxes V13.0 Drivers Install Guide
42 pages
AutoCAD: Evolution for Professionals
No ratings yet
AutoCAD: Evolution for Professionals
2 pages
BSD Assignment 2-Solutions
No ratings yet
BSD Assignment 2-Solutions
2 pages
SAEJ743 V 001
No ratings yet
SAEJ743 V 001
19 pages

Yunus BigData Module1 Assignment2

Uploaded by

Yunus BigData Module1 Assignment2

Uploaded by

ASSIGNMENT-02(BIG DATA)

Name: Shaik Mohammad Yunus

Download files to local: wget https://github.com/logpai/loghub/raw/master/Hadoop/Hadoop_2k.log

Creating raw_logs directory

hdfs dfs -mkdir -p /user/ubuntu/raw_logs/

hdfs dfs -ls /user/ubuntu/raw_logs/

hdfs dfs -mkdir -p /user/ubuntu/raw_logs

hdfs dfs -mkdir -p /user/ubuntu/processed_logs

hdfs dfs -mkdir -p /user/ubuntu/logs_archive

File Upload Verification

hdfs dfs -ls /user/ubuntu/raw_logs/

List Files with Details

hdfs dfs -ls -h /user/ubuntu/raw_logs/

(replication factor 1 of both files) (h- human readable)

SIZE: (375.9K of Hadoop_2k.log) (522.3k of Hadoop_2k.log_structrued.csv)

hdfs dfs -mv/user/ubuntu/raw_logs/structured_hadoop_logs.csv /user/ubuntu/processed_logs/

hdfs dfs -rm /user/ubuntu/raw_logs/Hadoop_2k.loger/ubuntu/processed_logs/

Preview File Content

hdfs dfs -head /user/ubuntu/processed_logs/Hadoop_2k.log | head -n 20 or hadoop fs -cat

View File Metadata hdfs dfs -stat %s,%r,%b

Search for a String in Logs

hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep "ERROR"

Set Replication Factor

hdfs dfs -setrep 2 /user/ubuntu/processed_logs/structured_hadoop_logs.csv

Verify Replication Factor

hdfs dfs -ls /user/ubuntu/processed_logs/structured_hadoop_logs.csv

Check File Blocks

hdfs fsck /user/ubuntu/processed_logs/Hadoop_2k.log -files -blocks

Clean Up Empty Directories

Filter Large Files

hadoop fs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep "INFO" | hadoop fs -put -f -

Error Logs Count hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep ERROR | wc -l

hadoop fs -checksum /user/ubuntu/processed_logs/structured_hadoop_logs.csv

hdfs dfs -chmod 755 /user/ubuntu/processed_logs

hadoop fs -chmod 770 /user/ubuntu/raw_logs

hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | head -50 | tee temp_50_lines.txt

Merge Logs hdfs dfs -ls /user/ubuntu/processed_logs/merged_logs.txt hdfs dfs -cat

You might also like