0% found this document useful (0 votes)
23 views7 pages

Yunus BigData Module1 Assignment2

Uploaded by

sunvegan8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

Yunus BigData Module1 Assignment2

Uploaded by

sunvegan8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ASSIGNMENT-02(BIG DATA)

Name: Shaik Mohammad Yunus

RegNo:12206551

Section: K22GT

Download files to local: wget https://github.com/logpai/loghub/raw/master/Hadoop/Hadoop_2k.log


wget https://raw.github.com/logpai/loghub/raw/master/Hadoop/Hadoop_2k.log_structured.csv

Creating raw_logs directory

hdfs dfs -mkdir -p /user/ubuntu/raw_logs/

Upload to HDFS: hdfs dfs -put Hadoop_2k.log /user/ubuntu/raw_logs/ hdfs dfs -put
Hadoop_2k.log_structured.csv /user/ubuntu/raw_logs/

Verify

hdfs dfs -ls /user/ubuntu/raw_logs/


DATA SETUP

hdfs dfs -mkdir -p /user/ubuntu/raw_logs

hdfs dfs -mkdir -p /user/ubuntu/processed_logs

hdfs dfs -mkdir -p /user/ubuntu/logs_archive

File Upload Verification

hdfs dfs -ls /user/ubuntu/raw_logs/

List Files with Details

hdfs dfs -ls -h /user/ubuntu/raw_logs/

(replication factor 1 of both files) (h- human readable)

SIZE: (375.9K of Hadoop_2k.log) (522.3k of Hadoop_2k.log_structrued.csv)

OPERATIONS:

File Copy:hdfs dfs -cp /user/ubuntu/raw_logs/Hadoop_2k.log /usFile Rename hdfs dfs -mv
/user/ubuntu/raw_logs/Hadoop_2k.log_structured.csv
/user/ubuntu/raw_logs/structured_hadoop_logs.csv
Move File

hdfs dfs -mv/user/ubuntu/raw_logs/structured_hadoop_logs.csv /user/ubuntu/processed_logs/


Delete a File

hdfs dfs -rm /user/ubuntu/raw_logs/Hadoop_2k.loger/ubuntu/processed_logs/

Preview File Content

hdfs dfs -head /user/ubuntu/processed_logs/Hadoop_2k.log | head -n 20 or hadoop fs -cat


/user/ubuntu/processed_logs/Hadoop_2k.log | head -20

View File Metadata hdfs dfs -stat %s,%r,%b


/user/ubuntu/processed_logs/structured_hadoop_logs.csv 0r hadoop fsck
/user/ubuntu/processed_logs/structured_hadoop_logs.csv -files -blocks - locations Find Number of
Lines in File hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | wc -l

Search for a String in Logs

hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep "ERROR"


Count String Occurrences hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep -o
"WARN" | wc -l

Set Replication Factor

hdfs dfs -setrep 2 /user/ubuntu/processed_logs/structured_hadoop_logs.csv

Verify Replication Factor

hdfs dfs -ls /user/ubuntu/processed_logs/structured_hadoop_logs.csv

Check File Blocks

hdfs fsck /user/ubuntu/processed_logs/Hadoop_2k.log -files -blocks

Directory Size hdfs dfs -du -h /user/ubuntu/processed_logs/ Disk Space Usage hdfs dfs -du -h
/user/ubuntu
Directory Size
hdfs dfs -du -h /user/ubuntu/processed_logs/ Disk Space Usage hdfs dfs -du -h /user/ubuntu

Clean Up Empty Directories

hadoop fs -ls -R /user/ubuntu | awk '$1 ~ /^d/ {print $8}' | while read dir; do if [ -z "$(hadoop fs -ls
$dir 2>/dev/null | tail -n +2)" ]; then hadoop fs -rm -r "$dir" echo "Deleted: $dir" fi done

Filter Large Files

hdfs dfs -ls -R /user/ubuntu | awk '$5 > 1048576 {print $NF}'

Log Filtering

hadoop fs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep "INFO" | hadoop fs -put -f -


/user/ubuntu/processed_logs/info_logs.txt

Error Logs Count hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | grep ERROR | wc -l

Generate Checksums

hadoop fs -checksum /user/ubuntu/processed_logs/structured_hadoop_logs.csv


Set Permissions

hdfs dfs -chmod 755 /user/ubuntu/processed_logs

Set ACLs

Check ACLs

hadoop fs -chmod 770 /user/ubuntu/raw_logs

Append to File

hdfs dfs -cat /user/ubuntu/processed_logs/Hadoop_2k.log | head -50 | tee temp_50_lines.txt

Merge Logs hdfs dfs -ls /user/ubuntu/processed_logs/merged_logs.txt hdfs dfs -cat


/user/ubuntu/processed_logs/Hadoop_2k.log hdfs dfs -put
/user/ubuntu/processed_logs/structured_hadoop_logs.csv
/user/ubuntu/processed_logs/merged_logs.txt hdfs dfs -cat
/user/ubuntu/processed_logs/Hadoop_2k.log | head -10
Archive Old Logs

hdfs dfs -ls /user/ubuntu/raw_logs | awk -v date="$(date -d '7 days ago' '+%Y-%m-%d')" '$6 < date
{print $8}' | while read file; do hdfs dfs -mv "$file" /user/ubuntu/logs_archive/ done

You might also like