Certificate
THIS IS CERTIFY TO THE WORK EMBODIES IN THE “DATA
SCIENCE & BIG DATA ANALYTICS” PRACTICAL.
THIS ARE BONAFIDE STUDENTS OF THIS INSTITUTE AND THE WORK
HAS BEEN CARRIED OUT BY THEM UNDER THE GUIDANCE OF
“Prof Gage P.K.” AND IT IS APPROVED FOR THE PARTIAL FULL-
FILLMENT OF THE REQUIREMENT OF SAVITRIBAI PHULE PUNE
UNIVERSITY FOR THE DEGREE OF BACHELOR OF ENGINEERING
THIRD YEAR OF COMPUTER ENGINEERING.
DATE:- PLACE:-Belhe
NAME: NAVALE RUPESH PANDHARINATH [Link] : 36
Batch :B
Prof. Gage P.K. Prof. Shegar S.R
(Dept. of Computer Engineering) (Head of Dept. Of computer engineering)
Dr. Narawade N.S.
(Principle of SGOI COE)
EXPERIMENT NO:-01
Data wranging ,I perform the following operations using python on any
open source dataset([Link]).
EXPERIMENT NO:02
Data wranging-2 Create an “Academic Performance” datasets of students
and perform the following operation using python
EXPERIMENT NO:03
Provide Summary Statistics for a datasets with numeric variable
grouped by one of the qualitative variable
EXPERIMENT NO:04
Create a linear regression model using python to predict home
prise using boston housing dataset.
EXPERIMENT NO:05
Data Analytics-2
EXPERIMENT NO:06
Data Analytics-3
EXPERIMENT NO:07
Text Analytics
EXPERIMENT NO:08
Data Visualization-1
EXPERIMENT NO:09
Data Visualization-2
EXPERIMENT NO:10
Download the iris flower datasets or any other datasets into a
Dataframe.
EXPERIMENT NO:11
Write a code in JAVA for a simple WordCount application that
counts the number of occurrences of each word in a given input set
using the Hadoop MapReduce framework on local-standalone set-
up.
Program:-
public class wordcount{
public static void main(String[] args) {
// Sample input string
String text = "this is a wordcount example.";
// Count words in the input string
int wordCount = countWords(text);
// Output the result
[Link]("Word count: " + wordCount);
}
public static int countWords(String text) {
// Trim any leading or trailing spaces
text = [Link]();
// If the string is empty, return 0
if ([Link]()) {
return 0;
}
// Split the text by one or more spaces
String[] words = [Link]("\\s+");
// Return the number of words in the array
return [Link];
}
}
Output:-
EXPERIMENT NO:12
Design a distributed application using MapReduce which
processes a log file of a system.
Program:-
import [Link].*;
import [Link].*;
import [Link].*;
import [Link].*;
public class kalyani {
// Mapper: Splits lines into words and creates (word, 1) pairs
public static class Mapper {
public List<Map<String, Integer>> map(String input) {
List<Map<String, Integer>> wordCountList = new ArrayList<>();
Map<String, Integer> wordCountMap = new HashMap<>();
// Split the input into words (based on non-word characters)
String[] words = [Link]("\\W+")
for (String word : words) {
if (![Link]()) {
word = [Link]();
[Link](word, [Link](word, 0) +
1);
}
}
// Add the word counts from this line to the result
[Link](wordCountMap);
return wordCountList;
}
}
// Reducer: Aggregates word counts from all mappers
public static class Reducer {
public Map<String, Integer> reduce(List<Map<String, Integer>>
mappedResults) {
Map<String, Integer> finalCountMap = new HashMap<>();
// For each map in the list of results
for (Map<String, Integer> map : mappedResults) {
// For each word and its count in the map
for ([Link]<String, Integer> entry : [Link]()) {
[Link]([Link](),
[Link]([Link](), 0) +
[Link]());
}
}
return finalCountMap;
}
}
// Main Method
public static void main(String[] args) throws InterruptedException,
ExecutionException, IOException {
String inputText = "Hello world hello mapreduce hello Java world";
// 1. Step 1: Map phase (split input into words and create word count
pairs)
Mapper mapper = new Mapper();
List<Map<String, Integer>> mappedResults = [Link](inputText);
// 2. Step 2: Reduce phase (aggregate word counts)
Reducer reducer = new Reducer();
Map<String, Integer> finalWordCount = [Link](mappedResults);
// 3. Output the result
[Link]("Word Count Results:");
for ([Link]<String, Integer> entry : [Link]()) {
[Link]([Link]() + ": " + [Link]());
}
}
}
Output:-
EXPERIMENT NO:13
Locatate databases(eg.Sample_weather.txt) for Working on
Weather data which read the next input file & finds Average
for temperature ,dew point ,and wind speed using java.
Program:-
# This code does nothing, as the provided input is just log messages.
# To process these logs, you would need to parse them into a structured format.
# Here is an example of how you might do it using Python's regular expressions:
import re
logs = """
2025-04-01 [Link] INFO Starting system...
2025-04-01 [Link] ERROR Disk space low!
2025-04-01 [Link] INFO System running normally
2025-04-01 [Link] ERROR Disk space low!
2025-04-01 [Link] WARNING CPU usage high
"""
pattern = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)"
for line in [Link]():
if line:
match = [Link](pattern, line) # Indented this line to be inside the 'if' block
if match:
timestamp, level, message = [Link]()
print(f"Timestamp: {timestamp}, Level: {level}, Message: {message}")
#The following lines were not part of the code, but rather log entries.
#They have been commented out as they would cause a syntax error.
#2025-04-01 [Link] ERROR Disk space low!
#2025-04-01 [Link] WARNING CPU usage high
Output:-