0% found this document useful (0 votes)
21 views39 pages

Data Science Lab Manual (EDA)

The document outlines the installation and setup of data analysis and visualization tools including Python, R, Tableau Public, and Power BI. It details procedures for installing these tools, performing exploratory data analysis (EDA) on an email dataset, and using libraries like NumPy, Pandas, and Matplotlib for data manipulation and visualization. Additionally, it covers data cleaning and visualization techniques in R, as well as time series analysis methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views39 pages

Data Science Lab Manual (EDA)

The document outlines the installation and setup of data analysis and visualization tools including Python, R, Tableau Public, and Power BI. It details procedures for installing these tools, performing exploratory data analysis (EDA) on an email dataset, and using libraries like NumPy, Pandas, and Matplotlib for data manipulation and visualization. Additionally, it covers data cleaning and visualization techniques in R, as well as time series analysis methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

EX.

NO:1 Install the Data Analysis and Visualization Tool: R /


DATE: Python / Tableau Public / Power BI

AIM:
To install and set up a data analysis and visualization environment using
Python (with Jupyter Notebook), or other tools like R, Tableau Public, or Power
BI, enabling the user to perform data analysis, visualization, and basic data
science operations.

ALGORITHM / PROCEDURE:
A. Installation of Python (Anaconda Distribution)
1. Download Anaconda:

Visit https://www.anaconda.com/products/distribution

Choose the installer for your operating system (Windows / macOS /


Linux).

2. Install Anaconda:

Run the installer and follow on-screen instructions.

Select the option to add Anaconda to the system PATH.

3. Launch Jupyter Notebook:

Open Anaconda Navigator → Click on Jupyter Notebook.


Create a new Python notebook (.ipynb).

4. Verify Installation:

Import key data science libraries like NumPy, Pandas, and Matplotlib.

B. Installation of R (optional alternative)


1. Download and install R from https://cran.r-project.org/.

2. Download and install RStudio (IDE) from


https://posit.co/download/rstudio-desktop/.

3. Launch RStudio and test with a simple script.

C. Installation of Tableau Public / Power BI


Tableau Public: Download from https://public.tableau.com/en-us/s/ and
install.

Power BI: Download from Microsoft Store or


https://powerbi.microsoft.com/.

Load a sample dataset and verify visualization functionality.

RESULT:
The data analysis and visualization environment was successfully
installed and configured using Python (Anaconda Distribution).
Basic data science libraries such as NumPy, Pandas, Matplotlib, and
Seaborn were verified and tested with a sample visualization.
EX.NO:2
Perform Exploratory Data Analysis (EDA) with an
DATE: Email Dataset — Import, Visualize, and Derive
Insights

AIM:
To perform Exploratory Data Analysis (EDA) on an email dataset by
importing exported email data into a Pandas DataFrame, cleaning and
exploring it, visualizing patterns, and deriving meaningful insights about the
data.

ALGORITHM / PROCEDURE:

Data Collection:
Export email data (e.g., from Gmail using Google Takeout or from
Outlook) in a .csv or .xlsx format.

The dataset may include columns like Sender, Receiver, Subject, Date,
Time, Message Length, Folder (Inbox/Sent/Spam), etc.

Import the Dataset:


Load the dataset into a Pandas DataFrame using
pd.read_csv() or pd.read_excel().

Data Cleaning:
Check for missing values, duplicates, and invalid entries.
Convert date/time columns to datetime format.

Exploratory Data Analysis (EDA):


Display dataset information (.info(), .describe()).

Check distributions and relationships between variables.

Analyze patterns such as:

Number of emails sent/received per day.

Most frequent senders/receivers.

Common keywords in subjects.

Visualization:

Use Matplotlib and Seaborn for data visualization.

Create plots such as:

Bar charts for most frequent senders.

Line charts for emails per day.

Pie chart for email categories.

Word cloud for subject keywords (optional).

Derive Insights:
Summarize findings from EDA and visualizations
PROGRAM :
# Importing required libraries
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from wordcloud import WordCloud

# Step 1: Load the dataset

data = pd.read_csv('emails.csv')

# Step 2: Display basic info

print("Dataset Info:")

print(data.info())

print("\nFirst 5 Records:")

print(data.head())

# Step 3: Data Cleaning

data.drop_duplicates(inplace=True)

data['Date'] = pd.to_datetime(data['Date'], errors='coerce')

# Step 4: Basic EDA

print("\nSummary Statistics:")

print(data.describe())
# Number of emails per day

emails_per_day = data['Date'].dt.date.value_counts().sort_index()

# Step 5: Visualization

# 1. Emails per day

plt.figure(figsize=(10,5))

emails_per_day.plot(kind='line', color='blue')

plt.title('Number of Emails per Day')

plt.xlabel('Date')

plt.ylabel('Count')

plt.grid(True)

plt.show()

# 2. Top 10 Senders

plt.figure(figsize=(10,5))

top_senders = data['From'].value_counts().head(10)

sns.barplot(x=top_senders.index, y=top_senders.values, palette='viridis')

plt.title('Top 10 Senders')

plt.xlabel('Sender')

plt.ylabel('Email Count')

plt.xticks(rotation=45)

plt.show()
# 3. Word Cloud for Subject Keywords

text = " ".join(str(subj) for subj in data['Subject'].dropna())

wordcloud = WordCloud(width=800, height=400,


background_color='white').generate(text)

plt.figure(figsize=(10,5))

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis('off')

plt.title('Most Common Words in Email Subjects')

plt.show()
OUTPUT :

1. Dataset Info
Dataset Info:

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 1000 entries, 0 to 999

Data columns (total 5 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Date 1000 non-null datetime64[ns]

1 From 1000 non-null object

2 To 1000 non-null object

3 Subject 980 non-null object

4 Folder 1000 non-null object

dtypes: datetime64 , object(4)

2. First 5 Record
Date From To Subject Folder

0 2023-05-02 [email protected] [email protected] Project Update - Phase 2 Inbox

1 2023-05-03 [email protected] [email protected] Interview Schedule Inbox

2 2023-05-04 [email protected] [email protected] Meeting Notes - Review Sent

3 2023-05-05 [email protected] [email protected] Order Confirmation #12345 Inbox

4 2023-05-06 [email protected] [email protected] Win a Free iPhone! Spam


3. Summary Statistics

Summary Statistics:

Date

count 1000

unique 300

top 2023-07-15

freq 15

Name: Date, dtype: object

RESULT:

Exploratory Data Analysis (EDA) was successfully performed on the email


dataset.
Different visualizations helped identify:

 The most frequent senders and active dates.


 Distribution of emails over time.
 Common keywords in email subjects.
This experiment demonstrates how EDA techniques reveal trends and
patterns in textual and time-based data.
EX.NO:3
Working with NumPy Arrays, Pandas
DATE: DataFrames, and Basic Plots using Matplotlib

AIM:
To understand and demonstrate the creation, manipulation, and analysis
of NumPy arrays and Pandas DataFrames, and to visualize data using basic
Matplotlib plots such as line plots, bar charts, histograms, and scatter plots.

ALGORITHM / PROCEDURE:

Import Required Libraries

Import the numpy, pandas, and matplotlib.pyplot libraries.

Create and Manipulate NumPy Arrays

Create 1D and 2D arrays using np.array().

Perform basic operations: addition, multiplication, slicing, and reshaping.

Create and Explore Pandas DataFrames

Create a DataFrame using a dictionary or CSV file.

Display rows and columns using .head(), .tail(), .info(),


.describe().

Perform column operations and basic statistical analysis .


Visualize Data Using Matplotlib

Create different types of plots:

 Line plot
 Bar chart
 Histogram
 Scatter plot

Add titles, labels, legends, and grid for better readability.

Interpret the Output

Analyze the visualizations and draw conclusions.


PROGRAM :

# Import required libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

Part 1: Working with NumPy

# Creating arrays

arr1 = np.array([10, 20, 30, 40, 50])

arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print("1D Array:", arr1)

print("2D Array:\n", arr2)

# Array operations

print("Array Sum:", arr1.sum())

print("Array Mean:", arr1.mean())

print("Array Slicing:", arr1[1:4])

print("Reshaped 2D Array:\n", arr2.reshape(3, 2))


# Part 2: Working with Pandas

# Creating a DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],

'Age': [24, 27, 22, 32, 29],

'Marks': [88, 92, 95, 70, 85]

df = pd.DataFrame(data)

print("\nDataFrame:\n", df)

# Display basic info

print("\nDataFrame Info:")

print(df.info())

print("\nStatistical Summary:\n", df.describe())

# Part 3: Data Visualization using Matplotlib

# Line Plot

plt.figure(figsize=(6,4))

plt.plot(df['Name'], df['Marks'], marker='o', color='green')


plt.title('Line Plot - Student Marks')

plt.xlabel('Name')

plt.ylabel('Marks')

plt.grid(True)

plt.show()

# Bar Chart

plt.bar(df['Name'], df['Age'], color='orange')

plt.title('Bar Chart - Student Ages')

plt.xlabel('Name')

plt.ylabel('Age')

plt.show()

# Histogram

plt.hist(df['Marks'], bins=5, color='skyblue', edgecolor='black')

plt.title('Histogram - Marks Distribution')

plt.xlabel('Marks')

plt.ylabel('Frequency')

plt.show()

# Scatter Plot

plt.scatter(df['Age'], df['Marks'], color='red')

plt.title('Scatter Plot - Age vs Marks')

plt.xlabel('Age')
plt.ylabel('Marks')

plt.grid(True)

plt.show()
OUTPUT:
1. NumPy Array Output

1D Array: [10 20 30 40 50]

2D Array:

[[1 2 3]

[4 5 6]]

Array Sum: 150

Array Mean: 30.0

Array Slicing: [20 30 40]

Reshaped 2D Array:

[[1 2]

[3 4]

[5 6]]

2. Pandas DataFrame Output

Name Age Marks

0 Alice 24 88

1 Bob 27 92

2 Charlie 22 95

3 David 32 70

4 Eva 29 85
RESULT:
Successfully demonstrated the creation and manipulation of NumPy arrays and
Pandas DataFrames, along with visualization of data using Matplotlib.
The experiment highlights how Python’s core data science libraries enable
numerical computation, tabular data handling, and effective graphical
representation of information.
EX.NO:4
Data Cleaning and Visualization, Exploring
DATE: Filters and Plot Features in R

AIM:
To understand and demonstrate how to filter variables (columns) and rows in R
for data cleaning purposes, and to apply various plotting features on sample
datasets for visual analysis.

ALGORITHM / PROCEDURE:

Load Required Libraries


Use the tidyverse package for data manipulation (dplyr) and visualization
(ggplot2).

Load Sample Dataset


Use built-in datasets such as mtcars, iris, or diamonds.

Explore and Inspect Data


View the structure, summary, and first few rows of the dataset using functions like
head(), str(), summary().

Apply Variable (Column) Filters


Select specific columns using select() function from dplyr.
Remove unnecessary columns for cleaning.

Apply Row Filters


Use filter() function to include or exclude rows based on specific conditions.

Example: select only cars with mpg > 20 from mtcars dataset

Handle Missing or Invalid Data (if any)

Use na.omit() or is.na() to remove or identify missing values.

Visualization using ggplot2


Create various plots using ggplot():

o Bar Plot
o Histogram
o Scatter Plot
o Box Plot

Add titles, axis labels, and color themes for clarity.

Analyze and Interpret the Output


PROGRAM (R):
# Load required libraries

library(dplyr)

library(ggplot2)

# Step 1: Load sample dataset

data("mtcars")

# Step 2: Explore dataset

head(mtcars)

str(mtcars)

summary(mtcars)

# Step 3: Variable (Column) Filtering

selected_data <- select(mtcars, mpg, cyl, hp, wt)

print("Selected Columns:")

print(head(selected_data))

# Step 4: Row Filtering

filtered_data <- filter(selected_data, mpg > 20)

print("Filtered Rows where mpg > 20:")

print(filtered_data)

# Step 5: Handling Missing Values (if any)

cleaned_data <- na.omit(filtered_data)


# Step 6: Visualization

# (a) Scatter Plot - Horsepower vs. MPG

ggplot(cleaned_data, aes(x = hp, y = mpg)) +

geom_point(color = "blue", size = 3) +

ggtitle("Scatter Plot: Horsepower vs. Mileage (mpg)") +

xlab("Horsepower (hp)") +

ylab("Miles per Gallon (mpg)") +

theme_minimal()

# (b) Bar Plot - Count of Cylinders

ggplot(cleaned_data, aes(x = factor(cyl))) +

geom_bar(fill = "orange") +

ggtitle("Bar Plot: Number of Cars by Cylinders") +

xlab("Cylinders") +

ylab("Count") +

theme_bw()

# (c) Box Plot - Weight vs. MPG

ggplot(cleaned_data, aes(x = factor(cyl), y = wt, fill = factor(cyl))) +

geom_boxplot() +

ggtitle("Box Plot: Weight Distribution by Cylinders") +

xlab("Cylinders") +

ylab("Weight (wt)") +

theme_classic()
OUTPUT:
1. Data Inspection

'data.frame': 32 obs. of 11 variables:

$ mpg: Miles per gallon (numeric)

$ cyl: Number of cylinders (numeric)

$ disp: Displacement (numeric)

$ hp : Horsepower (numeric)

$ drat: Rear axle ratio (numeric)

$ wt : Weight (numeric)

2. Filtered Data Example

Selected Columns:

mpg cyl hp wt

1 21.0 6 110 2.620

2 21.0 6 110 2.875

3 22.8 4 93 2.320

4 24.4 4 62 2.200

5 22.8 4 95 2.320

Filtered Rows (mpg > 20):

mpg cyl hp wt

1 21.0 6 110 2.620

2 21.0 6 110 2.875

3 22.8 4 93 2.320

4 24.4 4 62 2.200
RESULT:
Successfully explored row and variable filtering techniques in R for data
cleaning using dplyr functions such as select() and filter().
Additionally, different visualization techniques were applied using ggplot2,
providing insights into relationships among key variables.
The experiment demonstrates the use of R as a powerful tool for data wrangling
and visualization.
EX.NO:5
Perform Time Series Analysis and Apply
DATE: Various Visualization Techniques

AIM:
To perform time series analysis on a dataset and apply various visualization
techniques to understand trends, seasonality, and patterns using R.

ALGORITHM :

Start the R environment (RStudio or R GUI).

Load the necessary libraries for time series analysis and visualization:

 ggplot2 for plotting.


 forecast for time series analysis and forecasting.
 tseries for additional time series functions.

Import or load a time series dataset:

 Use a built-in dataset like AirPassengers, or import your own dataset


using read.csv().

Convert the dataset into a time series object using the ts() function if it is
not already in time series format.
Display the dataset information:

 Use functions such as head(), summary(), and str() to understand


the structure and statistics of the data.

Plot the original time series using the plot() function to visualize the overall
trend and fluctuations over time.

Decompose the time series into its components:

 Use the decompose() or stl() functions to separate the data into


trend, seasonal, and random (residual) components.
 Plot the decomposed components to study their behavior.

Apply visualization techniques using ggplot2 and forecast package


functions:

 autoplot() for enhanced plots.


 ggseasonplot() to visualize seasonal patterns across years.
 ggsubseriesplot() to identify monthly or periodic variations.

(Optional) Perform forecasting:

 Use auto.arima() or ets() to fit forecasting models.


 Predict future values using the forecast() function.
 Visualize the forecast with autoplot().

Interpret the visualizations to identify:

 Trends (increasing or decreasing behavior over time).


 Seasonality (repeating patterns at regular intervals).
 Residuals (random fluctuations).

Stop the program and record the observations and results.


PROGRAM (in R):
# Load required libraries

library(ggplot2)

library(forecast)

library(tseries)

# Load a sample time series dataset

data("AirPassengers")

# Display dataset information

print(head(AirPassengers))

summary(AirPassengers)

# Basic time series plot

plot(AirPassengers,

main = "Monthly Air Passengers Data",

ylab = "Number of Passengers",

xlab = "Year",

col = "blue")

# Decompose the time series

decomposed <- decompose(AirPassengers)

plot(decomposed)
# Seasonal plot using ggplot2 (forecast package)

autoplot(AirPassengers) +

ggtitle("Time Series Plot of Air Passengers") +

xlab("Year") + ylab("Passengers")

# Seasonal pattern visualization

ggseasonplot(AirPassengers, year.labels = TRUE, year.labels.left = TRUE) +

ggtitle("Seasonal Plot: Air Passengers")

# Subseries plot

ggsubseriesplot(AirPassengers) +

ggtitle("Subseries Plot: Air Passengers")

# Optional: Forecasting

fit <- auto.arima(AirPassengers)

forecast_values <- forecast(fit, h = 12)

autoplot(forecast_values)
OUTPUT:
1. Display of Dataset (First Few Records):

> head(AirPassengers)

[1] 112 118 132 129 121 135

2. Summary of the Dataset:

> summary(AirPassengers)
Min. 1st Qu. Median Mean 3rd Qu. Max.
104.0 180.0 265.5 280.3 360.5 622.0

RESULT:
Time series analysis was successfully performed on the AirPassengers dataset.
The analysis revealed a strong upward trend and clear seasonal patterns in the
data.
Various visualization techniques such as decomposition, seasonal, and
forecasting plots effectively demonstrated these time-based behaviors.
EX.NO:6
Perform Data Analysis and Representation on
DATE: a Map Using Various Map Datasets with
Mouse Rollover Effect and User Interaction

AIM:
To analyze spatial data and represent it visually on an interactive map using
R.
The experiment demonstrates mouse rollover effects, user interactions (zoom,
pan, popups), and data visualization using different map datasets.

ALGORITHM:
Start the R environment (RStudio or R GUI).

Install and load the required libraries:

 leaflet – for interactive map visualization.


 dplyr – for data manipulation.
 sf – for handling spatial data.
 maps or rnaturalearth – for world or country-level geographic data.

Load a map dataset:

 Use built-in world/country shapefiles or import custom data using


st_read().
 Example: Use rnaturalearth to get a world map or maps for U.S.
states.

Perform data analysis:


 Analyze attributes such as population, density, or GDP associated with
geographic regions.
 Use dplyr to summarize or group the data as needed.

Merge the geographic data with analytical results to create a data frame
containing spatial and statistical data.

Create an interactive map using leaflet:

 Initialize the map with leaflet() and addTiles().


 Add polygons or markers representing each region.
 Use color palettes (colorNumeric or colorBin) to visualize data
intensity.

Add interactivity:

 Use addPopups() or addLabelOnlyMarkers() for tooltips and


rollovers.
 Add legends, layers, and zoom controls for better user experience.

Display the map:

 Render the interactive map in the RStudio Viewer or web brows


PROGRAM (in R):

# Load necessary libraries

library(leaflet)

library(dplyr)

library(rnaturalearth)

library(rnaturalearthdata)

library(sf)

# Load world map data

world <- ne_countries(scale = "medium", returnclass = "sf")

# Create a sample dataset (Population data)

data <- world %>%

mutate(pop_density = pop_est / area_km2)

# Define color palette based on population density

pal <- colorNumeric(palette = "YlOrRd", domain = data$pop_density)

# Create interactive map

leaflet(data = data) %>%

addTiles() %>%

addPolygons(

fillColor = ~pal(pop_density),

weight = 1,
opacity = 1,

color = "white",

dashArray = "3",

fillOpacity = 0.7,

highlight = highlightOptions(

weight = 3,

color = "#666",

dashArray = "",

fillOpacity = 0.7,

bringToFront = TRUE

),

label = ~paste0(name, ": ", round(pop_density, 2), " people/km²"),

labelOptions = labelOptions(

style = list("font-weight" = "normal", padding = "3px 8px"),

textsize = "13px",

direction = "auto"

) %>%

addLegend(

pal = pal,

values = ~pop_density,

opacity = 0.7,

title = "Population Density",

position = "bottomright"

)
OUTPUT:

1. Map Visualization Output:


o A world map appears in the RStudio Viewer or default web browser.
o Each country is filled with a color gradient representing population
density (calculated as population per square kilometer).
2. : After executing the R program, the following outputs are observed
o Countries with higher population density appear in dark red, while
those with lower density appear in light yellow.
3. Mouse Rollover Effect:
o When the mouse pointer is moved over a country:
 The country region highlights with a bold border.
 A tooltip label appears showing:

Country Name: Population Density

Example:

India: 420.52 people/km²

China: 380.11 people/km²

Australia: 3.24 people/km²

User Interaction Features:

The user can:

o Zoom in and out of the map using the zoom control buttons.
o Pan or drag the map to view different regions.
o Hover over countries to view their details dynamically.

Legend Output:

 A color legend is displayed at the bottom right corner of the map.


 It indicates how color intensity corresponds to population density:

Yellow → Low Density

Orange → Moderate Density

Red → High Density


Sample Console Output (if any):

> library(leaflet)

> library(rnaturalearth)

> library(dplyr)

> leaflet(data = data)

Rendering map... done!

RESULT:
An interactive world map was successfully generated using the Leaflet library in
R.
The map displayed population density for each country with color-coded
visualization, along with mouse rollover effects, zooming, and user interaction
controls, enabling effective and interactive spatial data analysis.
EX.NO:7
Perform Exploratory Data Analysis (EDA) on
DATE: Wine Quality Dataset

AIM:
To perform Exploratory Data Analysis (EDA) on the Wine Quality dataset to
understand its structure, identify patterns, detect missing values, and analyze
relationships between various chemical properties and wine quality using R.

ALGORITHM:

 Start RStudio or R environment.

 Load required libraries:

 ggplot2 for visualization.


 dplyr for data manipulation.
 corrplot for correlation analysis.
 readr for reading CSV data.

 Import the dataset:

 Load the Wine Quality Dataset (e.g., winequality-red.csv or


winequality-white.csv) using read.csv() or
readr::read_csv().

 Inspect the dataset:

 Use functions such as head(), str(), summary(), and dim() to


understand data structure and summary statistics.
 Check for missing values:

 Use sum(is.na(data)) to find missing or null values.

 Perform univariate analysis:

 Plot histograms, boxplots, or density plots for numerical variables to study


their distributions.

 Perform bivariate analysis:

 Use scatter plots and correlation matrices to analyze relationships between


predictors and wine quality.

 Calculate correlation matrix:

 Compute correlations using cor() and visualize using corrplot().

 Perform feature analysis:

 Identify features most strongly correlated with quality.

 Draw inferences and observations based on the visualizations and statistical


results.

 End.
PROGRAM (in R):
(“ Density by Quality") +

theme_minimal()# Load required libraries

library(ggplot2)

library(dplyr)

library(corrplot)

library(readr)

# Import the dataset

wine <- read_csv("winequality-red.csv")

# View the structure and summary of the dataset

str(wine)

summary(wine)

dim(wine)

# Check for missing values

sum(is.na(wine))

# Univariate Analysis

ggplot(wine, aes(x = quality)) +

geom_bar(fill = "skyblue") +

ggtitle("Distribution of Wine Quality") +

xlab("Quality") + ylab("Count")
# Boxplot for alcohol vs quality

ggplot(wine, aes(x = as.factor(quality), y = alcohol, fill = as.factor(quality))) +

geom_boxplot() +

ggtitle("Alcohol Content vs Wine Quality") +

xlab("Wine Quality") + ylab("Alcohol (%)")

# Correlation matrix

corr_matrix <- cor(wine %>% select(-quality))

corrplot(corr_matrix, method = "color", type = "upper", tl.col = "black", tl.srt =


45)

# Scatter plot: alcohol vs density

ggplot(wine, aes(x = alcohol, y = density, color = as.factor(quality))) +

geom_point(alpha = 0.6) +

ggtitle("Alcohol”)
OUTPUT:
1. Dataset Summary:

> dim(wine)

[1] 1599 12

> head(wine)

fixed.acidity volatile.acidity citric.acid residual.sugar chlorides ...

7.4 0.70 0.00 1.9 0.076

...

Missing Values Check:

> sum(is.na(wine))

[1] 0

RESULT:
Exploratory Data Analysis (EDA) was successfully performed on the Wine
Quality Dataset using R.
The analysis revealed that alcohol, sulphates, and citric acid are positively
correlated with wine quality, while volatile acidity negatively

affects quality.
The dataset is clean, with no missing values, and visualizations effectively
represent important data patterns.

You might also like