0% found this document useful (0 votes)
6 views15 pages

Visualisations

The document presents an analysis of 18 CSV data files focusing on 'Time frames' and 'Frequencies' to identify patterns and correlations. It includes visualizations such as heatmaps, scatter plots, and correlation matrices, revealing insights about frequency distributions, relationships between frequency columns, and autocorrelation behaviors. Key findings indicate that freq_p9_1 consistently shows higher values, while freq_p5_1 has lower values, and certain frequency columns exhibit periodic behavior and long-term influences.

Uploaded by

Jansi Goswami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views15 pages

Visualisations

The document presents an analysis of 18 CSV data files focusing on 'Time frames' and 'Frequencies' to identify patterns and correlations. It includes visualizations such as heatmaps, scatter plots, and correlation matrices, revealing insights about frequency distributions, relationships between frequency columns, and autocorrelation behaviors. Key findings indicate that freq_p9_1 consistently shows higher values, while freq_p5_1 has lower values, and certain frequency columns exhibit periodic behavior and long-term influences.

Uploaded by

Jansi Goswami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Tech Atriocare - Jansi Goswami

Data Analysis
Brief analysis of 18 data points:

Introduction

We have 18 CSVs of data with ‘Time frames’ and ‘Frequencies’ as our two columns.

What are we doing?


We have to find patterns within these files, and the correlation between them.

To understand what story our data tells through these charts and graphs, we have used
Data Analysis.
Heatmap representation.

What This Code Does:

It creates a heatmap to show the distribution of frequency values across different time
frames and frequency columns.

The color intensity indicates the magnitude of the frequency values at specific time frames.
Darker colors typically represent higher values, and lighter colors represent lower values.

Interpretation of the Heatmap:

1. Y-Axis (Frequency Columns): The y-axis lists the frequency columns being visualized.
In this heatmap, the columns are:
○ freq_p3_1
○ freq_p4_1
○ freq_p5_1
○ freq_p9_1

2
2. X-Axis (Time Frame): The x-axis represents different time frames. The labels indicate
sequential time frames, possibly aggregated or resampled to fit into the heatmap.

3. Color Coding: The color bar on the right side of the graph shows the frequency
values' range, from around 1950 (dark purple) to 2250 (yellow). The colors in the
heatmap represent the actual frequency values for each time frame:

○ Yellow areas represent higher frequency values (close to 2250).

○ Purple areas represent lower frequency values (close to 1950).

○ Green and blue shades represent mid-range frequency values.

4. Insights from Color Patterns:

○ The freq_p9_1 column shows consistent high-frequency values across most


time frames, indicated by the prominent yellow color. This suggests that the
freq_p9_1 has a generally higher frequency compared to others.

○ The freq_p5_1 column has a lower frequency range, as indicated by the


purple and blue colors.

○ The freq_p3_1 and freq_p4_1 columns show variations but generally have
mid to high-frequency values, indicated by green and blue colors.

Summary:

● The heatmap helps visualize the distribution and variation of frequency values
across different time frames for the specified columns.

● The freq_p9_1 column stands out due to its consistently higher frequency values,
while freq_p5_1 tends to have lower values.

● This visualization effectively shows the temporal behavior of these frequency


columns, helping identify trends or anomalies in specific columns over time.

3
Frequency over Time frame.

Insights:

● The scatter plot shows a predominant frequency range (between 2080 and 2140),
suggesting stable conditions.
● Occasional drops in frequency values could indicate anomalies or disturbances in
the system.
● The clustering of data points suggests a high sampling rate or precision, capturing
fine details of the frequency changes.

4
Distribution of Frequency.

Distribution of Frequency.

The histogram with the density curve shows that Frequency is roughly normally distributed
but skewed slightly to the left,

indicating that most values are concentrated around the higher end, closer to the
maximum frequency.

5
Correlation Matrix of frequency columns.

Understanding the Correlation Matrix

The correlation matrix shows the relationships between different frequency columns in
your dataset. In simple terms, it measures how much one column moves in the same or
opposite direction as another column. Each cell in the matrix represents the correlation

6
between two columns, ranging from -1 (perfect negative correlation) to +1 (perfect positive
correlation).

1. Positive Correlation (Red Cells):

○ When two columns have a positive correlation (close to +1), it means that as
one column increases, the other tends to increase as well.

○ For example, freq_p8_1 and freq_p8_2 have a correlation of 0.48, indicating


that these two columns often increase together. This suggests they might be
related or respond similarly to the same factors.

2. Negative Correlation (Blue Cells):

○ A negative correlation (close to -1) means that when one column increases,
the other decreases.

○ freq_p9_1 and freq_p6_1 have a correlation of -0.32, indicating an inverse


relationship. When the values in freq_p9_1 go up, those in freq_p6_1 tend to
go down.

3. No or Weak Correlation (Light Colors, Close to 0):

○ Many cells have values close to 0, which means there is no strong


relationship between those pairs of columns. Their movements are largely
independent of each other.

○ For example, freq_1 and freq_p3_1 have a correlation of 0.015, which is very
close to zero, indicating no significant relationship.

Key Patterns and Insights:

7
Clusters of Positive Correlation: You can spot a cluster of moderately positive correlations
between freq_p8_1, freq_p8_2, and freq_p18_1.

This suggests that these frequencies might be responding similarly to some underlying factor.

● Negative Relationships: The strongest negative relationship is between freq_p9_1


and freq_p6_1 (-0.32). This could imply that when one of these frequencies is
prominent, the other tends to be less so.

● Mostly Independent Columns: The majority of the correlations are weak (close to
zero), suggesting that most frequencies don't have strong relationships with each
other. This indicates that they might be capturing different aspects of the data.
● Isolated Strong Positive Correlation: freq_p15_1 shows a positive correlation of 0.42
with freq_p4_1, hinting at a potential direct relationship where these two
frequencies tend to rise together.

Autocorrelation of all the freq columns - Autocorrelation is a statistical tool used to


analyze the relationship between a time series' current values and its past values over
different time lags. It helps in identifying patterns, such as trends or periodicity, in the data.
1. Y-axis (Autocorrelation): This shows how much the time series data is
correlated with its own previous values at different time [Link] values
range from -1 to 1.

A value close to 1 means strong positive correlation (the data points are similar). A value
close to -1 means strong negative correlation (the data points are opposite). A value close
to 0 means no correlation (random relationship).

2. X-axis (Lag): This represents the time lag.

A lag of 1 means the correlation between each data point and the one immediately before
it, a lag of 2 means the correlation with the data point two steps before, and so on.

8
Autocorrelation of all the freq columns -

9
What is Autocorrelation?:

Autocorrelation is a way to measure the similarity between observations of a time series as a


function of the time lag between them. In simple terms, it tells us how much a point in a
series is influenced by its previous points. If a series is highly autocorrelated, values further
down the timeline are highly influenced by earlier values.

The Graph Elements:

● X-axis (Lag): This represents the lag, which is the number of time steps separating
the data points you're comparing. For example, a lag of 100 compares the
relationship between a point at time and the point 100 steps before it.
● Y-axis (Autocorrelation): This measures how correlated the time series is with itself
at different lags. A positive value means there's a positive correlation (the data
points follow a similar trend), while a negative value indicates an inverse
relationship.
● Horizontal Dotted Lines: These are the confidence intervals. If the autocorrelation
line falls outside these boundaries, it suggests a significant correlation at that lag.

Observations from the Plots:

● Autocorrelation Patterns:

○ Some plots (like those for freq_p3_1 and freq_p5_1) have oscillations above
and below zero, which might suggest a repeating or periodic pattern in the
data.

○ Other plots (like freq_p9_1 and freq_p10_1) show a slow decrease in


correlation, indicating that earlier data points have a lasting influence on
later points in the series.

○ Some plots have autocorrelation values that quickly drop and remain near
zero (e.g., freq_p1_1, freq_p2_2). This suggests that after a certain lag, earlier
values stop having any meaningful influence.

Insights:

10
1. Periodic Behavior:

○ Some series (like freq_p3_1 and freq_p7_1) show oscillations in


autocorrelation, suggesting periodic patterns. This could be a sign of cyclical
behavior in the data.

2. Stationarity:

○ A few series quickly drop to zero (e.g., freq_p8_1), indicating the data may be
stationary, meaning it doesn’t depend heavily on time and its statistical
properties don’t change over time.

3. Long-Term Influence:

○ In some series (like freq_p14_1), autocorrelation slowly diminishes, indicating


that past values have a long-lasting impact on future values. This can often
happen in systems where there is some inertia or memory.

4. Random or Uncorrelated Data:

○ Some plots (like freq_p2_2 and freq_p12_1) show very little autocorrelation
from the start, meaning that these series behave more like white
noise—there’s no clear relationship between values at different lags.

Actionable Insights:

● If you're trying to forecast these time series or extract patterns, focus on those with
high autocorrelation values (like freq_p3_1) as they are more predictable.

● For series where autocorrelation drops quickly to zero (like freq_p8_1), more
sophisticated techniques like moving averages or statistical smoothing may be
necessary to predict future values.

Cross correlation Matrix

11
Cross-Correlation Matrix: The cross_corr_matrix DataFrame stores the maximum
cross-correlation values between each pair of frequency columns.

● Heatmap: The heatmap is a visual representation where each cell's color intensity
reflects the strength of the cross-correlation between two columns.

● Diagonal Values: The diagonal of the matrix represents the correlation of each
frequency with itself, which is always 1.

12
Insights:

● High Cross-Correlation: Darker or more intense colors indicate a higher


cross-correlation between two frequency columns, suggesting that these two signals
have a strong relationship.

● Low Cross-Correlation: Lighter colors indicate weaker or no significant relationship


between the two signals.

Customization:

● Lag Analysis: If you're interested in specific lags, you can modify the code to store
the cross-correlation values for those particular lags.

● Thresholding: You can filter the matrix to show only cross-correlations above a
certain threshold if you are interested in significant correlations only.

13
Based on autocorrelation, we can do further steps of forecasting on our data.

It will help us predict the rising and decline in frequencies.

We can use :-
-fourier method.

14
● Explanation of the Fourier Transform Results: The frequency spectrum above shows
how much each frequency contributes to the original time series. Here's a
breakdown of what we're seeing:
● Dominant Low Frequencies: The largest peak is at the lowest frequency range,
indicating that the time series has a strong low-frequency (longer-term) component.
This suggests there is some slow, gradual oscillation in the data.
● No Significant High Frequencies: The rest of the frequencies have much lower
magnitudes, indicating that there aren't strong high-frequency (short-term)
oscillations.
● Interpretation: This means the oscillations in the autocorrelation of freq_p3_1 are
largely driven by slow, recurring trends. There might be cycles in the data with a
period related to this dominant low frequency.

15

You might also like