Tech Atriocare - Jansi Goswami
Data Analysis
Brief analysis of 18 data points:
Introduction
We have 18 CSVs of data with ‘Time frames’ and ‘Frequencies’ as our two columns.
What are we doing?
We have to find patterns within these files, and the correlation between them.
To understand what story our data tells through these charts and graphs, we have used
Data Analysis.
Heatmap representation.
What This Code Does:
It creates a heatmap to show the distribution of frequency values across different time
frames and frequency columns.
The color intensity indicates the magnitude of the frequency values at specific time frames.
Darker colors typically represent higher values, and lighter colors represent lower values.
Interpretation of the Heatmap:
1. Y-Axis (Frequency Columns): The y-axis lists the frequency columns being visualized.
In this heatmap, the columns are:
○ freq_p3_1
○ freq_p4_1
○ freq_p5_1
○ freq_p9_1
2
2. X-Axis (Time Frame): The x-axis represents different time frames. The labels indicate
sequential time frames, possibly aggregated or resampled to fit into the heatmap.
3. Color Coding: The color bar on the right side of the graph shows the frequency
values' range, from around 1950 (dark purple) to 2250 (yellow). The colors in the
heatmap represent the actual frequency values for each time frame:
○ Yellow areas represent higher frequency values (close to 2250).
○ Purple areas represent lower frequency values (close to 1950).
○ Green and blue shades represent mid-range frequency values.
4. Insights from Color Patterns:
○ The freq_p9_1 column shows consistent high-frequency values across most
time frames, indicated by the prominent yellow color. This suggests that the
freq_p9_1 has a generally higher frequency compared to others.
○ The freq_p5_1 column has a lower frequency range, as indicated by the
purple and blue colors.
○ The freq_p3_1 and freq_p4_1 columns show variations but generally have
mid to high-frequency values, indicated by green and blue colors.
Summary:
● The heatmap helps visualize the distribution and variation of frequency values
across different time frames for the specified columns.
● The freq_p9_1 column stands out due to its consistently higher frequency values,
while freq_p5_1 tends to have lower values.
● This visualization effectively shows the temporal behavior of these frequency
columns, helping identify trends or anomalies in specific columns over time.
3
Frequency over Time frame.
Insights:
● The scatter plot shows a predominant frequency range (between 2080 and 2140),
suggesting stable conditions.
● Occasional drops in frequency values could indicate anomalies or disturbances in
the system.
● The clustering of data points suggests a high sampling rate or precision, capturing
fine details of the frequency changes.
4
Distribution of Frequency.
Distribution of Frequency.
The histogram with the density curve shows that Frequency is roughly normally distributed
but skewed slightly to the left,
indicating that most values are concentrated around the higher end, closer to the
maximum frequency.
5
Correlation Matrix of frequency columns.
Understanding the Correlation Matrix
The correlation matrix shows the relationships between different frequency columns in
your dataset. In simple terms, it measures how much one column moves in the same or
opposite direction as another column. Each cell in the matrix represents the correlation
6
between two columns, ranging from -1 (perfect negative correlation) to +1 (perfect positive
correlation).
1. Positive Correlation (Red Cells):
○ When two columns have a positive correlation (close to +1), it means that as
one column increases, the other tends to increase as well.
○ For example, freq_p8_1 and freq_p8_2 have a correlation of 0.48, indicating
that these two columns often increase together. This suggests they might be
related or respond similarly to the same factors.
2. Negative Correlation (Blue Cells):
○ A negative correlation (close to -1) means that when one column increases,
the other decreases.
○ freq_p9_1 and freq_p6_1 have a correlation of -0.32, indicating an inverse
relationship. When the values in freq_p9_1 go up, those in freq_p6_1 tend to
go down.
3. No or Weak Correlation (Light Colors, Close to 0):
○ Many cells have values close to 0, which means there is no strong
relationship between those pairs of columns. Their movements are largely
independent of each other.
○ For example, freq_1 and freq_p3_1 have a correlation of 0.015, which is very
close to zero, indicating no significant relationship.
Key Patterns and Insights:
7
Clusters of Positive Correlation: You can spot a cluster of moderately positive correlations
between freq_p8_1, freq_p8_2, and freq_p18_1.
This suggests that these frequencies might be responding similarly to some underlying factor.
● Negative Relationships: The strongest negative relationship is between freq_p9_1
and freq_p6_1 (-0.32). This could imply that when one of these frequencies is
prominent, the other tends to be less so.
● Mostly Independent Columns: The majority of the correlations are weak (close to
zero), suggesting that most frequencies don't have strong relationships with each
other. This indicates that they might be capturing different aspects of the data.
● Isolated Strong Positive Correlation: freq_p15_1 shows a positive correlation of 0.42
with freq_p4_1, hinting at a potential direct relationship where these two
frequencies tend to rise together.
Autocorrelation of all the freq columns - Autocorrelation is a statistical tool used to
analyze the relationship between a time series' current values and its past values over
different time lags. It helps in identifying patterns, such as trends or periodicity, in the data.
1. Y-axis (Autocorrelation): This shows how much the time series data is
correlated with its own previous values at different time [Link] values
range from -1 to 1.
A value close to 1 means strong positive correlation (the data points are similar). A value
close to -1 means strong negative correlation (the data points are opposite). A value close
to 0 means no correlation (random relationship).
2. X-axis (Lag): This represents the time lag.
A lag of 1 means the correlation between each data point and the one immediately before
it, a lag of 2 means the correlation with the data point two steps before, and so on.
8
Autocorrelation of all the freq columns -
9
What is Autocorrelation?:
Autocorrelation is a way to measure the similarity between observations of a time series as a
function of the time lag between them. In simple terms, it tells us how much a point in a
series is influenced by its previous points. If a series is highly autocorrelated, values further
down the timeline are highly influenced by earlier values.
The Graph Elements:
● X-axis (Lag): This represents the lag, which is the number of time steps separating
the data points you're comparing. For example, a lag of 100 compares the
relationship between a point at time and the point 100 steps before it.
● Y-axis (Autocorrelation): This measures how correlated the time series is with itself
at different lags. A positive value means there's a positive correlation (the data
points follow a similar trend), while a negative value indicates an inverse
relationship.
● Horizontal Dotted Lines: These are the confidence intervals. If the autocorrelation
line falls outside these boundaries, it suggests a significant correlation at that lag.
Observations from the Plots:
● Autocorrelation Patterns:
○ Some plots (like those for freq_p3_1 and freq_p5_1) have oscillations above
and below zero, which might suggest a repeating or periodic pattern in the
data.
○ Other plots (like freq_p9_1 and freq_p10_1) show a slow decrease in
correlation, indicating that earlier data points have a lasting influence on
later points in the series.
○ Some plots have autocorrelation values that quickly drop and remain near
zero (e.g., freq_p1_1, freq_p2_2). This suggests that after a certain lag, earlier
values stop having any meaningful influence.
Insights:
10
1. Periodic Behavior:
○ Some series (like freq_p3_1 and freq_p7_1) show oscillations in
autocorrelation, suggesting periodic patterns. This could be a sign of cyclical
behavior in the data.
2. Stationarity:
○ A few series quickly drop to zero (e.g., freq_p8_1), indicating the data may be
stationary, meaning it doesn’t depend heavily on time and its statistical
properties don’t change over time.
3. Long-Term Influence:
○ In some series (like freq_p14_1), autocorrelation slowly diminishes, indicating
that past values have a long-lasting impact on future values. This can often
happen in systems where there is some inertia or memory.
4. Random or Uncorrelated Data:
○ Some plots (like freq_p2_2 and freq_p12_1) show very little autocorrelation
from the start, meaning that these series behave more like white
noise—there’s no clear relationship between values at different lags.
Actionable Insights:
● If you're trying to forecast these time series or extract patterns, focus on those with
high autocorrelation values (like freq_p3_1) as they are more predictable.
● For series where autocorrelation drops quickly to zero (like freq_p8_1), more
sophisticated techniques like moving averages or statistical smoothing may be
necessary to predict future values.
Cross correlation Matrix
11
Cross-Correlation Matrix: The cross_corr_matrix DataFrame stores the maximum
cross-correlation values between each pair of frequency columns.
● Heatmap: The heatmap is a visual representation where each cell's color intensity
reflects the strength of the cross-correlation between two columns.
● Diagonal Values: The diagonal of the matrix represents the correlation of each
frequency with itself, which is always 1.
12
Insights:
● High Cross-Correlation: Darker or more intense colors indicate a higher
cross-correlation between two frequency columns, suggesting that these two signals
have a strong relationship.
● Low Cross-Correlation: Lighter colors indicate weaker or no significant relationship
between the two signals.
Customization:
● Lag Analysis: If you're interested in specific lags, you can modify the code to store
the cross-correlation values for those particular lags.
● Thresholding: You can filter the matrix to show only cross-correlations above a
certain threshold if you are interested in significant correlations only.
13
Based on autocorrelation, we can do further steps of forecasting on our data.
It will help us predict the rising and decline in frequencies.
We can use :-
-fourier method.
14
● Explanation of the Fourier Transform Results: The frequency spectrum above shows
how much each frequency contributes to the original time series. Here's a
breakdown of what we're seeing:
● Dominant Low Frequencies: The largest peak is at the lowest frequency range,
indicating that the time series has a strong low-frequency (longer-term) component.
This suggests there is some slow, gradual oscillation in the data.
● No Significant High Frequencies: The rest of the frequencies have much lower
magnitudes, indicating that there aren't strong high-frequency (short-term)
oscillations.
● Interpretation: This means the oscillations in the autocorrelation of freq_p3_1 are
largely driven by slow, recurring trends. There might be cycles in the data with a
period related to this dominant low frequency.
15