warm up
1. What comes to mind when you hear the term "big data"?
2. Why do you think companies and organisations are increasingly interested in
collecting large amounts of data?
3. How do you think big data could impact our daily lives, both positively and
negatively?
vocabulary
Data Mining – the process of extracting and identifying patterns and relationships within
massive datasets to generate valuable insights. 5C
Machine Learning – a subset of artificial intelligence where algorithms learn from data
autonomously, improving decision-making and predictions without explicit programming for
each task. 2E
Datafication – the transformation of various real-world phenomena into quantified data
formats, such as tracking GPS location data or online user behavior. 8J
Predictive Analytics – advanced data analysis aimed at forecasting future outcomes or
trends based on historical data and statistical modeling. 9H
Data Warehouse – a central place where large amounts of organized data from different
sources are stored, making it easy to search and generate reports. 3D
Structured Data – data organized according to a predefined schema, such as rows and
columns in a relational database, making it easy to search and analyze. 6B
Unstructured Data – data lacking a fixed structure, including texts, images, and audio files,
which require specialized processing methods. 12F
Data Visualization – the practice of using graphical representations like charts, graphs, or
dashboards to make complex data more comprehensible and actionable. 11I
NoSQL Databases – non-relational databases designed for high-performance data storage,
often used for handling unstructured or semi-structured big data. 4G
Real-time Processing – the ability to process and analyze data instantly as it’s generated,
supporting timely decision-making and dynamic responses. 10A
Data Lake – a vast storage repository that holds raw, unprocessed data in various formats,
enabling flexible and large-scale data analysis. 1L
Hadoop – an open-source framework that enables the distributed storage and processing of
extensive datasets across clusters of computers, widely used in big data environments. 7K
fill the gaps
1. A ________ is a centralized storage system that organizes large amounts of structured
data from various sources.
(Answer: Data Warehouse)
2. By using ________, computers can improve their decision-making skills without being
explicitly programmed for each task.
(Answer: Machine Learning)
3. ________ allows organizations to predict future trends and outcomes based on historical
data patterns.
(Answer: Predictive Analytics)
4. ________ includes data organized into predefined structures, making it easier to search
and analyze.
(Answer: Structured Data)
5. Photos, videos, and social media posts are examples of ________, which lacks a specific
format and requires more complex processing.
(Answer: Unstructured Data)
6. ________ helps present complex data through charts and graphs, making it easier for
users to understand insights visually.
(Answer: Data Visualization)
7. The concept of ________ refers to turning real-world events and behaviors into data,
which can then be analyzed and utilized.
(Answer: Datafication)
8. A ________ can store raw data in its native format, enabling large-scale analysis without
the need for pre-processing.
(Answer: Data Lake)
9. ________ is essential for discovering hidden patterns in large datasets, helping
companies make informed decisions.
(Answer: Data Mining)
10. ________ allows information to be processed as it arrives, which is crucial for
applications that need immediate response, such as traffic monitoring.
(Answer: Real-time Processing)
11. ________ databases are highly scalable and handle large volumes of unstructured or
semi-structured data, often used in big data applications.
(Answer: NoSQL Databases)
12. ________ is an open-source platform that allows for the distributed processing of big
data across clusters of computers.
(Answer: Hadoop)
comprehension
How does the increase in data volume enable us to understand consumer preferences
more accurately?
● Example: The shift in pie preferences when the size changed from family-size to
individual-size portions highlighted new insights.
In what ways has big data transformed traditional data storage and processing, and
what are the implications of this change?
● Example: The transition from static storage, like clay discs, to dynamic, easily
accessible digital data.
“The disc that was discovered off of Crete that's 4,000 years old, is heavy, it doesn't store a
lot of information, and that information is unchangeable. By contrast, all of the files that
Edward Snowden took from the National Security Agency in the United States fits on a
memory stick the size of a fingernail, and it can be shared at the speed of light.”
What innovative use of data is being explored in Tokyo regarding car theft
prevention?
• In Tokyo, researchers are exploring the use of posture sensors in car seats as a
security feature, where the car can recognize an unauthorized driver based on their
unique sitting posture, potentially preventing theft.
What are some of the ethical challenges and potential dangers associated with big
data, especially in areas like predictive policing (refers to the use of data analysis and
algorithms to predict where crimes are likely to occur or which individuals are more likely to
commit crimes in the future.)?
● Example: How the use of big data in law enforcement could lead to the punishment
of individuals based on predictions rather than actions (if we take a lot of data, for
example where past crimes have been, we know where to send the patrols).
How is big data impacting jobs in professional fields, and what parallels can be drawn
to past technological revolutions?
● Example: The automation of tasks like cancer biopsy analysis may lead to job loss in
fields previously thought secure.