Data Mining
Prepared by: Ian Magistrado Naz
Submitted to: Mrs. Anna Rhodora L. Priolo
Task 1: Identify different sources of Data, how each source generates
Data, and give an example of what Data they produce.
Table of Contents page
Title page 1
Table of contents 2
Genetics 3
How Data is being collected and Produced in Genetics 3
High-throughput sequencing 4
Bioinformatics 4
DNA and RNA 5
Business analytics and Marketing 6
Social Media Data 7
Cyber Security 8
Block Chain Technology 9
Web Data 10
Database Data 11
Genetics – is one of the most famous fields of study where data is
collected through D.N.A (Deoxyribonucleic Acid) and R.N.A
(Ribonucleic Acid). With the help of Technology, it can generate,
gather, and produce DATA with various procedure such as high-
throughput sequencing, bioinformatics, etc.
How Data is being collected and Produced in Genetics?
In genetics, data is collected by studying the instructions stored in
DNA, the molecule that contains our genetic information.
Scientists use different methods to collect this data, like reading
the order of the four building blocks of DNA (A, C, G, and T) in a
process called DNA sequencing. They also look at specific genetic
differences between individuals, called genotyping, and study how
genes are turned on or off in cells through gene expression
profiling. Another way scientists collect genetic data is by studying
how genes work and interact with each other, which helps us
understand how traits and diseases are inherited.
High-throughput sequencing, also referred to as massively parallel
sequencing or next-generation sequencing, has transformed genetic
research by facilitating the rapid and efficient sequencing of vast
quantities of DNA or RNA molecules. This technological advancement
enables the sequencing of entire genomes, transcriptomes, and other
genomic regions in a high-throughput manner, revolutionizing our
understanding of genetic information.
Bioinformatics is an interdisciplinary field that merges biology,
computer science, and information technology to analyze and interpret
biological data, particularly genetic data. In the realm of genetics,
bioinformatics tools and methodologies are pivotal for processing,
analyzing, and visualizing large-scale genetic datasets. They extract
meaningful insights from these datasets and facilitate predictions about
gene functionality, evolutionary relationships, and associations with
diseases. Through bioinformatics, researchers can uncover intricate
patterns within genetic data, aiding in the advancement of genetic
research and its applications.
DNA and RNA
DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid) are two
types of nucleic acids that serve as essential molecules in living
organisms. DNA is a double-stranded molecule composed of
nucleotides containing deoxyribose sugar, phosphate groups, and
four nitrogenous bases (adenine, cytosine, guanine, and thymine).
It is primarily located in the cell nucleus and carries the genetic
information responsible for the development, functioning,
growth, and reproduction of organisms. RNA, on the other hand,
is a single-stranded molecule composed of nucleotides containing
ribose sugar, phosphate groups, and four nitrogenous bases
(adenine, cytosine, guanine, and uracil). RNA is involved in various
biological processes, including protein synthesis (as messenger
RNA, transfer RNA, and ribosomal RNA), gene regulation (as
microRNA and small interfering RNA), and other functions such as
RNA splicing, RNA editing, and telomerase activity. Together, DNA
and RNA play crucial roles in the transmission, expression, and
regulation of genetic information within cells.
Business analytics and Marketing– Data in business is collected through
various channels and methods, including KYC (Knowing Your Customer),
transactional data from sales transactions and purchases, customer
interactions through touchpoints like customer service calls and feedback
forms, website and social media data capturing user behavior and
engagement, market research involving surveys and industry reports,
operational data from supply chain and inventory management systems,
sensors and IoT devices providing real-time data on equipment
performance, and external sources such as third-party data providers and
government agencies supplying demographic and market trend data. In the
sales process, especially for appliance loans, businesses often conduct a
background check known as a "credit investigation" (C.I.) here in the
Philippines. This involves looking into a customer's financial history to assess
their ability to repay the loan. The purpose is to ensure that the customer
has a good track record of making payments on time and has the financial
capacity to fulfill the loan obligations. These different data sources and
technologies enable businesses to gather information relevant to their
operations, customers, and market environment, facilitating data-driven
decision-making and strategic planning, business analytics/business
intelligence gathers different kinds of important information for businesses
to make smart decisions. This includes details about how the business is
running every day (like sales and inventory), how much money it's making
and spending, what customers are buying and saying, what's happening in
the market, how the supply chain is working, who is working for the
company, and what big plans the company has for the future.
Social Media Data - Social media platforms like Facebook, TikTok,
Instagram, YouTube, etc. collect data from users in several ways. Like
when people create accounts, they often share personal information
like their name, age, location, and interests. As they use the platform,
they generate more data through their activities such as posting
updates, liking and sharing posts, commenting on others' posts, and
following pages or accounts. Social media platforms also track users'
interactions with ads, including clicks and views. This data is then
processed and used to create a variety of information, including
demographic details about users, their interests and preferences, the
type of content they engage with, how they interact with brands, and
trends in popular topics or hashtags. Additionally, sentiment analysis
tools are used to understand the general feelings expressed in posts
and comments, whether they're positive, negative, or neutral.
Cyber Security – In cybersecurity, data is collected through various
methods to keep digital information safe and secure. This includes
monitoring network traffic for any unusual activities or potential
threats, analyzing system logs to detect unauthorized access attempts,
and using specialized software to scan for malware and viruses.
Additionally, cybersecurity professionals collect data from security
incidents, such as breaches or attacks, to understand how they occurred
and prevent similar incidents in the future. The data collected in
cybersecurity includes information about network traffic patterns,
system vulnerabilities, attempted breaches, malware signatures, and
security incident details.
Block Chain Technology – In blockchain technology, data is collected
and managed through a decentralized and distributed ledger system.
Each block in the blockchain contains a set of transactions, and these
blocks are linked together in a chain using cryptographic hashes. Data is
collected in blockchain through transactions initiated by users, such as
sending or receiving cryptocurrency, recording ownership of digital
assets, or executing smart contracts. These transactions are verified by
network participants (nodes) through a process called consensus, and
once verified, they are added to the blockchain as a new block. The data
produced in blockchain includes transaction details such as sender,
receiver, timestamp, and transaction amount, as well as cryptographic
signatures for authentication and verification. Additionally, blockchain
data also includes information about the overall state of the network,
including block height, network hash rate, and difficulty level.
Web Data – In web data collection, information is gathered from
various online sources such as websites, web applications, and web
services. This data can include text, images, videos, user interactions,
and other content available on the web. Web data is collected through
techniques like web scraping, which involves extracting data from web
pages using automated scripts or tools. Additionally, web analytics tools
are used to collect data about website visitors, their behavior, and
interactions with the site, including page views, click-through rates,
bounce rates, and conversion metrics. This data is then processed and
analyzed to gain insights into user preferences, trends, and patterns,
which can inform website optimization, content creation, marketing
strategies, and other business decisions. Web data also includes
metadata such as URLs, timestamps, and metadata tags, which provide
additional context and information about the web content.
Database Data – In database data collection, information is gathered
from different sources like online forms, transactions, and manual
entries. For instance, when you buy something online and enter your
name and address, that information gets collected into a database.
Similarly, when a store sells a product, that sale is recorded in a
database too. This data can include details about customers, products,
transactions, and more. For example, a database might have
information about who bought what, when they bought it, how much
they paid, and where it was shipped. This data is then organized and
stored in tables within the database. So, whenever you need to know
something like how many products were sold last month or who your
top customers are, you can ask the database and get the answers you
need.
End…