DS Unit-2

Open elective syllabus for ECE

Uploaded by

bandiketharun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views9 pages

DS Unit-2

Open elective syllabus for ECE

Uploaded by

bandiketharun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT-II: Data Collection and Management- Introduction, Sources of data, Data collection and

APIs, Exploring and fixing data, Data storage and management, using multiple data sources
INTRODUCTION
Data collection and management are two critical, interconnected processes for any organization
or research project. Data collection is the systematic process of gathering information from
various sources to answer research questions, make informed decisions, and evaluate
outcomes. Data management is the subsequent practice of organizing, protecting, and storing
that data so it can be easily accessed, used, and analyzed.
SOURCES OF DATA
Data collection is the process of acquiring, collecting, extracting, and storing a voluminous
amount of data, which may be in a structured or unstructured form like text, video, audio, XML
files, records, or other image files used in later stages of data analysis. In the process of big
data analysis, “Data collection” is the initial step before starting to analyze the patterns or useful
information in the data. The data that is to be analyzed must be collected from different valid
sources.
The data that is collected is known as raw data, which is not useful now, but after cleaning the
impure and utilizing that data for further analysis forms information, the information obtained
is known as “knowledge”. Knowledge has many meanings like business knowledge or sales of
enterprise products, disease treatment, etc. The main goal of data collection is to collect
information-rich data. Data collection starts with asking some questions such as what type of
data is to be collected and what is the source of collection. Most of the data collected are of
two types known as “qualitative data“ which is a group of non-numerical data such as words,
sentences mostly focus on behavior and actions of the group and another one is “quantitative
data” which is in numerical forms and can be calculated using different scientific tools and
sampling data.
The actual data is then further divided mainly into two types known as:
 Primary data
 Secondary data
Primary data
The data which is Raw, original, and extracted directly from the official sources is known as
primary data. This type of data is collected directly by performing techniques such as
questionnaires, interviews, and surveys. The data collected must be according to the demand
and requirements of the target audience on which analysis is performed otherwise it would be
a burden in the data processing. Few methods of collecting primary data:
1. Interview method:
The data collected during this process is through interviewing the target audience by a person
called interviewer and the person who answers the interview is known as the interviewee. Some
basic business or product related questions are asked and noted down in the form of notes,
audio, or video and this data is stored for processing. These can be both structured and
unstructured like personal interviews or formal interviews through telephone, face to face,
email, etc.
2. Survey method:
The survey method is the process of research where a list of relevant questions are asked and
answers are noted down in the form of text, audio, or video. The survey method can be obtained
in both online and offline mode like through website forms and email. Then that survey answers
are stored for analyzing data. Examples are online surveys or surveys through social media
polls.
3. Observation method:
The observation method is a method of data collection in which the researcher keenly observes
the behavior and practices of the target audience using some data collecting tool and stores the
observed data in the form of text, audio, video, or any raw formats. In this method, the data is
collected directly by posting a few questions on the participants. For example, observing a
group of customers and their behavior towards the products. The data obtained will be sent for
processing.
4. Experimental method:
The experimental method is the process of collecting data through performing experiments,
research, and investigation. The most frequently used experiment methods are CRD, RBD,
LSD, FD.
 CRD - Completely Randomized design is a simple experimental design used in data
analytics which is based on randomization and replication. It is mostly used for
comparing the experiments.
 RBD - Randomized Block Design is an experimental design in which the experiment
is divided into small units called blocks. Random experiments are performed on each
of the blocks and results are drawn using a technique known as analysis of variance
(ANOVA). RBD was originated from the agriculture sector.
 LSD - Latin Square Design is an experimental design that is similar to CRD and RBD
blocks but contains rows and columns. It is an arrangement of NxN squares with an
equal amount of rows and columns which contain letters that occurs only once in a row.
Hence the differences can be easily found with fewer errors in the experiment. Sudoku
puzzle is an example of a Latin square design.
 FD - Factorial design is an experimental design where each experiment has two factors
each with possible values and on performing trail other combinational factors are
derived.
Secondary data
Secondary data is the data which has already been collected and reused again for some valid
purpose. This type of data is previously recorded from primary data and it has two types of
sources named internal source and external source.
1. Internal source:
These types of data can easily be found within the organization such as market record, a sales
record, transactions, customer data, accounting resources, etc. The cost and time consumption
is less in obtaining internal sources.
2. External source:
The data which can’t be found at internal organizations and can be gained through external
third party resources is external source data. The cost and time consumption is more because
this contains a huge amount of data. Examples of external sources are Government
publications, news publications, Registrar General of India, planning commission,
international labor bureau, syndicate services, and other non-governmental publications.
3 Other sources:
 Sensors data: With the advancement of IoT devices, the sensors of these devices
collect data which can be used for sensor data analytics to track the performance and
usage of products.
 Satellites data: Satellites collect a lot of images and data in terabytes on daily basis
through surveillance cameras which can be used to collect useful information.
 Web traffic: Due to fast and cheap internet facilities many formats of data which is
uploaded by users on different platforms can be predicted and collected with their
permission for data analysis. The search engines also provide their data through
keywords and queries searched mostly.
INTRODUCTION TO DATA COLLECTION AND APIS
Data collection is the process of gathering and measuring information from various sources to
gain insights and make informed decisions. In the modern digital world, a significant amount
of this data is dynamic and constantly updated, making manual collection inefficient. This is
where APIs (Application Programming Interfaces) become a powerful tool.
An API is a set of rules and protocols that allows different software applications to
communicate with each other. In the context of data collection, an API acts as a "contract" or a
"menu" that a data provider (the server) offers to a data consumer (the client). The client sends
a request according to the API's rules, and the server responds with the requested data.
How APIs are Used for Data Collection
Using an API for data collection is a standardized and efficient way to access structured data.
Instead of "web scraping," which involves extracting data directly from a website's HTML,
APIs provide a clean and reliable data stream. This is often the preferred method because it is
less prone to breaking and is a more respectful way to access a company's data.
The general process of using an API for data collection involves these key steps:
1. *Find the Right API:* Identify a public or partner API that offers the data you need. Many
organizations, from social media platforms to government agencies and weather bureaus,
provide APIs to access their data.
2. *Read the Documentation:* API documentation is a manual that explains how to use the
API. It specifies the "endpoints" (URLs for accessing specific resources), the required request
methods (like GET for retrieving data), and the parameters you can use to filter or customize
your request.
3. *Authentication:* Most APIs require an API key or other authentication methods (like
OAuth 2.0) to identify the user and control access. You need to obtain and securely store this
key.
4. *Make the Request:* Using a programming language (like Python or R) or an API client
tool (like Postman), you send a request to the API's endpoint, including your API key and any
necessary parameters.
5. *Process the Response:* The API will send a response, usually in a structured format like
JSON or XML. You then need to parse this data to extract the information you want.
An API, or *Application Programming Interface*, is a set of rules and protocols that allows
different software applications to communicate and exchange information. APIs act as an
intermediary, enabling two separate systems to talk to each other without needing to understand
the internal workings of the other.
Think of an API as a waiter in a restaurant. You, the client, give your order (a request) to the
waiter (the API). The waiter takes your order to the kitchen (the server), which processes it and
prepares your food (the response). The waiter then brings the food back to you. You don't need
to know how the kitchen works; you just need to know how to interact with the waiter to get
what you want.
Types of APIs
APIs are categorized in several ways, most commonly by their architectural style or by their
scope and purpose.
* *REST (Representational State Transfer)*: The most popular and widely used architectural
style for web APIs. RESTful APIs are stateless, meaning each request from a client to a server
contains all the information needed to understand the request, and the server doesn't store any
client context between requests. They use standard HTTP methods like GET, POST, PUT, and
DELETE to perform operations on resources.
* *Pros*: Simple, lightweight, and scalable.
* *Cons*: Can lead to "over-fetching" (getting more data than you need) or "under-fetching"
(needing multiple requests to get all the data).
* *Example*: The Twitter API, which lets you retrieve user tweets or post new ones using
simple URLs and HTTP methods.
* *SOAP (Simple Object Access Protocol)*: A protocol-based API that is more rigid and has
stricter rules than REST. SOAP APIs use XML for their message format and often rely on a
protocol called WSDL (Web Services Description Language) to describe the API's functions.
* *Pros*: Has built-in security and reliability features, making it ideal for enterprise-level
applications.
* *Cons*: Heavier, more complex, and requires more bandwidth due to the verbose XML
format.
* *Example*: Used in legacy systems, financial institutions, and telecommunications for
highly secure and structured transactions.
* *GraphQL*: A modern query language for APIs that was developed to solve the data-fetching
issues of REST. With GraphQL, the client specifies exactly what data it needs in a single
request, eliminating over-fetching and under-fetching.
* *Pros*: Efficient and flexible, allowing clients to get only the data they need with a single
API call.
* *Cons*: Can be more complex to set up and manage than REST, and it trades off some of
the native caching benefits of HTTP.
* *Example*: Used by Facebook, which developed it, and other modern applications that
require precise and efficient data retrieval, particularly in mobile apps.
* *RPC (Remote Procedure Call)*: One of the oldest API types, which allows a client to
execute a function or procedure on a remote server as if it were a local function.
* *Pros*: Simple and straightforward for performing specific actions.
* *Cons*: Can be tightly coupled and less flexible than other styles.
* *Example*: Can be used in microservices architectures where one service calls a function
in another service.
By Scope and Audience
This classification categorizes APIs based on who is allowed to use them.
* *Public (Open) APIs*: These APIs are publicly available and can be used by anyone. They
are often used to enable third-party developers to build applications on top of a company's
services.
* *Example*: The Google Maps API, which lets developers embed maps and location data
into their websites and apps.
* *Partner APIs*: Shared externally but only with specific business partners. Access is often
restricted and requires a special key or token.
* *Example*: An e-commerce site might provide a partner API to a shipping company to
automatically share order and tracking information.
* *Private (Internal) APIs*: Used exclusively within a single organization to connect internal
systems and services. They are not exposed to external developers.
* *Example*: An internal API might be used by a company's sales application to retrieve
customer data from its internal database.
* *Composite APIs*: These APIs combine multiple API calls into a single, streamlined request.
They are useful for complex tasks that would otherwise require multiple separate calls.
* *Example*: A composite API could get user profile information, recent posts, and
comments in one single request, which would have taken three separate calls with a standard
REST API.
EXPLORING AND FIXING DATA
When exploring and fixing data, you're essentially performing a two-part process: data
exploration and data cleaning (or data wrangling). This is a critical step in any data analysis,
machine learning, or business intelligence project to ensure the data is accurate, reliable, and
ready for use.
Data exploration is a crucial first step in the data science process. It involves using statistical
analysis and visualizations to understand the characteristics of a dataset, identify patterns, and
uncover insights. The main goal is to gain a deep understanding of the data before building
predictive models or performing other advanced analyses.
What is Data Exploration?
Data exploration, also known as Exploratory Data Analysis (EDA), is the initial process of
investigating a dataset. It is not about finding the final answers, but rather about asking
questions of the data. This phase is detective work; it helps data scientists understand the
dataset's structure, identify potential problems like missing values or outliers, and discover
relationships between variables. EDA helps inform subsequent steps in the data science
pipeline, such as feature engineering and model selection.
Key Techniques
Data exploration uses a variety of techniques, which can be broadly categorized into two types:
Statistical Techniques
 Descriptive Statistics: This involves calculating measures like mean, median, mode,
standard deviation, and variance for numerical data. These statistics summarize the
central tendency and dispersion of the data. For categorical data, you might look at
frequency counts.
 Correlation Analysis: This technique helps to understand the relationship between two
or more variables. A correlation coefficient (e.g., Pearson's r) measures the strength and
direction of a linear relationship.
 Hypothesis Testing: Although more formal, simple hypothesis tests can be used to
compare groups or check for significant differences.
Visualization Techniques
Visualizations are a powerful way to explore data, as they can reveal patterns and relationships
that are hard to see from raw numbers. Some of the most common visualizations include:
 Histograms: Used to visualize the distribution of a single numerical variable. They
show how often values fall into different ranges.
 Box Plots: Excellent for summarizing the distribution of a numerical variable and
identifying outliers. They show the median, quartiles, and range of the data.
 Scatter Plots: Used to visualize the relationship between two numerical variables. Each
point represents a data instance, and the pattern of the points can reveal a correlation.
 Bar Charts: Ideal for comparing categorical data. They show the frequency or value
for different categories.
 Heatmaps: Useful for visualizing correlations between many variables at once. A grid
of colors represents the strength of the relationships.
Why is it Important?
Data exploration is critical for several reasons:
 Data Cleaning: It helps identify and address data quality issues, such as missing values,
incorrect data types, or duplicates.
 Feature Understanding: It provides insights into the characteristics of each variable,
which can guide the process of feature selection and creation.
 Outlier Detection: EDA helps spot unusual data points or outliers that could skew the
results of a model.
 Hypothesis Generation: By exploring the data, data scientists can form new hypotheses
about the relationships between variables that can be tested later.
 Informing Modeling: The insights gained from EDA can help choose the right machine
learning algorithms and guide model building, leading to more accurate and reliable
results.
Data Cleaning
Data cleaning is the process of detecting and correcting (or removing) corrupt, inaccurate, or
irrelevant records from a dataset. The goal is to improve the quality of the data so that it can be
used for analysis.
Common Problems and Their Solutions:
Missing Values: Missing data can be handled in several ways:
Removal: Delete rows or columns with a high percentage of missing values. This is suitable
for small datasets or when the number of missing values is minimal.
Imputation: Fill in missing values using a replacement strategy. Common methods include
using the mean, median, or mode for numerical data, or a most frequent value for categorical
data.
Inconsistent Data: This includes variations in data entry, such as "USA," "U.S.A.," and
"United States."
Solution: Standardize the data by creating a consistent format. Use techniques like string
manipulation to convert all variations to a single, unified value (e.g., "USA").
Duplicate Records: When the same data point appears multiple times.
Solution: Remove the duplicate rows, keeping only one instance. This is crucial for accurate
analysis.
Incorrect Data Types: For example, a column representing numerical data is stored as a string.
Solution: Convert the data type to the correct format (e.g., converting a string to an integer or
float).
Outliers: Data points that are significantly different from other observations. They can be valid
but may also be a result of data entry errors.
Solution: Depending on the context, you might remove them or transform them. For example,
you can cap the values at a certain percentile to reduce their impact.
Structural Errors: Incorrectly formatted data, such as a single column containing multiple
data points that should be in separate columns.
Solution: Reshape the data using techniques like splitting columns or pivoting tables to create
a logical and structured format.
DATA STORAGE AND MANAGEMENT, USING MULTIPLE DATA SOURCES
Data storage and management using multiple sources is a complex but crucial process that
involves integrating disparate datasets into a unified, accessible, and high-quality system. It is
a fundamental step for businesses aiming to make data-driven decisions and derive meaningful
insights.
Key Challenges
When working with multiple data sources, organizations face several challenges:
 Data Silos: Data is often isolated in different departments or systems (e.g., CRM, ERP,
marketing platforms), making it difficult to get a complete view.
 Data Variety: Data comes in many forms, including structured data from relational
databases, semi-structured data like JSON or XML, and unstructured data such as text
documents and images.
 Data Inconsistency: Disparate sources may have different formats, naming
conventions, or data types, leading to inconsistencies that must be resolved.
 Data Quality: Issues like missing values, duplicates, and inaccurate information are
common and can compromise the reliability of analysis.
The Integration Process
To overcome these challenges, a structured approach is essential. The most common method
for data integration is a process known as Extract, Transform, Load (ETL) or its modern
variant, Extract, Load, Transform (ELT).
1. Extract: This is the first step, where data is pulled from all identified sources. These
sources can be anything from relational databases and cloud applications to flat files
and APIs.
2. Transform: Data from different sources is cleaned, standardized, and aggregated to
ensure it's consistent and ready for analysis. This step includes:
o Data Cleansing: Removing duplicates, correcting errors, and handling missing
values.
o Standardization: Ensuring consistent data formats, such as dates or currency.
o Aggregation: Summarizing data to a higher level of granularity.
3. Load: The transformed data is loaded into a central repository, often a data warehouse
or a data lake.
While ETL is a traditional approach, ELT has become popular with the rise of cloud-based
data warehouses. In ELT, raw data is loaded directly into the data warehouse first, and the
transformation happens within the warehouse itself. This approach is beneficial because
modern data warehouses are powerful enough to handle large-scale transformations, and it
allows for greater flexibility.
Storage and Management Solutions
After integration, the data needs to be stored and managed in a way that supports the
organization's goals. The choice of storage solution depends on the type of data and the
intended use case.
 Data Warehouse: This is a centralized repository that stores structured, historical data
from multiple sources. It's optimized for fast querying and analysis, making it ideal for
business intelligence and reporting.
 Data Lake: A data lake is a vast storage repository that can hold large amounts of raw
data in its native format. It's more flexible than a data warehouse and is well-suited for
advanced analytics, machine learning, and data exploration.
 Hybrid Solutions: Many companies use a combination of on-premise and cloud-based
solutions to create a flexible and scalable data management infrastructure.

Data Analytics PDF
No ratings yet
Data Analytics PDF
115 pages
Unit 2 BI & Data Science
No ratings yet
Unit 2 BI & Data Science
35 pages
All Unit Notes
No ratings yet
All Unit Notes
116 pages
Da Notes
No ratings yet
Da Notes
61 pages
DA Unit1 Notes
No ratings yet
DA Unit1 Notes
28 pages
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
No ratings yet
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
18 pages
Data Analytics BCSDS501
No ratings yet
Data Analytics BCSDS501
114 pages
DAFD UNit-2
No ratings yet
DAFD UNit-2
16 pages
BigDataAnalytics - Unit1
No ratings yet
BigDataAnalytics - Unit1
21 pages
Unit 1 Da
No ratings yet
Unit 1 Da
69 pages
Data Analytics Unit 1
No ratings yet
Data Analytics Unit 1
16 pages
LESSON1 ObtainingData
100% (1)
LESSON1 ObtainingData
32 pages
Data Analytics Unit-1 Part 1
No ratings yet
Data Analytics Unit-1 Part 1
37 pages
1 Da
No ratings yet
1 Da
12 pages
Introduction To Data Science Module 2
No ratings yet
Introduction To Data Science Module 2
35 pages
Unit I
No ratings yet
Unit I
15 pages
UNIT 2 Notes - Data Science
No ratings yet
UNIT 2 Notes - Data Science
18 pages
Da Unit-I
No ratings yet
Da Unit-I
39 pages
Data Collection Lecture
No ratings yet
Data Collection Lecture
10 pages
Module 1
No ratings yet
Module 1
20 pages
Data Science Basics for Beginners
100% (2)
Data Science Basics for Beginners
68 pages
Module 2 Data Science
No ratings yet
Module 2 Data Science
28 pages
Comprehensive Guide To Data Collection
No ratings yet
Comprehensive Guide To Data Collection
16 pages
Unit-1 - ADA - Notes
No ratings yet
Unit-1 - ADA - Notes
23 pages
Session 3 Data Collection Analysis and Interpretation
No ratings yet
Session 3 Data Collection Analysis and Interpretation
31 pages
Xi Ai Unit - 5 Notes
No ratings yet
Xi Ai Unit - 5 Notes
28 pages
Data Processing
No ratings yet
Data Processing
14 pages
Chapter II Data Collection and Management
No ratings yet
Chapter II Data Collection and Management
19 pages
ITE Elective Lecture Materials Data Colletion and Descriptive Statistics
No ratings yet
ITE Elective Lecture Materials Data Colletion and Descriptive Statistics
8 pages
Da Module 1
No ratings yet
Da Module 1
34 pages
Data Analytics - Unit - 1
No ratings yet
Data Analytics - Unit - 1
25 pages
Important Question of Introduction of Data Science
No ratings yet
Important Question of Introduction of Data Science
10 pages
Data Collection
No ratings yet
Data Collection
13 pages
Data Collection Methods Guide
No ratings yet
Data Collection Methods Guide
7 pages
Unit 1 - PPT
No ratings yet
Unit 1 - PPT
67 pages
Module 5 Lecture Note
No ratings yet
Module 5 Lecture Note
8 pages
IM M2-Week 3-Organization & Presentation of Data-1
No ratings yet
IM M2-Week 3-Organization & Presentation of Data-1
16 pages
Data Is A Collection
No ratings yet
Data Is A Collection
9 pages
DS Module2 L1 L11
No ratings yet
DS Module2 L1 L11
27 pages
DATA ANALYSIS Docx
No ratings yet
DATA ANALYSIS Docx
17 pages
Data Collection
No ratings yet
Data Collection
6 pages
Module 3
No ratings yet
Module 3
17 pages
Statistics Method of Data Collection
No ratings yet
Statistics Method of Data Collection
6 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
Research Methodology Unit 4
No ratings yet
Research Methodology Unit 4
5 pages
Marketing Data Sources
No ratings yet
Marketing Data Sources
38 pages
Data Collection
No ratings yet
Data Collection
64 pages
Session 6 - Data Collection - Thien Nguyen
No ratings yet
Session 6 - Data Collection - Thien Nguyen
18 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
TECH8000 Week 05
No ratings yet
TECH8000 Week 05
30 pages
DA Total Notes
No ratings yet
DA Total Notes
99 pages
Methods of Data Collection Lesson
No ratings yet
Methods of Data Collection Lesson
3 pages
Data Analytics Unit I
No ratings yet
Data Analytics Unit I
22 pages
Data Collection & Sources of Data Collection: Faculty of Architecture, Urban & Town Planning
No ratings yet
Data Collection & Sources of Data Collection: Faculty of Architecture, Urban & Town Planning
6 pages
Module 2: Data Collection and Sampling Design
100% (1)
Module 2: Data Collection and Sampling Design
8 pages
Data Collection for Business Insights
No ratings yet
Data Collection for Business Insights
8 pages
Bes 3 (Daisy)
No ratings yet
Bes 3 (Daisy)
22 pages
DEV UNIT 1&2 Notes
No ratings yet
DEV UNIT 1&2 Notes
118 pages
Manaloto Arr 413 Data Collection
No ratings yet
Manaloto Arr 413 Data Collection
18 pages
DSP Allinone Unit3
No ratings yet
DSP Allinone Unit3
73 pages
DSP Allinone Unit4
No ratings yet
DSP Allinone Unit4
64 pages
DSP Allinone Unit5
No ratings yet
DSP Allinone Unit5
29 pages
ML All Units Open Elective
No ratings yet
ML All Units Open Elective
20 pages
SC Unit-4a
No ratings yet
SC Unit-4a
48 pages
SAS Notes: Cognizant Technology Solutions
No ratings yet
SAS Notes: Cognizant Technology Solutions
57 pages
Ccw331-Question Bank
No ratings yet
Ccw331-Question Bank
4 pages
Secrets of Analytical Leadership Insights
50% (2)
Secrets of Analytical Leadership Insights
18 pages
Resume Subh NTT
No ratings yet
Resume Subh NTT
3 pages
Decision Support System: Unit 1
No ratings yet
Decision Support System: Unit 1
34 pages
Pre-Ph.D Management Studies Subjects List
No ratings yet
Pre-Ph.D Management Studies Subjects List
25 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
10 pages
Data Warehousing Q&A Guide
No ratings yet
Data Warehousing Q&A Guide
5 pages
Relational Databases
No ratings yet
Relational Databases
18 pages
Thirunavukkarasu Ramanathan
No ratings yet
Thirunavukkarasu Ramanathan
4 pages
DWM Question Bank
No ratings yet
DWM Question Bank
3 pages
Abraham Resume
No ratings yet
Abraham Resume
2 pages
Srinivasa Garlapati - Resume - 7102013 PDF
No ratings yet
Srinivasa Garlapati - Resume - 7102013 PDF
6 pages
Information Package for Data Warehouse Analysis
No ratings yet
Information Package for Data Warehouse Analysis
18 pages
Data Warehouse
No ratings yet
Data Warehouse
33 pages
Mba Iv It
No ratings yet
Mba Iv It
6 pages
Unit 2
No ratings yet
Unit 2
15 pages
ETL-A Survey Of-2009
No ratings yet
ETL-A Survey Of-2009
28 pages
Notes KM Bca 6
No ratings yet
Notes KM Bca 6
29 pages
DWM PPT Modeling
No ratings yet
DWM PPT Modeling
98 pages
Data Warehousing and Minig Syllabus
100% (1)
Data Warehousing and Minig Syllabus
2 pages
Enterprise Resource Planning System
No ratings yet
Enterprise Resource Planning System
4 pages
In The Star Schema Design
No ratings yet
In The Star Schema Design
11 pages
OLAP Cubes for Business Analytics
No ratings yet
OLAP Cubes for Business Analytics
5 pages
Information Packages - A New Concept
No ratings yet
Information Packages - A New Concept
27 pages
Accelerate Machine Learning With A Unified Analytics Architecture
No ratings yet
Accelerate Machine Learning With A Unified Analytics Architecture
56 pages
Accounting Information Systems, 6: Edition James A. Hall
No ratings yet
Accounting Information Systems, 6: Edition James A. Hall
33 pages
Data Warehouse Thesis Help Guide
100% (3)
Data Warehouse Thesis Help Guide
5 pages
Term Paper Warehouse
100% (1)
Term Paper Warehouse
8 pages
CS7079NI - Data Warehousing and Big Data Y22 Autumn (1st Sit) - CW QP
No ratings yet
CS7079NI - Data Warehousing and Big Data Y22 Autumn (1st Sit) - CW QP
5 pages

DS Unit-2

Uploaded by

DS Unit-2

Uploaded by

UNIT-II: Data Collection and Management- Introduction, Sources of data, Data collection and

You might also like