0% found this document useful (0 votes)
26 views7 pages

Topic 1 & 2 Data Architecture

The document outlines the importance of data architecture in managing data for analysis, detailing its components, models, and key roles such as data architects and data engineers. It also discusses various sources of data, including primary and secondary sources, and describes different experimental designs used in data collection. Additionally, it highlights the influence of enterprise requirements, technology drivers, and business policies on data architecture design.

Uploaded by

sahithi.n64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

Topic 1 & 2 Data Architecture

The document outlines the importance of data architecture in managing data for analysis, detailing its components, models, and key roles such as data architects and data engineers. It also discusses various sources of data, including primary and secondary sources, and describes different experimental designs used in data collection. Additionally, it highlights the influence of enterprise requirements, technology drivers, and business policies on data architecture design.

Uploaded by

sahithi.n64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

1.

Design Data Architecture and manage the data for analysis


1.1 Data Architecture Design:
 Data is one of the essential pillars of enterprise architecture through which it
succeeds in execution of business strategy.
 Data architecture design is like a detailed plan for how to handle data in a
company, showing the steps for gathering, storing, accessing and using data.
 Data architecture is composed of models, polices, rules or standards that govern
how and which data is collected ,arrangement of data, sorting data, utilizing and
securing data into systems and data ware houses for further analysis.
 Data architecture design is important for creating a vision of interactions occurring
between data systems.
 Data architecture also describes the type of data structures applied to manage data
and it provides an easy way of data processing.
 Data architecture formed by dividing into 3 essential models and then are
combined:
i) Conceptual Model
 This is high-level business model that utilizes the Entity Relationship(ER) model
to depict the relationships between entities and their attributes.
 It serves as blueprint that outlines major entities and their connections but does not
delve into details of data structure.
 This model helps stakeholders understand the overall structure of data and its
interactions without needing to know technical details.
ii) logical model
 This model represents data organization at a more detailed level than
conceptual model, focusing on how data is structured logically.
 It is expressed in formats such as tables (rows & columns), classes in object
oriented programming, xml tags and other data management techniques.
 This logical model provides a blueprint for how the database should
operate, translating complex system designs into a readable form for
technical developers.
iii) Physical model
 This model is about the actual implementation of data base, detailing
how the logical design will be executed using specific database
technologies.
 It includes specifications like file structures, database, indexes, data
partitioning and hardware requirements.
 This model is crucial for data base administrators who are responsible
for technical deployment and maintenance of data base systems.

A data architect is responsible for all the design, creation, management, deployment of
data architecture and defines how data is to be stored and retrieved; other decisions are
made by internal bodies.

Key components of data architecture:


Key components of data architecture include data models, data flow diagrams, metadata
and data governance policies. These elements work together to ensure data is accurate,
accessible and secure.

Key roles in data architecture design:


The people who play vital role in shaping and maintaining modern data architecture are:

Data architect: A data architect is an IT professional who designs, creates, and manages
an organization's data infrastructure. They ensure that data is accessible, secure, and
reliable by developing and maintaining data frameworks, models, and policies. They play
a crucial role in translating business requirements into technical solutions for data
management and utilization.

Project manager: Data project managers oversee data-related initiatives; ensuring


projects align with business goals and are completed on time and within budget. They
bridge the gap between data, technology, and business needs, coordinating teams and
managing resources to deliver actionable insights.

Cloud architect or data center engineer: Cloud architects play a crucial role in data
analytics by designing, implementing, and managing the cloud infrastructure that
supports the entire data processing. Their responsibilities include ensuring scalability,
performance, security, and cost-effectiveness of the cloud environment for big data
solutions.
Data engineer: Data engineer’s role is building and managing the infrastructure that
supports data collection, storage, processing, and accessibility. Tasks like designing data
models, managing databases and data warehouses, and ensuring data quality and
reliability.

Data analyst: They are responsible for collecting, cleaning, analyzing, and interpreting
large datasets to identify trends, patterns, and anomalies.

Data scientists: They are responsible for leveraging their expertise in statistics, machine
learning, and programming to extract valuable insights from large, complex
datasets. They are responsible for transforming raw data into actionable information that
can drive business decisions and strategies.

 Various constraints and influences will have an effect on data architecture design.
These include enterprise requirements, technology drivers, economics, business
policies and data processing needs.
Enterprise requirements: This involves defining clear objectives, establishing
data governance, and leveraging scalable, secure, and flexible solutions. Key
aspects include understanding business context, defining data scope, designing the
architecture, validating it, implementing it, and continuously evolving it to meet
changing needs
Technology drivers: Several key technology drivers are shaping modern data
architecture. These include the cloud computing, the real-time data processing, the
growing importance of artificial intelligence (AI) and machine learning (ML), and
the emphasis on data security and governance. These drivers necessitate a shift
towards more flexible, scalable, and intelligent data architectures.
Economics: It dictates how data is collected, stored, processed, and utilized to
maximize value and minimize costs. Well-designed data architecture, informed by
economic principles, ensures that data infrastructure supports business objectives
effectively and efficiently.
Business policies: These policies and rules will help describe the manner in which
enterprise wishes to process their data. These policies cover aspects like data
quality, access control, usage guidelines, and integration strategies. Effective data
architecture design translates these policies into technical specifications, enabling
organizations to manage data effectively and derive valuable insights.
Data processing needs: Dictating how data is handled from ingestion to
analysis. Well-designed data architecture ensures efficient data processing,
enabling organizations to extract meaningful insights from their data assets. This
involves considerations for storage, transformation, and access, all while
maintaining data quality, security, and scalability.

2.UNDERSTAND VARIOUS SOURCES OF DATA LIKE


SENSORS/SIGNALS/GPS
 Data can be generated from two types of sources namely Primary and Secondary
Sources.
 The sources of generating primary data are -
i) Observation Method ii) Survey Method iii) Experimental Method
There are number of experimental designs that are used in carrying out and
experiment.
 Market researchers have used 4 experimental designs most frequently.
i) Completely Randomized Design (CRD)
ii) Randomized Block Design (RBD)
iii) Latin Square Design (LSD)
iv) Factorial Designs (FD)
Completely Randomized Design (CRD)
 Simplest design to use.
 There only one primary factor under consideration in the experiment. The test
subjects are assigned to treatment levels of primary factor.
 The CRD is best suited for experiments with a small number of treatments.

Advantage
 CRDs are relatively simple to design and analyze, making them a good
starting point for many experiments.

Disadvantage

 Not suited for a large number of treatments.

ii) Randomized Block Design (RBD)


 The term Randomized Block Design has originated from agricultural
research. In this design several treatments of variables are applied to
different blocks of land to ascertain their effect on the yield of the crop.
 Blocks are formed in such a manner that each block contains as many plots
as a number of treatments so that one plot from each is selected at random
for each treatment.
 The production of each plot is measured after the treatment is given. These
data are then interpreted and inferences are drawn by using the analysis of
Variance Technique so as to know the effect of various treatments like
different dozes of fertilizers, different types of irrigation etc.

Latin Square Design (LSD)

 A Latin square is one of the experimental designs which has a balanced two way
classification scheme say for example - 4 X 4 arrangement. In this scheme each
letter from A to D occurs only once in each row and also only once in each
column. The balance arrangement, it may be noted that, will not get disturbed if
any row gets changed with the other.

ABCD

BCDA

CDAB

DABC

 The balance arrangement achieved in a Latin Square is its main strength. In this
design, the comparisons among treatments will be free from both differences
between rows and columns. Thus the magnitude of error will be smaller than any
other design.
Factorial Designs(FD):
 This design allows the experimenter to test two or more variables simultaneously.
It also measures interaction effects of the variables and analyses the impacts of
each of the variables. In a true experiment, randomization is essential so that the
experimenter can infer cause and effect without any bias.

Sources of Secondary Data:


 The secondary data can be obtained through
i) Internal Sources - These are within the organization
ii) External Sources - These are outside the organization

The Internal Sources Include

 Accounting resources: This gives so much information which can be used by the
marketing researcher. They give information about internal factors.
 Sales Force Report: It gives information about the sale of a product. The
information provided is of outside the organization.
 Internal Experts: These are people who are heading the various departments.
They can give an idea of how a particular thing is working.
 Miscellaneous Reports These are what information you are getting from
operational reports. If the data available within the organization are unsuitable or
inadequate, the marketer should extend the search to external secondary data
sources.
 External Sources of Data External Sources are sources which are outside the
company in a larger environment. Collection of external data is more difficult
because the data have much greater variety and the sources are much more
numerous.
 External data can be divided into following classes.

Government Publications: Government sources provide an extremely rich pool of data


for the researchers. In addition, many of these data are available free of cost on internet
websites. There are number of government agencies generating data, these are

Registrar General of India It is an office which generates demographic data. It includes


details of gender, age, occupation etc.

Central Statistical Organization This organization publishes the national accounts


statistics. It contains estimates of national income for several years, growth rate, and rate
of major economic activities. Commission It provides the basic statistics of Indian
Economy.

Reserve Bank of India provides information on Banking Savings and investment. RBI
also prepares currency and finance reports.

National Sample Survey


 This is done by the Ministry of Planning and it provides social, economic,
demographic, industrial and agricultural statistics.

Syndicate Services

 These services are provided by certain organizations which collect and tabulate the
marketing information on a regular basis for a number of clients who are the
subscribers to these services.
In collecting data from household they use three approaches
i)Survey They conduct surveys regarding - lifestyle, sociographic, general topics.
ii) Mail Diary Panel It may be related to 2 fields - Purchase and Media.
iii) Electronic Scanner Services These are used to generate data on volume.
 They collect data for Institutions from Whole sellers Retailers, and Industrial
Firms.

 International Organization These includes


 The International Labor Organization (ILO) It publishes data on the
total and active population, employment, Unemployment, wages and
consumer prices.
 The Organization for Economic Co-operation and development
(OECD) It publishes data on foreign trade, industry, food, transport, and
science and technology.
 The International Monetary Fund (IMA) It publishes reports on national
and international foreign exchange regulations.

You might also like