0% found this document useful (0 votes)
30 views49 pages

Chapter 2 - Data and Knowledge Management-Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views49 pages

Chapter 2 - Data and Knowledge Management-Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Subject: Management Information system Semester: VII

Chapter 2 -Data and Knowledge Management

The chapter emphasizes the need and importance of Data and Knowledge Management along with
Business intelligence in the domain of Management Information System

Data and Knowledge Management


• Database Approach
• Big Data
• Data warehouse and Data Marts
• Knowledge Management

2.1 Database Approach


❖ The database approach is an improvement on the shared file solution as the use of a
database management system (DBMS) provides facilities for querying,data security
and integrity, and allows simultaneous access to data by several different users.

❖ Database: A database is a collection of related data.


❖ The Database is a shared collection of logically related data, designed to meetthe
information needs of an organization.
❖ A database is a computer-based record keeping system whose over all purpose is to
record and maintains information.
❖ The database is a single, large repository of data, which can be used simultaneously
by many departments and users. Instead of disconnected fileswith redundant data,
all data items are integrated with a minimum amount ofduplication.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 2: Database Approach

2.1.1 Building blocks of a Database


The following three components form the building blocks of a database. They storethe data
that we want to save in our database.

i. Columns. Columns are like fields, that is, individual items of data that we wish to
store. A Student' Roll Number, Name, Address etc. are all examplesof columns.
They are also like the columns found in spreadsheets (the A, B,C etc. along the
top).

ii. Rows. Rows are like records as they contain data of multiple columns (like the 1,
2, 3 etc. in a spreadsheet). A row can be made up of as many or as few columns as
you want. This makes reading data much more efficient - you fetch what you want.

iii. Tables. A table is a logical group of columns. For example, you may have a table
that stores details of customers' names and addresses. Another table would be
used to store details of parts and yet another would be used for supplier's names
and addresses.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

2.1.2 Characteristics of database


The data in a database should have the following features:

i. Organized/Related: It should be well organized and related.

ii. Shared: Data in a database are shared among different users and
applications.

iii. Permanent or Persistence: Data in a database exist permanently in thesense


the data can live beyond the scope of the process that created it.

iv. Validity/integrity/Correctness: Data should be correct with respect to thereal-


world entity that they represent.

v. Security: Data should be protected from unauthorized access.

vi. Consistency: Whenever more than one data element in a database represents
related real-world values, the values should be consistent with respect to the
relationship.

vii. Non-redundancy: No two data items in a database should represent the same
real-world entity.

viii. Independence: Data at different levels should be independent of each other so


that the changes in one level should not affect the other levels.

ix. Easily Accessible: It should be available when and where it is needed i.e. itshould
be easily accessible.

x. Recoverable: It should be recoverable in case of damage.

xi. Flexible to change: It should be flexible to change.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 4: Database Approach

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

2.1.3 Traditional File Processing System and Its


Characteristics
i. Traditional File Processing Systems: It was totally computer-based system
where all the information is store in different computer files.
ii. Traditional files system stores data in a manner that all the departments ofan
organization have their own set of files that creates data redundancy. For
example:
To illustrate Traditional File Processing Systems definition, lets us take an example
of college where student record for examination is stored n other file and his
library record is stored in different file that creates manyduplicate values like roll
Number, Name and Father Name.

A typical Traditional File Processing Systems is shown in the diagram that shows
program and data independency.

Library Examinations Registrations

Library Examination Registration


Applications Applications Applications

Registration Data Registration Data Registration Data

Figure 5: Transaction Processing System

2.2 Big Data


i. Big Data is becoming one of the most talked about technology trendsnowadays.
ii. The real challenge with the big organization is to get maximum out of the data
iii. already available and predict what kind of data to collect in the future.
iv. How to take the existing data and make it meaningful that it provides us accurate
insight in the past data is one of the key discussion points in many ofthe executive
meetings in organizations.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

v. With the explosion of the data the challenge has gone to the next level and now a Big
Data is becoming the reality in many organizations.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

vi. The goal of every organization and expert is same to get maximum out of the data,
the route and the starting point are different for each organization and expert.
vii. As organizations are evaluating and architecting big data solutions they are also
learning the ways and opportunities which are related to Big Data.
viii. There is not a single solution to big data as well there is not a single vendor which can
claim to know all about Big Data.
ix. Big Data is too big a concept and there are many players – different architectures,
different vendors and different technology.

The three Vs of Big data are Velocity, Volume and Variety.

Figure 6:Big Data Sphere

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 7:Big Data – Transactions, Interactions, Observations

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Big data Characteristics

The three Vs of Big data are Velocity, Volume and Variety

Figure 8:Characteristics of Big Data

Volume

i. The exponential growth in the data storage as the data is now more than textdata.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

ii. The data can be found in the format of videos, music’s and large images on
our social media channels.

iii. It is very common to have Terabytes and Petabytes of the storage system for
enterprises.
iv. As the database grows the applications and architecture built to support the data
needs to be reevaluated quite often.
v. Sometimes the same data is re-evaluated with multiple angles and even though the
original data is the same the new found intelligence creates explosion of the data.
vi. The big volume indeed represents Big Data.

Velocity
i. The data growth and social media explosion have changed how we look at thedata.
There was a time when we used to believe that data of yesterday is recent.
ii. The matter of the fact newspapers is still following that logic.
iii. However, news channels and radios have changed how fast we receive the news.
iv. Today, people reply on social media to update them with the latest happening. On
social media sometimes a few seconds old messages (a tweet, status updates etc.) is
not something interests users.
v. They often discard old messages and pay attention to recent updates.
vi. The data movement is now almost real time and the update window has reduced to
fractions of the seconds.
vii. This high velocity data represents Big Data.

Variety
i. Data can be stored in multiple format. For example, database, excel, csv, accessor for
the matter of the fact, it can be stored in a simple text file.
ii. Sometimes the data is not even in the traditional format as we assume, it maybe in
the form of video, SMS, pdf or something we might have not thought about it. It is the
need of the organization to arrange it and make it meaningful.
iii. It will be easy to do so if we have data in the same format, however it is not the case
most of the time.
iv. The real world has data in many different formats and that is the challenge weneed
to overcome with the Big Data.
v. This variety of the data represent Big Data.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 9:Volume, Velocity, Variety

2.3 Data warehouse and Data Marts


2.3.1 Introduction to Data Warehousing

A data warehouse is storage of convenient, consistent, complete and consolidated data,


which is collected for the purpose of making quick analysis for the end users who take
place in Decision Support Systems (DSS).

Data warehouses have no standard definition and the people who work on data
warehouse subject have defined it in many ways as follows:

[1] “The basic data warehouse architecture interposes between end-user desktops and
production data sources a warehouse that we usually think of as a single, large system
maintaining an approximation of an enterprise data model.”
[2] “A data warehouse is a copy of transaction data specifically structured for querying
and reporting.”
[3] “A data warehouse as a “subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making process”.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

These data is obtained from different operational sources and kept in separate physical
store. A data warehouse is not only a relational database that contains historical data
derived from transactional data but also it is an environment that includes all the
operations and applications to manage the process of gathering data, and delivering it
to business users such as extraction, transportation, transformation, and loading (ETL)
solution, an online analytical processing (OLAP) engine, client analysis tools.

Figure 12:Data Warehouse System Model

i. Subject-Oriented: Data warehouses are designed to aid in decision making for a specific
subject. For example, sales data for applications contains specific sales of specific products
to specific customers. In contrast, sales data for decision support contains a historical
record of sales over specific time intervals. If designed well, subject-oriented data
provides a stable image of business processes, independentof legacy systems. In other
words, it captures the basic nature of the business environment.
ii. Integrated: Data warehouse consists of different kind of data which are collected from
separate legacy systems and this can create conflicts and inconsistencies among units of
measure.
iii. Because of this, they have to be put in a consistent format and by this way they become
integrated.
iv. Nonvolatile: Nonvolatile means that, once entered the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable a userto analyze
what has occurred. New data is always appended to the database, ratherthan replaced. The
database continually absorbs new data, integrating it with the previous data.
v. Time variant: There is difference between operational data and informational datafrom
the point of time valiancy. Operational data is valid only now of access- capturing a
Prof. Rushikesh R. Nikam Department Computer Engineering
Subject: Management Information system Semester: VII

moment in time. When performance requirements are demanded, historical data is


needed. Due to the data warehouse data represents data over a long time horizon;
historical analysis can be easily performed.

2.3.2 The Goals of a Data Warehouse


The fundamental goals of the data warehouse are:

1- “Makes an organization’s information accessible.” The contents of the data warehouse are
correctly labeled and obvious. It is very easy to reach to data because they are oneclick away
and there is no need to wait for this. These properties are called as same inthe above order;
understandable, navigable and fast performance.
2- “Makes the organization’s information consistent.” Consistent information has a key
importance for the data warehouses since they get data from different parts of an
organization. They must be matched properly. If two measures of the organization have the
same name, then they must mean the same thing. Conversely, in two measures don’t mean
the same thing, they are labeled differently.
3- “To be an adaptive and resilient source of information.” It enables to add new data and ask
new questions without any change in existing data and the technologies dueto it are designed
for continuous change.
4- “To be a secure bastion that protects owner’s information asset.” The data warehousenot
only controls access to the data effectively, but also gives its owners great visibilityinto the uses
and abuses of that data, even after it has left the data warehouse.
5- “To be the foundation for decision-making.” The data warehouse provides the right data for
the decision makers. The decisions are output of the data warehouses.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

2.3.3 Basic Elements of the Data Warehouse


i. Source System
A source system is called as legacy system that captures business data and
transactions. Source system has to be uptime and available and it gives a chance
to share basic dimensions as product and customer with other legacy system in the
organization. It is the largest source of data for analysis systems therefore it is a
burden to create queries and management reports directly from these systems.

ii. Data Staging Area


A data staging area is an initial storage area where set of processes- that clean,
transform, combine, de-duplicate, household, archive- are performed on the data
in order to use them in the data warehouse. The data staging area acts as a bridge
between the source system and presentation server. Data staging area can be
spread over a number of machines and does not need to be based on relational
technology. Unlike the presentation service, which will be describe below, the main
restriction of data staging area is that it never provides query and presentation
services.

iii. Presentation Server


A presentation server is a physical machine that stores the processed data for the
end user’s querying and reporting requirements. It is fed from data staging area. If
the query able presentation resource for an enterprise’s data organizes around an
entity-relation model, understandability and performance will be lost. Also the
tables will be organized as star schema if the presentation server presents and
stores data in a dimensional framework.

iv. Dimensional Model


Dimensional model, which is designed to provide higher query performance,
resilience to change and to be more understandable, is an alternative model to
entity relation model. The dimensional model consists of fact table and dimension
tables.
A fact table contains measurement of the business that is preferred to be numeric
and additive. There has to be a set of two or more foreign keys that helps to join
Prof. Rushikesh R. Nikam Department Computer Engineering
Subject: Management Information system Semester: VII

dimension tables to fact table.


A dimension table is complementary to the fact table. Most of them have many
textual attributes. It also has primary key enables to make a relation with the fact
table.

v. Data Mart
Data mart is a logical subset of the complete data warehouse and prepared for a
single business process in an organization. When they come together, an

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

integrated enterprise data warehouse is formed. Data marts must be built from
shared dimensions and fact. By this way they can be combined and used together.

vi. OLAP (On-Line Analytic Processing)


OLAP enables querying and presenting text and number data from data
warehouses for end users. OLAP technology is based on multidimensional cube of
data and OLAP databases have multidimensional structure.

vii. End User Application


These applications help end users to prepare queries, make analysis and perform
other activities which are targeted to support business needs such as end user data
access tool and ad hoc query tool.
End user data access tool works with SQL session and provides to the user a report,
a screen of data or another forms of analysis.
Ad hoc query tool facilitates preparing queries by given an opportunity to the user
to use pre-built query templates.

viii. Modeling Application


Modeling applications enable to transform or make a summary from the data
warehouse by forecasting models, behavior scoring models allocation models and
data mining tools.

ix. Metadata
Metadata contains information and definitions about the data, which is stored.

The basic elements of the data warehouse are given in Figure 3

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII
Source End User Data Access

Legacy

ct

ct

End
User group driven;

Models:
ct

Figure 13:The Basic Elements of the Data Warehouse

2.4 Differences between Operational Database


Systems and Data Warehouses
The major task of online operational database systems is to perform online transaction
and query processing. These systems are called online transaction processing (OLTP)
systems. They cover most of the day-to-day operations of an organization.

Data warehouse systems, on the other hand, users or knowledge workers in the role
of data analysis and decision making. Such systems can organize and present data in
various formats in order to accommodate the diverse needs of different users. These
systems are known as online analytical processing (OLAP) systems.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 14: Data Warehouse Architecture

Feature OLTP OLAP

Characteristic Operational processing Informational processing

Orientation Transaction Analysis

User Clerk, DBA, database Knowledge worker


professional (Manager, analyst, executive)
Function Day-to-day operations long-term informational,
requirements decision support
DB design ER-based,application oriented Star/snowflake, subject-oriented

Data Current, guaranteed Historic, accuracy, maintained


Up to date over time
Summarization Primitive, Summarized, consolidated
Highly detailed
View Detailed, flat relational Summarized, multidimensional

Unit of work Short, simple Complex query


Transaction
Access Read/write Mostly read

Focus Data in Information out

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Operations Index/ hash on primary key Lots of scans

Number of Tens Millions


records accessed
Number of Users Thousands Hundreds

DB size GB to high- order GB >=TB

Priority High performance, High flexibility, end-user


High availability autonomy

(1) Users and System Orientation: An OLTP system is used for transaction and query
processing by clerk, clients and information technology professionals. An OLAP system is
used for data analysis by knowledge workers, analysts, managers and executives.
(2) Data Contents: An OLTP system manages current data that typically are too detailed to
be easily used for decision making. An OLAP system manages large amounts of historic
data, provides facilities for summarization and aggregation and stores and manages
information at different levels of granularity. These features make the data easier to use
for informed decision making.
(3) Database Design: An OLTP systems use the entity-relationship(ER) data model and an
application-oriented database design. An OLAP systems use a star or snowflakemodel and
subject-oriented database design.
(4) View: An OLTP system focuses mainly on the current data within an enterprise or
department, without referring to historic data or data in different organization. In
contrast, an OLAP system often spans multiple versions of a database schema, dueto the
evolutionary process of an organization. OLAP systems also deal with information that
originates from different organizations, integrating information from many data stores.
Because of their huge volume, OLAP data are stored on multiple storage media.
(5) Access patterns: The access patterns of an OLTP system consist mainly of short, atomic
transactions. Such a system requires concurrency control and recovery mechanisms.
However, accesses to OLAP systems are mostly-read only operations,although many could
be complex queries.

2.5 Data Warehouse Architectures


Data warehouses and their architectures vary depending upon the specifics of an
organization's situation. Three common architectures are:
i. Data Warehouse Architecture (Basic)

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

ii. Data Warehouse Architecture (with a Staging Area)


iii. Data Warehouse Architecture (with a Staging Area and Data Marts)

2.6.1 Data Warehouse Architecture (Basic)

By this simple architecture for a data warehouse seen in Figure 3.6.1, end users
directly access data derived from several source systems through the data
warehouse.

Data Sources Warehouses Users

Flat files

Figure 15:Architecture of a Data Warehouse (Basic)

An additional type of data, summary data is very valuable in data warehouses because they
pre-compute long operations in advance. For example, the result of the query that is about sales
of last year is retrieved by adding sales data.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Data Warehouse Architecture (with a Staging Area)

The most data warehouses use a staging area in order to clean and process the operational
data before putting it into the warehouse. A staging area simplifies building summaries and

general warehouse management. The quite common architecture is shown in Figure 3.6.2.

Flat files

Figure 16:Architecture of a Data Warehouse with a Staging Area

Data Warehouse Architecture (with a Staging Area and Data


Marts)

A warehouse’s architecture can be customized for different groups within the organization
by adding data marts, which are systems designed for specific parts of business.

The following Figure 3.6.3 shows an example. In this example, there are three data marts
which are designed separately for purchasing, sales, and inventories. This architecture gives an
opportunity to analyze historical data for purchases and sales.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 17: Architecture of a Data Warehouse with a Staging Area and Data Marts

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

3.8 Define Extraction, Transformation and Loading


Data warehouse systems use back-end tools and utilities to populate and refresh their data. These
tools and utilities include the following functions.

∑ Data Extraction which typically gathers data from multiple, heterogeneous and external
sources.
∑ Data Cleaning which detects errors in the data and rectifies them when possible.

∑ Data Transformation which converts data from legacy or host format to warehouse format.

∑ Load, which sorts, summarizes, consolidates, computes views, checks integrity and builds
indexes and partitions.
∑ Refresh, which propagates the updates from data source to the data warehouse.

3.9 Data warehouse Metadata


Given the complexity of information in an ODS and data warehouse, it is essential that
there be a mechanism for users to easily find out what data is there and how it can be used to
meet their needs. Providing metadata about the ODS or the data warehouse achieves this.
Metadata is data about data or documentation about the data that is needed by the users. It is
not the actual data warehouse, but answers “who, what, where, when, why and how” questions
about the data warehouse.

Another thing is that of Metadata is that it is structured data which describes the
characteristics of resource. Metadata is stored in the system itself and can be queried using tools
that are available on the system.

Examples:

(1) The table of contents and index in a book may be considered metadata for the book.

(2) A library catalogue may be considered metadata. The catalogue metadata consists ofseveral
predefined elements representing specific attributes of a resource, and each element can
have one or more values. These elements could be the name of the author, the name of the
document, the publisher’s name, the publication date and the categoryto which it belongs.
They could even include an abstract of the data.
(3) Suppose we say that a data element about a person is 80. This must be described by nothing that
it is the person’s weight and the unit is kilograms. Therefore (weight, kilogram) is the metadata

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

about the data is 80.


A metadata repository is a database of data about data (metadata). The purpose of the
metadata repository is to provide consistent and reliable access to data. The metadata

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

repository itself may be stored in a physical location in which metadata is drawn from separate
sources. Metadata may include information about how to access specific data or more details
about the data.

3.9.1 Role of Metadata

Metadata has a very important role in a data warehouse. The role of metadata in a
warehouse is different from the warehouse data, and it plays an important role.

The various roles of metadata are explained below.

i. Metadata acts as a directory.


ii. This directory helps the decision support system to locate the contents of the data
warehouse.
iii. Metadata helps in decision support system for mapping of data when data is
transformed from operational environment to data warehouse environment.
iv. Metadata helps in summarization between current detailed data and highly
summarized data.
v. Metadata is used for query tools.
vi. Metadata is used in extraction and cleansing tools.
vii. Metadata is used in reporting tools.
viii. Metadata is used in transformation tools.
ix. Metadata plays an importing role in loading functions.

Metadata plays a very different role than data warehouse and it is important for many reasons.
Example: A metadata are used as a directory to help the decision support system analyst locate
the contents of the data warehouse, and as a guide to the data mapping when data are
transformed from the operational environment to the data warehouse environment. Metadata
also serve as a guide to the algorithms used for summarization between the current detailed data
and the highly summarized data, and between the lightly summarized data and the highly
summarized data. Metadata should be stored and managed persistently.

The following diagrams show the role of Metadata.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 18:Role of Metadata Chart

3.9.2 Metadata Repository:

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Metadata repository is an integral part of a data warehouse system.

A Metadata repository should contain the following:

(1) Definition of data warehouse:

It includes the description of the structure of data warehouse. The description is defined by
schema, view, hierarchies, derived data definitions, and data mart location and contents.

(2) Business Metadata:

It includes the business terms and definitions, data ownership information and changing
policies.

(3) Operational Metadata:

It includes currency of data and data lineage. Currency of data means whether the data is
active, archived or purged. Lineage of data means the history of data migrated and
transformation applied on it.

(4) Data for mapping from operational environment to data warehouse:

It includes source databases and their contents, data partitions, data extraction, cleaning,
transformation rules, data refresh and purging rules and security (user authorization and
access control).
(5) The algorithms used for summarization:It includes measure and dimension definition
algorithms, data on granularity, partitions,subject areas, aggregation, summarization, and predefined
queries and reports.
(6) Data related to system performance:

It includes indices and profiles that improve data access and retrieval performance, in
addition to rules for the timing and scheduling or refresh, update and replication cycles.

3.9.3 Types of Metadata in Data Warehouse Architecture:

The two most common approaches to building Meta data repository architecture are:

(1) Centralized
(2) Decentralized
Generally small to medium sized organizations, a single metadata repository (the centralized
approach) is enough for handling all of the metadata required by the various groups in the

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

corporation. This architecture offers a single and centralized approach to administering and
sharing metadata.

On the Other hand, most large enterprises that have multiple and disparate divisions will require
several metadata repository for handling all of the corporation’s various types of metadata
content and applications.

3.9.3.1 Centralized Metadata Repository Architecture:

This approach is the most common one that corporations have implemented.

The concept of a centralized Metadata architecture, consistent Meta model that mandates the
schema for defining and organizing the various metadata be stored in a global metadata
repository.

The strength of this approach is that it integrates all of the metadata and stores it in the Meta
model schema that can be easily accessed.

Process

Figure 3.9.3.1 Centralized Metadata Repository Architecture

3.9.3.2 Decentralized Metadata Repository Architecture:

Decentralized Metadata architecture creates a uniform and consistent Meta model that mandates
the schema for defining and organizing the various Metadata to be stored in a global metadata
repository and in the shared metadata elements that appear in the local meta data repository.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

All the Metadata that is shared and reused among the various repositories must first go
through the central global repository but sharing and access to the local metadata is independent
of the central repository.

MetaData sources MetaData sources MetaData sources

Figure 19:Decentralized Metadata Repository Architecture

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

3.10 Mapping

A basic part of the data warehouse environment is that of mapping from the operational
environment into the data warehouse.

The mapping includes a wide variety of feature include some here.

∑ Mapping from one attribute to another

∑ Conversions

∑ Changes in mapping conventions.

∑ Changes in physical characteristics of data.


∑ Filtering of data , etc. Example:

Consider the Vice president of marketing who has just asked for a new report of product selling
and purchasing. The manager turns to the data warehouse for the data for report. Uponinspection,
the vice president proclaims the report to be fiction. Than manager who can prove that data in
the report to be valid. The manager first looks to the validity of the data in the warehouse. If the
data warehouse, data has not been reported properly then the reports are adjusted.

However, if the reports have been made properly from the data warehouse, the manage having
to go back to the operational sources. At this point, if the mapping data has been carefully stored,
then the manager can quickly and easily go to the operational source. However, if the mapping
has not been stored properly, then manager has a difficult time defending conclusion to the vice
president.

The metadata store for the data warehouse then is natural place for the storing of mapping
information.

Figure 20:Functionality chart of Mapping

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

3.11. Data Mart


The data mart is a model, which represents the same data structure with the data
warehouse. They are prepared for specific requirements of the whole organization or a part of it.
The data mart contains less data that gives to users some advantages. Firstly it enables to work
with faster queries. Another advantage is mobility due to it requires less hard disk space so the
user can carry the data mart with the laptop. During the designing process of the data marts, it is
possible to follow up two different methods in order to collect the data. One option is to collect
the granular data from the enterprise data warehouse and then process it according to the needs
around which the data mart was prepared. The second option is to collect shaped data directly to
the data mart. The data, which is designed up to the requirements of data mart, then is kept in the
central repository of all enterprise data. In Figure 3.11 the options can be seen.

Figure 21:Data Mart

Data marts can have dependent or independent structure. If the characteristic of the data
marts’ dimensions is defined at the beginning, as they would be compliant to each other
then these data marts will have dependent characteristic.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

In some situations, it is better to have independent data marts. This time the characteristic
of the other data marts will not take in the consideration during the preparation of the
datamart. However, this can prevent future integration and add development cost if there
willbe an interest in sharing information across departments.

3.11.1 Reason for creating a Data Mart

i. To give users more flexible access to the data they need to analyze most often.
ii. To provide data in a form that matches the collective view of a group of users.
iii. To improve end uses response time.
iv. Potential users of a data mart are clearly defined and can be targeted for support to
retrieve the data.
v. To provide appropriately structured data as dictated by the requirements of the enduser
access tools.
vi. Building a data mart is simpler compared with establishing a corporate data warehouse.
vii. The cost of implementing data marts is far less than that required to establish a data
warehouse.
viii. Data mart is the access larger of the data warehouse environment. That means we
create data mart to retrieve the data to the users faster.
ix. The Data mart is the subset of warehouse that means all the data available in thedata
mart will be available in database. This Data mart will be created for the purpose of
specific business.
x. It is easy to access frequently needed data from the database when required by the
client.
xi. We can give access to group of users to view the Data mart when it is required. Of
course, performance will be good.
xii. It is easy to maintain and to create the data mart. It will be related to specific business.
xiii. It is low cost to create a data mart rather than creating data warehouse with a huge
space.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Resource

Finance

Figure 22:Functionality chart of Data Mart

3.11.2 Data Marts Development Approaches

There are three main approaches for building data marts; top-down approach, bottomup
approach and federated approach.

3.11.2.1. Top-Down Approach


As shown in the Figure 23 below the data firstly comes to the data staging area from the
operational sources and in this area some of the processes are performed to the data. After this it
is transferred to the data warehouse which then feeds it to the dependent data mart.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Data Marts

Enterprise Data Warehouse


(EDW )

ODS

Figure 23:Top-Down Approach to Data Mart Development

3.11.2.2. Bottom-Up Approach


In this approach, the data, which comes from legacy systems to the staging area, flows
directly into the independent data marts and then these data marts feed the enterprise data
warehouse as it is illustrated in Figure 24

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Enterprise Data Warehouse (EDW)

ODS

Figure 24:Bottom-Up Approach to Data Mart Development

3.11.3 The Differences between Data Mart and Data Warehouse

When the data mart is compared with the data warehouse, two fundamental distinctions
can easily be noticed. One of them is that data mart is a subset of the data warehouse
and it is requirement oriented. Against this data warehouse holds the enterprise data
without taking care about any specific requirements. But of course, during the design of
data mart the structure of the whole warehouse has to be considered, if not it will be very
hard to integrate the data marts later.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 25: Data Warehouse and Data Mart

The implementation of the data mart is much faster and costs cheaper, since a data mart
contains only a specific part of the data warehouse whose implementation is more time
consuming and costs much more.

There are some data mart solutions that are developed by the many decision support
systems (DSS) vendors. But using them to design a data mart for the specific
requirements needs to spend much more effort to customize them; due to this solutions
are produced for general purposes.

The other main difference of the data mart from the data warehouse is that the data in
the data mart can be more granular than the data warehouse. Since the requirements of
the data mart are more defined than those of the data warehouse, preaggregation can
be afforded to the data along the requirements. So the extraction of the data can be
done faster and more efficient.

Parameter Data Warehouse Data Mart

Definition A Data Warehouse is a large A data mart is an only subtype of


repository of data collected from a Data Warehouse. It is designed
different organizations or to meet the need of a certain user
departments within a corporation. group.

Usage It helps to take a strategic decision. It helps to take tactical decisions


for the business.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Objective The main objective of Data A data mart mostly used in a


Warehouse is to provide an business division at the
integrated environment and department level.
coherent picture of the business at
a point in time.

Designing The designing process of Data The designing process of Data


Warehouse is quite difficult. Mart is easy.

Model May or may not use in a It is built focused on a


dimensional model. However, it dimensional model using a start
can feed dimensional models. schema.

Data Handling Data warehousing includes large Data marts are easy to use, design
area of the corporation which is and implement as it can only
why it takes a long time to process handle small amounts of data.
it.

Focus Data warehousing is broadly Data Mart is subject-oriented, and


focused all the departments. It is it is used at a department level.
possible that it can even represent
the entire company.

Data type The data stored inside the Data Data Marts are built for particular
Warehouse are always detailed user groups. Therefore, data short
when compared with data mart. and limited.

Subject-area The main objective of Data Mostly hold only one subject area-
Warehouse is to provide an for example, Sales figure.
integrated environment and
coherent picture of the business at
a point in time.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

wledge Designed
Data storing
Management
to store enterprise-wide Dimensional modeling and star
decision data, not just marketing schema design employed for
Knowledge is very important for survival of organization. Historically,
data. optimizing the performance of
employees have gathered knowledgeaccess through trial-and-error method
layer.
or by working as an apprentice under a tenured knowledgeable
employee. Management guru Peter Drucker forwarded a concept that
Data type Time variance
knowledge is asandvaluable
non-volatile Mostly includes
as a company’s variousconsolidation
asset like data
plant,
design areetc.
machinery, strictly enforced. management
A knowledge structures
systemto meet subject aarea's
comprises range of
query and reporting
practices used in an organization to identify, create, represent,needs.

distribute, and enable adoption to insight and experience. Such insights


Data value andRead-Only
experiencefrom comprise knowledge,Transaction
the end-user’s either embodied in individual
data regardless of or
embedded in organizational processesgrain
standpoint. andfed
practices
directly from the Data
Warehouse.
Importance of Knowledge Management
Scope Data warehousing
Knowledge providesisamore helpful advantage
competitive Data marttocontains data, ofas
an employee a well as
as it can bring information from specific department of a company.
the organization. The data and information which come with
any department. There are maybe separate data
knowledge help organization make an informed decision. For example,
marts for sales, finance, marketing,
knowledge aboutcompetitors pricingetc. model or business strategy can
Has limited usage
help organization work towards bettering the competitor. Historical
data e.g sales data, pricing data, etc. can help organization improve
Source In Data
existing orWarehouse
proposed Data comes
business In Data Mart data comes from
initiative.
from many sources. very few sources.
Knowledge management is a highly iterative process which consists of
six major tasks like create, capture, refine store, tag and circulate. The
Size firstThe sizeisof
step to the DataorWarehouse
create capture data andThestore
Size of Data
it at Mart is lesslocation.
appropriate than
Themay rangestep
second fromis100 GB to 1the
to refine TB+.data into
100 GB.
meaningful information. The
third step is to transmitinformation to relevant stakeholders.
Implementation There
The are
implementation
two types ofprocess of
knowledge, The implementation
which process
need to be capture asofpart of
time Data Warehouse
knowledge can be extended
management. The first Data
typeMart is restricted
is hard data to
in few
terms of
from months to years. months.
numbers and figures. The second type of knowledge is the
interpretation of data captured based on experience. The real need of
the knowledge management system is to provide access to the
knowledge base whenever required.
2.6
Knowledge Management System
Kno
Prof. Rushikesh R. Nikam Department Computer Engineering
Subject: Management Information system Semester: VII

T ine, tag and circulate information used to improve business productivity


h of the organization. There are three broadways of managing the
e knowledge system.
s
❖ Utilization of information technology and systems to improve
y
businessefficiency.
s
❖ Utilization of organizational method to improve business efficiency.
t
❖e Creating a healthy workplace to facilitate improvement of business efficiency.
m
Structure
s
The structure of the knowledge management system is dependent on the business
d
strategy of the organization. The final structure needs to have alignment of
e
technology, organizational structure and work culture.
v
e
l
o
p
t
o
c
a
p
t
u
r
e
,
c
r
e
a
t
e
,
r
e
f

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Figure 26:Knowledge Management

Types of Knowledge Management Systems


Based on structure and requirement of organization, there are several types of
knowledge management systems. Some of them are as follows:

i. Expert Systems
These are knowledge management systems developed to facilitate a Subject Matter
Expert. This module provides knowledge of different subjects.

ii. Groupware
In the current global scenario, team members are spread across regions. However, it is
important for them to collaborate on various projects. Groupware is a knowledge
management system which helps in sharing calendar, project activities and instant
messaging.

iii. SharePoint

It is important for team to store various documents at a single location. SharePoint


enables a user to store multiple version of the same document, helps a user search
through folders for document, etc.

iv. Decision Support System


Decision support system helps floor managers; Sales Manager, CEO, etc. take decisions
to finalize business or operational strategy. Decision support system comprises of
primary data as well as secondary data. Decision support system enables editing of data
and converts it information in the desired format.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

v. Database Management System


Knowledge management systems which support active storage and retrieval of data are
known as a database management system.

All the systems we are discussing here come under knowledge management category.
A knowledge management system is not radically different from all these information
systems, but it just extends the already existing systems by assimilating more
information.

As we have seen, data is raw facts, information is processed and/or interpreted data,
and knowledge is personalized information.

What is Knowledge?
• Personalized information
• State of knowing and understanding
• An object to be stored and manipulated
• A process of applying expertise
• A condition of access to information
• Potential to influence action
Sources of Knowledge of an Organization

• Intranet
• Data warehouses and knowledge repositories
• Decision support tools
• Groupware for supporting collaboration
• Networks of knowledge workers
• Internal expertise

Purpose of KMS

• Improved performance
• Competitive advantage

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

• Innovation
• Sharing of knowledge
• Integration
• Continuous improvement by −
o Driving strategy
o Starting new lines of business
o Solving problems faster
o Developing professional skills
o Recruit and retain talent
Activities in Knowledge Management
• Start with the business problem and the business value to be delivered first.
• Identify what kind of strategy to pursue to deliver this value and address the KM
problem.
• Think about the system required from a people and process point of view.
• Finally, think about what kind of technical infrastructure are required to supportthe
people and processes.
• Implement system and processes with appropriate change management anditerative
staged release.
Level of Knowledge Management

Figure 27:Level of Knowledge Management

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Business Intelligence (BI)

• Managers and Decision Making

Decision Making in Management

Decision making is the mental process of selecting a course of action from a set of alternatives.
Decision making is the mental process of choosing from a set of alternatives. Every decision-
making process produces an outcome that might be an action, a recommendation, or an opinion.
Since doing nothing or remaining neutral is usually among the set of options one chooses from,
selecting that course is also deciding.

Difference Between Problem Analysis and Decision Making


While they are related, problem analysis and decision making are distinct activities. Decisions are
commonly focused on a problem or challenge. Decision makers must gather and consider data
before making a choice. Problem analysis involves framing the issue bydefining its boundaries,
establishing criteria with which to select from alternatives, and developing conclusions based on
available information. Analyzing a problem may not result in a decision, although the results are
an important ingredient in all decision making.

Steps in Decision Making


Decision making comprises a series of sequential activities that together structure the process
and facilitate its conclusion. These steps are:

• Establishing objectives
• Classifying and prioritizing objectives
• Developing selection criteria
• Identifying alternatives
• Evaluating alternatives against the selection criteria
• Choosing the alternative that best satisfies the selection criteria

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

• Implementing the decision


A major part of decision making involves the analysis of a defined set of alternatives against
selection criteria. These criteria usually include costs and benefits, advantages anddisadvantages,
and alignment with preferences. For example, when choosing a place to establish a new business,
the criteria might include rental costs, availability of skilled labor, access to transportation and
means of distribution, and proximity to customers. Based on the relative importance of these
factors, a business owner decides that best meets the criteria.

The decision maker may face a problem when trying to evaluate alternatives in terms of their
strengths and weaknesses. This can be especially challenging when there are many factors to
consider. Time limits and personal emotions also play a role in the process of choosing between
alternatives. Greater deliberation and information gathering often takes additional time, and
decision makers often must choose before they feel fully prepared. In addition, the more that is
at stake the more emotions are likely to come intoplay, and this can distort one’s judgment.

Types of Decisions
Three approaches to decision making are avoiding, problem solving and problem
seeking.

• Problem seeking: The process of clarifying, understanding, and restating the


problem.
• Problem solving: Problem solving involves using generic or ad hoc methods, inan
orderly manner, for finding solutions to specific problems.

Every decision-making process reaches a conclusion, which can be a choice to act or notto act,
a decision on what course of action to take and how, or even an opinion or recommendation.
Sometimes decision-making leads to redefining the issue or challenge. Accordingly, three
decision-making processes are known as avoiding, problem solving, and problem seeking.

One decision-making option is to make no choice at all. There are several reasons whythe
decision maker might do this:

1. There is insufficient information to make a reasoned choice between alternatives.


2. The potential negative consequences of selecting any alternative outweigh the
benefits of selecting one.
3. No pressing need for a choice exists and the status quo can continue without harm.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

4. The person considering the alternatives does not have the authority to decide.

BI for Data analysis and Presenting Results

Business Intelligence (BI) is a technology-driven process for analyzing data and presenting
actionable information to help executives, managers and other corporate end users make
informed business decisions. BI encompasses a wide variety of tools, applications and
methodologies that enable organizations to collect data from internal systems and
external sources, prepare it for analysis, develop and run queries against that data, and
create reports, dashboards and data visualizations to make the analytical results available
to corporate decision- makers, as well as operational workers.
Business intelligence is sometimes used interchangeably with business analytics. In other
cases, business analytics is used either more narrowly to refer to advanceddata analytics
or more broadly to include both BI and advanced analytics.

Importance of Business Intelligence


The potential benefits of business intelligence tools include accelerating and improving
decision-making, optimizing internal business processes, increasing operational
efficiency, driving new revenues and gaining competitive advantage over business rivals.
BI systems can also help companies identify market trends and spot business problems

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

that need to be addressed. BI data can include historical information stored in a data
warehouse, as well as new data gathered from source systems as it is generated, enabling
BI tools to support both strategic and tactical decision-making processes.

Initially, BI tools were primarily used by data analysts and other IT professionals who ran
analyses and produced reports with query results for business users. Increasingly,
however, business executives and workers are using BI platforms themselves, thanks
partly to the development of self-service BI and data discoverytools and dashboards.

Types of BI tools
Business intelligence combines a broad set of data analysis applications, includingad hoc
analytics and querying, enterprise reporting, online analytical processing (OLAP), mobile
BI, real-time BI, operational BI, software-as-a-service BI, open source BI, collaborative BI
and location intelligence.
BI technology also includes data visualization software for designing charts and other
infographics, as well as tools for building BI dashboards and performance scorecards that
display visualized data on business metrics and key performance indicators in an easy-to-
grasp way. Data visualization tools have become the standard of modern BI in recent
years. A couple leading vendors defined the technology early on, but more traditional BI
vendors have followed in their path. Now, virtually every major BI tool incorporates
features of visual data discovery.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

BI programs may also incorporate forms of advanced analytics, such as data mining,
predictive analytics, text mining, statistical analysis and big data analytics.In many cases,
though, advanced analytics projects are conducted and managed by separate teams of
data scientists, statisticians, predictive modelers and other skilled analytics professionals,
while BI teams oversee more straightforward querying and analysis of business data.
Business intelligence data is typically stored in a data warehouse or in smaller datamarts
that hold subsets of a company's information. In addition, Hadoop systems are
increasingly being used within BI architectures as repositories or landing padsfor BI and
analytics data, especially for unstructured data, log files, sensor data and other types of
big data. Before it is used in BI applications, raw data from different source systems must
be integrated, consolidated and cleansed using data integration and data quality tools to
ensure that users are analyzing accurateand consistent information.

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

Questions

2 Marks Questions
1. Define Database Approach
2. Define Big Data with example.
3. Define Datawarehouse and Data Mart.
4. Define Knowledge management with neat diagram.
5. What are the 3V’s of big data analytics.
6. What are the roles of BI?
7. Differentiate between traditional Computing and Stream Computing.
8. Define data management.

5 Marks Questions
1. Describe the importance of Business Intelligence and DSS in developing MIS
2. Explain the MIS pyramid.
3. What is Information system? What are functions of information system and
itsimpact on the society in the domain of health care.
4. Explain the ethical issues and threats of information security.
5. Differentiate between Datawarehouse and Data Mart.
6. Differentiate between OLAP and OLTP.
7. Explain with neat diagram the Value Chain of Big data.
8. Explain the Knowledge Management framework with KM Ladder.

10 Marks Question
1. What is the role of knowledge management and knowledge
managementprograms in business?
2. What are the business benefits of using intelligent techniques for
knowledgemanagement?

Prof. Rushikesh R. Nikam Department Computer Engineering


Subject: Management Information system Semester: VII

3. What is the role of knowledge management and knowledge


managementprograms in business?
4. How do different decision-making constituencies in an organization use
business intelligence?
5. Explain the importance of transition from traditional database systems to
bigdata analytics system with respect to the online processing system.

Prof. Rushikesh R. Nikam Department Computer Engineering

You might also like