0% found this document useful (0 votes)
12 views21 pages

Unit 3 Notes

Uploaded by

harshithkataray1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

Unit 3 Notes

Uploaded by

harshithkataray1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT – III

Recent Advances in Access Control:

Information plays an important role in any organization and its protection against unauthorized disclosure
(secrecy) and unauthorized or improper modifications (integrity), while ensuring its availability to legitimate
users (no denials-of-service) is becoming of paramount importance. An important service in guaranteeing
information protection is the access control service. Access control is the process of mediating every request
to resources and data maintained by a system and determining whether the request should be granted or
denied. An access control system can be considered at three different abstractions of control: access control
policy, access control model, and access control mechanism. A policy defines the high level rules used to
verify whether an access request is to be granted or denied. A policy is then formalized through a security
model and is enforced by an access control mechanism.

An example of access matrix

Document1 Document2 Program1 Program2


Ann read, write read execute
Bob read read read, execute
Carol read, write read, execute
Davi read, write, read, write,
d execute execute

The variety and complexity of the protection requirements that may need to be imposed in today‘s systems
makes the definition of access control policies a far from trivial process. An access control system should be
simple and expressive. It should be simple to make easy the management task of specifying and maintaining
the security specifications. It should be expressive to make it possible to specify in a flexible way different
protection requirements that may need to be imposed on different resources and data. Moreover, an access
control system should include support for the following features.

• Policy combination. Since information may not be under the control of a single authority, access
control policies information may take into consideration the protection requirements of the owner,
but also the requirements of the collector and of other parties. These multiple authorities‘ scenario
should be supported from the administration point of view providing solutions for modular, large-
scale, scalable policy composition and interaction.
• Anonymity. Many services do not need to know the real identity of a user. It is then necessary to
make access control decisions dependent on the requester‘s attributes, which are usually proved by
digital certificates.
• Data outsourcing. A recent trend in the information technology area is represented by data
outsourcing, according to which companies shifted from fully local management to outsourcing the
administration of their data by using externally service providers. Here, an interesting research
challenge consists in developing an efficient mechanism for implementing selective access to the
remote data.

Classical Access Control Models:

Classical access control models can be grouped into three main classes: discretionary access control
(DAC), which bases access decisions on users‘ identity; mandatory access control (MAC), which
bases access decisions on mandated regulations defined by a central authority; and role-based access
control (RBAC), which bases access decisions on the roles played by users in the models. We now
briefly present the main characteristics of these classical access control models.

Discretionary Access Control:

Discretionary access control is based on the identity of the user requesting access and on a set of
rules, called authorizations, explicitly stating which user can perform which action on which
resource. In the most basic form, an authorization is a triple (s, o, a), stating that user s can execute
action a on object o. The first discretionary access control model proposed in the literature is the
access matrix model [4, 5, 6]. Let S, O, and A be a set of subjects, objects, and actions, respectively.
The access matrix model represents the set of authorizations through a |S|×|O| matrix A. Each entry
A[s, o] contains the list of actions that subject s can execute over object o.

The access matrix model can be implemented through different mechanisms. The straightforward
solution exploiting a two-dimensional array is not viable, since A is usually sparse. The mechanisms
typically adopted are:
• Authorization table. The non empty entries of A are stored in a table with three attributes: user,
action, and object.
• Access control list (ACL). The access matrix is stored by column, that is, each object is associated
with a list of subjects together with a set of actions they can perform on the object.
• Capability. The access matrix is stored by row, that is, each subject is associated with a list
indicating, for each object, the set of actions the subject can perform on it.

User Action Object

Ann read Document1


Ann write Document1
Ann read Document2
Ann execute Program1
Bob read Document1
Bob read Document2
Bob read Program1
Bob execute Program1
Carol read Document2
Carol write Document2
Carol execute Program2
David read Program1
David write Program1
David execute Program1
David read Program2
David write Program2
David execute Program2

(a)

From the access matrix model, discretionary access control systems have evolved and they include
support for the following features.
• Conditions. To make authorization validity depend on the satisfaction of some specific constraints,
today‘s access control systems typically support conditions associated with authorizations. For
instance, conditions impose restrictions on the basis of: object content (content-dependent
conditions), system predicates (system-dependent conditions), or accesses previously executed
(history-dependent conditions).
•Abstractions. To simplify the authorization definition process, discretionary access control supports
also user groups and classes of objects, which may also be hierarchically organized. Typically,
authorizations specified on an abstraction propagate to all its members according to different
propagation policies. Here, for example, an authorization specified for the Nurse group applies also
to Bob and Carol.
• Exceptions. The definition of abstractions naturally leads to the need of supporting exceptions in
authorization definition. Suppose, for example, that all users belonging to a group but u can access
resource r. If exceptions were not supported, it would be necessary to associate an authorization with
each user in the group but u, therefore not exploiting the possibility of specifying the authorization of
the group. This situation can be easily solved by supporting both positive and negative
authorizations: the system would have a positive authorization for the group and a negative
authorization for u.

The introduction of both positive and negative authorizations brings to two problems: inconsistency,
when conflicting authorizations are associated with the same element in a hierarchy; and
incompleteness, when some accesses are neither authorized nor denied.

Incompleteness is usually easily solved by assuming a default policy, open or closed (this latter being
more common), where no authorization applies. In this case, an open policy approach allows the
access, while the closed policy approach denies it.

To solve the inconsistency problem, different conflict resolution policies have been proposed such
as:
– No conflict. The presence of a conflict is considered an error.
– Denials take precedence. Negative authorizations take precedence.
– Permissions take precedence. Positive authorizations take precedence.
– Nothing takes precedence. Conflicts remain unsolved.
– Most specific takes precedence. An authorization associated with an element n overrides a
contradicting authorization (i.e., an authorization with the same subject, object, and action but with a
different sign) associated with an ancestor of n for all the descendants of n. For instance, consider the
user-group hierarchy.

Mandatory Access Control:

Mandatory security policies enforce access control on the basis of regulations mandated by a central
authority. The most common form of mandatory policy is the multilevel security policy, based on the
classifications of subjects and objects in the system. Each subject and object in the system is
associated with an access class, usually composed of a security level and a set of categories. Security
levels in the system are characterized by a total order relation, while categories form an unordered
set. As a consequence, the set of access classes is characterized by a partial order relation, denoted ≥
and called dominance.

Given two access classes c1 and c2, c1 dominates c2, denoted c1 ≥ c2, iff the security level of c1 is
greater than or equal to the security level of c2 and the set of categories of c1 includes the set of
categories of c2. Access classes together with their partial order dominance relationship form a
lattice. Mandatory policies can be classified as secrecy-based and integrity-based, operating in a dual
manner.

Mandatory policies can be classified as secrecy-based and integrity-based, operating in a dual


manner.
S, Admin, Medical C, Admin, Medical

S, AdminU, Admin, Medical S, Medical C, Admin I, Admin, Medical C, Medical


U, Admin S, U, Medical I, Admin C, I, Medical

U, I,
(a) (b)

Secrecy-Based Mandatory Policy. The main goal of secrecy based mandatory policies is to protect data
confidentiality. As a consequence, the security level of the access class associated with an object
reflects the sensitivity of its content, while the security level of the access class associated with a
subject, called clearance, reflects the degree of trust placed in the subject not to reveal sensitive
information. The set of categories associated with both subjects and objects defines the area of
competence of users and data. A user can connect to the system using her clearance or any access class
dominated by her clearance. A process generated by a user connected with a specific access class has
the same access class as the user.

The access requests submitted by a subject are evaluated by applying the following two principles.
No-Read-Up: A subject s can read an object o if and only if the access class of the subject dominates
the access class of the object.
No-Write-Down: A subject s can write an object o if and only if the access class of the object
dominates the access class of the subject.

Role-Based Access Control:

A third approach for access control is represented by Role-Based Access Control (RBAC) models. A
role is defined as a set of privileges that any user playing that role is associated with. When accessing
the system, each user has to specify the role she wishes to play and, if she is granted to play that role,
she can exploit the corresponding privileges. The access control policy is then defined through two
different steps: first the administrator defines roles and the privileges related to each of them; second,
each user is assigned with the set of roles she can play. Roles can be hierarchically organized to exploit
the propagation of access control privileges along the hierarchy.

A user may be allowed to simultaneously play more than one role and more users may simultaneously
play the same role, even if restrictions on their number may be imposed by the security administrator.
It is important to note that roles and groups of users are two different concepts. A group is a named
collection of users and possibly other groups, and a role is a named collection of privileges, and
possibly other roles. Furthermore, while roles can be activated and deactivated directly by users at their
discretion, the membership in a group cannot be deactivated.

The main advantage of RBAC, with respect to DAC and MAC, is that it better suits to commercial
environments. In fact, in a company, it is not important the identity of a person for her access to the
system, but her responsibilities. Also, the role-based policy tries to organize privileges mapping the
organization‘s structure on the roles hierarchy used for access control.

Credential-Based Access Control:

In an open and dynamic scenario, parties may be unknown to each other and the traditional
separation between authentication and access control cannot be applied anymore. Such parties can
also play the role of both client, when requesting access to a resource, and server for the resources it
makes available for other users in the system. Advanced access control solutions should then allow to
decide, on one hand, which requester (client) is to be granted access to the resource, and, on the other
hand, which server is qualified for providing the same resource. Trust management has been
developed as a solution for supporting access control in open environments [19]. The first
approaches proposing a trust management solution for access control are PolicyMaker and KeyNote.

The key idea of these proposals is to bind public keys to authorizations and to use credentials to
describe specific delegations of trust among keys. The great disadvantage of these early solutions is
that they assign authorizations directly to users‘ keys. The authorization specification is then difficult
to manage and, moreover, the public key of a user may act as a pseudonym of her, thus reducing the
advantages of trust management, where the identity of the users should not be considered. The
problem of assigning authorizations directly to keys has been solved by the introduction of digital
certificates.

A digital certificate is the on-line counterpart of paper credentials (e.g., a driver license). A digital
certificate is a statement, certified by a trusted entity (the certificate authority), declaring a set of
properties of the certificate‘s holder (e.g., identity, accreditation, or authorizations). Access control
models, by exploiting digital certificates for granting or denying access to resources, make access
decisions on the basis of a set of properties that the requester should have. The final user can prove to
have such properties by providing one or more digital certificates
.
The development and effective use of credential-based access control models require however
tackling several problems related to credential management and disclosure strategies, delegation and
revocation of credentials, and establishment of credential chains. In particular, when developing an
access control system based on credentials, the following issues need to be carefully considered.

• Ontologies. Since there is a variety of security attributes and requirements that may need to
be considered, it is important to guarantee that different parties will be able to understand
each other, by defining a set of common languages, dictionaries, and Ontologies.
• Client-side and server-side restrictions. Since parties may act as either a client or a server,
access control rules need to be defined both client-side and server-side.
• Credential-based access control rules. New access control languages supporting credentials
need to be developed. These languages should be both expressive (to define different kinds of
policies) and simple (to facilitate policy definition).
• Access control evaluation outcome. The resource requester may not be aware of the attributes
she needs to gain access to the requested resource. As a consequence, access control
mechanisms should not simply return a permit or deny answer, but should be able to ask the
final user for the needed credentials to access the resource.
• Trust negotiation strategies. Due to the large number of possible alternative credentials that
would enable an access request, a server cannot formulate a request for all these credentials,
since the client may not be willing tore lease the whole set of her credentials. On the other
hand, the server should not disclose too much of the underlying security policy, since it may
contain sensitive information. In the following, we briefly describe some proposals that have
been developed for trust negotiation and for regulating service access in open environments.

Policy Composition:

In many real word scenarios, access control enforcement needs to take into consideration different
policies independently stated by different administrative subjects, which must be enforced as if they
were a single policy. As an example of policy composition, consider an hospital, where the global
policy may be obtained by combining together the policies of its different wards and the externally
imposed constraints (e.g., privacy regulations). Policy composition is becoming of paramount
importance in all those contexts in which administrative tasks are managed by different, non
collaborating, entities.

Policy composition is an orthogonal aspect with respect to policy models, mechanisms, and
languages. As a matter of fact, the entities expressing the policies to be composed may even not be
aware of the access control system adopted by the other entities specifying access control rules. The
main desiderata for a policy composition framework can be summarized as follows.

Heterogeneous policy support: The framework should support policies expressed in different
languages and enforced by different mechanisms.
• Support of unknown policies. The framework should support policies that are not fully
defined or are not fully known when the composition strategy is defined. Consequently,
policies are to be treated as black-boxes and are supposed to return a correct and complete
response when queried at access control time.
• Controlled interference. The framework cannot simply merge the sets of rules defined by the
different administrative entities, since this behavior may cause side effects. For instance, the
accesses granted/denied might not correctly reflect the specifications anymore.
• Expressiveness. The framework should support a number of different ways for combining the
input policies, without changing the input set of rules or introducing ad-hoc extensions to
authorizations.
• Support of different abstraction levels. The composition should highlight the different
components and their interplay at different levels of abstraction.
• Formal semantics. The language for policy composition adopted by the framework should be
declarative, implementation independent, and based on a formal semantic to avoid ambiguity.

Access Control Models for XML:

Introduction:

The amount of information that is made available and exchanged on the Web sites is continuously
increasing. A large portion of this information (e.g., data exchanged during EC transactions) is sensitive
and needs to be protected. However, granting security requirements through HTML-based information
processing turns out to be rather awkward, due to HTML‘s inherent limitations. HTML provides no
clean separation between the structure and the layout of a document and some of its content is only
used to specify the document layout. Moreover, site designers often prepare HTML pages according
to the needs of a particular browser. Therefore, HTML markup has generally little to do with data
semantics.

To the aim of separating data that need to be represented from how they are displayed, the World Wide
Web Consortium (W3C) has standardized a new markup language: the eXtensible Markup Language
(XML) [1]. XML is a markup meta-language providing semantics-aware markup without losing the
formatting and rendering capabilities of HTML. XML‘s tags‘ capability of self-descriptionis shifting
the focus of Web communication from conventional hypertext to data interchange. Although HTML
was defined using only a small and basic part of SGML (Standard Generalized Markup Language: ISO
8879), XML is a sophisticated subset of SGML, designed to describe data using arbitrary tags. As its
name implies, extensibility is a key feature of XML; users and applications are free to declare and use
their own tags and attributes.

Therefore, XML ensures that both the logical structure and content of semantically rich information
is retained. XML focuses on the description of information structure and content as opposed to its
presentation. Presentation issues are addressed by a separate language, XSL [2] (XML Style sheet
Language), which is also a W3C standard for expressing how XML-based data should be rendered.

Since XML documents can be used instead of traditional relational databases for data storage and
organization, it is necessary to think of a security system for XML documents protection. In this
chapter, we will focus on access control enforcement. Specifically, in the literature, different access
control models have been proposed for protecting data stored in XML documents, exploiting the
flexibility offered by the markup language. Even if traditionally access control models can be applied
to XML documents, by simply treating them as files, a finer grained access control system is frequently
necessary. As a matter of fact, an XML document may contain both sensitive and publicly available
information, and it is necessary to distinguish between them when specifying the access control policy.
The remainder of the chapter is organized as follows. Section 2 discusses the basic XML concepts, by
introducing DTD, XML Schema, XPath and XQuery syntax and semantics. Section 3 introduces the
problem of access control for XML documents, points out the characteristics that an access control
model for XML documents should have. Section 4 illustrates in the details two
of the first access control models proposed for XML documents, and briefly describes other
proposals. Finally, Sect. 5 concludes the chapter.

Preliminary Concepts:

XML (eXtensible Markup Language) is a markup language developed by the World Wide Web
Consortium (W3C) and used for describing semi structured information. We introduce some of the
most important concepts related to XML, which are useful to define an access control system for
protecting XML documents.

Well-Formed and Valid XML Documents: XML document is composed of a sequence of (possibly
nested) elements and attributes associated with them. Basically, elements are delimited by a pair of
start and end tags (e.g., <request> and </request>) or, if they have no content, are composed of an
empty tag (e.g., <request/>). Attributes represent properties of elements and are included in the start
tag of the el-ement with which they are associated (e.g., <request number=―10‖>). An XML
document is said to be well-formed if its syntax complies with the rules defined by the W3C
consortium [1], which can be summarized as follows:

• the document must start with the prologue <?xml version=―1.0‖?>;


• the document must have a root element, containing all other elements in the document;
• all open tags must have a corresponding closed tag, provided it is not an empty tag;
• elements must be properly nested;
• tags are case-sensitive;
• attribute values must be quoted.

An XML language is a set of XML documents that are characterized by a syntax, which describes the
markup tags that the language uses and how they can be combined, together with its semantics. A schema is
a formal definition of the syntax of an XML language, and is usually expressed through a schema language.
The most common schema languages, and on which we focus our attention, are DTD and XML Schema, both
originating from W3C.
Document Type Definition.
A DTD document may be either internal or external to an XML document and it is not itself written in the
XML notation.
A DTD schema consists of definition of elements, attributes, and other constructs. An element declaration
is of the form <!ELEMENT element name content >, where element name is an element name and content
is the de-scription of the content of an element and can assume one of the following alternatives:
o the element contains parsable character data (#PCDATA);
p the element has no content (Empty);
q the element may have any content (Any);
r the element contains a group of one or more subelements, which in turn may be composed of other
subelements;
s the element contains parsable character data, interleaved with subele-ments.
When an element contains other elements (i.e., subelements or mixed con-tent), it is necessary to declare
the subelements composing it and their organi-zation. Specifically, sequences of elements are separated by a
comma ―,‖ and alternative elements are separated by a vertical bar ― ‖. Declarations of se-quence and
choices of subelements need to describe subelements‘ cardinality. With a notation inspired by extended BNF
grammars, ―*‖ indicates zero or more occurrences, ―+‖ indicates one or more occurrences, ―?‖ indicates zero
or one occurrence, and no label indicates exactly one occurrence.
An attribute declaration is of the form <!ATTLIST element name at-tribute def >, where element name is
the name of an element, and attribute def is a list of attribute definitions that, for each attribute, specify the
at-tribute name, type, and possibly default value. Attributes can be marked as #REQUIRED, meaning that they
must have an explicit value for each occur-rence of the elements with which they are associated; #IMPLIED,
meaning that they are optional; #FIXED, meaning that they have a fixed value, indicated in the definition
itself.
An XML document is said to be valid with respect to a DTD if it is syntactically correct according to the DTD.
Note that, since elements and attributes defined in a DTD may appear in an XML document zero (optional
elements), one, or multiple times, depending on their cardinality constraints, the structure specified by the
DTD is not rigid; two distinct XML documents of the same schema may differ in the number and structure of
elements.

XML Schema:

An XML Schema is an XML document that, with respect to DTD, has a number of advantages. First, an XML
Schema is itself an XML document, consequently it can be easily extended for future needs. Furthermore,
XML Schemas are richer and more powerful than DTDs, since they provide support for data types and
namespaces, which are two of the most significant issues with DTD.

An element declaration specifies an element name together with a simple or complex type. A simple type is
a set of Unicode strings (e.g., decimal, string, float, and so on) and a complex type is a collection of
requirements for attributes, subelements, and character data that apply to the elements assigned to that type.
Such requirements specify, for example, the order in which subelements must appear, and the cardinality of
each subelement (in terms of maxOccurs and minOccurs, with 1 as default value). Attribute declarations
specify the attributes associated with each element and indicate attribute name and type. Attribute declarations
may also specify either a default value or a fixed value. Attributes can be marked as: required, meaning that
they must have an explicit value for each occurrence of the elements with which they are associated; optional,
meaning that they are not necessary.

Example 1. Suppose that we need to define an XML-based language for describing bank account operations.
Figure 1(a) illustrates a DTD stating that each account operation contains a request element and one or more
operation elements. Each account operation is also characterized by two mandatory attributes: bankAccN,
indicating the number of the bank account of the requester; and id, identifying the single update. Each request
element is composed of date, means, and notes elements, where only date is required. Element operation is
instead composed of: type, amount, recipient, and possibly one between notes and value.

DTDs and XML documents can be graphically represented as trees. A DTD is represented as a labeled tree
containing a node for each element, attribute, and value associated with fixed attributes. To distinguish
elements and attributes in the graphical representation, elements are represented as ovals, while attributes as
rectangles. There is an arc in the tree connecting each element with all the elements/attributes belonging to it,
and between each #FIXED attribute and its value. Arcs connecting an element with its subelements are labeled
with the cardinality of the relationship. Arcs labeled or and with multiple branching are used to represent a
choice in an element declaration (|). An arc with multiple branching is also used to represent a sequence with
a cardinality constraint associated with the whole sequence (?, +, *). To preserve the order between elements
in a sequence, for any two elements ei and ej, if ej follows ei in the element declaration, node ej appears below
node ei in the tree. Each XML document is represented by a tree with a node for each element, attribute, and
value in the document. There is an arc between each element and each of its subelements/attributes/values and
between each attribute and each of its value(s). Each arc in the DTD tree may correspond to zero, one, or
multiple arcs in the XML document, depending on the cardinality of the corresponding containment
relationship. Note that arcs in XML documents are not labeled, as there is no further information that needs
representation.

Elements and Attributes Identification:

The majority of the access control models for XML documents identify the objects under protection (i.e.,
elements and attributes) through the XPath language [3]. XPath is an expression language, where the basic
building blockis the path expression.
A path expression on a document tree is a sequence of element names or predefined functions separated by
character / (slash): l1/l2/ . . . /ln. Path expressions may terminate with an attribute name as the last term of
the sequence. Attribute names are syntactically distinguished by preceding them with special character @.

XPath allows the association of conditions with nodes in a path; in this case the path expression identifies
the set of nodes that satisfy all the conditions. Conditional expressions in XPath may operate on the ―text‖ of
elements (i.e., character data in elements) or on names and values of attributes. A condition is represented by
enclosing it within square brackets, following a label li in a path expression l1/l2/ . . . /ln. The condition is
composed of one or more predicates, which may be combined via and and or boolean operators. Each predicate
compares the result of the evaluation of the relative path expression (evaluated at li) with a constant or with
another expression. Multiple conditional expressions appearing in the same path expression are considered to
be anded (i.e., all the conditions must be satisfied). In addition, conditional expressions may include
functions last() and position() that permit the extraction of the children of a node that are in given positions.
Function last() evaluates to true on the last child of the current node. Function position () evaluates to true on
the node in the evaluation context whose position is equal to the context position.

Path expressions are also the building blocks of other languages, such as XQuery [4] that allows to make
queries on XML documents through FLWOR expressions. A FLOWR expression is composed of the
following clauses:

• FOR declares variables that are iteratively associated with elements in the XML documents, which are
identified via path expressions;
• LET declares variables associated with the result of a path expression; • WHERE imposes conditions on
tuples;
• ORDER BY orders the result obtained by FOR and LET clauses;
• RETURN generates the final result returned to the requester.
Example 2. Consider the DTD and the XML document in Example 1. Some examples of path expressions
are the following.
• /account operation/operation: returns the content of the operation element, child of account operation;
• /account operation/@bankAccN: returns attribute bankAccN of element account operation;
• /account operation//notes: returns the content of the notes elements, anywhere in the sub tree rooted at
account operation; in this case, it returns both /account operation/request/notes and /account
operation/operation/notes;
• /account operation/operation[./type=―bank transfer‖]: returns the content of the operation element, child of
account operation, only if the type element, child of operation, has value equal to ―bank transfer‖.

The following XQuery extracts from the XML document in Fig. 1(b) all the account operation elements with
operation type equal to ―bank transfer‖. For the selected elements, the amount and the recipient of the
operation are returned, along with all notes appearing in the selected account operation element.

<BankTransf>
{ FOR $r in document(―update account‖)/account operation
WHERE $r/operation/type=―bank transfer‖
RETURN $r/operation/amount, $r/operation/recipient, $r//notes
}
</BankTransf>

XML Access Control Requirements:

Due to the peculiar characteristics of the XML documents, they cannot be protected by simply adopting
traditional access control models, and specific models need to be defined. By analyzing the existing
proposals, it is easy to see that they are all based on the definition of a set of authorizations that at least
specify the subjects on which they apply the objects to be protected, and the action to be executed. The
existing XML-based access control models differentiate on the basis of the subjects, objects, and actions
they can support for access control specification and enforcement.

Subject: Subjects are usually referred to on the basis of their identities or of the network location from which
requests originate. Locations can be expressed with reference to either the numeric IP address (e.g.,
[Link]) or the symbolic name (e.g., [Link]) from which the request comes. It often happens thatthe
same privilege should be granted to sets of subjects, which share common characteristics, such as the
department where they work, or the role played in the company where they work. To the aim of simplifying
the authorizations definition, some access control models allow the specification of authorizations having as
subject:

• a group of users, which is a statically defined set of users; groups can be nested and overlapping;
• a location pattern, which is an expression identifying a set of physical locations, obtained by using the wild
character * in physical or symbolic addresses;
• a role, which is a set of privileges that can be exploited by any user while playing the specific role; users
can dynamically decide which role to play, among the ones they are authorized to play. Also, subjects are
often organized in hierarchies, where an authorization defined for a general subject propagates to its
descendants.

An example of user-group hierarchy:

Public

BankEmployee Client

StatisticalAnalyst CashOperator

Alice Bob Carol David Eric Fiona Gregory Hilary Ivan

A hierarchy can be pictured as a directed acyclic graph containing a node for each element in the hierarchy
and an arc from element x to element y, if x directly dominates y. Dominance relationships holding in the
hierarchy correspond to paths in the graph. Figure 3 shows an example of user-group hierarchy.

Object Granularity: The identification of the object involved in a specific authorization can exploit the
possibility given by XML of identifying elements and attributes within a document through path expressions
as defined by the XPath language.

Action: Most of the proposed XML access control models support only read operations, since there is not a
standard language for XML update. Furthermore, the management of write privileges is a difficult task,
which needs to take into account both the access control policy and the DTD (or XML Schema) defined for
the document. In fact, the DTD may be partially hidden to the user accessing the document, as some
elements/ attributes may be denied by the access control policy. For instance, when adding an element to the
document, the user may even not be aware of the existence of a required attribute associated with it, as she is
not entitled to access the attribute itself.

Support for Fine and Coarse Authorizations. The different protection re-quirements that different
documents may have call for the support of access restrictions at the level of each specific document.
However, re-quiring the specification of authorizations for each single document would make the
authorization specification task too heavy. The system may then support, beside authorizations on single
documents (or portions of doc-uments), authorizations on collections of documents [9]. The concept of
DTD can be naturally exploited to this end, by allowing protection re-quirements to refer to DTDs or XML
documents, where requirements specified at the level of DTD apply to all those documents
instance of the considered DTD. Authorizations specified at DTD level are called schema level
authorizations, while those specified at XML document level are called instance level authorizations.

Propagation Policy:
The structure of an XML document can be exploited by possibly applying different propagation strategies
that allow the derivation of authorizations from a given set of authorizations explicitly defined over
elements of DTD and/or XML documents. Some proposals therefore distinguish between two kinds of
authorizations: local, and recursive [9]. Local authorizations defined on an element apply to all its
attributes only. A recursive authorization defined on an element applies to its whole content (both attributes
and sub elements). Recursive authorizations represent an easy way for specifying authorizations holding
for the whole structured content of an element (for the whole document if the element is the root node of
the document).

The models proposed assume that negative authorizations are always recursive, while positive
authorizations may be either local or re-cursive. Besides downward propagation, upward propagation
methods have been introduced. Here, the authorizations associated with a node in the XML tree propagate
to all its parents.

Some of the most common propagation policies (which include also some resolution policies for possible
conflicts) are described in the following.
o No propagation. Authorizations are not propagated. This is the case of local authorizations.
p No overriding. Authorizations of a node are propagated to its descendants, but they are all kept.
Most specific overrides. Authorizations of a node are propagated to its descendants, if not overridden.
An authorization associated with a node n overrides a contradicting authorization3 associated with any
ancestor of n for all the descendants of n.
o Path overrides. Authorizations of a node are propagated to its descendants, if not overridden. An
authorization associated with a node n overrides a contradicting authorization associated with an
ancestor n′ for all the descendants of n only for the paths passing from n. The overriding has no effect
on other paths.

XML Access Control Models: Several access control models have been proposed in the literature for
regulating access to XML documents. We start our overview of these models by presenting the first access
control model for XML [9], which has then inspired.

Two authorizations (s, o, a) and (s′, o′, a′) are contradictory if s = s′, o = o′, and a = a′, but one of
them grants access, while the other denies it.

Fine Grained XML Access Control System:


Damiani et al [9] propose a fine grained XML access control system, which extends the proposals in [14, 15,
16], exploiting XML‘s own capabilities to define and implement an authorization model for regulating access
to XML documents. We now present the authorizations supported by the access control model and illustrate
the authorizations enforcement process.

Authorizations Specification

Access authorization determines the accesses that the system should allow or deny. In this model, access
authorizations are defined as follows.

Definition 1 ((Access Authorization)). An access authorization a ∈ Auth is a five-tuple of the form:


subject, object, action, sign, type , where:
subject ∈ AS is the subject for which the authorization is intended;
object is either a URI∈Obj or is of the form URI:PE, where URI∈Obj and
PE is a path expression on the tree of document URI;
action=read is the action being authorized or forbidden;
sign ∈ +,− is the sign of the authorization, which can be positive (allow access) or negative (forbid
access);
type ∈ LDH, RDH, L, R, LD, RD, LS, RS is the type of the authorization and regulates whether the
authorization propagates to other objects and how it interplays with other authorizations (exception
policy).

Database Issues in Trust Management and Trust Negotiation: Trust management is the process of
managing authorization decisions in a decentralized environment where many of the participants do not have
pre established trust relationships, such as logins and passwords, with one another. Trust management is
important for enterprise-level and cross-organizational database applications such as supply chain
management, enterprise resource planning, and customer relationship management. Trust management
research may also interest the database research community because of the former‘s affinity for a Datalog-
based world, in which a query (authorization request) launches a multi-site search for a proof of authorization.
To complicate the process, sites have autonomy and may not always cooperate in proof construction; it is not
always obvious where to find the facts and rules needed to construct a proof; and attempts to access particular
facts and rules may spawn new authorization requests.

Introduction to Trust Management: Authorization is one of the most important problems in computersecurity
and privacy. It lies at the heart of meeting the objectives of confidentiality, integrity, and availability. Within
a single organization, pre-established trust relationships are used to assign authorizations and prearranged
information such as login names and passwords can serve as the basis for making authorization decisions at
run time. For instance, an enterprise has pre-established trust relationships with its employees, so it is necessary
only to authenticate that a certain resource request is being made by a certain employee for the request to be
given appropriate authorization. On the other hand, when resource provider and resource requester belong to
different organizations or have no prior relationship whatsoever, there are no pre-existing trust relationships.
This problem can be mitigated slightly by using manual procedures for cross-domain authentication and
authorization, such as maintaining local logins and passwords (or lists of
X.509 identities) for all employees in a partner company.

Over the last 10–15 years, researchers have proposed new techniques that enable on-line parties to establish
trust on the fly, as the need arises. Bina et al. proposed using characteristics other than identity, attested to by
known authorities in digital certificates, as a basis for authorization on the Internet. Blaze et al. introduced a
complementary approach to authorization based on delegation of privileges and coined the term trust
management to describe it. Ellison et al. introduced a similar scheme called SPKI. Rivest et al. introduced a
scheme called SDSI that provides an ingenious way to introduce names and bind them to public keys controlled
by individuals and groups, which greatly facilitates identifying authorized principals electronically. Following
these seminal works, a great deal of work has been done, much of which we will survey in this chapter. Trust
management systems typically use cryptographic credentials to convey information relevant to authorization
decisions. The authorization decision determines whether a given set ofcredentials demonstrate that a given
request to access a resource, such as a web or peer-to-peer service, is authorized, which is to say that the access
request complies with current policy governing that resource. This raises two additional problems that we also
survey here. First, the credentials are issued in a decentralized manner, and somehow the relevant credentials
need to be collected or otherwise made available to the authorization evaluation process. Second, some
credentials carry sensitive, confidential information, and may need to be subject to access control themselves
when dealing with an unfamiliar resource provider or requester. The same may also be true of policy: an access
control policy may give clues about the nature of the resource it protects. For example, if a patient‘s
prescription can be viewed only by their pharmacist or by their parent, then one can guess that the prescription
is for a child. To preserve the privacy of the resources that they protect, policies themselves may need
protection just like any other resource. In other words, access to the contents of a policy may need to be
governed by another access control policy. These additional authorization decisions can also be based on
credentials. Thus, there is a need for a process of credential exchange in which both parties seek to enable a
positive authorization decision for the main resource request, while also supporting the additional
authorization decisions that may be necessary to
achieve this. This process is trust negotiation [64, 65], an automated approach to establishing bilateral trust
between two parties at run time.

What is Trust Management?:

Traditional access control models base authorization decisions on the identity of the principal who requests a
resource. In an operating system, this identity may be a user name that must appear on an access control list
(ACL) associated with a file, if access to the file is authorized. In a large enterprise, an identity may be a
distinguished name mentioned in a Public Key Infrastructure (PKI) certificate. In these and similarly closed
environments, the identities of all authorized resource requesters are presumed to be known to the resource
provider, or at least to some local administrator who aggregates identities into groups or roles to which the
resource provider can grant access. However, the number of autonomous services that are administered in a
decentralized manner (i.e., within different security domains) has increased enormously on the Internet. As a
result, services are often provided to clients whose identities are not previously known to the service provider.
Similarly, the participants in a peer-to-peer system need to establish mutual trust in one another. In such a
decentralized environment, the traditional access control mechanisms such as ACLs cannot be used to secure
the system without excluding vast numbers of valuable and well-intentioned clients and peers.

The trust management (TM) approach, first developed by Blaze et al. [11], aims to provide a basis for
authorization in highly decentralized environments by enabling resource owners to delegate authority to
other entities who will help them identify the appropriate requesters to authorize. In this manner, resource
owners and other policy authors can enlist the assistance of appropriate authorities in determining the
suitability of individual requesters for authorization.

Trust management relies on digital credentials, which are unforgivable statements signed by their issuer.
Typically, a digital credential contains an assertion about the properties of one or more principals mentioned
in the credential. The best-known standard for digital credentials is X.509v3 [31], though many alternatives
exist. Most of these schemes rely on public key cryptography: the credential issuer signs the credential using
its private key, and anyone can verify the contents of the credential by obtaining the corresponding public key
and checking the signature. In the US, recent legislation such as the Sarbanes-Oxley Act has forced the
widespread adoption of the public key infrastructures needed to support digital credentials. Today‘s digital
credentials are typically identity certificates, i.e., they simply say what public key is associated with a particular
principal. However, current credential standards already support the inclusion of additional information
describing a principal‘s properties, such as one would need for a digital employee ID, driver‘s license, or birth
certificate. In TM systems, security policy is made by local administrators to specify access control rules on
local resources. Blaze et al. [10] said that trust management systems combined the notion of specifying security
policy with the mechanism for specifying security credentials.

The authorization semantics of most TM systems is monotonic in the sense that if any given action is approved
when given a set of evidence E (i.e., policies and credentials), then it will also be approved when given any
superset of E. This means that no negative evidence is allowed in the system. Monotonicity ensures fail-safe
behavior in which no potentially dangerous action is allowed by default, simply because of an inability to find
negative credentials. This is especially important in decentralized environments due tothe many factors that
can prevent one from obtaining complete information about all the credentials in the system (network
interruption, uncooperative credential repositories, lost information, etc.).

Besides the basic notion of delegation in which one entity gives some of its access rights to another entity,
there are two additional delegation idioms that are most often discussed in the designs of trust management
systems: appointment and threshold delegation. In the case of appointment, the appointer has the
(appointment) right to confer on another (the appointee) an attribute or right that the appointer may not
herself have. (In general, the conferred right can itself be an appointment right.) Threshold delegation is also
called k-out-of-n (n ≥ 1 and n ≥ k ≥ 1) delegation, meaning the authority is delegated to n parties, each of
which only gets a fragment of it. It is effective only if at least k of these parties issue their requests or
exercise their authorities in concert.

Compliance checking (also called policy evaluation or query evaluation) answers the question: Does a set
of credentials prove that a request complies with the security policy associated with a given resource? The
process of evaluating a query involves finding a chain of credentials that delegate authority from the
resource owner to the requester. This process is also called credential chain discovery [47]. As we shall see,
it can be helpful to imagine credential chains in graphical terms. To a first approximation, a credential chain
can be thought of as a path from the resource provider to the requester in which nodes are principals and
edges are credentials. However, the details of such a credential graph depend on the TM system and, in
general, a chain may correspond to a richer sub graph structure.

Example trust negotiation:

As mentioned earlier, trust negotiation is the process of establishing bilateral trust at run time. Trust
negotiation uses verifiable, unforgivable digital credentials that describe principals‘ properties, together with
explicit policies that describe the properties that a principal must possess in order to gain access to a
particular resource. When Alice wishes to access a resource owned by Bob, she and Bob follow one of many
proposed trust negotiation protocols to determine whether she has the right properties, i.e., whether her
credentials satisfy the policy that Bob has specified for access to that resource.
Security in Data Warehouses and OLAP Systems:
Unlike in operational databases, aggregation and derivation play a major role in on-line analytical processing
(OLAP) systems and data warehouses. Unfortunately, the process of aggregation and derivation can also
pose challenging security problems. Aggregated and derived data usually look innocent to traditional
security mechanisms, such as access control, and yet such data may carry enough sensitive information to
cause security breaches.

Introduction:
With rapid advancements in computer and network technology, it becomes a common practice for
organizations to collect, store, and analyze vast amounts of data quickly and efficiently. On-line analytical
processing (OLAP) systems and data warehouses of today are used to store and analyze everything – vital or
not – to an organization. The security of data warehouses and OLAP systems is crucial to the interest of both
organizations and individuals. Stolen organizational secrets may cause serious and immediate damages to an
organization. Indiscriminate collection and retention of data represents an extraordinary intrusion on privacy
of individuals. Security breaches in governmental data warehouses may lead to losses of information that
translate into financial losses or losses whose values are obviously high but difficult to quantify (for xample,
national security).

Unlike in traditional databases, information stored in data warehouses is typically accessed through decision
support systems, such as OLAP systems. OLAP systems help analysts to gain insights to different perspectives
of large amounts of data stored in a data warehouse. Due to the sheer volume of data, OLAP systems heavily
depend on aggregates of data in order to hide insignificant details and hence to accentuate global patterns and
trends. As the underlying data model, a data cube [15] can nicely organize multi- dimensional aggregates
formulated by dimension hierarchies. Although security breaches in a data warehouse are possible in many
ways, the most challenging threat is from insiders who have legitimate accesses to data through OLAP queries.
Unfortunately, most of today‘s OLAP systems lack effectivesecurity measures to safeguard the data accessed
through them. Existing security mechanisms can at best alleviate security breaches but cannot completely
remove the threat. Data sanitization has long been recognized as insufficient for protecting sensitive data by
itself due to potential linking attacks [24].

Access control techniques, although mature in traditional data management systems, are usually not directly
applicable to OLAP systems and data warehouses due to the difference in data models. Moreover, OLAP
systems and underlying data warehouses are especially vulnerable to indirect inferences of protected data. The
aggregation process used by OLAP systems does not completely destroy sensitive information. The remaining
vestiges of sensitive information, together with knowledge obtained through out of bound channels, can cause
disclosures of such information. Although studied since 1970´s in statistical databases, inference control for
on-line systems is largely regarded as impractical due to its negative-in-tone Complexity results [7]. Most
restriction-based inference control methods adopt a detecting-then-removing approach. The detection of
inferences must take into accounts all combinations of answered queries, which implies complicated on-line
computations and constant bookkeeping of queries. Even at such a high cost, each method usually applies to
only a few unrealistically simplified cases, such as with only one aggregation type. These facts partially
explain why inference control is absent in most commercial OLAP systems. Onthe other hand, off-line
inference control methods have long been used in releasing census tables, which demonstrates that the threat
of inferences is real.

Data Warehouses and OLAP Systems:


A centralized data warehouse is usually used to store enterprise data. The data are organized based on a star
schema, which usually has a fact table with part of the attributes called dimensions and the rest called measures.
Each dimension is associated with a dimension table indicating a dimension hierarchy. The dimension tables
may contain redundancy, which can be removed by splitting each dimension table into multiple tables, one
per attribute in the dimension table. The result is called a snowflake schema. A data warehouse usually stores
data collected from multiple data sources, such as transactional databases throughout an organization. The
data are cleaned and transformed to a common consistent format before they are stored in the data warehouse.
Subsets of the data in a data warehouse can be extracted as data marts to meet the specific requirements of an
organizational division. Unlike in transactional databases where data are constantly updated, typically the data
stored in a data warehouse are refreshed from data sources only periodically.

Coined by Codd et. al in 1993 [9], OLAP stands for On-Line Analytical Processing. The concept has its root
in earlier products such as the IRI Express, the Comshare system, and the Essbase system [29]. Unlike
statistical databases which usually store census data and economic data, OLAP is mainly used for analyzing
business data collected from daily transactions, such as sales data and health care data [27]. The main purpose
of an OLAP system is to enable analysts to construct a mental image about the underlying data by exploring
it from different perspectives, at different level of generalizations, and in an interactive manner. Popular
architectures of OLAP systems include ROLAP (relational OLAP) and MOLAP (multidimensional OLAP).
ROLAP provides a front-end tool that translates multidimensional queries into corresponding SQL queries to
be processed by the relational backend. MOLAP does not rely on the relational model but instead materializes
the multidimensional views. Using MOLAP for dense parts of the data and ROLAP for the others leads to a
hybrid architecture, namely, the HOLAP or hybrid OLAP.

The requirements on OLAP systems have been defined differently, such as the FASMI (Fast Analysis of
Shared Multidimensional Information) test [23] and the Codd rules [9]. Some of the requirements are
especially relevant to this chapter. First, to make OLAP analysis an interactive process, the OLAP system must
be highly efficient in answering queries. OLAP systems usually rely on extensive pre-computations, indexing,
and specialized storage to improve the performance. Second, to allow analysts to explore the data from
different perspectives and at different level of generalization, OLAP organizes and generalizes data along
multiple dimensions and dimension hierarchies. The data cube model we shall address shortly is one of the
most popular abstract models for this purpose.

Data cube was proposed as a SQL operator to support common OLAP tasks like histograms and sub-totals
[15]. Even though such tasks are usually possible with standard SQL queries, the queries may become very
complex. The number of needed unions is exponential in the number of dimensions of the base table. Such a
complex query may result in many scans of the base table, leading to poor performance. Because sub-totals
are very common in OLAP queries, it is desired to define a new operator for the collection of such sub- totals,
namely, data cube.

Figure 1 depicts a fictitious data cube. It has two dimensions: time and organization with three and four
attributes, respectively. We regard all as a special attribute having one attribute value ALL, which depends
on all other attribute values. The attributes of each dimension are partially ordered by the dependency relation
_ into a dependency lattice [18], that is, quarter _ year _ all and employee _ department _ branch _ all. The
product of the two lattices gives the dependency lattice of cuboids. Each element of the dependency lattice is
a tuple < T,O >, where T is an attribute of the time dimension and O is an attribute of the organization. Attached
to each such tuple < T,O > is an empty two-dimensional array, namely, a cuboid. Each cell of the cuboid <
T,O > is also a tuple < t,o >, where t and o are attribute values of the attribute T and O, respectively. The
dependency relation extends to be among cells. For example, a cell < Y 1,Bob > depends on the cells < Q1,Bob
>, < Q2,Bob >, < Q3,Bob >, and < Q4,Bob >. Hence, all cells also form a dependency lattice.

A base table with the schema (quarter, employee, commission) is used to populate the data cube with values
of a measure attribute commission. Each record in the base table, a triple (q, e,m), is used to populate a cell<
q,e > of the core cuboid < quarter, employee >, where q, e, and m are values of the attributes quarter, employee,
and commission, respectively. Some cells of < quarter, employee > remain empty (or having the
NULL value), if corresponding records are absent in the base table. If multiple records correspond to the same
cell, since the two attributes quarter and employee are not necessarily a key of the base relation, they are
aggregated using the aggregation function SUM. All cuboids are then populated using the same aggregation
function. For example, in the cuboid < year, employee >, a cell < Y 1,Bob > takes the value 8500, which is
the total amount of the four cells it depends on, < Q1,Bob >, < Q2,Bob >,< Q3,Bob >, and < Q4.

Secure multi-party data mining allows multiple distrusted parties to cooperatively compute aggregations over
each other‘s data [31, 14]. Cryptographic protocols enable each party to obtain the final result with minimal
disclosures of their own data. This problem is different from inference control, because the threat ofinferences
comes from what users know, not from the way they know it. The k-anonymity model enables sensitive values
to be released without threatening privacy [24, 36]. Each record is indistinguishable from at least k − 1 others
because they all have exactly the same identifying attribute values. An adversary can link an individual in the
physical world to at least (the sensitive values of) k records, which is considered a tolerable privacy threat.
Inference control and the k-anonymity model can be considered as dual approaches. The information theoretic
approach in [22] formally characterizes insecure queries as those that bring a user with more confidence in
guessing possible records [22]. However, such a perfect-secrecy metric will not tolerate any partial disclosure,
such as those caused by aggregations.

Security Requirements:

The Threat of Inferences:

Unlike in traditional databases where unauthorized accesses are the main security concern, an adversary using
an OLAP system can more easily infer prohibited data from answers to legitimate queries. Example 1
illustrates an one dimensional (or 1-d for short) inference where the sensitive cell is inferred using exactly one
of its descendants.
A multi-dimensional (or m-d) inference is the complementary case of 1-d inferences. That is, a cell is inferred
using two or more of its descendants, and neither of those descendants causes 1-d inferences. Example 2
illustrates an m-d inference in a two-dimensional SUM-only data cube. Example 3 and 4 illustrate m-d
inferences with MAX-only, and with SUM, MAX, and MIN.

The Requirements:
As illustrated in above examples, a security solution for OLAP systems must combine access control and
inference control to remove security threats. At the same time, providing security should not adversely reduce
the usefulness of data warehouses and OLAP systems. A practical solution must achieve a balance among
following objectives.
• Security: Sensitive data stored in underlying data warehouses should be guarded from both
unauthorized accesses and malicious inferences. Such a definition of security considers not
only the information a user can directly obtain from an OLAP system, but also those that he/she
can derive using answers to seemingly irrelevant queries.
• Applicability: The security provided by a solution should not rely on any unrealistic
assumptions about OLAP systems. In particular, assumptions made in statistical databases are
usually not unacceptable in OLAP applications. A desired solution should cover a wide range
of scenarios without the need for significant modifications.
• Efficiency: The name of OLAP itself indicates the interactive nature of such systems. Most
queries should be answered in a matter of seconds or minutes. A significant portion of the
OLAP literature has been devoted to meeting such stringent performance requirements. A
desired security must be computationally efficient, especially with respect to on-line overhead.
• Availability: Data should be readily available to legitimate users who have sufficient privileges.
A solution must place security upon justifiable restrictions of accesses in the sense that
removing the restrictions will either lead to security breaches or render the method
computationally infeasible.
• Practicality: A practical security solution should not demand significant modifications to the
existing infrastructure of an OLAP system. A solution should take advantage of any query-
processing mechanisms and security mechanisms that are already in place. The main challenge,
however, lies in the inherent tradeoff between above objectives. To have provable security and
justifiable availability in varying settings of OLAP systems usually implies complicated on-
line computations, which are expensive and hard to implement. The methods we shall describe
in this chapter represent efforts towards a balance among these objectives.

A Three-Tier Security Architecture:


Security in statistical databases usually has two tiers, that is, sensitive data and aggregation queries.
Inference control mechanisms check each aggregation query to decide whether answering the query, in
addition to previously answered queries, may disclose any protected data through inferences. However,
applying such a two-tier architecture to OLAP systems has some inherent drawbacks. First, checking queries
for inferences at run time may bring unacceptable delay to query processing. The complexity of such
checking is usually high due to the fact that m-d inferences must be checked against sets of queries instead
of each individual query. Second, inference control methods cannot take advantage of the special
characteristics of an OLAP application under the two-tier architecture.

User
Queries

RAQ
Pre-defined
aggregations Access control
(A)

Inference control

RDA
Data Set
(D)
Fig :2 A Three-Tier Inference Control Architecture

The methods we shall review are based on a three-tier security architecture. As illustrated in Figure 2, this
architecture introduces an intermediate aggregation tier between the data tier and the query tier. More
specifically, the architecture has three tiers and three relations, and the aggregation tier must satisfy three
properties. First, inference control is enforced between the aggregation tier and the data tier such that the
former is secure with respect to the latter. Access control then helps to enforce that only safe aggregations will
be used to compute results to queries. Second, the size of the aggregation tier must be comparable to the data
tier. Third, the problem of inference control can be partitioned into blocks in the data tier and the aggregation
tier such that security only need to be ensured between each corresponding pair of blocks in the two tiers.
The three-tier architecture helps to reduce the performance overhead of inference control in several aspects.
The first property of the model implies that the aggregation tier can be pre-computed such that the computation
intensive part of inference control can be shifted to off-line processing. The on-line part is to enforce access
control based on whether a query can be rewritten using the aggregation tier (that is, security through views).
Second, the last two properties both reduce the size of inputs to inference control algorithms and consequently
reduce the complexity. Note that an aggregation tier can be designed to meet the second property, but the size
of the query tier is inherently exponential in the size of the data tier. The third property also localizes inference
control tasks to each block of the data tier so a failure in one block will not affect other blocks.

Securing OLAP Data Cubes:


The data cube is a natural data model for OLAP systems and underlying data warehouses. This section
reviews several methods in safeguarding OLAP data cubes against both unauthorized accesses and indirect
inferences.

SUM-only Data Cubes:

Cardinality-Based Method:

The cardinality-based method by Dobkin et. al [13] determines the existence of inferences based on the number
of answered queries. In a data cube, aggregations are pre-defined based on the dimension hierarchy, and what
may vary is the number of empty cells, that is previously known values. Recall that in Section 3.1, Example 1
illustrated a straightforward connection between 1-d inferences and the number of empty cells ina data cube.
That is, an 1-d inference is present when an adversary can access any cell that has exactly one ancestor in the
core cuboid. A similar but less straightforward connection also exists between m-d inferencesand the number
of empty cells, as we shall show in this here.

The model for inferences in this case is similar to that given by Chin et. al [8], but the queries are limited to
data cube cells. Here we only consider one level dimension hierarchy where each dimension can only have
two attributes, that is the attribute in core cuboid and the all. For each attribute of the core cuboid, we assume
an arbitrary but fixed order on its domain. Although an attribute may have infinitely many values, we shall
only consider the values that appear in at least one non-empty cell in the given data cube instance. The number
of such values is thus fixed. From the point of view of an adversary, the value in any non-emptycell is
unknown, and hence the cell is denoted by an unknown variable. The central tabulation in Table 1 rephrases
part of the core cuboid in Figure 1.

Table 1 also includes cells in descendants of the core cuboid, namely, the aggregation cuboids. These are
_all, employee_, _quarter, all_, and _all, all_, as we only consider one-level dimension hierarchy. For SUM-
only data cubes, the dependency relation can be modeled as linear equations. At the left side of those equations
are the unknown variables in the core cuboid, and at the left side the values in the aggregation cuboids. Table
2 shows a system of nine equations corresponding to the nine cells in the aggregationcuboids.
Table 1. Modeling A Core Cuboid

Q1 Q2 Q3 Q4 ALL
Bob x1 x2 x3 8500
Alice x4 x5 10000
J im x6 x7 6100
M
allory x8 x9 12400

Table 2. Modeling the Aggregation Cuboids

111000000 x1 8500
000110000 0000 011 00
x2
x3
10000
6100

000000011 x4 12400

100001010 x5 = 10000
010100000 x6 6000
001010000 x7 11000
000000101 x 9000

111111111 x9 36000
8

Table 3. The Reduced Row Echelon Form Mrref


0010 1 0 0 0 0
0001 1 000 0
0000 0 100−1
0000 0 010 1

0000 0 001 1

0000 0 0 0 0 0
0000 0 0 0 0 0

Next we obtain the reduced row echelon form (RREF) Mrref of the coefficients matrix
through a sequence of elementary row operations [19], as shown in Table 3. From Mrref
it can be observed that the system of linear equations in Table 2 has infinitely many
solutions. This means that an adversary cannot infer the entire core cuboid from the
given aggregation cuboids. However, the first row vector of Mrref being a unit vector
(that is, it has a single 1) indicates that the value of x1 must remain the same among all
the solutionsto the system of equations. Consequently, the adversary can infer Bob‘s
commission in Q1.

The existence of at least one unit row vectors in Mrref is indeed the necessary and sufficient condition
for any unknown variable to have the same value among all the solutions [8]. We shall adopt this
notion to model inferences in SUM-only data cubes. Notice that for the special case of 1-d inferences,
as shown in Example 1, the coefficients matrix itself would have a unit row vector (which will
certainly also appear in Mrref ). It is well-known that the RREF of a m×n matrix can be obtained by a
Gauss-Jordan elimination withcomplexity O(m2n) [19]. The number of empty cells can only
determine the existence of 1-d inferences in two extreme cases. First, if the core cuboid has no empty
cell, then trivially it is free of 1-d inferences as long as all the attributes have more than one value.
The second straightforward result says that any data cubewhose core cuboid has fewer non-empty
cells than the given upper bound 2k−1.

You might also like