Felix Cuadrado Latasa
Felix Cuadrado Latasa
TESIS DOCTORAL
2009
Tribunal nombrado por el Magfco. y Excmo. Sr. Rector de la
Universidad Politécnica de Madrid, el día 4 de diciembre de 2009.
Presidente: _____________________________________________
Vocal: _________________________________________________
Vocal: _________________________________________________
Vocal: _________________________________________________
Secretario: _____________________________________________
Suplente: ______________________________________________
Suplente: ______________________________________________
EL SECRETARIO
A mis padres y
a Nacho
Agradecimientos
En primer lugar tengo que expresar mi sincera y total gratitud a la persona que ha hecho
posible esto: mi director de tesis, Juan Carlos Dueñas. Muchas gracias por tu confianza y por tu
apoyo, por tu disponibilidad para aconsejarme y guiarme en este proceso, encontrando
siempre una solución a los problemas que han aparecido.
También quiero agradecer a todos los compañeros del C-215 (y Siberia), que han contribuido
para que éste sea un entorno de trabajo motivador y acogedor. Gracias Jose, por ayudarme
desde el primer día, por tus consejos y por tu amistad. A Rodrigo, por ser un gran compañero y
amigo, que me ha apoyado todo este tiempo. A Don Hugo A., siempre juicioso y cercano,
Álvaro, Boni, Marta, Freakant, Samuel, Chema, Rubén, Bea, Antonio, y Sandra, y los
integrantes del equipo ITECBAN despliegue y testing. A July, que hace que todo esto funcione.
Por último, gracias al departamento y la universidad, por el apoyo prestado estos años.
I would also like to mention the European partners of this journey. Thanks to Hans, Patricia,
Rahul, Qing, Maryan, Viktor, Adam, Rik, Remco, Daniela and Elly for three great months at Vrije
Universiteit. Also, thanks to the external reviewers of this thesis, Didier and Patricia, for their
valuable advice.
Gracias a mis padres, a Nacho, y al resto de mi familia, por darme su cariño y apoyarme en
todo momento, aunque haya escogido este camino tan complicado.
Y por último, no me olvido de todos mis amigos, que han sabido comprenderme y animarme
en los momentos menos buenos. Gracias Perico, Marisol, Bea, Carlos, Irene, Ignacio, Miguel,
Evaristo, y muchos otros que se merecen estar en esta lista.
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Abstract
In a globalized world enterprises have to face greatly increased competition, demanding agility
to release new products and update to customer demands. From a technological perspective,
these factors have lead to the adoption of the service oriented paradigm, which must be
supported by a robust IT infrastructure. One of the main competitive factors is the quality of
service provided, ensuring the elements of the services portfolio have high availability, and
unnoticeable response times. These non functional requirements are partially supported by
the execution infrastructure, composed by multiple, heterogeneous servers with specialized
roles, distributed over a network. However, the combination of these factors greatly
complicate technical management processes of the infrastructure such as diagnosing the
environment status, planning the required changes or applying corrections to improve its
performance. Frequently those tasks are manually executed by an IT administrator, but this
approach is very costly and hampers the desired agility. An increased degree of automation in
service change management operations is a must for obtaining the potential advantages of the
service oriented approach.
This dissertation proposes an enterprise service management architecture with automated
operation capabilities. One of the cornerstones of this proposal is an information model of all
the relevant management information. The proposed model builds upon the common ground
of the main information model standards to characterize both the logical artifacts, originated
from the service development process, and the managed runtime elements, ranging from
hardware nodes to the provisioned services. The model not only allows to represent different
environment configurations but also provides well defined expressions for validating the
correctness of any system state, and automatically obtain the required configuration values for
some of the managed resources. In addition to the information model, the business objectives,
desired functionality, and changes to the domain have been defined using the same concepts.
This way, the effect of external changes to the environment configuration, as well as its impact
on the stability and functionality of the environment can be automatically analyzed.
After defining all the relevant management information through a cohesive model that covers
both technical and business aspects, this dissertation proposes an algorithm for automating
the execution of service configuration change management activities, based on pseudo-
boolean SAT techniques. The proposed algorithm analyzes the current state of a managed
domain and, in case the situation is not stable or desirable obtains the set of required changes
to restore the system to its intended functionality. Instead of defining separate processes for
installation, reconfiguration, or removal of selected elements, the same reasoning steps
produce a change plan with the necessary operations.
Finally, after taking into account the requirements of enterprise applications, an architecture
for a service change management system has been proposed, based on the described models
and reasoning techniques. A prototype of the proposed architecture and algorithms has been
developed and validated through a set of case studies taken from the context of a real banking
organization. The results of the validation show how different situations such as initial
provisioning or reaction to hardware malfunctions are correctly addressed by the architecture,
as well as how the proposal scales with increasingly larger environments and defined services.
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Resumen
La globalización ha incrementado el nivel de competencia entre las empresas, obligando a una
mejor adaptación a las necesidades de los clientes, y a recortar los ciclos de desarrollo de
nuevos productos. Estos factores pueden ser soportados a nivel técnico por una
infraestructura orientada a servicios, que tenga suficiente robustez para apoyar las
necesidades del negocio. En este contexto, mejorar la calidad de servicio es un posible factor
para diferenciarse de la competencia, ofreciendo servicios con alta disponibilidad y un tiempo
de respuesta imperceptible. Con el objetivo de poder soportar estos requisitos no funcionales
la infraestructura base de ejecución está formada por un conjunto de servidores
heterogéneos, distribuidos sobre la red de la compañía. La combinación de estos factores
complica enormemente las actividades gestión de los servicios, como el diagnóstico de la
situación del entorno, o la identificación de los cambios necesarios para corregir una incidencia
o mejorar el rendimiento de los servicios en ejecución. Estas actividades son frecuentemente
realizadas manualmente por un administrador de sistemas, aunque el esfuerzo que conlleva
este tipo de cambios imposibilita aplicarlos con la agilidad necesaria. Para aprovecharse de las
ventajas de la orientación a servicios resulta necesario incrementar el nivel de automatización
de estos procesos.
Esta tesis propone un conjunto de modelos y técnicas para automatizar las operaciones de
cambios de configuración a los servicios empresariales. Como base de la propuesta se propone
un modelo genérico que captura toda la información del entorno relevante para su gestión,
con el objetivo de ser automáticamente interpretable por los sistemas de control de los
cambios. El modelo se basa en las principales abstracciones definidas en los estándares de
gestión, y sobre ellas modela tanto los elementos lógicos, que provienen directamente del
proceso de desarrollo, como los elementos del entorno de ejecución, caracterizando desde los
nodos hardware hasta los servicios en operación. El modelo no sólo permite representar la
configuración del entorno, ya que también define cómo validar la estabilidad del mismo, así
como obtener el valor correcto de configuración de algunos elementos. Sobre estos mismos
conceptos también se ha formalizado la definición de los objetivos de negocio que debe
cumplir el sistema, o los cambios que puede experimentar. Esto permite un análisis
automático del efecto de un cambio externo en la configuración actual, así como estimar el
impacto del cambio en la estabilidad o funcionalidad del sistema.
Tras capturar toda la información relevante de gestión con los modelos propuestos, esta tesis
propone un algoritmo para gestionar las actividades de gestión de cambios, basada en un
sistema resolutor SAT pseudo booleano. El algoritmo analiza el estado actual del dominio
gestionado y, en caso de que la situación actual no sea estable o deseable, obtiene un
conjunto de cambios que restaurarán la funcionalidad deseada del sistema. En lugar de definir
procesos independientes para instalar, reconfigurar, o eliminar componentes del sistema, la
solución propuesta es capaz de generar un plan de cambios con las operaciones necesarias
mediante el mismo procedimiento.
Por último, teniendo en cuenta los requisitos propios de las aplicaciones empresariales, se ha
propuesto una arquitectura de un sistema de gestión de cambios de servicios empresariales,
basada en los modelos y técnicas de razonamiento descritas anteriormente. También se ha
desarrollado un prototipo de esta arquitectura, que se ha validado mediante un conjunto de
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
casos de estudio extraídos del contexto de una organización bancaria. Los resultados de este
trabajo de validación muestran cómo la arquitectura propuesta es capaz de tratar
correctamente distintas situaciones, desde el aprovisionamiento inicial de un nuevo servicio
hasta el diagnóstico y reparación de una avería en uno de los dispositivos hardware del
entorno. Finalmente, la escalabilidad de la propuesta se ha evaluado mediante una serie de
experimentos con modelos del entorno gestionado y la lista de servicios disponibles
progresivamente de mayor tamaño.
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Table of contents
1. Motivation ............................................................................................................1
1.1. Research Methodology ............................................................................................. 3
1.2. Structure of the document........................................................................................ 4
3. Objectives ........................................................................................................... 41
xv
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
xvi
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
xvii
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
List of Figures
Figure 1 Functional view of an Enterprise SOA Infrastructure...................................................... 2
Figure 2 The TMN management pyramid ..................................................................................... 8
Figure 3 Dimensions of management ......................................................................................... 10
Figure 4 The ITIL Service Support Process ................................................................................... 12
Figure 5 eTOM operations level 2 process matrix ...................................................................... 13
Figure 6 eTOM Service configuration & Activation Level 3 decomposition................................ 14
Figure 7 Representation of services as an aggregation of MIB described resources ................. 16
Figure 8 Main Elements of the CIM Core Model ......................................................................... 16
Figure 9 CIM J2EE Application Model ......................................................................................... 17
Figure 10 SDD Deployment Descriptor Schema .......................................................................... 19
Figure 11 Model transformation deployment architecture ........................................................ 24
Figure 12 CHAMPS architecture .................................................................................................. 27
Figure 13 Transactional change management architecture ....................................................... 28
Figure 14 Policy framework architecture. ................................................................................... 29
Figure 15 Planit Architecture. ..................................................................................................... 30
Figure 16 Semantic operations network management............................................................... 32
Figure 17 Architecture of an autonomic system. ........................................................................ 34
Figure 18 FOCALE Architecture ................................................................................................... 36
Figure 19 Architectural Framework for Dynamic Automatic Configuration ............................... 38
Figure 20 PhD Thesis Scope Definition ........................................................................................ 41
Figure 21 Resource model........................................................................................................... 47
Figure 22 Logical and Runtime Resources................................................................................... 48
Figure 23 Composite Resources model ....................................................................................... 49
Figure 24 Containment Relationship model ............................................................................... 50
Figure 25 Sample model of a managed node with Hosted and Composite Resources .............. 51
Figure 26 Binding Model ............................................................................................................. 52
Figure 27 Stability check types and search scopes ..................................................................... 54
Figure 28 Updated resource model with types ........................................................................... 57
Figure 29 Set Representation of the Type Containment ............................................................ 57
Figure 30 Constraint definition model ........................................................................................ 63
Figure 31 Supported types by host definition ............................................................................. 64
Figure 32 Resource with visibility information ........................................................................... 66
Figure 33 Resource Visibility Scopes ........................................................................................... 66
Figure 34 Bound Property and Binding ....................................................................................... 67
Figure 35 Dependent Resource Model ....................................................................................... 68
Figure 36 Versioned Resources model ........................................................................................ 70
Figure 37 Deployment Unit Model.............................................................................................. 73
Figure 38 Bound Property Configuration Example ..................................................................... 76
Figure 39 Context Aware Property Configuration Example ........................................................ 78
Figure 40 OSGi component lifecycle and proposed shared lifecycle .......................................... 79
Figure 41 Runtime model ............................................................................................................ 80
Figure 42 Tree view of the runtime model ................................................................................. 83
Figure 43 Container Resource Configuration Model .................................................................. 84
xviii
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
xix
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
xx
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Glossary
ACEL: Autonomic Computing Expression Language.
ADL: Architecture Description Language.
AM: Autonomic Manager.
API: Application Programming Interface.
B2B: Business-to-Business
B2C: Business-to-Consumer
BPEL: Business Process Execution Language.
BPMN: Business Process Modeling Notation.
BRM: Business Rules Management.
CDN: Content Delivery Network.
CENIT: Consorcios Estratégicos Nacionales en Investigación Técnica.
CI: Configuration Item.
CMDB: Configuration Management DataBase.
CORBA: Common Object Request Broker Architecture.
CPU: Central Processing Unit.
CRM: Customer Relationship Management.
D&C: Deployment and Configuration
DBMS: Database Management System
DIT: Departamento de Ingeniería de servicios Telemáticos.
DMTF: Distributed Management Task Force.
DMT: Device Management Tree.
DPWS: Device Profile for Web Services.
DS: Discovery Service.
DSL: Domain-Specific Language.
DU: Deployment Unit.
EAI: Enterprise Application Integration.
EAR: Enterprise Archive file.
ECA: Event Condition Action.
EJB: Enterprise JavaBeans.
EMF: Eclipse Modeling Framework.
EPL: Eclipse Public License.
ERP: Enterprise Resource Planning.
ESB: Enterprise Service Bus.
eTOM: enhanced Telecom Operations Map.
ETSI: European Telecommunications Standards Institute.
xxi
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
xxii
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
xxiii
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
xxiv
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
1. Motivation
The IT infrastructure has increasingly taken a central role in modern enterprises, which can’t
be nowadays conceived without it for working efficiently and competitively. The use of IT
allows adopting faster and more efficient business processes, becoming a key factor in
competitive advantage in the last years. Currently we can even talk about Service-Oriented-
Organizations [78], whose business model is providing IT computing or business-service
functionalities to other organizations and companies.
The increased importance of IT infrastructure has led to important investments in
infrastructure, which must be amortized over long periods of time. However, the IT systems
evolve rapidly, rendering purchased units as legacy technology before their lifetime has been
completed. On top of that, it is necessary to upgrade applications and enterprise services, in
order to continuously improve process efficiency to gain a competitive advantage. This force
lends to adopt new technologies for Business to Business (B2B) and presentation services.
Thus, systems are composed by not only legacy systems, mainframes, databases, but also Java
Enterprise Edition (JEE) application servers, or Business Rule Manager (BRM) systems. The
resulting enterprise infrastructure is a highly complex distributed system, composed by dozens
of different servers and application containers, deployed over hardware machines
interconnected through complex network distributions, containing firewalls, virtual private
network and other access restriction and security mechanisms. To further complicate the
situation, server vendors try to differentiate from each other by providing additional features
over the common standards, which effectively breaks the interoperability between artifacts
from different providers.
Interoperability between all the components of the IT infrastructure is usually achieved by
adopting a higher-level integration layer, which is based on Service Oriented Architecture and
Business Process Management (SOA/BPM). This way, each artifact of the system is presented
as a service, hiding its implementation details and providing a uniform high-level view. Services
are published in directories and connected through an Enterprise Service Bus (ESB), where
additional non-functional capabilities can be added to the communications, such as logging, or
data transformation. On top of that, BPM technologies, such as BPEL (Business Process
Execution Language), orchestrate the activities, bridging the gap between the IT infrastructure
and the business processes.
The SOA / BPM approach has contributed to maximizing the use of the existing IT
infrastructure, but managing the system is still a pending issue. Although they are now treated
uniformly with the abstraction of services, managers still need to cope with the enormous
heterogeneity and complexity of the infrastructure.
Traditional management processes are identified with human operation over a management
administration console. Monitoring information and events are collected and aggregated into
the console, and the human administrator invokes specific operations on the environment
based on the identified objectives and the collected information.
-1-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Operations are executed in scripts, containing the exact set of machine-specific instructions for
achieving a concrete task. Because of that, scripts lack reusability and suffer tremendously
from the increased complexity, distribution and heterogeneity of current IT systems.
This approach has severe limitations, which become more evident as the complexity and
heterogeneity of the managed systems keep growing. Changes to the environment impact the
complete management process, as the configuration and workflows must be manually
adapted to the specifics of the environment. Management operations are manually created
and composed, requiring lots of knowledge from the administration experts, and can hardly be
reused. A runtime system configuration has component and configuration dependencies
between heterogeneous artifacts, propagating the impact of any change or error throughout
the whole system.
The importance of addressing these limitations is widely acknowledged not only by the
academia but also by the main industrial companies. As a reference, the third edition of the
ITEA Roadmap for Software Intensive Systems and Services [51], defined by the ITEA2
Consortium industrial members (such as Nokia, Phillips, Alcatel, Siemens or Telefonica), selects
it as one of the four main challenge areas which should be addressed by the European
research community. The roadmap highlights the criticality of improving the deployment and
-2-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
reconfiguration processes of large, heterogeneous distributed systems, while at the same time
supporting the non functional capabilities such as efficiency, or reliability.
There is clearly a necessity of finding models, paradigms and mechanisms to address all the
different factors, lessen human intervention and automate parts of the enterprise service
management processes.
-3-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
the changes which could impact the environment were identified and classified. With these
concepts properly defined, the base function of the management system was defined as
obtaining and applying the required changes to the managed domain in order to restore its
correctness.
Once the problem was completely established and modeled, alternatives were looked for
defining an algorithm with can reason over all the information from the domain, and
automatically suggest a set of changes that can return the system to a desirable and stable
state, regardless of the type of change. After evaluating the available approaches a pseudo-
boolean satisfiability (PBSAT) solver was selected as the base technique. The proposed solution
converts the domain information into boolean variables, which represent the potentially
reachable states by applying a set of operations, and functions, which restrict the valid
combinations of these variables, in order to ensure that the obtained result represents a
correct and stable configuration of the environment. Once the resolution engine has been
completely configured and invoked, variable assignations are interpreted, obtaining from them
the set of required changes which need to be applied to the domain in order to achieve the
desired configuration.
On top of that, in order to enable the adoption of the proposed solution in an enterprise
context, a reference architecture was defined which would support the complete functionality
of a service change management system. The architecture components leveraged the
proposed algorithm as the core reasoning module of the enterprise management architecture,
and described how to integrate it with the rest of the enterprise infrastructure.
Finally, a prototype implementation was developed, and a number of validation tests were
defined an executed. The results from those experiments verified that the solution described
at this dissertation supports the identified management functions, and can successfully
operate over a set of different scenarios with the described abstractions and algorithms.
After this chapter, which provides the motivation for the work and the research approach
followed, the next chapter details the results of the analysis of the state of the art, comparing
the main standardization initiatives in the information modeling, and the main approaches for
systems management.
Chapter three defines the objectives of the work under this thesis, after determining that the
current solutions did not address all the concerns about the automation of service
configuration activities. The next two chapters detail the main contributions of this work.
Chapter 4 describes the proposed model abstractions for representing heterogeneous,
distributed management, as well as a specific model for service-based applications
management. Chapter 5 defines the objectives of a service management system and
characterizes the set of changes that can occur at an environment. For the specific case,
internal operations allowed by the management system are defined. After those concepts
have been clearly established, an algorithm based on pseudo-boolean SAT solvers is described
for solving the management functions. This contribution takes as only input the previously
defined models, and automatically obtains a stable and desirable configuration for the system.
-4-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Chapter 6 describes a reference architecture for the service management system that uses the
proposed models and algorithm to automate the main tasks. The results from a set of
validation experiments are discussed in chapter 8.
Finally, the last chapter details the main conclusions of this work, as well as a description of
the most interesting future research activities which have been identified during the
development of this work.
-5-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
First of all, I will try to define what we understand for management, using Hegering’s [42]
definition of networked system management: “the management of networked systems
comprises all the measures necessary to ensure the effective and efficient operation of a
system and its resources pursuant to an organization’s goals”. As this is a very broad topic,
contributions usually focus on managing specific parts of the system. If the fundamental
elements are the network elements we will speak about network management. If the main
entities are the end systems we will use the term system management. As the software layer
relevance keeps growing, the management of existing applications has acquired a greater
relevance, to the point of opening the field of applications management. The SOA (Service-
Oriented Architecture) paradigm has spread over distributed architectures, shifting the focus
from the software components to the runtime functionality exposed to the complete system.
Service management works at that abstraction level. Finally, business processes, strategies,
and policies are the focus of business management. This classification of management levels
was established by The TMN (Telecommunications Management Network) [102], a standard
defined by the telecommunications operators derived from OSI. Next figure shows the TMN
management pyramid.
-7-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
-8-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
-9-
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Functional
Areas
Security
Network Performance
Types
Accounting
Internet
Configuration
Corporate
Stages
Network
WAN
Fault
MAN Change
Operation
LAN Installation
Planning
Enterprise Management
Video
System Management
Information
Network Management
Multimedia types
Management
Level
Figure 3 Dimensions of management
- 10 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The most extended version of ITIL is v2, which focuses on two main process areas: Service
Delivery and Service Support. On May 2007 the ITIL Refresh Project released an updated
version, frequently referred as ITIL v3, with a new focus on the business value provided by the
different IT processes [54].
ITIL is a very large specification, from which I will focus on the related concepts for service
management, defined in the Service Support process. This part of the specification contains
the best practices obtained by professionals over the service maintenance process, including
activities ranging from incident management to the solution of the problems. The next picture
shows the main elements of the model. In the central part it can be seen one of the central
pieces of ITIL: The Service Desk, which is a unified point of interaction with all service users
(user level). It handles all incidences, queries and requests, acting as the only interface for all
the service support processes.
The other central element of ITIL is the Configuration Management DataBase (CMDB), seen in
the lower part of the picture. This component centralizes all the management-relevant
information of the service lifecycle, including the Configuration Items and its relationships, as
well as incidences, errors, or releases.
The service support process starts when service users report an incident to the service desk.
After analyzing it, it is handed to the incident management process. In this step the incident is
recorded, and compared against the existing database of known system incidences. In case
there are no previous records of the notified issue, it is escalated to the problem management
activity, where a root cause analysis is performed in order to diagnose the technical cause of
the problem. The found solution can either be directly applied , notifying the user and closing
the process, or may need to perform changes on the software services or the base
infrastructure, which will be applied in the change management process through a Request for
Change (RFC) notification.
The change management process interprets the request and implements the necessary
changes to the services and / or the infrastructure to ensure the initial SLA levels can be
restored. This process is executed in a controlled environment, with approval of the proposed
changes to the infrastructure. These changes are executed through the release management
activity, where the impact of the modifications is evaluated and the system configuration is
adjusted to them. Finally, the configuration management process keeps the CMDB
synchronized with the updates to the Configuration Items (CI), as well as its relationships after
the system configuration has been updated to deal with the raised incident.
- 11 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Business, Customers,
Users
Management
Incidents
Tools
Queries
Inquiries
Service
Incidents Incidents Desk
Changes
Customer
Survey
Reports
Incident
Management
Releases
Problem
Management
Change
Management
Release
Management
Configuration
Management
Problems Cis
Incidents Changes Releases
Known Errors Relationships
CMDB
ITIL processes are modeled at an organizational level, with both the company departments and
external clients and providers as the process actors. Because of that, it is more focused on
high-level processes and does not provide direct links to specific technologies, processes and
architectures for realizing the best practices described in the specification.
However, ITSM frameworks should be the basic frame for management systems, as they
provide the organizational context where the business concepts are integrated with the
technical process. In [97] an example process s described for the definition of a Service Level
Management architecture. It starts from the relevant Service Support processes, and through a
set of scenarios described a management architecture supporting them.
[Link]. eTOM
The enhanced Telecom Operations Map (eTOM) [35] is a business process framework for
Internet and Communications Service Providers. It is published and maintained by the
TeleManagement Forum (TMF). Its predecessor, the Telecom Operations Map (TOM) was first
- 12 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
published by the TMF in 1998 and was superseded by the eTOM in 2001. Since 2004, eTOM is
also an ITU-T Recommendation (M.3050). The standard is one of the main components of the
New Generation Operation System and Software (NGOSS) project, whose objective is to define
a vendor-independent architecture for management systems.
eTOM defines five view levels of processes, from level 0 to level 4. The views provide
increasingly more detailed views of the processes. Level 0 defines only three fundamental
processes [7](also known as ‘process areas’): SIP (Strategy, Infrastructure and Product),
Operations and Enterprise Management. Service configuration processes would fall under the
operations area, which is also the most developed part of eTOM, as it was the scope of the
original TOM. The standard refines operations processes up to level 3.
We can see in Figure 5 the process matrix of the level 2 view of the Operations processes. The
processes are aligned to four business-level functional areas, resembling organization
departments: Customer Relationship and Management (CRM), Service Management and
Operations (SMO), Resource Management and Operations (RMO) and Supplier / Partner
Relationship Management (SPRM). The matrix is completed by four vertical business goal
areas: OSR (Operations Support and Readiness), Fulfillment, Assurance and Billing. The
technical processes related to service configuration clearly belongs to the level 2 SCA (Service
Configuration & Activation) process, although in a broader context a service configuration
workflow will possibly involve additional processes of the fulfillment category. Figure 6 shows
the level 3 process decomposition of SCA defined by eTOM.
- 13 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Service
Configuration &
Activation
Design Solution
Allocate Specific
Track & Manage Implement & Test Service End-
Resources to Activate Service
Work Orders Configure Service to-End
Services
eTOM provides a fine detailed process framework for service provisioning and configuration.
However, it does not provide proposed implementations of the activities or even guidelines, as
they are out of the scope of the specification. Nonetheless, eTOM is a fundamental reference
for the organizational aspects of service management.
- 14 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
be hard to separate the information model from the management operations (which are tied
to its), this study will cover both aspects.
2.2.1. SNMP/MIB
The Internet Engineering Task Force (IETF) defines a set of standards which aim to provide a
complete management solution for networked systems. The result of this work is the Simple
Network Management Protocol (SNMP) [9], which, in combination with the Structure of
Management Information (SMI) v2[74], and the Management Information Base (MIB) [75]
define together a simple management model, a way of describing the management objects
and a mechanism for packaging and storing the information.
The model is oriented to variables management. Information is represented by collections of
variables, either as simple primitive types or organized as conceptual tables. As this set can
grow to huge figures, SNMP provides a four-level naming system which ensures a unique
identifier for each managed variable. In this model, the two upper-level layers define the
context engine and the context name where the variable is stored. Next two levels identify
variable type and instance. This structure also eases the management functions, as the
information can be explicitly aggregated in contexts, which are frequently self-discovered, and
context variables can be easily iterated.
The SMI data model is based on conceptual tables, which have a primitive data type for each
column. Columns can be marked as index, similarly to a relational database, although the
model is much more limited (for example, only one index can be used for fast lookups). As a
conceptual table is a property, it has a unique identifier. Managed objects defined by SMI are
usually called MIB objects. A set of interrelated MIB objects is a MIB module.
Thanks to the simplicity of the data model, SNMP protocol operations are very simple. There
are read operations (get, getnext, getbulk), write operations (set) and notify operations (trap,
inform).
SNMP / SMI define low-level management operations, and a restricted data model. These
characteristics complicate building advanced functionality or automate managing processes
through these standards. Thus, it is a very successful standard, but its scope is centered on
event notification, and monitoring a large amount of simple devices. In an enterprise
environment, services are the base granularity level, and complex dependencies must be
managed. Because of that, SNMP is not well suited to automating high-level enterprise
management operations, although it is the reference standard for network management
operations.
However, there are some initiatives such as the work by Danciu et al[17], which propose an
approach to aggregate the low level properties usually managed by the defined MIBs into
coarser grained Service MIBs, with all the relevant information for the adequate management
of the instrumented service. In their work they also propose a methodology for identifying the
relevant management properties of each service, a language, the SISL (Service Information
Specification Language) for expressing the relationships between the low level resource data
and the service attributes, and an architecture for supporting those concepts. However, this
approach is focused on the service abstractions from lower level elements, but still does not
- 15 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
consider the environment-wide dependencies of the elements running the services, ending
somewhat limited in its expressivity.
Component *ManagedElement
* * *
** Caption : string
** Dependency
ConcreteComponent ConcreteDependency
LogicalIdentity
** Description : string
ElementName : string
*0..1 HostedDependency
* * * *
Ordered
Component
PhysicalElement LogicalElement
(See Core Model (See Core Model
(Physical & (Managed System
Logical Device)) Element))
- 16 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Every CIM element is an extension of the base type ManagedElement, defined in the core.
From this common base the core model defines an initial set of subclasses, such as
PhysicalElement, Configuration, or Product, which serve as abstract definitions for the CIM
extensions. Those concepts are refined in the extensions provided by the profiles, which
provide a management characterization of elements ranging from J2EE application servers to
file system directories and files).
The CIM Application Model extends the base elements with the concepts of the installed
software artifacts. It provides a model similar to the concepts handled by software vendors. A
Software Product is a unit of software acquisition, with all the provider and product
information. Products aggregate a set of Software Features, which are considered the base
units perceived by the users of the software. Each feature is provided by a collection of
Software Elements, which are the base elements actually installed in the system. The model
allows specifying constraints on these artifacts to work properly in a target environment. This
extension process is not only applied to the core model, but also profiles can be further refined
by additional profile extensions. As an example of that, the application model is further
extended by the Application_J2EE model, providing a more concrete example of servers and
applications.
ManagedElement
* Dependency
See Core Model, page 1 *
ManagedSystemElement Capabilities
*
LogicalElement
J2eeManagedObjectCapabilities
See Core Model, page 2
StateManageable: boolean
StatisticsProvider: boolean
EnabledLogicalElement EventProvider: boolean
See Core Model, page 2 InstanceID {Key} See Core Model, page 2 See System Model, page 9 InstanceID {Key}
StartTime: datetime JavaVendor: string
JavaVersion: string
StartRecursive() : uint32 Node : string
1..n
AdminDomain
J2eeResource
DeploymentDescriptor: string
ApplicationSystem
InstanceID: {Key}
J2eeDomain
See Application Model
J2eeServerUsesJVM
*
1
J2eeModule J2eeServerInDomain J2eeResourceOnServer
0..1
J2eeApplication J2eeServer
1
1..n J2eeApplicationModule Vendor: string
* Version: string
1..n
1
* J2eeApplicationHostedOnServer
In addition to the environment model defined by the schemas, CIM provides a management
interface for the defined elements. CIM Management operations can either be explicitly
defined, as methods in the modeled objects, or be implicit operations, covering model
instance creation, modification and reading, in a similar way to SNMP operations.
- 17 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The main strength of CIM is the depth and scope of the specification, as it covers an enormous
range of system management elements, at different degrees of details. However, because of
the enormous complexity of its scope, it should not be treated as a monolithic standard which
has to be completely supported. Instead, depending on the specific requirements of the
supporting management systems, selected parts of CIM (including the core and base
dependencies) will be implemented, and additional extensions to the model may be defined
over those common concepts.
However, the main strength of CIM can also be seen as its main limitation in some fields, as
the modeling effort for CIM-based models carries a tremendous overweight, and must be
continuously updated with the newly appearing requirements of an enterprise runtime
environment.
[Link]. WS-MANAGEMENT
WS-Management is another DMTF standard [24] which was defined in April 2006. It specifies a
Web Services based protocol between managed resources and the management
infrastructure. Managed resources have an associated resource class, dictating its information
model. Although it could be theoretically decoupled, the information model for WS—
Management is the CIM object model. The standard also defines the allowed operations over
the resources, composed by the following set: get, put, create and delete resource instances,
iterate over managed collections, subscribe to notifications from the resources, and execute
specific management methods with strongly typed parameters. These operations are similar in
nature and degree of abstraction to the SNMP equivalents. In [90] the performance of WS fine
grained protocols is compared against traditional SNMP protocol, with the results as it was
expected clearly favoring the latter one for fine grained scenarios. Clearly, Web Services are
more suited to a SOA model, implying higher-lever, coarser-grained way of management, but
they are not the best option for supporting every operation of a management architecture.
- 18 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
It is always associated to a parent package descriptor. Figure 10 shows the main attributes of
this descriptor.
The Deployment Descriptor contains a set of Atomic Content Elements, which can be of three
different kinds: InstallableUnit, ConfigurationUnit, and LocalizationUnit. Each one of them
defines a set of artifacts. An artifact defines inputs, outputs, variables and types associated
with the artifact files (installation scripts, .sql files, .jar files, zip files…). Each one is processed
by a targetResource, and will generate a set of ResultingResource and ResultingChange
elements. A descriptor can either contain multiple atomic elements or one
CompositeInstallable, which aggregates a set of atomic elements into one logical entity.
Deployment operations are executed over an environment model, which is based on the
concept of resources (ResourceType). Resources have a name, a set of properties and a type
(without an attached topology). Additionally, resources can be nested through a host
relationship. Some of these elements are also targetResources, which process the deployment
artifacts. Those elements are frequently containers which host the processed resources. The
topology includes descriptions of every resource involved in the deployment. A resource
participates if it is required for, created by or modified by the deployment activities. It must be
noted, that those resources are “logical”, as they are not actual resources existing in the
deployment domain. The mapping from logical to real resources is out of the scope of this
specification.
The base specification defines a very generic model, without modeling specific types of
resources which can appear at the environment. However, profiles can extend the base
standard to provide those specific domain models. As an example, the SDD Starter Profile,
although not mandatory, strongly suggests combining SDD with CIM for the domain modeling.
SDD complements CIM providing a logical model of software components and a set of defined
operations which allow executing deployment operations, whereas CIM application model
provides a vendor-centric view of the managed software artifacts.
- 19 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 20 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 21 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
can be validated against the intended state described in the model. The actual
service/system and its model together enable a self-healing service/system ― the ultimate
objective. Models of a service/system must necessarily stay decoupled from the live
service/system to create the control loop.
3. Models are units of communication and collaboration between designers, implementers,
operators, and users; and can easily be shared, tracked, and revision controlled. This is
important because complex services are often built and maintained by a variety of people
playing different roles.
4. Models drive modularity, re-use, and standardization. Most real-world complex services
and systems are composed of sufficiently complex parts. Re-use and standardization of
services/systems and their parts is a key factor in reducing overall production and
operation cost and in increasing reliability.
5. Models enable increased automation of management tasks. Automation facilities exposed
by the majority of services/systems today could be driven by software ― not people ―
both for reliable initial realization of a service/system as well as for ongoing lifecycle
management.
The latest characteristic is especially interesting as it points to a key requirement of service-
centric information models: they must contain sufficient relevant information about the
domain to enable a more automated approach to the management functions. This way, not
only the elements must be characterized by the model, but also the relationships and
constraints must be properly specified, in order to enable an automatic validation.
With these objectives in mind, SML contributes two main extensions to the XML Schema. First,
it allows defining references between model instances which span over multiple documents,
extending the single document support of base schema. This way, the structure of the
complete domain can be validated among multiple, complementary model sources. In order to
further enrich the expressivity of the models, additional constraints can be defined for the
models as rules, which are Boolean expressions that constrain the structure and content of
documents in a model. Rules are defined in Schematron[52] and XPath[10].
Because of its recent public release the impact of this specification is still low, so it has not
been taken into account for the context of this dissertation. However, by taking into account
the relevance of the standards organization and the companies involved in its definition (such
as BEA, Microsoft, HP or IBM) it will probably be a very important solution in the future.
Heterogeneous distributed service management brings a new set of complex factors which
must be addressed by new initiatives, originating numerous contributions from the scientific
community. In [112] a classification of the different approaches to software and services
deployment is provided, which can be used as a reference. This taxonomy distinguishes
between manual, script-based, language-based, and model-based paradigms. I will structure
this section following that classification, and extending it with behavior-based approaches:
policy and ontology-based management and the autonomic computing paradigm. However,
- 22 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
these categories are not perfect partitions of a set; there is overlapping to a certain extent
among the selected categories.
- 23 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
from the target environment, stored in the data center model. The result of this search is a
physical deployment topology, where the logical elements have been assigned to the physical
resources existing in the environment. The final product is an executable deployment and
configuration plan, which is handled to the provisioning system.
The proposed deployment topology generator is the center piece of the architecture, which
can be considered an inference engine, similar to the ones presented in the sections about
Policy-Based Management and Ontology-Based Management, described in the following
sections.
The most interesting contribution of this approach is the management of complexity not only
by modeling abstraction but also through problem partitioning into different model artifacts.
This approach also has the added benefit of converting human knowledge into models which
can be reused in different contexts, improving the process automation. However, it is not clear
how scalable this approach is, neither how do the different models seamlessly convert into
one or another. Also, the central element automatically applies multiple model
transformations potentially coming from different sources, which could be not consistent
- 24 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
between themselves. Finally, the proposed architecture does not build on any well-established
information model, greatly complicating its adaptation to different scenarios.
[Link]. Smartfrog
The SmartFrog language [37] is a prototype-based language for distributed systems
configuration. Components are the base language elements, and are described by a set of
properties. Components can have two types of relationships: containment and inheritance.
Inheritance provides a way to address the lack of typing in the components and properties.
The most powerful feature of the model is the capacity to define values by reference, which
may be resolved at definition time or lazily, in the runtime environment, enabling dynamic
configuration of some attributes. Finally, the model also defines some basic operations (i.e.
concat for creating URLs).
A system is defined as a collection of applications running over a distributed collection of
computing resources. Applications are a collection of components, defined statically with a
descriptor or generated dynamically at run-time. Finally, components are Java objects that
implement a specific API, which binds them to the SmartFrog component model lifecycle.
Components may create and manage other objects, including processes and programs written
in other languages. This characteristic of Smartfrog makes the framework tightly coupled with
the desired solutions, as it is necessary to define a description model, as well as a Smartfrog
- 25 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Java component for each managed element. Smartfrog defines a distributed deployment
infrastructure, where the infrastructure elements read component descriptions, instantiate
them, and manage the lifecycle of the created applications and components.
In [81] two extensions to Smartfrog are described, which extend the runtime management
functionality of the deployment infrastructure. Anubis extends the automatic discovery
mechanisms of the deployment infrastructure (adapting accurately the runtime information),
and continuously monitors the system for failures in the agents. Woodfrog is a runtime
configuration snapshot extension. Woodfrog records several states of the system and allows
reverting back to stable configurations. The combination of these extensions with the base
functionality enables infusing autonomic behavior to Smartfrog-managed systems.
Smartfrog presents some very interesting ideas, specially related to the expressiveness of
configuration properties and dependencies, which are also built from a simple base. However,
its high level of intrusiveness, as well as the considerable effort for applying it to a very
heterogeneous environment hampers its applicability.
[Link]. CHAMPS
The CHAMPS System [56] is a prototype under development at IBM Research for CHAnge
Management with Planning and Scheduling, providing an execution platform for IT Change
Management activities (one of the disciplines of the ITIL best practices). CHAMPS creates
automatically executable plan workflows applying the modifications specified in a Request For
Change (RFC) Document.
- 26 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
CHAMPS consists of two main elements: The Task Graph Builder (TGB) and the Planner &
Scheduler (P&S). TGB evaluates temporal and location dependencies between the elements
involved in the change operation, using as inputs the specified RFC document, the deployment
descriptors, expressed as SDD elements, and a runtime dependency model of the system
which contains remote dependencies among elements. With all that information, the system
produces a Task Graph, which is an abstract workflow defining tasks, their dependencies, and
time estimations for each one of them. This model does not contain specific information about
the runtime.
The P&S converts the Task Graph into a Change Plan, taking into account the defined policies &
SLAs. The Change Plan is another workflow, defined in WS-BPEL, with tasks assigned to specific
runtime elements, and deadline estimations. This executable tries to maximize parallel
execution of activities [57], thanks to the use of Constraint Satisfaction Problem techniques in
the logic of the P&S. As a consequence of that, workflows have enhanced temporal constraints
expressivity, adopting the available temporal constraints of GANTT diagrams. This way, tasks
can depend with the four following expressions (Finish-To-Start, FS: task A must finish before
task B begins, Start-to-Start, SS: task B cannot start until A does. Finish-to-Finish, FF: task B
cannot finish before task A does. Start-to-Finish: task B cannot finish until task A starts).
The researchers have compared the benefits of the automation provided by CHAMPS with a
manual approach [55], taking as reference the installation and configuration of the
SPECjAppServer, a multi-tier J2EE application which serves as benchmark for the different
application server vendors. The metrics obtained show a significant reduction in the human
intervention, and about 33 percent decrease in required installation time.
[Link]. RollbackITCM
One of the most frequent assumptions for configuration plans is the assured correct execution
of each activity. However, this assumption makes failures in the configuration operations leave
- 27 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
the system in an unstable state. The paper [71] proposes a general architecture for dealing
with failures over the execution of ITIL change management process.
Conceptually, rollback execution is obtained by applying the concepts of atomicity and
transactions to the execution of configuration change plans. In order to do so, it builds on the
BPEL process definition engine base mechanisms, marking initially some activities as atomic,
and at a later stage, processing the initial plan in order to generate a rollback one.
This proposal clearly describes how to cope with failures in the execution of activities,
although it still imposes a very strong requirement on the actual operations: each atomic
operation must have a reverse one.
- 28 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
the defined action. There is a generalization of this model, called Event Trigger Response [5]
which separates the three base elements, allowing rule definition by composition of the
separate parts.
The IETF and DMTF have jointly defined a policy model and architecture [80]. This model
defines policies are simple condition rules, in the form of if <condition(s)> then <action(s)>.
The trigger part of the ECA paradigm is left open for the policy managers. The model is object-
oriented, extending CIM core classes, and can be serialized to both XML format and LDAP-like
tabular format.
We can see in the next figure a typical architecture of a policy-based system. A policy
management tool allows creating, modifying and deploying policies, storing them in a policy
repository (PR). Policy Decision Points (PDP) communicate with the repository, interpret the
policies and transmit the results to the Policy Enforcement Points (PEP). Finally, PEPs apply the
policies to the system. The roles of PEP and PDP are often taken by the same device. An event
monitoring system sends monitored events from the environment to the PDPs to trigger the
execution of relevant policies.
PBM can be applied to the configuration of distributed systems for contributing high-level
requirements to the management loop, such as SLA, or load balancing policies. The behavior of
a system can be partially determined by the specific policies, hiding the managers from the
complexity of the system.
[Link]. PlanIT
The PlanIT system [4] is an automatic configuration change planner for distributed systems. It
supports both initial provisioning planning as well as dynamic reconfiguration planning. PlanIT
uses an environment model derived from ADL. The model defines components, connectors
and machines (places where components are deployed). Every model component has a state.
PlanIT uses PDDL for defining these environments as well as the plan model.
- 29 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
In order to construct a configuration plan the following input is required: the domain
elements, the initial state, the goal state, and basic utility functions. Those elements contain
not only the environment topology information but also additional component properties such
as operation metrics or SLA requirements. When the collected information is sufficient to
generate a plan PlanIT calls LPG (a PDDL planner) and iteratively obtains valid deployment
plans. When the timeout finishes, PlanIT chooses the best plan among the obtained ones. The
generated plan is a sequence of steps, scheduled in specific time intervals, and with an
estimated duration for each step. There will be a different sequence of operations for each
environment machine.
The combination of PDDL and a planner builds on the same foundation of policy-based
management, applying well-known techniques from the intelligent systems domain into
automatic operations inference from the stated problem. The main limitation of this approach
is the use of a proprietary environment model described in PDDL The required effort for
describing with sufficient detail a heterogeneous enterprise system severely limits its
applicability, as well as the already mentioned difficulties of establishing exact timing for the
execution of the operations. Additionally, the authors provide an analysis of PlanIT-LPG
performance for generating plans for increasingly large environments, and it shows the
scalability limitations of an automatic planner solution.
- 30 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
knowledge about the contents of one or more related subject domains throughout the life cycle
of its existence. These entities and relationships are used to represent knowledge in the set of
related subject domains. Formal refers to the fact that the ontology should be representable in
a formal grammar. Explicit means that the entities and relationships used, and the constraints
on their use, are precisely and unambiguously defined in a declarative language suitable for
knowledge representation. Shared means that all users of an ontology will represent a concept
using the same or equivalent set of entities and relationships. Subject domain refers to the
content of the universe of discourse being represented by the ontology.
An ontology commitment represents a selection of the best mapping between the terms in an
ontology and their meanings. Hence, ontologies can be combined and/or related to each other
by defining a set of mappings that define precisely and unambiguously how one node in one
ontology is related to another node in another ontology.
As ontologies are bound to a subject domain, if the definition is narrowed to the domain of
network and systems management it becomes the following:
An ontology for network and system administration is a particular type of ontology whose
subject domain is constrained to the administration of networks and systems. Administration is
defined as the set of management functions required to create, set up, monitor, adjust, tear
down, and keep the network or system operational. One or more ontologies must be defined
for each device in the network or system that has a different programming model.
The definition clearly shows how ontologies can help in the management of enterprise
systems. Ontologies can in principle provide a better modeling of the managed environment,
as well as the management operations. Ontologies can also enable interoperability and a
reference frame for management devices from different vendors and protocols.
Many of the advantages ontologies provide have already been provided by environment and
management operations models. However, ontologies’ expressivity goes beyond standard
object-oriented models, resulting in these additional capabilities:
Ontologies allow greater expressiveness in object relationships compared to standard
modeling such as UML (dependencies, associations, aggregations, and inheritance).
With ontologies it is possible to express relationships such as time (past, future),
synonyms and antonyms.
As it is the case with expert systems, in an ontology-based system the initial
knowledge can be automatically extended through reasoning with the available facts.
Ontologies provide the necessary means to identify the same concepts expressed in
different formats (e.g, two equivalent management interfaces from servers from
different vendors).
There are several specific languages for defining ontologies, such as Semantic Web’s OWL [19].
However, models and ontologies are not completely separated domains. It is possible to enrich
some well-established models so that they comply with the requirements to become
ontologies. In [67] the CIM metamodel is extended using the Object Constraint Language (OCL)
[117] to contribute the strictness required for an ontology definition.
However, the adoption of ontologies to heterogeneous domains has been slower than
expected, as they demand a considerable greater modeling effort, requiring behavioral
- 31 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
In order to alleviate this interoperability problem, this work proposes the adoption of
ontologies. This way, each different management protocols is defined as an ontology. On top
of that, mappings between different ontologies are defined, so that all of them form part of
the same knowledge base. This way, an ontology-based manager can work with the
aggregated set of knowledge from all the systems managed by heterogeneous protocols. As it
can be seen in the picture, the proposed ontologies allow a mapping between the main
network management information models.
- 32 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 33 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[Link]. PMAC
Researchers from the IBM Watson Research Center have developed PMAC (Policy
Management for Autonomic Computing) [2], an autonomic manager (AM) whose behavior is
controlled by policies. PMAC architecture consists of a policy definition tool (PDT) for creating
policies, the policy editor storage (PES), for providing policy deployment and persistence, the
AM and the managed resource sensors and actuators (MR libraries) for policy enforcement.
The AM manages one or more MR through the deployed sensors and actuators.
- 34 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
PMAC Policies are defined in the ACPL (Autonomic Computing Policy Language), an XML-based
rule definition language. ACPL rules are composed by four components: condition, action,
priority, and role. The addition of a role allows scoping the policy to a subset of the resources
controlled by the autonomic manager, improving the scalability of the autonomic manager.
ACPL expressivity if enhanced by ACEL (Autonomic Computing Expression Language)[1], an
XML-based language for expressing conditions. PMAC also accepts externally defined policies
in CIM-SPL [3].
The AM provided by PMAC builds upon a policy ratification engine. This component uses
specialized strategies for improving the efficiency of policy resolution and improving its
automation capabilities. These techniques are: dominance check – allowing to detect whether
a new policy doesn’t affect at all the behavior of the system; conflict check – detecting
conflicting conditions in policies; coverage check – measuring the variety of events addressed
by the defined policies; conflict resolution – assigning priorities to different policies.
PMAC is a prime example of the synergies between policy-based management and autonomic
computing. It also proposes some valuable approaches to improve the performance of
automatic policy triggering and resolution, which are frequently limited to its application in
complex system because of its scalability limitations.
[Link]. FOCALE
A similar approach is adopted by FOCALE (Foundation Observation Comparison Action Learn
eReason) [53], an autonomic network manager implemented with ontology-based policies.
FOCALE is designed to deal with the main current challenges in network management:
managing heterogeneous functionality, adapting to changes in user requirements and the
environment, and integrating learning and reasoning techniques to network management.
FOCALE reasons over an information model based in DEN-ng (Directory Enhanced Networks -
Next Generation) [108], an object-oriented information and policy model for
telecommunications. This model is complemented with Finite State Machines for modeling
behavior and ontologies for embodying semantic information.
- 35 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
FOCALE architecture adds a Model-Based Translation Layer (MBTF) between the AM and the
MRs. This layer is in charge of translating vendor-neutral commands to resource-specific
operations as well as converting resource-specific data to the internal model handled by
FOCALE, isolating the AM from all the vendor-specific details. The implemented prototype
defines a configuration DSL using XText from the OpenArchitectureWare Eclipse project [28].
Models are later converted to OWL with EODM from EMF.
The AM also implements two control loops: a maintenance control loop, used when the
system works normally, and an adjustment control loop, used when there are changes in the
defined policies. Control loops are driven by ECA policies, which are executed with Drools
inference engine.
FOCALE architecture is a good example on how to implement an autonomic manager over a
potentially heterogeneous environment, as well as another example of the use of policies to
implement an autonomic manager.
- 36 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
actuations on the MR such as deploying a new instance of the application in a node nearby to
absorb an increase in the number of requests.
This research line highlights the convenience of evaluating policy-based autonomic
management for the domain of enterprise systems and service management.
[Link]. DACAR
Another interesting initiative in the field of autonomic management is DACAR (Distributed
Autonomous Component-based Architecture) [25]. The main objective of this autonomic
manager is dealing with the heterogeneity and dynamics of managed environments.
DACAR models managed resources with the OMG D&C model. On top of that, policies defined
in ECA (Event-Condition-Action) style are deployed. DACAR implements two control loops: one
for endogenous events (coming from the knowledge base, such as a new policy defined) and
another one for exogenous events (coming from the environment, such as a new node
appearing in the system). Because of the selected domain model, DACAR has been successfully
applied to CORBA systems, which are the de-facto PSM defined for D&C. However, that fact
complicates the adoption of this approach into environments with a more heterogeneous
infrastructure.
- 37 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
intent is the complete automation of the configuration operations, by taking into account the
characteristics of the managed components and the available runtime resources.
In order to support these objectives, the work proposes an architecture framework built over
CORBA services, which is formed by three main functions: a mechanism for dependency
management and representation, a hardware resource management service, and an automatic
configuration service dynamically instantiates components using the other two. A general view
of the main architecture components is depicted in Figure 19.
The architecture reasons over components, ensuring that the logical and runtime
dependencies are satisfied whenever they are instantiated. These requirements are defined in
component descriptors with the XML based Simple Prerequisite Description Format (SPDF). In
order for the components to be supported by this architecture, developers have to contribute
these descriptors. However, it is unclear how hardware requirements can be properly defined
without a common ontology for characterizing the required hardware elements, or at least a
shared taxonomy for referring to them with the same names.
The resource management service instruments the monitored elements, providing an updated
model of the available resources, and reasons about the available resources in order to
allocate the desired component instantiations. Hierarchical structure with local agents (Local
Resource Managers) and a control point, the Global Resource Manager. The Automatic
Configuration Service involves both elements in order to deploy new components to the
system or perform dynamic reconfiguration for the components in case some changes occur to
the environment.
Unfortunately, the scope of the work makes it unfeasible to be directly translated to the
enterprise heterogeneous environments, as it lacks a supporting modeling base, and the
prototype tests are built over a CORBA / 2K operating system infrastructure. However, the
reasoning approach of this architecture, can serve as reference for addressing some of the
requirements of the domain of work.
- 38 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
2.4. Conclusion
In this section I have analyzed the most relevant initiatives in the field of network and services
management. The proposals have been evaluated for their suitability to the domain of
enterprise services. In order to provide the initial context, some basic service management
concepts have been defined, and complemented with an overview of the most relevant
business-level service management processes, ITIL and eTOM. Those specifications define a
conceptual framework of the organizational for the technical problem discussed in the rest of
the analysis.
The analyzed standards aggregate the experience gathered from different sectors over the
years (mainly network management and IT administration). However, none of them provides a
complete model for enterprise service management, including the characterization of the
environment resources and the possible operations. Standards are created and evolve for
different domains, but they still haven’t catch up to the requirements of a service-based
environment. In order to efficiently manage complex heterogeneous systems, one mandatory
requirement is the availability of a uniform representation of every service-relevant
configuration element. This model should allow describing distributed systems, runtime
services, their configuration and the available operations over the environment.
This chapter also presents an analysis of numerous research initiatives for distributed service
management. By looking at the general picture the first identified commonality is that every
proposal is built over a representation model, which can be either the already mentioned
standards or self-developed models. Usually these models are designed to cover the new
requirements of services. Also, they usually have greater expressivity and can solve the
identified problems. However, without the modeling work provided by the existing standards,
many of these proposals can’t be efficiently applied to heterogeneous systems. Moreover,
complex models imply a considerable modeling effort which impacts the advantages provided
by these solutions. The most promising initiatives provide transformation mechanisms to
combine the knowledge provided by existing standards with additional expressivity and
adapted scope to solve their needs.
By looking at the functional proposes, it is clear that the main concern is the automation of the
management operations. Multiple examples have been presented which show how
approaches such as autonomic computing or policy-based management can partially automate
some of the service management processes. However, no analyzed proposal provides a
complete automated solution to the problem. Some of them address only part of the services
lifecycle (e.g, software deployment [34][93], or network and device configuration management
[2][53]). On the other hand, several contributions can support the complete problem, but use
ad-hoc information models, rather than following a standards-based solution [4][29][37][69].
Because of that, their applicability is restricted to their specific domains, they cannot be
efficiently translated to support complex, heterogeneous enterprise domains. Therefore, none
of the analyzed proposal can handle the diversity of changes that can happen to an enterprise
infrastructure and automatically react with the required changes.
- 39 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
3. Objectives
The topic of this PhD thesis is the automated management of distributed enterprise services.
For this to be achieved, the complete life cycle of the management resources (including initial
provisioning, initial configuration, reconfiguration and removal of no longer needed
components) must be supported. After the analysis of the state of the art in these topics I
concluded that there was not an existing solution which addressed at once all those
challenges, while at the same time supporting different heterogeneous environments.
The objective of this work is to propose a set of information model abstractions and
techniques which allow automating the deployment and configuration activities of enterprise
services distributed over heterogeneous environments. In addition to the metamodel
definitions for abstracting all the relevant management information and the set of functions
for supporting the management operations, the proposal must also provide guidelines on how
to develop management systems which can make use of the base contributions.
In order to better establish the context where this work will be executed I will show the
graphical representation proposed by Hegering for reflecting the dimensions of technical
management systems. The shaded area delimitates the scope where this work will be focused
on.
Functional
Areas
Security
Network Performance
Types
Accounting
Internet
Configuration
Corporate
Stages
Network
WAN
Fault
MAN Change
Operation
LAN Installation
Planning
Enterprise Management
Video
System Management
Information
Network Management
Multimedia types
Management
Level
Figure 20 PhD Thesis Scope Definition
- 41 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
By looking at these dimensions of management the first categories which can easily be
determined by analyzing the nature of the targeted domain are the type of networks involved
and the kind of information exchanged. Enterprise environments are managed at LAN level,
because the security restrictions greatly restrict the external communications of the
environment. The type of exchanged information is purely data. About the level of
management, the work on this PhD dissertation is focused on applications and service
management, whereas on a functional classification the main functional area is the
configuration. Fault management must also be covered, as the architecture must be able to
diagnose the correct functioning of the system and react accordingly if an incidence is
detected. Finally, regarding the management stages, depending on the type of situation the
proposed solution should support initial installation of applications and services, as well as
further operation and changes to the environment.
By breaking down the general objective into specific tasks the following specific objectives of
the PhD work have been identified:
To propose a reference architecture and validate the proposed models and algorithms
The described models and algorithms must be supported by a proposed reference
architecture, which can support the management activities over multiple, different
- 42 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
heterogeneous distributed environments. It must interact with the rest of the enterprise
infrastructure, instrumenting the physical elements of the environment and also
communicating with the rest of information management systems from the enterprise.
Also, a working prototype of these elements will be developed and validated against a set of
representative cases to verify that it addresses the objectives presented in this chapter.
- 43 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 45 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The main requirement for the selected language is for it to be expressive to model the
required information as well as the relationships between the elements, which has lead to
discard the lesser structured, most basic options.
If the existing standards are looked as reference, it is clear that current trends point to object
oriented models, such as the ones defined by CIM and OMG D&C. The constructs provided by
UML class diagrams constitute powerful modeling concepts not only for the information but
also to characterize the relationships through the well known abstractions of composition,
association or inheritance.
These factors have motivated the selection of an object oriented metamodel, as it combines
powerful expressivity and easy access to already available modeling knowledge (e.g. CIM
modeling abstractions). From the available object oriented languages I have selected Ecore
[106]. This language is a simplified set of MOF (Meta Object Facility), the main
metametamodeling language promoted by the OMG as part of the MDA model [59]. Ecore is
designed to define metamodels (in this case, belonging to the structure of the information
models). Ecore contains the same constructs and expressions seen from class diagrams (such
as object, attribute, method, composition, or inheritance), while at the same time does not
bring the additional often overlooked complexity of complete UML models (mainly derived
from the dynamic aspects of the specification, unsuited for defining an information model).
Ecore is a mature language, considered the de-facto standard of MDA based tools, such as the
numerous projects available at the Eclipse Modeling Project.
Ontologies were the other main alternative as the base modeling language, but were
discarded for several reasons. The required effort to model with ontologies the characteristics
of heterogeneous, complex systems, was a deterrent factor. This was aggravated by the lack of
available information model standards defined with ontologies, so an additional effort would
be required if those concepts were to be incorporated into the proposed model. Nonetheless,
there are several research works which show how object-oriented models can be transformed
from / to ontologies with additional formalization of the base models [68], which does not rule
out this possibility. Regardless of the format, clearly the additional knowledge contained by
the management specific ontologies described in the previous chapter must be present in one
way or another in the complete management architecture.
Before providing a detailed information model I will define the management modeling
abstractions that will enable the specific definitions. These abstractions will be applicable to
any management information model, not being specific to model enterprise concepts. The
definitions include a characterization of the basic managed elements, the resources, and its
most important characteristics and relationships.
4.3.1. Resource
As every analyzed management information model builds on the concept of resource, it will be
the first definition of the proposed model: resources are the base unit of abstraction for a
management view. This way, a managed environment is composed by a set of resources,
which comprise all the relevant information for an effective operation of the environment. In a
- 46 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
service management architecture, the most representative elements modeled as resources are
services. However, every other relevant software and hardware artifact, such as operating
systems, containers, dll or jar libraries, TCP ports, processor speed, ram memory, or
peripherals, will also be modeled as resources in the management view of the system.
An efficient operation of resources requires the availability of sufficient information about
each resource state and configuration. In order to model this knowledge I will add a set of
properties, composed by pairs name-value, to characterize the configuration of every
resource. This solution is the most extended in the current management specifications. If for
specific subtypes of resources, one property turned to be mandatory for all the similar
instances, it could be promoted to a named attribute of the class. Some examples of candidate
properties for specification are the status information of runtime resources with a defined
lifecycle, or the version of software resources. Basically, if every similar resource should have
this characteristic the state information should be moved to the class definition. However, I
will try to reduce to a minimum the amount of defined types in order to keep the model
simple and flexible.
A management environment is composed by a very large number of resources, from a
potentially infinite number of them. However, many of those possibilities actually represent
different states of the same resource. For instance, in two different time intervals, a resource
representing the RAM memory is completely identical, except a variation in the property
‘freeMemory’. These two resources represent the same manageable element, a hardware
piece, in spite of having a different value for a property. Clearly, a mechanism for establishing
resource identity, and differentiate unique entities from the infinitely large set of possible
number of modeled Resources is required. The concept of resource identity is managed in the
proposed model with a meta-attribute of every resource, called UID (Universal Identifier),
which will always be different for two distinct resources. This allows determining if two
resources are the same just by comparing their UIDs. I have called it a meta-attribute, as its
value is not independently defined, but is derived from the evaluation of several resource
properties. This way, a resource is completely characterized by the collection of containing
properties, and a specific subset of them also defines the resource identity. The set of
properties which are relevant for the identity will differ from some resources to other. In the
following sections we will provide additional restrictions to the identity concept. Figure 21
provides a graphical representation of the abstractions of the base resource model.
Resource Property
name: string 0..* name: string
id: Uid value: string
properties
- 47 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
service definition and a running instance of the service over a container). This relationship is
analog to the object orientation difference between logical classes and runtime instances.
Logical and runtime resources establish a partition from the complete resource set. However,
there is no intrinsic information which is common to all logical resources neither to all physical
resources, although there will be subsets of each of them and relationships between resources
which can only be applied to logical or runtime elements. Because of that, from now on, I will
specify whether each definition applies to one or both groups.
In addition to that, a base relationship exists between a runtime element and its logical
definition. A runtime element is an instance of its logical definition, and these definitions are
instantiated at concrete places in the runtime structure. However, this does not necessarily
imply that runtime elements contain all the information defined at the logical level. As
resources are abstractions defined for an effective management, there might be invariant
information, that, although relevant for its logical definition, would be redundant as part of the
runtime view of these instances (for instance, information about licenses, and vendors of an
installable library). However, there is a minimum set of common information between a logical
resource and their instances, composed by all the identifying attributes of the logical
definition. These shared attributes allow to univocally identify the corresponding logical
definition of an instance. On the other hand, runtime resources’ UID will be constituted by
additional properties, which allow distinguishing different instances.
The logical-runtime duality will not occur for every managed resource. A dual definition (logical
and physical) of resources only makes sense for elements whose complete lifecycle (including
instantiation to and removal from the environment) is controlled by the management
architecture. There will be runtime resources without a logical definition, as it becomes clear
that non-instantiateable resources (which can’t be added to or removed from the runtime
environment) are only relevant to the management architecture as runtime instances.
However, the opposite is not true. It makes no sense for a management model to include
logical resources without a corresponding runtime realization.
Resource Property
properties
name: string 0..* name: string
id: Uid value: string
Although it is not reflected explicitly in the model, there is another key difference between
these two kinds of managed resources: Logical resources can exist by themselves whereas
runtime resources must be contained by other element of the runtime environment. For one
logical resource defined (e.g., a characterization of an xml parsing library), multiple runtime
instances can exist in the environment. This attribute is the latest factor for defining the
- 48 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
identity of runtime resources: Two runtime resources are actually the same resource if all their
identifying fields share the same values, and their placement in the environment is the same.
Instantiation is an operation that takes as parameters a logical resource, and a host element
from the environment, and creates a runtime resource at that point which is an instance of the
logical definition. I will provide a more formal definition and description of resource hosts in
the following section.
4.3.3. Composition
I have already defined the basic characteristics of the resources, and how the management
information can be adapted to a homogeneous model. However, in many cases, the
expressivity allowed by a set of properties is not enough to model adequately all the
information related to a managed element. Because of that, I will introduce another extended
type of resource, with enhanced expressiveness.
A CompositeResource is a resource which presents its internal management information in a
more structured view, through the use of additional composed resources. The individual
resources are indivisible from the main one. CompositeResources can both be logical or
runtime. As regards instantiation, sub-resources cannot be instantiated individually, but all of
them will be present whenever the main resource is instantiated. It would be theoretically
possible to define CompositeResources whose internal resources were also composite (and so
on), but those cases don’t have a clear application, so they will be substituted by additional
resource relationships. Next figure shows the CompositeResource model.
Resource Property
name: string 0..* name: string
id: Uid value: string
properties
1..*
resources
CompositeResource
Composition is the most basic structural relationship, as the resulting element is a single
resource. However, in a runtime environment there are additional relationships which must be
traced between distinct resources from the environment. The next two sections describe those
two relationships: the previously mentioned containment relationship, and the more general
dependency relationship.
- 49 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
application server, and even virtual node instances are managed by a virtualization manager or
hypervisor. In the remaining of this work I will use the terms ‘container’ and ‘host’ to refer to
the containing elements.
At runtime, a resource host provides an execution context, providing several capabilities and
configuration to the existing units. From a management perspective, the containment
relationship also indirectly enables the operations consisting on adding or removing resources
to the runtime environment (or, to be more precise, to a host of the runtime environment).
After that introduction I will provide a simple definition of the hosting relationship: “A
(runtime) resource A is hosted by other (runtime) resource B if A exists only inside the
execution context of B”. This way, if B is removed A must also be removed from the
environment. The hosting relationship cannot be symmetrical. The containment relationship
can only occur between two runtime resources, as it makes no sense in the logical domain.
After describing those concepts I will translate them to the model. First, it is necessary to
define a resource subtype for the runtime instances, the RuntimeResource. As we have already
mentioned in the previous sections, runtime identity is obtained from the combination of
logical UID, and its containing resource at the environment. On the other hand, a HostResource
is a RuntimeResource, which contains (aggregates in OO terminology) a collection of hosted
runtime resources. This general definition will be specialized in the specific models, where
specific kinds of hosts will only aggregate specific subclasses of resources, as I will detail in the
further sections.
Resource Property
0..*
name: string name: string
id: Uid properties value: string
resources HostResource
RuntimeResource
1..*
The containment relationship defines exactly the identity of runtime resources. I have already
mentioned that two different runtime resources can have the identical logical image and still
be unique. We can state that two runtime resources contained by different hosts are
intuitively different, as each runtime resource can only have one host. In the same host, two
resources are unique if their logical identity is the same.
It must be noted that, although very similar, there is no possible confusion between
HostResources and CompositeResources. Composite resources are indivisible, and they are
treated as a single entity. HostResources can exist at the runtime environment without any
hosted element, and contain a variable number of hosted elements. On the other hand,
CompositeResources always appear with the same set of internal resources.
At this point, I will define an additional runtime element, which has already been informally
mentioned. The Environment is a special kind of RuntimeResource. There is exactly one
instance of this element, and is not contained by any other resource. The Environment has no
- 50 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
logical definition, as it represents the runtime management domain. Every other information
element will be part of a root hierarchy initiated by it. As the hosting relationship is mandatory
for every runtime resource, it changes the topology of the managed environment from a flat
set to a directed tree (a single-origin directed acyclic graph). The root of the tree is the
managed environment, and every runtime resource is part of it under a hierarchical structure.
This concept has already been applied by standards such as SDD (without any specification) or
OMG D&C (Although it is actually implemented by the Target, Node and Resource classes, it is
not defined as a general relationship).
Management environments will have runtime resources which are simultaneously hosts of
other resources, and are hosted by another element from the environment. Figure 25 shows a
sample management view of an enterprise node, modeled as a set of interrelated resources.
Every element is a resource. Darker tone rectangles represent host resources, and the
elements on top of them are the contained elements. We can see in this example several cases
of host-hosted resources, building the intermediate layers of the management tree. Partially
overlapping resources represent parts of the CompositeResources existing at the moment that
snapshot was taken.
Figure 25 Sample model of a managed node with Hosted and Composite Resources
After all those concepts have been defined I can finally provide a definition for Configuration.
Let’s define R as the complete set of RuntimeResources. If is defined as all the possible
combinations of RuntimeResources then is the complete set of possible
configurations of the environment.
An environment configuration is a representation of all the management-relevant
information from an administration domain. It is composed by the collection of
RuntimeResources, classified in a containment hierarchy tree, whose root is the Environment.
- 51 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
bindings
RuntimeResource BoundResource
1..*
Dependencies add an extra layer of complexity to the processes of change management, and
impact assessment. However, they are very frequent in runtime environments so they must be
able to be properly modeled and treated by the management architecture.
- 52 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
can be absent from the runtime instances, this way being evaluated against its logical
definition. For runtime-only resources, that cannot be the case.
The stability condition is composed by a set of validations against the environment
configuration. It can check for existence / non-existence of another RuntimeResource from the
environment, or check that the value of a specific property of a RuntimeResource is within a
defined range of accepted values. The stability condition can be decomposed into a set of
individual checks, each one defining a valid set of values for a property or attribute of any
resource. This way, the stability of a single RuntimeResource from the environment can be
expressed as a single logical formula, checking each stability condition is true for the current
environment configuration.
Each individual stability check is a function which will be evaluated against several variables
from the complete environment configuration. Depending on the scope of the analyzed
environment, the following categories can be identified:
Local conditions only evaluate the properties of the RuntimeResource where the
condition is defined. They are the simplest ones, but allow to express concepts such as
checking that initial settings for an application have been defined.
Hosting conditions are evaluated against any property or attribute of the contained
RuntimeResources. They propagate to additional elements other than the one defining
the check, but they are constrained to the hierarchical runtime resource structure.
Constraint conditions can refer to any characteristic of the containing execution
context, which is composed by the host resource, as well as its own hosts, ending at
the environment. The context provides to the contained resources a set of properties
and resources, where specific elements will be required to appear in order to ensure
the stability of the hosted elements. The sub resources aggregated by composite
resources are also considered in the search scope of a constraint condition, as they are
part of the host resource. However, the same is not the case for other contained
elements of the hosts, which are outside the scope of this type of conditions.
Dependencies are the remaining type of conditions. They can be evaluated against a
characteristic of any resource from the environment configuration, regardless of its
place in the containment hierarchy. It is clear that from the dependency and binding
definitions, any resource can be bound to any other one, which in most cases would
not produce stable configurations, hence the need to define constraints to these
relationships. These restrictions constitute the most important subset, because they
greatly impact configuration management processes.
The next picture shows a generic runtime configuration where several examples of the
different types of conditions are shown. Each shape represents a stability condition, and the
stars indicate the configuration element it is being validated against. Different shapes (square,
triangle, diamond and circle) identify the different types of relationships. The shaded
background boxes represent the environment configuration subset where each specific
condition can look for values. Dependency checks have a shared search space, which
comprises the complete environment configuration.
- 53 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Local
Constraint
Checked element
Hosted
Dependency
Resource
Resource
Runtime
Runtime
Resource
Resource
Runtime
Runtime
Res
Host Resource Host Resource
Environment
In the case of CompositeResources and HostResources, the stability condition must be true for
every participating element, including the main resources and all the containing / composed
resources. If we apply recursively that operation to an environment configuration we obtain
the definition of the stability of the complete environment. An Environment is stable if and
only if all the runtime resources which are part of its current configuration are stable. This way,
a single formula can be defined for validating the stability of an environment configuration,
composed by the validation of each condition defined in its resources.
It must also be mentioned that RuntimeResources are stable by default. Therefore, resources
without stability conditions will be stable. This can be expressed assigning them the following
stability function: . Because of that, these resources can be ignored at the
complete stability function of the environment, although they may still satisfy the conditions
by the rest of runtime resources.
- 54 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 55 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
This way, for management purposes, the only relevant runtime resources are the elements
from the current configuration.
For the space of logical resource definitions it is necessary to define the finite subset of
relevant elements. A LRB (Logical Resource Base) is a subset of the potentially infinite space of
logical resources, which contains only the available resource definitions of a management
context. As it is the case with the environment configuration, management processes can only
reason with the logical elements defined at the LRB. Similarly to configurations, the contents
of the LRB can also change over time, by factors external to the management system.
The definition of configuration and LRB as the only relevant resources constitutes the first
heuristic for addressing the inherent complexity of the problem. The outcome of every
management operation will be only based on the LRB and the current configuration. These
concepts are assumed in every management system, but have also been explicitly defined
along the proposed information model.
- 56 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
same type, they will share a set of properties, although it is not enforced in the way sub
classing would do.
Resource
Property
name: string 0..*
name: string
type: rType
properties value: string
id: Uid
The type definition is applicable to both logical and runtime resources. Moreover, as resource
type is an identity property, the same type must be shared by logical resources and runtime
instances. This is a key concept that enables a simultaneous reasoning over logical and runtime
resources.
After providing the base definition of the concept of types, I will detail how that information
will be modeled. Resource types are of the kind rType, which allows to support the concept of
hierarchical resource classifications. Resources can only have one value for its type. However,
depending on the degree of detail applied in its identification, a resource could have many
different types (e.g., a windows service and a database are clearly both software resources,
but their actual characteristics are not quite the same, implying they belong to different types).
In order to enable a greater expressivity, types aggregate hierarchically, some of them being
further restrictions of already existing ones, which allows resources to belong simultaneously
to multiple subsets. This way, types achieve a complete hierarchical clusterization of the
complete space of resources. For the remainder of this document, I will represent types as
Strings, with a format similar to Java packages. E.g., the type [Link] is a
specialization of the type [Link].
sw
[Link]
[Link]
[Link]
A = [Link] B
B= [Link] [Link] A
C=[Link] C
D=[Link]
[Link]
D
Types also allow restricting the set of potential configurations by defining a type taxonomy
which declares the supported types by the architecture. It is clear that the process of defining
that taxonomy cannot be automatically extracted with any algorithm, being defined instead by
an analysis of the domain by experts, performed at the adequate abstraction level of the
- 57 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 58 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
MIN(p, val)
Description: Checks that the value of a resource property is over a minimum
accepted value
Checked element:
Additional arguments:
Formula:
MAX(p, val)
Description: Checks that the value of a resource property is below a minimum
accepted value
Checked element:
Additional arguments:
Formula:
- 59 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
ATT(p, val)
Description: Checks that the value of a property is equal to a expected value
Checked element:
Additional arguments:
Formula:
SEL(p, range)
Description: Checks that the value of a property belongs to a set of accepted
values
Checked element:
Additional arguments:
Formula:
In order to collectively refer to the four defined base checks I will define the local check LOC(p)
which can be either of them.
The local checks allow restricting the values of the resource’s properties. In order for a
resource r to be locally stable, every local check must be met at the same
time. This way, the local stability function has the following shape:
I will also define the complementary set, which is the set of runtime resources directly or
indirectly contained by a HostResource. Starting from the designated host, it can be seen as a
sub tree of the complete configuration hierarchy. That set of resources will be known as
descendants from this point onwards, taking the name from tree graph theory.
Constraint restrictions defined by a resource can be satisfied by any resource from its
execution context, without the need to explicitly reflect in the model which one will be
- 60 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
satisfying the constraints. However, before providing additional restrictions on the identified
resources, it is necessary to provide the means for defining these resource identification
functions.
In order to do so, I will introduce base checks for expressing constraints over a resource
identity. I will first define the general check, which potentially analyses every identifying
property of the resource. More specific functions can be defined at a later stage which only
use some concrete identifying fields to make the matching.
RESID(r, idCheck)
Description: Checks that the resource identity belongs to the identified subset
Checked element:
Additional arguments:
Formula:
The local checks previously defined can also be used to further restrict a resource filter defined
through a RESID function, by imposing additional requirements over the resource properties.
However, there are many unstable situations that cannot be expressed with the current
functions. The local base functions define checks that are evaluated independently. They don’t
allow representing constraints over the access to shared resources, where there is competition
for consuming a limited resource (possibly partitioning them, such as RAM memory, or
completely locking them, as a TCP port from the machine). Those situations match the
Quantity (of value 1) and Capacity D&C satisfier kinds, which were previously mentioned.
Both identified cases will be supported by additional base checks. First, I will define a new
element for allowing the expression of D&C Capacity restrictions. The special characteristics of
this function imply that it can’t be individually evaluated at the resource constraint analysis,
but must be collectively evaluated at the hosting level, as each new definition can alter the
verdict on the previously processed restrictions on the same property.
CAP(p)
Description: Checks that the value of a property is large enough to be used by all
consuming resources, each of them requiring a quantity cons(p)
Checked element:
Additional arguments:
Formula:
- 61 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
As the second competitive check, instead of adopting the D&C Quantity function I will provide
a check for demanding exclusive resource reservation (e.g. TCP port process bindings). This
alternative check has several advantages over the D&C concept. The restriction is expressed
over the complete resource, instead of just on the value of a specific resource property of the
resource. This simplifies the modeling of these real concerns as it is no longer necessary to
define a virtual property named quantity with an integer value of 1, which in the proposed
approach is simply substituted by the existence of the RuntimeResource.
EXCL(r,rid)
Description: Checks that the identified resource is demanded exactly once
Checked element:
Additional arguments:
Formula:
The available functions allow detailing what must be present at the execution context for a
runtime resource to be stable. However, those constraints cannot express incompatibilities
between two resources, in other words, what must NOT appear at the context if the
demanding resource is instantiated. This concept is present in some of the analyzed standards
– they are named ex-requisites in the SDD standard. Because of its relevance I will define it as
another base function, defined on top of RESID.
NOT(r, rid)
Description: Checks that the identified resource is not present at the execution
context of the mentioned resource
Checked element:
Additional arguments:
Formula:
After the base concepts have been established, it is now possible to define the constraint
stability function of a runtime resource. However, before that it is necessary to define how
that information can be added in the model, as the conditions cannot be implicitly derived
from the current information. I will use the term ConstrainedResource for referring to the
RuntimeResources which either define constraints or have a corresponding logical element
which defines them. Both logical and runtime resources can define constraints, but only
runtime resources can be analyzed for stability. Resources can declare any number of
constraints.
- 62 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
constraints
propCons 1..*
PropertyConstraint
Constraint
name: string
id: resid
kind: PropertyConstraintKind 0..* kind: ResourceConstraintKind
value: string
PropertyConstraintKind
ResourceConstraintKind
Min = 0
None = 0
Max =1
Not = 1
Atr = 2
Exclusive = 2
Sel = 3
Cons = 4
A constraint definition is composed by two parts. One mandatory part identifies the resource
to be evaluated, which must belong to the execution context. Optionally one or more
constraint checks can be defined over the identified resource, allowing to express property
restrictions pCons (including both local checks and CAP checks), exclusive consumption of the
resource or incompatibility with it. With that information into account, we can define the
stability function corresponding to each check with the following function:
- 63 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
It has been described how the resource type completely defines the group identity information
of a resource. Therefore, checking the type information will be enough to determine
compatibility of hosted resources. The need to restrict the potential guest resources can be
clearly seen: with no limitations, any RuntimeResource can appear over any HostResource.
However, it would make no sense for an operating system (a HostResource) to host a physical
node. With this in mind, the hosting stability condition will be based on checking the types of
the resources to be contained.
Figure 31 shows how that restriction has been captured in the model. HostResources will
specify an additional property, defining the supportedTypes of the contained resources. The
property includes a list of compatible resource types. This way, the contained resources do not
define the condition, it is their types which will be validated. As the proposed definition of
inheritance for specific models builds over the base resource type hierarchy, the same
restriction can be applied to the specific subclasses, which will share the same base type.
RuntimeResource
1..* resources
name: string HostResource
type: rType
Id: Uid supportedTypes: rType[*]
properties: Property[*]
In order to support automatic validation of the hosting constraints, I will first define a specific
resource identity check for verifying that a resource belongs to a set of accepted types. As
types define a simple hierarchy, the check will be true whenever the type of the evaluated
resource is contained into the one provided as a parameter:
RESTYPE(r, typeCheck)
Description: Checks that the resource belongs to a type / subtype
Checked element:
Additional arguments:
Formula:
The provided definition of the supported types host stability check can be easily validated, and
can be applied at different management levels and heterogeneous resources, thus helping to
- 64 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
reduce the management complexity while capturing at the same time the underlying
conditions on the configuration stability. It has been defined for every host resource with the
property supportedTypes, thus avoiding the need to define it explicitly for each element.
Locally visible resources can be accessed by other resources which share the exact host
as the providing one.
Host visibility resources can be accessed only by the directly contained resources of
the host, but not by their own descendants.
Context visible resources are accessible to any resource belonging to the execution
context of the providing resource.
- 65 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Regarding composite resources, the visibility of compound resources is evaluated starting from
the main resource. The following picture shows examples of these four visibility types in a
sample environment. The small figures depict runtime resources, with the type of shape
showing the visibility of the resource. The subset of the environment where that resource can
be accessed is showed by a square frame of the corresponding tone. Environment-wide
resources have a shared scope equal to the complete configuration.
Local
Host
Context
Environment
Resource
Resource
Runtime
Runtime
Resource
Runtime
Resource
Resource
Runtime
Runtime
Resource
Runtime
Resource
Environment
- 66 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
After introducing the concept of visibility to the resource model, it is very simple to define the
visibility base check:
VISIBLE(r, s)
Description: Checks that one runtime resource can access another resource
Checked element:
Additional arguments:
Formula:
Resource R Resource S
b
p q
The concept of bound configuration can be defined by using the existing base checks. If the
value of property p of the dependant runtime resource r is bound to the configuration value of
the property q from the bound runtime resource s through the binding b, this check can
expressed as ATT(r.p,[Link]).
In order to explicitly represent those restrictions in the model I will extend the concept of
DependentResources. These elements define a variable number of dependencies, each of them
composed by a binding to another RuntimeResource, and a resource identification which must
be met by the bound runtime resource. In addition to that, these elements can define
- 67 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
BoundProperties, whose value will match the value of a property of the bound resource, whose
name is specified by the BoundProperty. These concepts are illustrated at the following
picture.
DependentResource
boundProps
dependencies
0..* 1..*
BoundProperty Dependency
ref: string id: resid
binding
binding
Property
name: string properties RuntimeResource
value: string
0..*
After defining the required base functions to completely evaluate dependency stability, I will
provide the general function. As it was the case for constraints, first each dependency
definition must validate the identity of the bound resources. Visibility will also be evaluated.
Finally, the configuration of the bound properties will also be verified. Taking into account
those factors, the dependency stability function of a resource can be obtained as follows:
Let it be r a RuntimeResource with defined , and each binding with
defined ,which are bound to the properties qj. of the bound
resources.
- 68 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
integrated into dependency expressions in order to further restrict the resources verifying
resource identity.
Table 1 Participating Primitives of Stability Functions
- 69 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
different (at least the version will change), but they have a strong shared identity, further than
the likeness reflected by sharing a type. This way, starting from the set defined by all the
resources of a common type, additional subsets can be identified by grouping resources with
the same name. Each resource from that subset will be at least different from the others by its
version. Unfortunately, more than type and name, it is not possible to establish exactly what
set of properties is shared between resources of two different versions. Moreover, the
commonality will possibly change between two different pairs of resources taken by the same
subset. It must be noted that, as an identity property, version is a common aspect of logical
and runtime resources, and will be shared between logical definitions and runtime instances.
The combination of name, version, and type provides a framework strong enough for
establishing a managed resource identity, built upon the abstractions from the main standards.
No additional properties have been detected which influence the identity, so these three will
be the only ones evaluated. However, it must also be considered that the concept of version,
although prevalent, is not defined in every resource from the management domain. In those
cases the identity definition of those resources will be determined just by its type and name;
the latter one enclosing the individual information of the resource.
Figure 36 shows the updated resource model and the extended type for representing
resources with version information. VersionedResources have a version attribute, of type
rVersion, for identifying members of a family of resources with similar characteristics. For
VersionedResources, version must be evaluated as part of its identity, as two
VersionedResources with the same name and type, but different versions are granted to be
separate entities.
Resource Property
name: string 0..*
name: string
type: rType properties value: string
VersionedResource
version: rVersion
The exact format of the rVersion type can’t be comprehensively defined at this abstraction
level, because the actual format of versions differs depending on the specific domain of the
work, the characteristics of the resources (its type), or the company policy, each concrete
format shares some key characteristics. A thorough analysis of the main versioning strategies
and its most important differences is presented at [12]. From that comparison it can be
extracted that any version space can be represented as an acyclic graph. As an additional
restriction, I will consider the domain of possible rVersion values to be a partially ordered set.
This way, version ranges can be defined, which are intervals representing subsets of the
complete rVersion, defined by lower and upper limits. When referring to intervals, I will use
traditional mathematical notation to represent version ranges, with the form
[(lower_bound,upper_bound)].In the following definitions, I will use the following notation to
specify that a version v is included in a VersionRange range :
- 70 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
In order to close this heuristic I will provide an updated definition of the RESID base check,
detailing all the possible checks on the three definitions.
RESID(r, idCheck)
Description: Checks that the resource identity belongs to the identified subset
Checked element:
Additional arguments:
Formula:
The previous sections have defined a solid foundation for defining information models,
complemented with several heuristics which address the potential complexity of the
management activities. The proposed abstractions allow characterizing the runtime
information and its structure, as well as the logical assets related to the system. On top of the
elements definition, a series of checks have been defined which can be evaluated to determine
the stability of a configuration. The concepts, abstractions and heuristics proposed are not
specific to enterprise environments, being applicable to any management information model.
Over the remainder of this chapter, I will propose a specific information model that builds
upon these established concepts and applies them to the domain of enterprise services
configuration and deployment. As it is the case with the base abstractions, the specific model
will also be aligned to the base structural aspects of the D&C, CIM, SDD and WSDM standards
which were described at the State of the Art analysis.
- 71 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Building on those base concepts, I will detail the fundamental elements of the management
information model. Starting from the concept of resources previously described, I will identify
special classes of resources which constitute the main domain concepts for the management
of distributed enterprise services. The model will cover both runtime elements, describing the
state of the management environment, and logical elements, representing the configuration
resources and characteristics which can be instantiated by the management architecture. Both
definitions will build upon the final iteration of the resource concept as it was depicted in
Figure 32.
I will start by defining how to characterize the central elements of the management
architecture: the services. The logical information model defined in the next section defines
how they are represented according to the base modeling abstractions, so that both their main
characteristics and their requirements for working correctly at the runtime environment are
sufficiently represented. Once these concepts have been properly established, I will propose
the specific information model for the elements of the runtime environment. The runtime
model must support the definition of distributed systems, containing several types of servers
(e.g. application servers, business rule managers, database management systems and
enterprise orchestration services), as well as the structure of the environment, and the
available characteristics provided as the execution environment for the services. The specific
model will build upon concepts such as resource identity and types to be applicable to
characterize the huge variety of environments present in this domain, without the need to
define a specific subclass for each new supported resource.
- 72 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Version has been defined at DeploymentUnit level instead of at the base Resource level
because, as it was mentioned during the definition of the version identity, it cannot be applied
to every resource (e.g. a TCP port, or a graphical card). However, it must be supported for the
base Resources which require version information. For those elements, instead of defining a
specific subclass which would complicate the model, it will be enough to define a resource
property named version. This way, restrictions over the Resource version will be evaluated
with this property.
Resource
Property visibKind
name: string 0..*
type: string name: string local = 0
visib: visibKind properties value: string host = 1
context = 2
environment = 3
0..*
exportedResources ContextAwareProperty
expression: string
DeploymentUnit
version: String
propertyConstraintKind
MINIMUM = 0
dependencies constraints MAXIMUM = 1
ATTRIBUTE = 2
0..*
0..* SELECTION = 3
Dependency CONSUMPTION = 4
Constraint
name: string
name: string
type: rType
type: string constraintKind
versionRange: String
kind: ConstraintKind
id: int none = 0
not = 1
boundProperties exclusive = 2
propConstraints
0..*
0..* PropertyConstraint
BoundProperty name: string
providerName: string kind: PropertyConstraintKind
dependantName: string expression: string
- 73 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 74 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Constraints are declared in the DeploymentUnit definition with a similar mechanism. A unit can
define one or more Constraints, each one of them requiring a specific resource to be present
(or not) at its runtime execution environment. Each Constraint identifies the valid resources,
by defining a filter consisting of name and type comparison. Version restrictions are not
included as in the general case environment resources do not contain version information. The
main difference between Dependency and Constraint identification lies in the search space.
Constraints can only be satisfied by the resources from the execution context of the
instantiated DeploymentUnit (that is, the container, the node, or the environment).
The Constraint model allows a further refinement of the resource identification by also
expressing restrictions over the properties of the identified resource. These additional
requirements are represented by PropertyConstraint elements, which follow the base checks
defined in the generic model, based on the defined restrictions of OMG D&C model [83].
PropertyConstraints simply declare the name of the property, the kind of evaluation to be
done (minimum, maximum, attribute, selection, consumption), and finally the expression value
to be compared with (e.g. a typical Constraint would identify a resource of type
“[Link]” with an additional PropertyConstraint over the property “speed” of kind
“minimum” and expression value “2000”). In case it was considered necessary, this mechanism
can also be used to add version identification to a Constraint specification.
A Constraint belongs to one of three different types, which define the relationship between
the requiring unit and the environment resource, determined by the ConstraintKind attribute.
DEFAULT Constraints require that the identified resource be present at its execution
environment. On the other hand, EXCLUSIVE Constraints further restrict this relationship by
demanding that no other unit can access this resource. Finally, NOT Constraints define
incompatibilities, meaning that the identified resource must not be present at the execution
environment for the DeploymentUnit to be stable.
- 75 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
marks the name of the local Property whose value will be automatically configured and the
provider field indicates the name of the source property for configuration. The next picture
shows an example of this automated configuration. If unit A contains a dependency D that is
satisfied by the unit B, and D contains a BoundProperty with “dependant” p, and source “q”,
the resulting configuration operation would be: [Link] = [Link]. As dependencies can be
transitive among several units it is possible that several BoundProperties are linked, with the
value being propagated over the dependency chain. In those cases, it is important to apply
them in the correct order, in order to transmit the original value over the dependant elements.
DeploymentUnit
Dependency satisfied by name: B
Property q: valueOfQ
...
DeploymentUnit
BoundProperty
name: A
Property p: providerName: q
... dependantName: p
A.p = valueOfQ
In addition to BoundProperties, there are also several configuration aspects which can be
automatically extracted from the execution context where the unit has been instantiated.
These configuration aspects are supported in the model with the ContextAwareProperty
concept. Although the specific elements of the runtime environment model will be defined in
the next section, it is clear that the environment model will contain runtime resources which
represent the nodes and containers where DeploymentUnits will be instantiated. Those
elements contain in their description the relevant context information that will influence many
aspects of the unit and the provided services. As an example, any remote service running at
the environment, will have a connection URL which depends on aspects such as the IP of the
physical node, or the service port which is reserved by its container. By providing a mechanism
to identify and obtain those context-dependant configuration parameters, the part of the unit
configuration which depends on the context can also be automated. Before simply explaining
how this kind of automated configuration has been integrated in the model, I will briefly
discuss the most relevant concerns which must be taken into account.
Up to this point, resources have been considered first on an individual basis, and later on
linked through several relationships (composition, hosting, and dependency). However, In the
case of enterprise environments, there is a certain type of configuration which should be
shared by every resource of the environment, or at least a hierarchic slice of it. I will refer to
this type of configuration as Global configuration. This concept will be supported in the model
by means of value inheritance from the properties defined in the execution context resources.
This way, Property values from parent resources of the hierarchy can be accessed by runtime
unit instances and become part of its configuration. This concept allows representing both
- 76 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
global configuration parameters, which are applied over the whole environment, or specific
context information, such as the base network address of all the node resources.
The inheritance of property values over the containment hierarchy can potentially lead to
conflicts, in cases where several resources from the hierarchy define a Property with the same
name (although in real scenarios this situation would rarely occur). For avoiding model
uncertainty, in those situations the more specific value of the Property will always take
precedence over the ones appearing farther in the resource hierarchy.
In addition to that, simply substituting the value of a local Property with another one from the
context can prove to be too limited to solve the needs of context-based automatic
configuration. As an example, in order to construct the serviceURL of a deployed Web Service,
it is necessary to retrieve both the IP and the listening port from the execution context of the
deployed service. Both values must be combined in order to obtain the required configuration
value. These requirements have been supported by the model by defining a specific element
which can be automatically configured based on its execution context. This approach was not
extended to BoundProperties as no scenarios were identified where the property combination
aspect would be necessary.
A ContextAwareProperty is an extension of the base Property class which can only appear in
DeploymentUnit definitions (either in the base unit resources or in the properties of the
composed resources). On top of the inherited name and value fields, these special properties
contain an expression that mandates what the Property value should be depending on all the
context information. For establishing the syntax of these expressions and the computing
approach, the string variable substitution [31] model supported by Ant [46] has been selected.
Ant defines a mechanism for property automatic derivation, including inheritance from
multiple files, and the same conflict resolution mechanism described earlier. The similarities
between the two concepts clearly indicate the feasibility of adopting this approach for context
aware expressions; Ant properties are syntactically identical to the properties of our model. On
top of that, in many cases they reflect the established global configuration, the current context
information for a specific operation, which is similar to the execution context, but without the
multiple nested scopes present in the model case. Finally, the domain of application of Ant
scripts covers deployment and configuration operations, thus these abstractions have already
been successfully applied to enterprise service management.
An expression is a string template, which represents the final value of the property. Inside the
expression one or more variables will be defined, identified by the character sequence
${variable}. There is a minimum of one variable for each expression (else it should just be a
regular property, as there would be no context dependence), and no established maximum.
The name of each variable identifies the name of a Property which will be present at the
execution context of the declaring unit. As an example, the context-based configuration of a
“serviceUrl” previously described, could be automatically supported by defining the following
expression: ”[Link] If instead of a composite configuration
a unit simply requires a global configuration attribute it can be retrieved by defining the same
property with the value: commonLogChannel = ${commonLogChannel}.
The following picture shows an example of the complete process for context-based automatic
configuration. The topmost resource represents an instance of a DeploymentUnit, which
- 77 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
contains in its logical definition the shown expression for automatically configuring the correct
value of its serviceUrl property. In order to resolve the two variables (ip and servicePort), the
Property names of the unit execution context (in this case, composed by the remaining two
resources) are examined. As there are no name conflicts in this simple scenario, the value from
servicePort is obtained from the container resource, and the value for ip is obtained for the
node resource. After the process has been completed the value for the automatic
configuration is obtained, as it is shown in the picture.
Resource
Type: runtime unit
serviceUrl: expr serviceUrl = [Link]
...
resolution
Context
Resource
Type: container
servicePort: 8081
...
Resource
Type: node
Ip: [Link] serviceUrl = [Link]
...
The defined DeploymentUnit model allows a rich characterization of both the deployment-
relevant elements (a deployment unit, its configuration, its requirements to run correctly), as
well as the design-time artifacts provided (services, libraries, components and so on). Because
of that, it acts as a bridge between the software architect view, of services, and the runtime
stability constraints which must be manually managed by IT administrators.
The following section complements this logical view with the runtime environment model,
which defines the runtime resources constituting the environment, ranging all the way from
the general environment to the instantiated deployment units.
- 78 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
components is one of the most complete ones, detailing the possible transitions and states of
their management elements. The left-hand side of the next picture shows the complete
lifecycle adopted by this specification:
install
install
Installed
Stopped
stop
Uninstalled Resolved uninstall
uninstall
start
Uninstalled start
stop
Active
In an OSGi container each unit beings its existence with the installation operation and can
potentially go through six different states. Transitions with actions on them and solid lines are
initiated by the manager, whereas transitions with dashed line arrows are automatically
processed by the framework when certain internal triggers occur (e.g., once all internal
dependencies of a newly installed unit have been satisfied, the framework changes its state
from installed to resolved).
From a management perspective, transition states such as starting or stopping are only
relevant over a short period of time, after which a stable state will be reached. Moreover, the
manager does not control these transitions, only being able to operate on the final states.
These fine-grained details are not present at the lifecycles of other platforms. However, the
fundamental notions of installation, activation, active and stopped states, are a well-accepted
base for the great majority of enterprise containers. Because of that, the general lifecycle for
the RuntimeUnits of the defined model will be the one defined at the right-hand side of Figure
40. This state diagram is in fact identical to the OSGi model, after the removal of the transitory
states. RuntimeUnits are stopped when initially installed, and must be activated for them to
perform their functions at the environment. In addition to that, every management operation
which modifies the unit configuration must be applied with the unit in the stopped state, as
‘hot configuration’ is not supported by every type of services container. This way, a
reconfiguration activity over an active unit will be preceded by a stop command, and followed
by a start command.
- 79 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
globalResources
0..* properties
0..* nodes
1..*
Resource
Node
name: string 0..* nodeResources
type: string
visib: visibKind
0..* 0..*
containers
exportedResources containerResources
0..*
RuntimeUnit Container
0..*
version: string supportedTypes: String
state: State units version: String
bindings
The root element of the runtime model is the Environment. An environment must have a
uniquely identifier for its base name, as one management system can manage several
environments, which are completely independent among them. The environment is composed
by at least one resource with computing power: a Node. In addition to that, it can also provide
global resources which are available to every application in the domain (such as an LDAP
- 80 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
authentication provider). These resources are not directly controlled by the management
architecture, but they appear in the management view as they can satisfy unit Constraints.
In addition to those global resources, the Environment object defines the current global
configuration through a series of properties. These elements, which were briefly mentioned
when detailing the context-based configuration capabilities of the logical model, are
mandatory for the internal management of enterprise infrastructures, as they simplify the
configuration of several aspects by making then common for the complete environment.
Deployment units which require those values to work correctly simply declare them as
ContextAwareProperties.
Nodes are the basic elements of the environment model, representing resources with
computing capabilities. They are specializations of resources that in the general hierarchy are
directly hosted by the environment. Nodes are directly or indirectly connected to the rest of
the network, having access in principle to any other resource from it (unless the visibility does
not allow it). A Node resource comprises all the hardware, firmware and low-level software
layers of the device (such as the operating system), abstracting the specific components,
libraries, communication channels and devices as nodeResources. On top of that substrate, a
node hosts any number of Containers. The name of the node must be unique over the
environment, allowing an univocal identification of the contained elements. For the purposes
of service management, there is no difference between real physical nodes and virtualization
nodes.
A Container is the base execution platform for DeploymentUnits; the specialized resources
where units are instantiated. Containers have a name, which must be unique over the
environment, and a container-specific type (examples include “[Link]”
or “[Link]”). In addition to that, containers must be versioned, as they are
software elements. The same reasons for versioning DeploymentUnits also apply to these
elements. Containers provide a set of platform services to the host units, which are expressed
as a set of container resources (in a typical application server these resources would be
Datasource connections, JMS queues, or external system connectors). Additionally, container
properties contain additional configuration details, such as the service port of an http server.
The main function of Containers is to host the runtime instances of the DeploymentUnits.
These instances are explicitly represented in the model as RuntimeUnits. However, clearly not
every unit can be instantiated on any Container from the environment. It is necessary to define
a mechanism for matching compatible units and containers. This is managed at Container level
through the supportedTypes attribute, which specifies which types can have the deployed
RuntimeUnits. This information is equivalent to the hosting stability conditions which were
previously mentioned at the general model. In addition to that, more specific requirements for
Container compatibility can be declared by the DeploymentUnits, through the definition of
Constraints for the container resources.
The defined runtime model restricts the possible resource host relationships by specifying
resource subtypes as the only valid nested elements. This way, Nodes can only be hosted by
the Environment, Containers are always hosted by Nodes and RuntimeUnits are provisioned
over the Containers. However, there are some enterprise environments were containers host
additional containers (e.g. a domain where the system manages both operating system-level
- 81 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
packages, and web applications deployed over an application server, or another one where an
OSGi container is deployed as an application on top of the regular application server).
Although initially it may not seem the case, those scenarios can be supported by this model, by
means of defining both Containers to be hosted by the Node containing the first. Although this
management view does not exactly represent the physical topology, at service-based
management it would be equivalent. Avoiding the support to recursive hosting of the same
type greatly reduces the space of potential configuration, consequently simplifying
management operations.
As it has been mentioned, the instantiation of a DeploymentUnit in a Container from the
environment creates a RuntimeUnit. The identity information of the logical definition (name,
version and type) is also shared with all the RuntimeUnit instances. As it is the case with logical
definitions, the RuntimeUnit internal information is described through properties and exported
resources. RuntimeUnits also have a state, which follows the previously defined lifecycle.
In addition to the information provided by the DeploymentUnit, these elements also included
attributes for specifying the Constraint and Dependency stability constraints of the instantiated
RuntimeUnits. That information is not replicated at the RuntimeUnits as it is already present at
the logical repository, and thus it can be retrieved without unnecessarily increasing the
footprint of the complete environment information. Constraints must be supported by the
execution context of the RuntimeUnit, but no additional information must be included in them.
On the other hand, as regards Dependencies, it has been already mentioned that they can
potentially be satisfied by multiple units. As each logical definition can appear at the
environment multiple times, it is clear that the number of possible satisfiers for each
dependency increases. However, at the runtime level, only one RuntimeUnit can satisfy each
Dependency. This way, if no additional information is defined, it is not possible to know exactly
how the Dependency is satisfied at the environment. As this knowledge is necessary in order to
correctly evaluate the impact of changes to the environment, it will be explicitly reflected in
the model.
The RuntimeUnit model encapsulates how dependencies are satisfied as a set of Bindings to
other RuntimeUnits, which are the realization of the logical requirements. Each logical
Dependency must result in a runtime Binding to another unit, both of them sharing the same id
in order to identify the corresponding dependency definition of each one. Each Binding
contains a reference to the satisfying unit. Multiple Bindings can be defined to the same unit,
each one because of a different Dependency. Finally, it must be clarified that Binding
relationships can only occur (both source and destination units) whenever the unit is at the
active state. Bindings improve the obtained information from the environment, defining an
overlay structure over the runtime containment hierarchy. This is vital to correctly apply any
changes over already existing RuntimeUnits.
As it was the case in the general resource model, the structure of a runtime environment
configuration is a hierarchical resource structure. This specific model streamlines the hierarchy
through the use of resource subclasses. This presents a clear management view of the
environment, with specific elements representing the key concepts of the system (nodes,
containers, units). The structure does not allow the recursive hosting of any of the elements,
which further simplifies the model. The next figure depicts a tree representation of a
- 82 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
configuration, where the different resource subclasses are allocated in the respective layers.
This view of the managed elements is very simple, although it must be mentioned again that
there is an additional dependency overlay defined by the RuntimeUnit Bindings. The
combination of both types of runtime resource relationships enables to estimate the real
impact of changes occurring to an element of a distributed service architecture.
- 83 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
CResourceConfig
properties Property
name: string
type: string name: string
0..* value: string
supportedContainers
1..*
ContainerIdentifier
Type: String
versionRange: String
The Figure 43 shows the main characteristics of this new element. A CRC is a logical template
that can be instantiated to add a resource to an existing runtime container. The CRC contains
several attributes which will be shared by the resources created based on it: the type of the
created resource, the initial set of properties, and optionally the resource name. The name is
optional as there are numerous cases where more than one resource of the same type can be
present at the same container (such as Datasources). In those cases the name differentiates
one resource from another, and will be provided as an additional argument of the operation.
Finally, the CRC must identify the set of compatible containers, where the specific instances
can be created. This is declared by elements of the ContainerIdentifier type, which similarly to
Dependency definitions, filters compatible containers by matching their identifying properties,
in this case the name and the version (with the possibility of defining a range of compatible
instances).
After the models of both the logical definition and runtime instances of deployment units have
been defined it is important to clarify exactly the relationships between them. These
connections have been mentioned over the previous paragraphs but they will be clarified at
this point because of its importance for management activities. The next picture shows a side
by side comparison of the two models, detailing the structure of each element, and the
correspondence between the fields. In order to explain these relationships I will describe them
from both perspectives (first from the logical side and then from the runtime side).
- 84 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
From the perspective of an existing RuntimeUnit, it is also fundamental to be able to trace back
to the original definition, in order better diagnose existing runtime instances. In order to do so,
the trio of identifying properties allows retrieving from the logical definitions space the
complete information about the unit. The Constraints and Dependencies from the logical
definition will be taken into account in order to ensure the stability of current and future
configurations.
LOGICAL PHYSICAL
Deployment Unit Runtime Unit
Name Name
y
tit
Version Version
en
id
Type Type
Properties Properties
Resources Resources
Dependency Binding
id id
name
resourceRef
type restricts
versionRange
Constraints restricts
Container
- 85 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
the logical and runtime models. The definition of two models also enables a separation of
concerns with the defined information: the stability concerns are expressed as logical
definitions, and the actual configuration values of the environment are contained at the
runtime model.
Finally, the defined model also enables a partial automation of the configuration of the
managed units and services. On one hand, global configuration values can be established and
retrieved from the environment information, reducing the repetitive tasks of applying them
specifically to each element of the runtime configuration. On the other hand, the concepts of
context aware properties and bound properties allow further automate the detailed
configuration of the runtime units, deriving the actual configuration values from the
dependant and required elements which are part of the environment.
- 86 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 87 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
the desired objectives. This way, declaring the functions which must be provided by the system
can be expressed by requiring that certain Resources appear at the runtime environment.
The formal definition of these additional checks is provided at the following paragraphs.
Because of the duality between logical and runtime resources, separate checks have been
defined to allow referencing either type.
EXISTS(rr), EXISTS(lr)
Description: Checks that the identified resource is present at the runtime
configuration
Checked element:
Additional arguments:
Formula:
NOTEXISTS(rr), NOTEXISTS(lr)
Description: Checks that the identified resource does not appear at the runtime
configuration
Checked element:
Additional arguments:
Formula:
Similarly to the search space reduction applied to the logical resources set, after defining the
general set I will try to characterize a more manageable subset of O. Although O contains all
- 88 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
the possible objectives, clearly only a fraction of them will be relevant for the managed
environment. I will call this relevant subset Defined Objectives, . DO contains all the
high-level policies and objectives which must be satisfied by the environment configuration.
The composition of DO can change over time, but as the mechanisms and strategies to modify
the contents of this set are the subject of the higher level management domains, I will consider
this set not modifiable for the management architecture, analogous to the LRB.
However, unlike LRB, not any subset of objectives constitutes a valid DO. It is possible that
some objectives from DO are inconsistent among themselves, or with respect to the
environment, invalidating the applicability of the correctness function. At the very least, the
two following conditions must be met by the DO members:
For each objective , the identified resource must either belong to the runtime
configuration or be defined at the LRB. Otherwise can never be evaluated against the
current configuration (because the resource is not even defined), rendering the objective
meaningless.
The DO cannot contain two objectives with contradictory conditions, which cannot be possibly
true simultaneously. This can occur when two conflicting conditions are defined as objectives
against the same resource. E.g. EXISTS(x) and NOTEXISTS(x). Clearly, if that were the case, the
correctness function would be wrongly defined.
Once the base checks and functions have been defined, it is possible to define the Desirability
formula of a managed environment, which will simply consist of evaluating every objective
defined at the DO:
- 89 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
C
SC
DC
CC
Because of the relevance of the set of correct configurations, I will provide a way to directly
identify its members. In order to do so I will define the correctness function, which when
applied to an environment configuration and an associated DO, returns true if the
configuration is at the same time stable and desirable. As those conditions have already been
defined as functions, the correctness formula will simply be a conjunction between those two.
The correctness function can be easily evaluated, as it is composed by stability and desirability
functions which can be in turn decomposed into atomic base checks, which can be evaluated
individually. This way, the complete correctness formula will be evaluated to true if and only if
each and every base check is evaluated to true.
From this point on, the management Domain will be the element of analysis for the
management system. A Domain can be evaluated for correctness, as DO defines the desirable
objectives, the LRB contains the stability requirements of the participating resources and the
elements of C0 will be the subject of the evaluation.
- 90 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Once the role of the managed environment has been defined, and expressed as an evaluable
function, it is possible to define the role of the management system. The objective of the
management system is keeping the Domain configuration at a correct state (meaning both
desirable and stable).
This definition brings a key concept about the managed Domain: it is a continuously evolving
element. As time passes, the runtime configuration can experience changes, leading to new
current configurations, and the two logical sets (LRB and DO) can also experience variations in
its composition. In this section I will develop the concept of Domain changes, as it is
fundamental to understand the functions of the management system. This way, although the
initially observed configuration can be correct, the effect of changes originating outside the
control of the management system can potentially alter that. In those cases, the role of the
management system will be reacting to these events in order to restore the domain to a
correct state. This will be achieved by in turn invoking a set of changes over the domain.
From those concepts, an initial classification of the changes can be established, depending on
the originating element. This way, changes can be either external changes (initiated by
external entities) or internal changes (applied by the management system). In addition to the
initiating agent of the changes, the scope of both types of changes is different: external
changes can modify any of the three elements of the domain, whereas internal changes can
only modify the current configuration. Those concepts are also coherent with the autonomic
control loop concept: an external change occurs to the domain, the system diagnoses the
change, evaluating if it breaks domain correctness. If that is the case, a set of internal changes
are applied, until the domain is restored to a correct state.
The following sections will further detail each category of changes, providing a better
characterization of the management operations. External changes will be evaluated on the
potential impact over the overall correctness, while the possible range of internal changes will
be defined in order to determine the possible scope of actuation of the management system.
- 91 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
runtime resources from the configuration. They will generate a new current configuration,
C’. Clearly, there is no guarantee that the new instance is either stable or desirable, thus it
will possibly force the management system to react to them.
LRB changes: They are endogenous changes consisting on the addition, removal or
modification of a logical resource from the LRB. As those elements are not evaluated for
stability or desirability, they won’t affect domain correctness. However, they will be
relevant for future change operations, as the logical definitions greatly restrict the scope of
action of the management system.
DO Changes: The other kind of endogenous changes affect to the defined objectives for
the domain. They consist on objectives being added, removed or modified from the DO
set. As objectives determine the conditions for domain correctness, the desirability of the
current configuration will have to be reevaluated, potentially triggering internal
management actions in case the updated objectives are no longer supported by the
current configuration.
The following picture provides a summary on the possible domain changes.
Management
System
INTERNAL
LRB
LR
LR
Configuration
ENDOGENOUS
LR
EXOGENOUS
RR
RR
RR
Res
Host Host
Objectives Resource Resource
O Environment
O O
Domain
- 92 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Because of that, in order to identify the different types of internal changes, I will analyze the
kinds of modifications which can be applied to the set of RuntimeResources. This way, three
main categories of changes can be identified: adding a new RuntimeResource to Co (additive
change), removing an element from Co (substractive change), or altering in some way one of
the existing elements (substitutive change). I will analyze each type of changes with the
objective of defining the primitive operations. I won’t define more complex changes as they
can always be expressed as a combination of the base modifications.
Additive changes consist of adding a new RuntimeResource to the configuration. As every
operation is defined only from the Domain information, in order to add a RuntimeResource its
logical definition must be defined in the LRB. By following the terminology provided earlier, I
will name this type of change instantiation. As the configuration is a hierarchical structure, the
operation will be executed over a runtime host from the current configuration, which will
become the root of the execution context of the new RuntimeResource. The corresponding
operation is defined as follows:
INSTANTIATE(lr, rhost)
Operation arguments
Applied Change
The opposite type of changes consists of removing RuntimeResources from Co. As I intend to
define only the base primitives, I will only allow removing resources without any hosted
element in the current configuration. The rest of elements cannot be individually removed
without breaking configuration structure (as they would be left without a containing element).
Those cases will be supported with a combination of remove changes, targeting initially the set
of the hosted elements, and finally the hosting resource. The primitive remove operation can
be defined as follows:
REMOVE(rr)
Operation arguments
Applied Change:
Finally I will detail the modifications which can be applied to an existing RuntimeResource. The
most common types of changes alter the resource configuration, while obviously respecting
the resource identity. This way, they will consist of a set of modifications to the resource
property, defining, eliminating or altering their values.
MODIFY(rr,conf)
Operation arguments
Applied Change:
- 93 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
instantiate and remove operations. However, if the target resource does not have a matching
logical definition, this operation needs to be independently defined.
MOVE(rr, rhost)
Operation arguments
Applied Change:
Without using a specific information model, the four defined types of basic changes represent
all the actions which can be initiated by the management architecture. Specific management
systems will define their own set of internal changes, by applying two iterations to these
generic definitions. First, in case the information model defines specific resource subclasses,
the changes will have to specify the affected types, potentially increasing the number of
operations. After that, it is also possible to define compound changes, such as a service update
(consisting internally of a resource removal followed by an instantiation of a different version
of the same resource over the same host resource).
REMOVE (RRi)
...
- 94 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Each of the potential changes described in the change tree will modify the targeted domain.
This way, starting from the initial state, a total of |Change(D)| different states can be reached
by applying one of the potential internal changes to the environment. The application of the
change modifies the domain configuration, obtaining a new set of potential changes,
Change D . This way, as long as there is no external change to the domain, it is possible to
iteratively explore the possible actions from each additional end state, until obtaining the
complete set of potential obtainable configurations. In the following paragraphs I will try to
explain how to build this set. Starting from a domain d0, and exploring the potentially
executable internal the following set can be obtained:
In order to obtain the second order set of potentially obtainable Domains, it will be necessary
to repeat the same process, starting from each element of the set , and aggregate the
results. When the complete results of this set have been gathered, the aggregated set will
have some repeated Domain states (as a pair of commutative operations has been applied in
different order), as well as the initial state D0 (obtained by applying a pair of inverse
operations). These repeated elements should be removed from D2 as all the potentially
obtainable branches from them would also be repeated elements. The second order of
reachable domains can be defined as follows:
The same process defined up to this point, applied recursively to every unique Domain state
obtained, will produce the complete set of reachable environments. This way, the nth iteration
can be defined as:
By expanding over the change tree concept introduced earlier, it is possible to visually
represent how the different Domains are reached. The proposed method for searching the
potentially obtainable configurations generates a spanning tree, which can be seen in Figure
48 after applying z iterations.
- 95 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
D0={C0 ,LRB,DO}
CHGn0(D0)
CHG1(D0)
CHGi(D0)
...
...
D1.1={C11,LRB,DO} D1.i D1.n0={C1n,LRB,DO}
CHG1(D1.n0) ...
CHGn1.n0(D1.n0)
CHGi(D1.1) ...
...
CHG1(D1.1)
CHGn1.1(D1.1)
...
...
...
...
...
...
D2.1.1 D2.n0.n1
...
D2.1.i=D2.n0.1 D2.1.n1
...
...
...
...
...
...
The followed process has brought the concept of Reachable Configurations (RC), which are all
the potential configurations, which can be obtained by applying a set of internal changes to
the initial configuration state can be easily defined by aggregating the contents of
the nth grade reachable Domain states:
The definition of the reachable configurations set supersedes the previous superset of possible
configurations as the root of the configurations space. For management purposes, only the
configurations contained in RC need to be considered, as they include the current state C0 , and
every reachable configuration through the intervention of the management system. This way,
the size of the set of Configurations which must be considered by management operations is
greatly delimited, further reducing the complexity of the base problem. As it can be seen in the
next picture, from this point onwards RC will be the superset for all the management
operations. Also, the same set of definitions for the sets of stable configurations, and desirable
configurations still apply, but they will be evaluated now against this base superset.
The RC is only valid as long as there are not external changes to the Domain. Any external
modification to the elements of the domain will need RC to be reevaluated. However, that
does not invalidate this set for management purposes, as it will be relevant as long as the
control of the modifications is held by the management system.
- 96 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
C RC
SC
DC SC
CC
RC
DC
The definition of the RC set also allows a better definition of the role of the management
system: If after suffering an external change, the managed Domain ends in a non correct state,
the management system must explore the RC from that initial state in order to detect a correct
configuration from this set and apply the set of required internal changes to reach it.
- 97 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
represented by the current models (e.g in the previous case, the response time could be
improved by migrating the service to a more powerful node, or creating a new instance of
the service and balancing the traffic among the two of them). They can be expressed with
the current abstractions but cannot be automatically addressed, so they have not been
considered.
This way, in order to support an automated behavior of the management architecture, the
previous scenarios must be included into the set of defined uses cases for the system. With
these requirements in mind, I will first identify the most relevant use cases which must be
fulfilled by the proposed management architecture, and finally define the set of internal
changes which can be ordered by it to support them.
- 98 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 99 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Node-level operations. The possible configuration operations to Node elements are Container
instantiation and elimination, and configuration of the nodeResources and nodeProperties. The
same reasoning applied for environment-level operations can be applied to this category, so
no defined internal changes will target the runtime Nodes.
Container-level operations: At this level most of the required internal changes will be defined.
The potential list of container-level changes includes RuntimeUnit instantiation / elimination,
and Container Properties and Resources configuration. RuntimeUnits operations must be
clearly supported, covering the complete lifecycle of hosted units, from the instantiation to its
removal or update. containerResources configuration must also be supported for its potential
role in the satisfaction of RuntimeUnit Constraints. On the other hand, containerProperties
modification will be left outside from the management scope, as they only reflect internal
details of the Container whose modification could not reach a correct state from an incorrect
one.
Runtime Unit-level operations: Finally, I will analyze the potential configuration changes to
existing RuntimeUnits. As unit lifecycle must be controllable by the management system, unit
state modification (STARTED<->STOPPED) must be included among the internal changes.
Regarding additional unit configuration, exportedResources configuration will not be
supported. These elements are considered and integral part of the RuntimeUnit and there are
no identified use cases where those operations would be necessary. Properties configuration
will also be supported, as it is clearly a required functionality, as both ContextAwareProperties
and BoundProperties need to be controllable by the management operations. For analogous
reasons, Binding configuration must also be supported.
After delimiting the scope of the allowed operations I will completely define the set of
potential internal changes, which is composed by 10 elements. Each configuration primitive
will be completely defined by the following information, the target (RuntimeResource where it
will be applied), the arguments (required additional information for the operation), the pre-
conditions (mandatory initial state for the change to be applied) and the post-conditions
(mandatory runtime state after applying the change). In case some of these elements include
optional sections they will be marked with a question mark (optional_term)?, similarly to the
EBNF (Extended Backus–Naur Form ) notation.
INSTALLDU(cont,du)
Name Install Deployment Unit
Target
Arguments
Pre-conditions DepUnit
Post-conditions DepUnit
DepUnit
UNINST(cont,ru)
Name Uninstall Deployment Unit
Target
Arguments
Pre-conditions DepUnit
Post-conditions
DepUnit
- 100 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
UPDATEDU(cont,ruo,dun)
Name Update Deployment Unit
Target
Arguments
Pre-conditions
Post-conditions
, DepUnit
, du2
DepUnit
DepUnit
ADDCRES(cont,crc,name)
Name Add Container Resource
Target
Arguments
Pre-conditions DepUnit
Post-conditions
DepUnit
RMVCRES(cont,rc)
Name Remove Container Resource
Target
Arguments
Pre-conditions
Post-conditions s
DepUnit
CNFCRES(cont,rc,props)
Name Configure Container Resource
Target
Arguments
Pre-conditions DepUnit
Post-conditions
DepUnit
STARTDU(ru)
Name Start Deployment Unit
Target
Arguments -
Pre-conditions DepUnit ED
Post-conditions C E
DepUnit
STOPDU(ru)
Name Stop Deployment Unit
Target
Arguments -
Pre-conditions DepUnit C E
Post-conditions
DepUnit
- 101 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
CFDUPROP(ru, props)
Name Configure Unit Properties
Target
Arguments
Pre-conditions DepUnit
Post-conditions
DepUnit
CNFBIND(ru,bindId,rb)
Name Configure Unit Binding
Target
Arguments
Pre-conditions
Post-conditions
DepUnit
- 102 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
most well-known NP Complete problems: the boolean satisfiability problem (SAT). The SAT
problem consists of finding a solution to a function composed only by boolean variables and
logical operands [103]. In addition to the base SAT problems, there are several variants as
MaxSAT, consisting of finding the maximum number of positive literals of the formula, or
Pseudo Boolean SAT, which expands the number of allowed functions to include boolean
linear expressions, as well as an optimization function.
In the early sixties the DPLL (Davis-Putman-Logemann-Loveland) algorithm was presented [18],
providing a sound and complete way of obtaining the solution to the problem of satisfiability..
This algorithm first tried to discover if the problem had a solution, and in the affirmative case
started to explore the variable space by assigning them potential values through a technique
called Boolean Constraint Propagation (BCP). In case at one step of this process a conflict was
found (assigning both true and false to a variable), the algorithm would backtrack to the point
where that decision had been made and try the other possibility. This base algorithm has been
heavily optimized in the recent years, by adopting strategies such as conflict analysis
techniques (which try to obtain the reason for an assignment conflict), conflict-driven learning
(adding clauses to avoid the same conflict in the future) or the incorporation of heuristics to
detect conflicts before they occur or optimizing the way the search space is explored [91]
The relevance of SAT goes beyond its mathematical interest, as the first discovered NP
complete problem. Thanks to the described advances in SAT resolution algorithms, current
solvers can obtain a solution for SAT problems of a considerable size in a short time, which has
lead to their adoption as the base algorithm for solving numerous problems of the electronics
and computer science fields [65]. The problems expressed in SAT include automatic test
pattern generation, redundancy identification and elimination, FPGA (Field Programmable
Gate Array) routing, or model correctness checking [73].
There is one specific field of application of SAT which has lead to its consideration for the
proposed algorithm: the support for dependency resolution in installation processes. This
application was initially proposed by the OPIUM prototype [113]. This work describes how to
automatically obtain dependency closures of Linux packages, respecting both expressed
dependencies and incompatibility constraints. The practical applicability of this approach has
been demonstrated since 2008, when a SAT-based algorithm was incorporated to the Zypp
dependency manager of the openSuse 11.0 Linux distribution. In addition to this example,
there is another successful use of SAT for dependency management among logical component
and service definitions: the Eclipse p2 provisioning engine which, since the 3.4 version of
Eclipse, released in June 2008, completely manages the dependencies of the platform plug-ins
for install, update and uninstall operations. The p2 use of SAT [66] refines the notions
presented in OPIUM, in order to support a more expressive dependency model, with version
information and decoupling among the packages and the providing /dependant elements.
Both of those refinements are also considered by the model proposed in this dissertation,
which further reinforces the notion that SAT may enable a declarative resolution of the change
identification problem. In addition to the increased expressivity, the performance results
obtained by the p2 implementation are very positive, as it has been successfully tested with
problems of more than 10000 literals, and a solution involving more than 3000. Finally, p2 uses
a PBSAT solver instead of a regular one, which not only provides a simpler definition of the SAT
clauses, thanks to the use of linear functions, but also allows to specify an optimization
- 103 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
function, enabling to find not only a correct solution but also improve the quality of it through
optional restrictions.
Although there are similarities between these proposals and the problem of change
identification there are also several differences. The main difference is the distributed nature
of the problem of finding a domain configuration, which adds an additional layer of complexity
to the problem, and forces to partially alter the strategy of literals definition. On top of that,
several additional requirements need to be expressed as SAT clauses, such as Constraints over
the execution context, or visibility restrictions. Finally, the solution must not only decide what
RuntimeUnits will be available, but must also provide a correct configuration for aspects such
as Bindings or containerResources. Nonetheless, after an analysis it has been determined that
these factors can also be expressed into an SAT problem.
With these factors in mind, I have opted for an imperative approach, consisting of the
characterization of the change identification problem as a Pseudo Boolean SAT. In this
approach, first a set of literals must be defined, representing the Reachable Configurations. On
top of them, constraint functions will be identified ensuring that the solution obtained by the
solver respects the structure of the information model, and is stable and desirable. Once that
information has been expressed in the format required by the SAT it will be necessary on a
final step to process the results of the engine invocation, in order to identify the required
internal changes to the domain.
In addition to the literals, it is necessary to define a set of clauses which will ensure that the
proposed solution is at the same time stable and desirable. For this goal the stability and
objective functions described in the previous chapters will be translated to boolean functions,
referencing the defined literals. In order to improve the quality of the proposed solutions, an
optimization function can be defined to maximize some desired characteristics of the final
environment state.
- 104 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Finally, after invoking the resolution engine the proposed values for the defined literals will be
interpreted and compared with the current configuration, in order to obtain the list of
required changes.
The following picture represents a high-level representation of the complete change
identification process. In the following sections I will describe the required algorithms for,
starting from the Domain information, generate the required input for the SAT Solver (Literals,
Clauses and Optimization Function), invoke the engine, and interpret the results as the set of
required changes.
CCL
LR
LR
RULit
StF
LR
Configuration
CRLit BLit CONF BRi TO Rj
PB SAT SOLVER
SbF
RR
RR
CRLit
RR SbF
Res
Host Host
RULit
Resource Resource SbF
Environment RULit RULit
Objectives ObF
O
O ObF
O
- 105 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The proposed method considers the RuntimeUnits obtainable from the logical definition,
without requiring to separately evaluate the resources already present at the environment.
Most of the existing RuntimeUnits from the current configuration will have a matching
DeploymentUnit in the LRB, meaning they are already included in the literals definition. The
only missing elements are the existing RuntimeUnits without a correspondent logical definition
at the LRB. Although they can potentially be removed by the operation REMOVE(RU), it has
been decided to leave them out of the scope of this algorithm, as in case it was decided to
apply that change, there would be no way to reverse the operation. Nonetheless, these units
still play a role in the process, as they might be required by other RuntimeUnits for satisfying
their dependencies.
In order to completely capture the possible structure of the final configuration the proposed
RU literals are not enough. The decision on how to configure the Bindings of dependent
RuntimeResources must also be provided by the change management process. In order to
support that, additional literals will be defined to represent the possible Binding values of the
existing RuntimeResources. For each RuntimeUnit Binding, there will be one literal for each
other RuntimeResource which can potentially satisfy the relationship. The set of BindingLiterals
will be referred to with the following notation: In the worst
case, the number of defined Binding literals would be: , with
being the set of Bindings among the potential RuntimeResources.
Finally, in addition to the decisions about the RuntimeUnits, the change management system
can also modify the configuration of the existing Containers, modifying their Resources through
the use of the Configurable Container Resource templates. With an analogous reasoning about
RuntimeUnit literals, I will define the Container Resource Literals,
, which represent the decision to create a RuntimeResource in a Container,
based on a CRC template. As potentially any number of RuntimeResources can be created on
the same Container from only one CRC template, it is not possible to provide a worst case
estimation of the cardinality of the set.
The three sets of literals are enough to capture all the decisions about the future configuration
of the domain. There are additional aspects of the configuration which haven’t been defined as
literals, such as context aware or bound properties. Those elements do not belong as literals
because their value is a result of the decisions about the RuntimeUnit structure, which is
already supported by the existing set.
The following table summarizes the defined literals and the worst case estimation of the
number of required literals for analyzing a base domain. It is evident that the number of
literals grows exponentially as the size of the LRB and the number of logical elements
increases. Fortunately, the actual complexity of the problem is much simpler than that, as the
already defined stability constraints greatly reduce the number of potential configurations.
Although some of them will be processed as SAT clauses later on, there are also at this point
some simplifications which can directly reduce the cardinality of the three sets of literals. Over
the following paragraphs I will propose an algorithm for defining the set of literals which only
contributes to the SAT the uncertain elements, filtering out variables whose value would
always be false or true.
- 106 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The CLit set elements are stable by themselves, but their only role is to support the stability of
RuntimeUnits with Constraint definitions. Because of that, instead of an undetermined amount
of crl literals, only the ones required for the stability of the constrained RLit elements will be
defined. This way, CLit literals will be defined on the fly over the processing of Resource
Constraints.
I will start the simplifications by proposing an algorithm for the definition of the RLit elements.
Over the iteration of the DeploymentUnit elements of the LRB against the configuration
Containers, some stability checks will be applied to verify whether the rul literal could
potentially be stable, ignoring it in case it is not possible. This way, the following checks will be
applied:
- If the type of the DeploymentUnit is not among the supportedTypes from the host
Container the literal will not be defined.
- For DeploymentUnits with Constraint definitions, the algorithm will check whether the
Execution Environment of the RuntimeUnit contains or can potentially contain (by means
of container resource creation from a CRC template) the identified resource. This way, for
each defined Constraint definition there will be three possible results: The resource is
NEVER available at the environment, the resource is ALWAYS available at the environment,
or the resource is not available but could be configured. RLit literals won’t be defined
whenever the resource is NEVER at the environment, or otherwise, whenever the
Constraint definition is of NOT kind (incompatibility) and the resource is always available at
the environment. In the remaining cases the RLit element will be defined, with also a CLit
being added for the undetermined cases, representing the decision of creating or not the
container resource from the CRC template.
By applying these restrictions, the following algorithm is proposed to only define the RLit and
CLit which can potentially be evaluated as true or false before defining the rest of structural
and management constraints.
filterUnits {
define RLit as list of RuntimeUnit literals
define CLit as list of ContainerResource literals
- 107 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
if([Link] [Link]){
if(ru is constrainedResource){
After reducing the size of the RLit, BLit is also simplified, as the total number of potential
Binding configurations depends on RLit cardinality. In addition to that, as Bindings originate
from a logical Dependency definition, I will evaluate some of its stability checks in order to
avoid defining literals for impossible bindings. This way, the following simplifications will be
applied:
- In order for each Binding to be valid, the dependant resource must be able to access the
bound resource. Resource accessibility is evaluated through the combination of visibility of
the bound resource and the placement of both resources in the overall hierarchy.
Unreachable Bindings won´t be defined as literals.
- The Dependency correspondent to each Binding includes a resource identification function
which must be met by the identity of the bound Resource. In case this check is not true the
potential literal will not be defined.
- As a consequence of both simplifications, it is also possible that some RuntimeUnit literals
can never be evaluated to true, as some of their Bindings could never be satisfied. If that
- 108 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
was the case, it will be known over this step of the algorithm, so the invalid RULit literals
will be assigned the false value at this point of the process.
The following algorithm obtains the reduced set of BLit literals after evaluating those
Constraints:
filterBindings {
define BLit as list of Binding literals
for each DependantResource ru RLit{
for each Binding b ru{
for each br RLit –{ru}
if(visible(rr,br) AND [Link](br))
[Link]([Link])
}
}
if(no possible binding was found)
set ru to false
}
}
return B
}
After these procedures have been executed the complete set of literals which will be
evaluated at the SAT solver has been defined. . The decided
value of these boolean variables will be enough to determine the final state of the managed
system.
- 109 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
These requirements will be defined the following way. Let it be with n defined
Bindings. The group of binding literals corresponding to the Binding i is the following set:
For each group of literals the following two conditions
must be enforced: if the RuntimeUnit is present at a Container, exactly one literal from the
group must be true. If the RuntimeUnit is not present, all the group literals must be false.
Those concepts can be expressed with the following two boolean functions:
The first function cannot be directly inserted in the pseudo-boolean SAT engine, which only
supports logic functions and linear expressions over the boolean variables. However, that
function can easily be decomposed into two factors, a disjunction among all the binding literals
and a linear function restricting the maximum amount of true literals from the group to one.
Moreover, the disjunction is the reverse implication of the other identified restriction, so both
can be combined as a double implication. This way, for each group of binding literals, the
following boolean functions will be added to the SAT engine:
The definition of structure functions can be easily added to the previous algorithm where the
set of BLit elements is defined. After completely processing the potential candidates for a
Binding of a RuntimeUnit, the group of literals has just been defined so it is immediate to
process them in order to generate the required structure functions.
- 110 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
applied at the filtering stage for reducing the number of defined RuntimeUnit literals. This way,
no additional functions need to be defined because of those conditions.
Dependency functions. Dependency functions ensure that the Bindings between RuntimeUnits
are correctly defined and that the internal unit configuration is correct with respect to the
bound resource (the values of the BoundProperties). The solution proposed by the SAT solver
must respect these requirements. Parts of them are already enforced by the algorithm for
identifying the binding literals, which applied the visibility and resource identification
restrictions in order to filter out unstable possibilities to the Bindings. Thanks to that, the
Binding literals already represent stable bindings. Moreover, the previously defined structure
functions ensure that the solution includes exactly one binding decision whenever the
RuntimeUnit is present at the proposed configuration.
Nonetheless, the previous operations are not sufficient to ensure a correct solution with
respect to Dependencies. It is also necessary to add functions to the SAT engine which ensure
that, whenever a RuntimeUnit with defined Bindings is present at the configuration solution, it
will have a correct configuration for each one of its Bindings. Structure functions ensure that
one of the binding options will appear at the solution. However, it still must be enforced that
the RuntimeUnit referenced by the Binding also appears in the proposed solution. This concept
can be expressed very similarly to the way OPIUM defined the component dependencies as
SAT functions. The main difference is that, as this algorithm reasons about distributed
instances instead of logical elements, the dependency restriction must be defined for each
binding literal, instead of only one among the provider and consumer elements. This way, for
every defined binding literal, the following function will be added to the SAT.
Finally, the only aspect of dependency expressions that has not been addressed is the handling
of BoundProperties. Analogous to local restrictions, property values cannot be efficiently
evaluated with a SAT, as it would be necessary to use integer variables. However, in this case
those values do not need to be expressed in the SAT, as they are obtained after the decision
on the Binding, which has been already defined through literals and functions. The SAT solver
decides to which unit the Binding must point, and the required value of the BoundProperty is
automatically obtained from that decision, without needing to apply any complex reasoning
process. This way, the configuration of BoundProperties will be directly obtained from
processing the results from the SAT solver.
Constraint Functions. Finally, the constraint functions will be translated and added to the
solver. Similarly to Dependencies, several aspects from the Constraints have already been
taken into account at the filtering stage. First, the resource identification of the required
resources to the constraint was already evaluated, combined with incompatibility check, to
filter out RuntimeResource literals which could never be true.
However, additional restrictions must be defined to ensure that the CLit elements appear at
the solution when required to satisfy the Constraints of RuntimeResorce literals. This
relationship is similar to the one expressed in the dependencies evaluation. If the constrained
RuntimeUnit appears at the solution, the satisfying CRC must also be present. Although they
were initially considered, incompatibility must also be evaluated at this point as it obviously
alters the defined formula.
- 111 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
This way, for each CLit literal, there will be at least one defined function (one for each RLit
which depends on the Configurable Resource existence for its stability). The following
algorithm can be followed for iterating these elements and defining the required functions:
for each ContainerResource cr CLit{
The definition of these additional functions ensures that for any case of the three possible
ways of satisfying the constraint resource identification, the solution will be stable with
respect to that. However, there are additional kinds of Constraint definitions which required
additional restrictions to ensure they are correctly respected by the proposed solution.
First, additional checks must be defined for EXCLUSIVE constraint declarations. For those
cases, it must be enforced that at most one of the demanding resources can be instantiated at
the Container for the configuration to be stable. This was not possible to evaluate during the
filtering stage, as it must be evaluated collectively for all the conflicting units which require the
same resource.
This restriction can be converted to SAT terms with one additional function for each group of
RLits with a Constraint of EXCLUSIVE access to the same resource. I will refer to each group of
literals with the same exclusive constraint as: The SAT
function to respect the special kind of Constraint of these elements can be expressed as:
After defining the functions for exclusive relationships, there is only one kind of constraint
base check which hasn’t been enforced by the SAT literals and functions: consumption
constraints. These constraints can be evaluated similarly to exclusive declarations, as in
practice this is a more general case of it. Fortunately although initially it seems that integer
variables would be required to evaluate this condition, this is not the case. The variables only
have to represent the decision on the existence or not of the consuming RuntimeUnits, with
each one of them consuming an already established weight. As the PBSAT solver accepts linear
Boolean functions, this condition can directly be expressed into the solver.
This way, one additional function will be added for each group of RuntimeUnits consuming the
same property of the same resource. The resource which is required by the units is called r,
with an initial capacity of the consumed property rcap. The group of RuntimeUnits consuming
the resource is , and the amount consumed by each
RuntimeResource is . The consumption function of the group of units can
be expressed as:
- 112 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
A similar process applies for the NOTEXISTS objectives, which mandate the non existence of
the selected resource in the runtime configuration. In order to express this objective the same
set of , will be involved. However, in this case the
interpretation of the objective won’t result in another formula, being directly a false
assignation for the affected literals.
The RuntimeUnit versions of EXISTS and NOTEXISTS objectives, are much simpler to represent
as SAT clauses. The RLIt element representing the target of the objective will be evaluated
beforehand as true, in the case of a mandatory existence, or false, in case it should not appear
over the environment.
- 113 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The first restriction which will be modeled is the control of the maximum number of allowed
instances of each DeploymentUnit over the environment. Depending on the characteristics of
each unit, and the environment defined policies, the specific requirements might change. For
instance, a policy might mandate that each relevant resource should be replicated if possible,
enabling a simple configuration for reacting to individual failures. On the other hand, in many
scenarios the preferred state is to avoid redundancy and allow only one instance of each
element. Those restrictions can be easily enforced by defining additional clauses for each
group of RLit literals. As an example, I will define the case where only one instance of each unit
is allowed in the solution. Let it be , and , the
set of defined literals representing the possible physical instantiations of the unit. In order to
ensure that only one is selected, the following boolean linear function must be added to the
SAT engine:
The second example optimization addresses an important aspect: the problem of minimizing
the amount of changes. Whenever possible, the proposed solution should respect the current
state of the configuration while addressing at the same time the desirability and stability
concerns. However, the set of previously defined functions does not provide any indication
that this would be preferred. The general objective of minimizing the set of changes, although
apparently clear, is actually difficult to represent correctly as part of the satisfiability problem,
because the actual impact of applying each type of changes usually differs. For instance, the
impact of removing a RuntimeUnit from the environment is usually much greater that the one
of reconfiguring a binding. As the specific details will depend on each domain, I have selected a
simplification of that general objective: to give preference to valid solutions that include the
RuntimeUnits existing at the current environment configuration. This constraint will partially
reduce the number of required changes, as whenever a resource provided by a
DeploymentUnit is required for the solution can be accessed from an existing RuntimeUnit, the
existing one will be used.
This objective will be translated to the PBSAT engine as the optimization function, as it
represents a general goal instead of a strict requirement. The complete optimization formula
will be built by the combination of a set of terms derived from each group of literals
, being the potential instantiations of ., with
. The evaluation of each group will provide the function , with the
defined optimization function being:
- 114 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
based on the specific domain heuristics. This way, when minimizing the function, if possible
the existing value will be selected as they have a lesser weight:
The specific value for the weights on the existing and not existing literals cannot be established
at this point, because it depends on whether there are additional factors that must also be
expressed through the optimization function, in which case a relative ordering among them
should be applied, restricting the values of the weights in the optimization formula.
- 115 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
As initially none of the resources identified by the literals exist at the environment, only the
literals evaluated to true need to be processed. False literals represent potential configurations
which have not been selected, and consequently won’t require additional operations. As
regards positive literals, for each positive RLit a new RuntimeResource will be created over an
environment Container, through a ADDCRES(cont, crc) operation.
- 116 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
BLit literals, false ones don’t need to be analyzed for derived operations; they either belong to
a RuntimeUnit which will not exist in the final state, or represent an alternative option for a
Binding which was not selected.
A BLit literal evaluated to true implies that for the final configuration the real binding will be
configured to the designated RuntimeUnit. If the dependant RuntimeUnit did not exist at the
initial configuration, or it did exist but the Binding it was pointing to a different unit, it will
need to be configured by a change operation, with a CNFBIND(ru,bindId,rub) change. As
regards execution order of the changes, Binding-related operations will be applied after all the
INSTALLDU changes have been executed, in order to avoid configuring Bindings to
nonexistent Resources.
In addition to the base Binding configuration, additional changes will be required if the
DeploymentUnit definition contains one or more BoundProperties dependent of the Binding. In
those cases, the required value for the properties will be obtained and configured through a
CFDUPROP(cont,ru,props) change. Those changes will be applied after the automated context-
based Property configuration changes have been applied. An additional factor must be taken
into account when executing those changes: the possibility that the value of a BoundProperty
could be transitively dependant, so that the base configuration value does not originate from
the bound Resource but from a third one instead. This is the case with resources acting either
as proxy (apparently exact copy) or façade (adapted image) of other resources. These
intermediation elements are bound to the original Resource, so that other Resources bound to
the proxy or façade will have their configuration equal to the one of the original resource.
Because of that, it is necessary to apply the configuration changes for BoundProperties
transitively, ensuring that the original value is transferred over the whole Binding chain.
If Binding-related changes need to be applied to an initially existing RuntimeUnit, a
STOPDU(ru) operation will be executed before them, in order to avoid hot configuration of
the unit. Finally, after the configuration is complete, the RuntimeUnit will be activated again
through a STARTDU(ru) change.
- 117 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The final part of the algorithm consists of obtaining the required internal changes to reach the
proposed configuration by the SAT result. In this process the translation of boolean values to
the intended state of the Configuration have been provided, as well as a simple strategy to
obtain the set of required changes from the assigned values to each type of literal.
- 118 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
B
A
C
Container I.1 Container I.2
Node I
F
D
Container II.1 Container III.1 Container III.2
The structural impact of changes affecting the host resources will extend to those in the
hierarchical structure. For instance, if Node I disappears, this would imply that Containers I.1
and I.2, as well as units A, B, and C, will be affected.
As regards Binding-derived, there are three possible sets that can be identified starting from a
unit. The most important set is the one composed by the transitively dependant units (those
bound directly or indirectly to the affected unit). Clearly, the stability of all those elements can
potentially be affected by any change to the initial unit. In the provided example, if the unit C
was affected, the complete set of dependant units (including the source one) would be:
The second set is composed by the satisfying units, which are transitively bound to the unit
Bindings. Although initially less critical, this set can also be relevant to analyze as those
elements were providing stability to a RuntimeUnit which might not be present after the
change. This may imply that these elements are no longer fulfilling a function in the system
and thus could possibly be removed. In the previous example, the satisfiers of C would be the
following set:
Finally, the complete graph of Binding related units (obtained by transitively applying both
relationships) allows partitioning the RuntimeUnits into closed sets that collaborate for
providing its functionality, thus providing a useful global view for the manager. Starting from C,
the related units would be:
This example shows that the set of related elements cannot be obtained just by the union of
the previous two ones, as the transitive nature of the relationships makes its calculation more
- 119 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
complex. Only units F and G are in no way related to C with the information handled by the
management architecture.
For any of the three categories, the aggregated impact of changes to several RuntimeUnit
equals the impact of the union of their sets. This way, in the most general case, the total
impact of a change to a Node or Container can be calculated by first obtaining the set of
affected hierarchical resources. After that, for each runtime unit affected, obtain the binding-
related changes, and apply a union to all those sets.
Once the characteristics of the problem have been described I will provide the required
definitions to handle it correctly with a PBSAT solver. I will only cover the units Binding
reasoning as the hierarchical impact is immediately obtained. The following terminology will
be used: the runtime environment contains the set of runtime units Units = {RUi}. The units
are related through the set of .
Only those sets need to be analyzed to define the SAT problem.
In order to convert the problem to a SAT function I will define one literal representing each
element from Units. The logical evaluation of these variables will dictate whether that
RuntimeUnit has been impacted by the source one (true value) or not (false value). Clearly, the
same set of variables will be used for obtaining any of the three impact graphs described
previously.
After defining the literals and the interpretation of their values, I will define the objective
function that will be optimized by the pseudo Boolean solver. As only the affected units must
be evaluated to true, the defined function will try to minimize the number of positive runtime
units. Again, this is also the case for the three different graphs to be calculated. This way, the
following objective function is obtained:
After those initial definitions have been established, different clauses will be defined
depending on the type of impact graph to be obtained. In the case of the dependant graph, for
each defined Binding, an implication will be added from the destination to the source of the
Binding.
For the satisfier units it is necessary to traverse the Bindings the reverse way, so the opposite
functions will be defined:
Finally, for obtaining the complete closure of binding-related elements, both relationships
must be transitively applied. This way, the defined functions must be:
Up to this point, all the necessary restrictions have been established. However, one last
statement must be defined to the solver in order to obtain a solution, as the current set of
definitions will always evaluate every literal as false. This is the case because none of the
aforementioned concepts specify which unit or units are the source of the impact analysis. This
can be expressed simply by evaluating to true the literals representing the directly affected
- 120 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
elements. One of the main advantages of adopting a SAT engine for addressing these
calculations is that, because of its declarative nature, the impact from multiple originating
elements can be evaluated simultaneously just by setting each corresponding literal to true.
This way, in order to select the units to be evaluated, the final assignments have to be made:
5.5. Conclusions
Over this chapter I have provided a complete characterization for service management
purposes of the two main affected entities. First, the management Domain has been defined,
consisting of the combination of a set of logical resource definitions, contained in the LRB, the
currently available runtime resources, which constitute the runtime Configuration, and the
established functional objectives for the Domain. All the relevant information is defined over
the base concepts introduced at the previous chapter, which enables an automatic reasoning
over the managed environment. This way, a function has been defined which determines
whether the state of the domain is correct or not, taking into account both the functional
objectives and the stability conditions.
The role of the management system has also been characterized as a corrective agent who can
invoke internal changes to restore the correctness of the domain state, after it has
experienced external modifications. in this model the set of allowed operations for the service
management system have been defined, and a complete algorithm has been proposed which
automatically analyzes the domain state, finds a reachable correct configuration and proposes
the set of required changes to alter the domain and obtain the selected state. The proposed
solution is based on a problem resolution technique of proven maturity, which is adopted in
areas such as logic circuit design or dependency management of complex software
distributions.
- 121 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
6. Reference Architecture
The previous chapters have detailed an analysis of the complexity of automating the change
and configuration operations of enterprise services. In order to cope with those problems,
initially the characteristics of general management systems and the enterprise services were
analyzed, in order to obtain a model abstraction of the relevant information. This way, the
terms of the problem are clearly described, using a common set of definitions for the managed
assets and the business objectives. On top of them, the function of the management system
was described and an algorithm to automatically solve its main activity, the service change
identification, was described. However, no guidelines have been provided to implement those
concepts in a specific scenario.
The objective of this section is to propose a reference architecture for an enterprise service
change management system. The architecture will be based on the models and reasoning
processes detailed in the previous sections. The description will only focus on supporting the
functional characteristics of the system, abstracting over aspects such as security, reliability or
auditability which are common non functional requirements to enterprise systems. This has
been decided in order to provide a more focused view of the architecture.
Before starting the architecture description, I will select a view model from the literature
which enables a clear description of the architecture of the service change management
system. This way, I will first analyze the best known model in the literature: the 4+1 view
model established by Kruchten [64].
This model consists of four base views which focus on different aspects of the architecture:
The logical view describes the functionality provided by the system, through the use of UML
class and /or sequence diagrams. The Development view provides a logical representation of
the architecture, detailing its internal structure into development packages. The Process view
focuses on the runtime behavior of the system, reflecting how the different elements interact
and communicate. Finally the Physical view covers the physical deployment aspects of the
proposed architecture. These four aspects of the system are complemented by the scenario
view, which illustrate the architecture description through a set of use cases or scenarios,
putting into context the other four views. The model allows a certain degree of flexibility,
conceding that for specific cases, some of the views might not be necessary as the remaining
set comprise all the important details.
- 123 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The 4+1 view model has the greatest acceptation among the proposals of the literature.
However, it is intentionally a generic model for describing the architecture of any system,
abstracting the specific characteristics of each domain. Because of that, there is room for
additional view models that provide a more specialized focus. In the management domain, the
most accepted model for describing management systems is the OSI management framework
for integrated management of networked systems [60]. Although the OSI management
architecture has not become the dominant solution, the high level concepts created for its
description have proven to be flexible enough to be adopted to describe any management
architecture [42]. This description model documents a management architecture by detailing
the following four basic sub models:
Functional model: Defines the management function areas supported by the architecture.
This includes the detail of the functions supported by the architecture, as well as how they
are achieved through the collaboration of the different components.
As it was described in the state of the art introduction, management can be understood at
different levels of abstraction, ranging from business processes to technical configuration
activities. Because of that, in addition to the OSI partition, the abstraction level of the views
should also be specified. In [97] three reference perspectives are proposed, process view,
system specification and technology model, going from the most abstract to the most concrete
one. The process view focuses on the high-level processes (similar in abstraction to the ITIL
Service Management and Service Delivery process views). The system specification perspective
- 124 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
details any required information to design a management system, except for the aspects which
are platform specific, covered by the technology perspective. This presents some similarities
with the PIM (Platform-Independent Model) - PSM (Platform-Specific Model) characteristic of
MDA processes. Clearly, in the context of this dissertation, the selected perspective will focus
on the system specification. The combination of OSI sub-models and description perspectives
is presented at Figure 53.
Although initially 4+1 and OSI might seem to be quite different, after analyzing the information
which is provided by each view /sub-model there are clear mappings which can be established
between both models, considering that one is generic and the other focused on the specifics of
management. The logical view completely characterizes the functionality of the system. This
information must be contained in any architecture description, as is the case of the OSI model.
Two sub-models of OSI are roughly equivalent to the logical description: the information
model describes the data which is being handled by the architecture, and the functional model
details the possible operations. The process view is represented by the communication model,
which in this case is specialized to focus on the main communication mechanisms between the
main components of the architecture and the agent infrastructure. Similarly, the physical view
is similar to the organization model information, which details the multiplicity and distribution
of the instrumentation infrastructure and the main management functions. The development
view is not present at OSI model, although that information can be covered in the functional
model. Finally, the scenarios are missing from the OSI model, which does not define any
mechanism for linking the different information views.
- 125 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
After analyzing and comparing both architecture description models I have selected a mixed
approach for detailing the management architecture. As a base reference I will adopt the OSI
model, with the addition of the +1 scenario view to better link those concepts. OSI has been
selected as the base model because it adapts better to the management concerns, and at the
same time reflects the key aspects covered in the 4 views. Development view information will
not be contributed, as it is sufficiently covered with the information provided in the other
models. However, the scenario view will be also adopted in this description to clearly reflect
how those factors come into play.
Once the description model has been selected, the following sections will describe each sub
models, with the scenarios providing the final link.
Information Model
Software meta model Runtime meta model Objectives
- 126 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The software model provides information about the logical elements which can appear
instantiated at the managed environment. Model details are provided in Section 4.5.1, Service
Deployment and Configuration Model. The basic elements of the metamodel are resources,
which represent software services, libraries and additional assets. Deployment units aggregate
those logical resources and represent the actual elements which are provisioned to the
environment, thus becoming the focus of the deployment operations. The model allows
expressing both the provided resources for each Deployment Unit and the restrictions on the
rest of the environment or them to be instantiated correctly. The elements of the software
model are represented at Figure 37.
The environment model allows characterizing the runtime execution environment where the
services are deployed. It is based on the common abstractions defined by the most relevant
standards in enterprise modeling (such as WSDM, CIM and D&C), while at the same time
extending the resource semantics so that there can be links between the runtime elements
and the logical definitions. This way, it is flexible enough to support the modeling of
environments with very different topologies and characteristics. It must be noted that,
although there is no explicit dependency between two metamodels, the elements from the
logical model appear instantiated at runtime, and at the same time the stability of the
structure of the environment model is restricted by the constraints and dependencies defined
in the software model. The elements of the runtime model are described with details in
Section 4.5.3, and its main elements are depicted at Figure 41.
The Objectives model captures the established definitions of what the environment as a whole
must do, in order for it to fulfill its intended purpose. As it this was the case with the other
ones, they are also expressed over the base concept of resources. The different types of
objectives are described in the section 5.1.1.
Those three models enclose the main management concepts that must be taken into account
for managing the service configuration changes over a distributed environment. Although
additional, specialized models might be used by some of the architecture subsystems, they will
be derived from the information contained in those three.
- 127 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
described. However, there is a gap between the core management systems, which process
those generic elements and the low-level, specific information of the environment. This
mismatch will be addressed by a third element which acts as a bridge between them: the
agents constituting the instrumentation infrastructure. Its role is twofold; on one hand, it must
capture all the relevant information from the managed runtime elements, and convert it to the
runtime model, enabling its processing by the core management systems. On the other hand,
it must be able to receive the changes identified by the main systems and apply them by
invoking the specific management interfaces of the concrete runtime elements.
The described relationships link the whole environment with the management functions.
However, in principle that should not be necessarily the case, as an environment is not a
monolithic element but a set of nodes with computing power. Each node provides multiple
resources, and containers, which on top of that also contain runtime units and services. In
principle, it seems there could be specific managers for each node of the environment, each of
them reasoning only over a limited set of the information. However, this is not possible as
distributed dependencies span over the whole environment, ignoring the containment
hierarchy, and objectives are defined and must be maintained over the complete set of
resources that constitute the environment. Fortunately, the existence of multiple, specialized
environments instead of a single one per organization allows this centralized model to be
adopted without scalability concerns about the size of the organization. The following picture
shows how the management architecture will be organized with respect to the managed
environment.
Organizational Model
Core Management 1 *
manages
Runtime Environment
System
1 1
Invokes change operations >
monitors
operates
< Provides runtime information
* 1
Environment
Instrumentation Agents
- 128 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The organizational model has identified three main elements (core, agents, environment), and
subsequently three potential communication channels, as shown in Figure 56. This way, the
first step to describe the communication model will consist of analyzing the nature of these
three connections. First, from the previous description it becomes clear that the core does not
directly communicate with the environment, as every operation is executed through the
instrumentation agents, which act as an adaptation layer between the model-based core and
the real runtime elements. On the other hand, the core-to-agents communication channel
plays a fundamental role, and must be thoroughly covered in this view. Finally, although
undeniably there is a great deal of communication between the agents and the environment it
cannot be detailed in this model, as the communication details will be different depending on
the technology of the specific managed elements. Moreover, as the architecture must be able
to adapt to multiple, different heterogeneous systems, it is unfeasible to know and describe
them beforehand as part of the architecture description.
In addition to the inter subsystem communications just discussed, these entities also present
complex internal communications requiring specific mention and detail in this view. This is the
case with the instrumentation agents, required to work under a high degree of uncertainty
about the actual managed environment. In order to cope with the fact that the topology and
characteristics of the environment are unknown at design time, the agents instrumentation
infrastructure has been broken down into a hierarchy of specialized agents that collaborate
and communicate to mediate between the abstract concepts and the low-level information
and operations. Therefore, the internal organization and communication of these agents will
also be covered as part of the communications model description.
The following structure will be followed to describe the different facets of the communication
model: First, the communication between the core and the instrumentation will be covered,
detailing how both monitoring and operation are supported. Once these aspects have been
sufficiently covered, the internal details of the inter agent communication will be explained.
- 129 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
protocol will not be selected in this architecture description. However, this view will address
the characterization of the exchanged information and the communication patterns between
these entities.
- 130 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
an active service, it will be at least composed by an install activity, followed by a start activity.
All those factors have been captured in the definition of the change plan model. This model
allows defining these complex changes and provides a simple mechanism for expressing the
set of required activities, with all the inter-dependency information that ensures they are
executed correctly. The next figure shows the elements of the change plan model.
dependencies
ChangePlan
1..* Activity 0..*
name: string
env: Environment activities target: Resource
DeploymentActivityType
ResourceActivityType ConfigurationActivityType
INSTALL_DEPLOYMENT_UNIT
UPDATE_DEPLOYMENT_UNIT ADD_CONTAINER_RESOURCE CONFIG_CONTAINER_RESOURCE
UNINSTALL_DEPLOYMENT_UNIT REMOVE_CONTAINER_RESOURCE CONFIG_UNIT_PROPERTIES
START_DEPLOYMENT_UNIT CONFIG_UNIT_BINDING
STOP_DEPLOYMENT_UNIT
- 131 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
case with the previous set of changes, the target of those activities is the Container where the
resource will be created or removed.
Finally, ConfigurationActivities modify the configuration of the existing resources at the
environment. Their input parameter is a set of Properties that will be applied to the selected
resource. Depending on the type of activity, the targeted resource will either be a Container
Resource (modifying its properties), or a RuntimeUnit, configuring either its properties or one
of its Bindings.
Up to this point the contents of the plan have been described, but the mechanism to express
the restrictions on their execution must still be provided. Before detailing the selected
approach I will provide a brief overview of the alternatives found in the literature, which were
previously discussed in the state of the art section.
The most classical approach for restricting the execution of multiple activities is through
temporal planning [4]. This way, each activity is scheduled for execution at a specific time
interval. Time ordering techniques require knowing with a high degree of confidence the
estimated time to complete each activity. There are several proposed techniques for
addressing this problem, including the use of SAT solvers [15]. However, enterprise
environments have a large degree of uncertainty that complicates this process; the diversity in
types of containers (and vendor implementations of the same standards), the variable load of
the systems and the difference in the actual hardware resources render this approach
unfeasible. That type of solution is adequate for homogeneous, predictable environments, but
cannot be adopted in this case.
Other approaches model the graph as an executable process, composed by a set of plan
activities with several dependencies. In order to express them, a possible alternative is
adopting Gantt-like semantics for defining the types of dependencies between primitives: FS
(Finish to Start), SF (Start to Finish), FF (Finish to Finish), SS (Start-to-Start), as it is described in
CHAMPS [56]. Although with less expressivity, these concepts are already present in the
deployment and configuration of enterprise production systems. Manually defined changes
are applied through configuration scripts that support simple dependency expression
mechanisms, such as Ant target dependencies [70]. In this model, each target (representing a
single operation) declared what other targets from the script must be successfully executed
before it is invoked; Ant dependencies are semantically equivalent to FS Gantt Dependencies -
cannot start until the other one has finished. It must be noted that neither mechanism defines
the exact execution order of the activities, as non-dependent ones can be correctly executed in
any order, or possibly simultaneously in parallel, for trying to improve the performance.
The difference between these two approaches lies in the required expressivity for the
dependency definition, as Gantt-style semantics allow to express small optimizations in the
simultaneous execution of the two activities. Depending on the nature of the operations, the
simple script dependency model can be enough, or the Gantt more expressive semantics can
be used instead. In this case the more simple approach has been selected, as those additional
semantics were not applicable to the constraints that must be reflected between the activities.
This way, in the model, each activity can express any number of dependencies to other
activities. At execution time, one activity cannot be executed until all their dependant activities
have finished executing correctly. This mechanism ensures a correct execution order of the
- 132 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
activities, while at the same time supports different interpretations. Activities can be
sequentially ordered and executed, or could be split into several, independent sub processes,
and executed in parallel for an optimized performance.
Once the change plan model has been completely characterized, the rest of details about
internal change communications can be easily described. The core management system
defines a set of changes in the form of a plan, which is delivered to the agent infrastructure for
it to be executed on the specific environment. Once its execution finishes, the agents must
also provide a report on the execution, informing about the successful execution the activities
or the detected problems.
- 133 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
agent. This mechanism simplifies the provision of the agents to the managed environments, as
well as reducing operation and maintenance costs, as it automatically adapts to runtime
changes. However, in some restricted environments a completely automated discovery is not
possible, because of the use of internal firewalls that filter the broadcasting messages. In order
to support those scenarios, the Environment Manager also supports manual configuration. The
Environment Manager monitors the discovered Node Managers, and in case the
communication with one of them is lost, the node information is removed from the general
model and the core management system is notified if the communications are not restored in
after a number of retries.
Communication Model
Change
Change Notification
Environment
Plan
Instrumentation Agents
Environment
Manager
Plan
Discovery
Executor Node Service
Node
Activity
Manager
Resource Container
Actuator
Resource Container
Gatherer Gatherer
Runtime Environment
Regarding the change execution agents, a simpler approach has been applied. As the
environment topology is already known because of the instrumentation agents, a simpler
structure can be adopted. Every activity is executed either over a Container or, a RuntimeUnit
running over a Container, thus increasing its homogeneity for change execution purposes.
With these factors in mind, the following agents have been defined: The Plan Executor is the
main element. It receives change plans, and dispatches the plan activities to different Actuator
agents, which translate the generic operation into protocol-specific operations to the targeted
Container. The Plan Executor performs the matching between activities and specific Actuators
- 134 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
through a simple mechanism: the Actuators register on the Plan Executor declaring which
container types and operations they support. This way, the Executor can obtain compatible
Actuators only by looking at the registry information. It must also be mentioned that, the
change execution agents do not necessarily have to be bound to one specific environment.
First, change execution agents do not need to store state information between one execution
and the next. Secondly, the technology specific management interfaces allow for remote
invocation of the operations, decoupling the Actuators from the Environment topology. The
combination of these two factors allows reusing a single instance of both the Plan Executor
and the Actuators for specific technologies among all the managed environments.
The general view of the communications model is shown in Figure 58, which describes the
communications between the three main elements, as well as the internal detail of the
exchanged information between the different members of the instrumentation agents. Both
monitoring and change execution communications are described.
- 135 -
Optimization Objectives Environment
Policy apply to
PhD Dissertation
creates
Universidad Politécnica de Madrid
- 136 -
orders
Managed
accesses CONTROL LOOP
Environment
accesses
notifies
invokes
stores
executes
creates
Departamento de Ingeniería de Sistemas Telemáticos
Félix Cuadrado
Configurable
Container Deployment Unit Change Plan
Resource
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The Unit Repository is the component which centralizes the logical information known as the
LRB in the previous chapters. Its main function consists of storing the definitions of all the
DeploymentUnits, logical Resources, and ContainerResourceConfigurations. This information is
made available to the rest of core components through access interfaces. The repository
allows contributing, modifying or removing logical definitions from the repository through a
CRUD (Create-Read-Update-Delete) interface. This access interface is used by the
organization’s development infrastructure, so that the just developed DeploymentUnits can be
automatically registered at the repository as the final step of the development cycle. This way,
both systems of the internal organization infrastructure are connected. External repositories
from service providers can also be connected to the repository, as long as they provide logical
descriptions for the repository elements. Logical descriptions are persistently stored by the
repository, while physical elements are managed by traditional artifact repositories.
The repository provides one additional function which is invoked during the different
repository update operations previously described: Internal repository stability checking. This
operation analyzes the set of defined units at the repository, and checks that every logical
dependency of the internal elements can be satisfied by at least one from the remaining set.
Broken dependencies would cause the affected units to be unstable, thus rendering them
unfeasible for any change operation. Because of that, newly uploaded elements will be
preliminary checked so the overall stability of the repository is kept after the change.
The Environment Configuration Repository is the component where the environment model
information is stored. This block retrieves the runtime information by communicating with the
Environment Manager agents (multiple ones as the architecture manages multiple
environments, and each one is monitored by its own manager). As it was mentioned in the
previous section, the exchange of information can both be requested by the Repository, and
provided to it through a publish-subscribe mechanism. Information for each environment is
persistently stored, not only the latest snapshot but the complete evolution over time of the
environment. Finally, it provides an access interface for the rest of core services to browse,
query and retrieve the runtime information.
The Objectives and Policy Repository captures all the high level knowledge governing the
service configuration change processes, in a way that it can be processed by the remaining
functional blocks. The stored information specifies the role of each runtime environment (the
management objectives) and how it should do it (optimization policies). As the objectives are
specific to each environment, they will be not only defined but also matched in the OPR with
the environments where they are operational. On the other hand, optimization policies, which
influence the change identification process, can either be specific to one environment or
global. The component persistently stores both types of information, provides manipulation
interfaces for their modification, and access interfaces for consulting them by the rest of the
architecture modules.
The Change Impact Analysis component controls the actions of the management architecture.
Its main function consists of evaluating the severity of the external changes occurring at the
managed domain. In order to do so, this component receives notifications from the three
repository components, each time a change in the defined deployment units, the runtime
information or the defined objectives occurs. These changes are analyzed, estimating their
- 137 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
criticality. If as a result of the change the managed environment is no longer at a correct state,
the CIA will order the Change Definition component to prepare a change plan that restores the
domain to its intended functionality. This component enables an autonomic management of
the domain. However, it must also be designed to support enterprise domains with restrictive
policies disallowing the completely automated execution of internal changes. Those cases are
supported by the definition of decision points where human validation can be requested after
the impact analysis has been completed.
In addition to the automated analysis of external change this component also provides runtime
impact estimation tools, which are used to aid the decision of whether to apply changes that
affect the environment. This function, starting from a RuntimeUnit which will be modified,
obtains the Binding graph of units potentially affected by the change. An algorithm for
automatically obtaining these graphs was described in Section 5.4.6.
The Change Definition component analyzes the current state and creates a change plan which
can restore the system to a correct state. In order to obtain the required changes it
communicates with the repositories to retrieve the domain information (defined deployment
units, configurable container resources, objectives, optimization policies and environment
information). Internally, it configures a pseudo-boolean SAT solver, converts the domain
information into variables and clauses and obtains a solution for it. These results are finally
analyzed and interpreted as an internal change plan. The Change Definition functionality is the
most complex from the management subsystems. Its foundations have been completely
detailed in the chapter 5.4.
The Change Manager governs the execution of internal change operations to the managed
environments. The component provides a plan execution service, which allows ordering the
execution of a defined change plan to the target environment. The execution can either be
instant or scheduled to a later time, depending on the characteristics of the environment and
the internal policies (e.g. in an integration testing environment changes should be applied as
soon as possible whereas changes to production environments are usually scheduled at
concrete times). After a plan has been completely executed the change manager collects and
stores all the information about the execution outcome, enabling internal traceability on the
changes initiated by the management system.
In order to prevent plan expiration problems (whenever the interval of time between plan
creation and execution increases), and protect the runtime environment stability, every plan
dispatched to the manager will be validated to verify that it is consistent with the actual state
of the environment. In case there are some warnings or errors during the validation, the
execution will be either aborted or suspended for manual review.
The Plan Executor receives a change plan and executes it over the designated environment. Its
internal details were described in the communications model. However in addition to
environment adaptation, there are additional requirements that further complicate the
internal function of plan execution. The change plan model only defines a set of partial orders,
but provides some flexibility about the exact way plan activities are executed. In the case of
this architecture there will be no parallel execution of activities, instead the executor will focus
on preserving the stability of the environment as much as possible. In order to do so, plan
execution will be transactional, automatically taking back the environment to its initial state in
- 138 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
case there is a problem during the execution of the changes. On top of that, for auditing
purposes, the result of every change operation result is registered and provided back to the
Change Manager in the form of an execution report.
The executor will be able to ensure transactional execution as long as only one plan is
interpreted simultaneously and the following characteristics are also met. The execution of
each single activity must be performed atomically, and its correct execution can be validated. If
these conditions are met, after the execution of each activity the state will either be the initial
one (if execution failed) or a new state with the operation successfully applied. The
transactional executor will keep a stack with the successfully applied activities. In case of
failure detection, the executor will automatically obtain a compensation activity for each one
of the successfully executed activities, and will execute them in reverse order to cancel the
changes applied by the partial execution of the plan.
- 139 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
with a positive result, the change must be accepted by the production environment
administrator, which will initiate the schedule of its execution at the pre set time for
minimizing the impact of the change in the runtime operation. After this maintenance process
is completed, the updated service is available for the users, and the problem incidence can be
closed.
The described scenario involves several service configuration change processes applied to the
different tiered environments. The technical details on how the architecture supports each
one of them would be very similar, only varying in the control and change policy settings. I
have selected as a representative of the three the update process at the pre-production
environment, because the physical environment is virtually identical to the final one, and at
the same time, the process can be executed with a greater degree of automation thanks to the
lesser criticality.
As regards the organizational aspects of this scenario it presents a standard structure. The core
functional blocks of the management system are centralized, managing all the described
environments. The unit repository is integrated with the company software assets repositories,
being automatically updated whenever a new or updated deployment unit has been released
by the development staff. The pre production environment is instrumented by the agent
infrastructure. Each physical node of the environment is monitored by a Node Manager and
specific Gatherers, while a single Environment Manager controls these agents and
communicates with the main system.
In this scenario, the appearance of the updated version of the service is reflected at the
domain as an endogenous external change, originated on the logical entities (the repository
and the defined objectives for the pre production environment). Over the scenario description
I will detail the complete update process since the instant the change has been detected until
the environment has been modified. The focus will be on describing how the change is
handled, analyzed and applied to the environment. The following sequence diagram shows the
main participant blocks and the exchange of calls and information that allows detecting the
change, analyzing it, evaluating the required corrections and applying them to the
environment.
- 140 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Change Envir.
Repository Objectives Change Change Plan Managed
Impact Config.
Manager Manager Definition Manager Executor System
Analysis Manager
getObjectives()
notify(change)
createPlan (envId)
getEnvInfo()
getUnits()
getConfResources()
getObjectives()
getOptimPolicies()
configureSAT()
executeSAT()
buildPlan()
validatePlan()
executePlan()
scheduleExecution()
change1()
launchPlan() change2()
change3()
change4()
report
saveExecutionReport()
The objectives and policies defined for the preproduction environment mandate that the bank
transfer service must be available at the environment always at the latest version. Whenever
the unit repository is updated with the latest version of the unit, the event is notified to the
Change Impact Analysis component. After receiving the notification, this component retrieves
the established objectives by accessing the Objectives and Policies repository. Because of the
previously described objective, this component determines that the environment must be
updated to supply the latest version of the service, and in order to do so invokes the Change
Definition module to obtain a change plan which satisfies the desired management objectives
When the Change Definition component is invoked, it first retrieves the complete information
about the management domain, by communicating with the three repositories. This way, the
list of available deployment units, the potentially configurable resources for the runtime
containers, the management objectives that must be supported by the running system, and
the optimization functions for selecting the final configuration, are obtained. Once all the
domain knowledge has been retrieved, it is processed to obtain a set of PBSAT variables and
- 141 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
functions, with the algorithm described in the previous sections. Once the PBSAT problem has
been defined and the engine is completely configured, it is invoked to find the desired solution
for the system state. The provided restrictions ensure that the obtained solution is both stable
and desirable. The results from the solver are boolean assignment to the variables, which are
later interpreted in order to calculate the required changes that must be applied to the current
environment configuration, as well as identifying potential dependencies between the
different activities of the change plan.
After the change plan has been created the Change Impact Analysis component hands it to the
Change Manager for its execution. In this environment, there is currently no other plan being
executed, and no constraint about the execution times, so it is scheduled to be immediately
applied to the environment. However, before executing it, the Change Manager validates the
plan so that there are no inconsistencies in the plan definition as regards to the current
environment state. Once this has been completed successfully, the plan is handed to the Plan
Executor to be applied to the environment. This element finds a set of compatible Actuators,
and invokes them to apply in the correct order the required internal changes. Once this
process is complete, the updated service is operating correctly at the environment.
- 142 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The following diagram describes how the management architecture detects the change of the
environment and reacts to it in order to restore the system stability.
updateNodeInfo
updateEnvInfo
updateNodeInfo
notifyEnvChange
getEnvInfo
getObjectives
getUnits
evaluateChangeImpact
createChangePlan
The described scenario starts when one of the synchronization messages sent by the
Environment Manager to a Node Manager does not obtain a response. The governing agent
tries several times to contact the missing Node agent, and after those attempts are also
unsuccessful it executes network diagnosis operations and determines that the physical node
managed by the agent is no longer reachable. After that, it requests from the remaining nodes
an updated snapshot of the actual environment configuration, which is aggregated and
immediately sent to the Environment Configuration Manager.
The Environment Configuration Manager receives the updated information, and after
detecting the severity of the change, sends a notification to the Change Impact Analysis
component. This diagnosis element receives the updated environment snapshot, and obtains
the latest known environment information in order to detect the affected elements. Over this
process it internally obtains runtime binding graphs of the affected services, in order to
estimate the potentially affected elements from the complete environment, not only the
vanished node. It also retrieves the environment objectives and available unit definitions, in
order to verify whether those components could potentially be replaced. If this is the case, it
- 143 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
finally invokes the Change Identification Component, with the objective of obtaining a change
plan that will try to restore the damaged system functionality and restore its stability.
The technical activities for the Change identification and execution of the plan in this scenario
would be analogous to the one previously described, so they won’t be repeated here.
However, because of the severity of the incidence described here, there are multiple
additional actions that will have to be addressed, which should be briefly mentioned in spite of
them not being directly managed by the proposed architecture. As this is a grave incidence,
once a human administrator is notified, he/she will have to analyze the restored environment,
and potentially apply additional configuration changes to elements outside of the control of
the management architecture (e.g. it might be necessary to reconfigure a DNS server so that
requests are transparently handled to the new hosted service). It must be taken into account
that the hardware fault might have caused the corruption of application data, as well as the
loss of process state, neither of which can be restored by the service management system. In
addition to those quick response actions, the damaged element will have to be diagnosed for
the originating error, and restored to bring back the system to its original capability.
- 144 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
7. Validation
The objective of this chapter is to present the experiments that have been carried out to
validate the feasibility of the proposed modeling abstractions and algorithms for addressing
enterprise service change management activities. In order to execute the tests, a prototype of
the described architecture, based on these concepts has been built, and has been put to test in
a series of experiments based on the requirements obtained at the ITECBAN project for a SOA
core banking system. The description of the validation results explains by example the internal
functioning of the proposed solution, and tests their limitations and degree of fulfillment over
a selection of scenarios.
In order to appropriately introduce the set of experiments, I will start with a description of the
general context, covering both, the reference domain for the validation and the characteristics
of the initial data for all the tested scenarios. On top of that, in order to provide a complete
overview on the context of the experiments, some details about the prototype
implementation and the platform were the validation was executed will be provided.
The discussion of the validation results will start with an initial, representative case, whose
execution will be thoroughly explained, in order to provide a clear insight on how the process
works. After that, a study on the scalability and sensibility of the proposed solution will be
presented. Although change management processes do not carry real time constraints, it is
interesting to measure the performance of the proposed solution both in space (memory
occupied) and time, with increasingly larger sets of data, in order to estimate its applicability to
large scale situations.
Following those experiments, the results on the execution of a set of varied experiments will
also be presented. The intent of that set of scenarios is to simulate a representative amount of
the possible change situations that can appear over a service management process, and
evaluate the correctness of the proposed changes by the change identification module. Finally,
some general conclusions about the result of the complete set of validation tests will be
described.
As the basic reference scenario for the set of validation experiments I have selected a slightly
modified version of the target platform which is being used as reference for the works of the
ITECBAN project. The objective of this project is to propose a complete core banking solution
completely based on the SOA / BPM paradigm. This way, the complete service portfolio of the
organization will be provided as SOA services. This includes client services (internet banking,
cashiers), internal services (for company workers at the bank offices) and B2B services for inter
bank transactions. As those services capture the company knowledge, they are internally
developed and provided, with no third party dependencies because they constitute the core of
the company business, and consequently must be internally controlled. In spite of its internal
nature, in order to cope with the complexity, they are architected in an SOA / BPM approach.
These elements are running at a service execution platform composed by multiple application
servers, BRM (Business Rule Manager) servers, Business Process servers, mediation servers,
- 145 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 146 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
correct value for the Property will be obtained according to that expression, and will be
properly configured. On top of that, the depending unit will define a BoundProperty attached
to the Dependency, mandating that the correct value for the connectionURL Property must be
automatically provided from the value of the serviceURL property from the bound Resource.
This way, it can be seen how both automatic configuration mechanisms are combined to
correctly establish the configuration derived from the dynamic binding between the two units
communicating through web services, as it is described in the next picture.
Consumer RuntimeUnit
connectionUrl = [Link]
Binding: WS-A
connectionUr->serviceURL
...
Resolution
Bound
Provider RuntimeUnit
serviceUrl: expr
Resource: WS-A
...
serviceUrl = [Link]
Container A1
servicePort: 8190
Resolution
...
Context
Node A
Ip: [Link]
serviceUrl = [Link]
...
With those considerations in mind, the logical elements that participate in the validation have
been modeled as a set of DeploymentUnits. The logical knowledge base defined at the
repository is composed by 22 units. The units belong to seven different types ([Link], war,
ear, bpel, [Link], [Link], [Link]), which must be supported by the different types of
Containers present at the runtime environment.
As regards the actual structure between these units, they can be divided into two types. By
applying transitive dependency analysis, 19 of them must be present at the environment in
order to fulfill the complete requirements of the high level service, while the remaining three
represent another service which is completely independent from the main one subject to the
validation tests. The reason to include non participating units as part of the knowledge base is
to validate that the solutions proposed by the change identification service do not involve
unnecessary changes to not relevant resources of the environment. Next picture shows the
dependency graphs present at the logical knowledge base. Each DeploymentUnit is
represented by a blue rectangle, with the exported resources depicted inside. Dashed arrows
signal satisfied logical dependencies. For clarity’s sake, the rest of information about those
units has been omitted.
- 147 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Open Account : bpel Product Catalog : ear Product Recomm Rules : drl
open acc process: [Link] product catalog service : [Link] prod recomm rule package : [Link]
Account Data Access : ear Client Data Access : ear Credit Prod Data Access : ear
account data dao : [Link] client personal data dao: [Link]
cred producs data dao : [Link]
client profile data dao: [Link]
Other Business Pres : war Other Business Logic : ear Other Business Data : [Link]
other bsn flow : [Link] Other bsn logic : [Link] other tables: [Link]
In total, the 22 units define 25 Dependencies, which are satisfied by the exported resources of
the enclosed units. Those resources belong to eight different types, and are visible
environment wide, except the “[Link]” resources, which can only be accessed inside the
same Container. In addition to Dependencies, the units providing the data access services
define additional Constraints on the execution platform, demanding the existence of a
resource of type “datasource” and a specific name at the selected Container, in order to be
able to access the information from the DBMS without additional configuration. In total, there
are three Constraints of this type, one per data access unit.
- 148 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 149 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[Link]
container bsn_rule2 : [Link]
drl
bpel bpel
jbi jbi
drl ear
resource
name: load_balancer
resource type: [Link]
name: company_AA_server
type: [Link]
resource
name: environment_firewall
type: [Link]
- 150 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Once the context for the validation experiments has been described I will first detail the
execution of one of the cases, providing a step by step explanation about the internal
functioning of the algorithm. The scope of the validation covers only the Change Identification
Service, as the intent is to validate the models and algorithm proposed, which are
fundamentally implemented by this element of the service management architecture.
The execution description will follow the following structure: First, the specific characteristics
of the scenario will be analyzed. After the input data has been described, an interpretation of
the current scenario will be provided, in order to diagnose the current state and predict what
the outcome of the change identification service should be. After that, the results from the
service execution will be described. The Change Identification Service will be invoked, and the
set of defined changes identified will be analyzed, as well as an evaluation of the internal
workings.
The logical repository contains the definitions of the 22 DeploymentUnits that were described
at the previous section, with a total of 25 Dependencies and 3 Constraints defined by the units.
The environment is also the same which has been described, with no RuntimeUnits at the
initial state. The domain contains one defined management objective: the existence of the
resource “client portal service” in the environment. This resource represents the end use
service, which is only provided by the unit named ”Client Portal” of type war.
The described scenario is a freshly installed environment, ready to start functioning after the
hardware elements have been provisioned with the required servers and services. In order for
it to start providing the desired functionality, a management objective is defined so that the
- 151 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
environment provides the client portal service. In this context, the change identification
process should install the unit providing the required service and all the required units for a
stable configuration in compatible runtime Containers. On top of that, the Bindings among
those components must be correctly configured, including the BoundProperties, and unit
Constraints must also be respected. None of those changes are explicitly expressed by the
initial input, but they must appear at the set of changes obtained by the execution algorithm.
In order to clearly reflect the details of the algorithm I will focus each step on the same unit,
Client Data Access. This unit has been selected because it is one of the most complex in terms
of Dependency and Constraint relationships. Figure 65 shows an expanded view of the selected
unit and its first level relationships (relationships with the remaining elements are reflected in
the previous general picture). The unit exports two Resources (client personal data dao, and
client profile data dao), which are required by other two units (Client Manager and Profile
Manager). Those resources are only visible in the same Container Client Data Access is
deployed. In order to provide the functionality enclosed in those two resources, the unit
requires access to two database tables, which will be provided by the ddl definitions contained
in other two units (Client Profile Structure and Client Data Structure). The detailed analysis will
only focus on the relationships directly involving the selected unit, as the rest of units will be
processed exactly the same way. Finally, the unit also declares a Constraint in the runtime
environment, requiring a resource named “DSClient” of type “[Link]” to exist in
order to work properly
RES client manager service : [Link] RES profile manager service : [Link]
DEP : client personal data dao: [Link] DEP: client profile data dao: [Link]
Figure 65 Resource, Dependency and Constraint Details of the Client Data Access Deployment Unit
The description of the case will be divided into two main blocks: first, the operations executed
before the SAT resolution engine is invoked will be described, and finally the interpretation of
the results will be explained.
- 152 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
After the former process has finished, the set of RU Literals originated from units with
Dependencies will be iterated again, in order to identify the possible Binding configurations for
each one, represented as Binding literals. For each potential binding the rest of RU literals will
be analyzed to check whether it satisfies the dependency and is accessible to the dependant
one (by checking visibility and relative topology). If both conditions are true a Binding Literal
will be generated representing the potential configuration. If after going over all the remaining
RU literals there is no valid configuration for one Binding, the dependant RU literal will be set
to false. In our focused variables, first the bindings from CDA to CPS and CDS will be
- 153 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
established. In the case of CM and PM bindings to CDA, the local visibility of the exported
resources will discard every binding configuration where the requiring unit is not at the same
container as the CDA, greatly reducing the possibilities. This shows the large impact that
environment visible resources have over the complexity of the complete process. In total, 96
Binding literals have been defined. B Literals will be referenced in the text with the following
notation <rudep>to<ruprovider>. In the analyzed segment, 4 literals have been valued to false,
and eight Binding Literals have been defined:
At this point, all the literals which will be provided to the SAT have been defined. The next step
is to define the functions which will restrict the potential solutions, ensuring that the results
are stable and desirable. The first aspect that will be taken care of is the desirability, by
translating the management objectives into SAT constraints. In this case, the only objective
was the existence of the client portal service in the environment. As it is provided by the Client
Portal unit, the constraint will mandate that at least one of its RU literals appears at the
solution, producing the following constraint:
As initially the environment does not contain RuntimeUnits, no optimization function will be
defined. However, the restriction about a maximum of one instance of each DeploymentUnit
must be enforced through a set of clauses. In the analyzed subset, three of the units (CM, PM,
CDA) have multiple possible locations so one clause must be defined for each one of them.
This way, the three following clauses have been defined:
The rest of the functions ensure that the proposed solution is stable and structurally coherent.
As regards stability there are three types of relationships which must be properly restricted
through clause definition: dependant RU to Bindings, Binding to bound RU, and Configurable
Resource to Constraining Resource. The first type ensures that the dependencies expressed by
the units are satisfied and correctly configured with bindings in the proposed solution. There
will be one clause for each B Literal, implicating the bound resource to the existence of the
binding. This way, in the analyzed subset, the following clauses will be defined:
- 154 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The second type of clauses are called structural functions, as they enforce that proposed
solutions are coherent with the indivisible relationship between one RU and its bindings. In
total three clauses will be defined for each group of bindings (the group is composed by the set
of Binding literals that express alternatives for one dependency of the same RU). These
restrictions mandate that no bindings are present if the dependant RU is not part of the
solution, and that exactly one binding literal must be true of the dependant is evaluated to
true. In the selected subset all binding groups are composed by one element, simplifying the
resulting clauses as the at most one clause is unnecessary, and the disjunctions become single
literals:
In order to show the complete set of clauses which can be generated at this step, I will also
focus on the CP (Client Portal) and OAP (Open Account Presentation) literals. In the previous
steps two RU Literals were defined for each of them, and four B Literals, from each CP to each
one of the OAP Literals. In this case, the following functions will respect to structure and
binding stability have been defined:
Finally, Constraint-related clauses are defined to ensure that if the constrained unit is part of
the solution, the Container Resource will also be configured. From the analyzed elements,
there will be two Constraint functions, one for each CDA RU Literal:
To sum up, after all those processes 280 clauses have been defined, representing the
desirability and stability conditions that must be met by the proposed solution by the SAT.
With no further operations required, the set of literals and clauses are provided to the SAT and
a solution is obtained, which will consist on a true or false value for each one of the 145
variables defined.
- 155 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
remaining three components defined at the repository, as they are neither part of the business
objectives nor relevant for the stability of the involved units.
Once those values have been collected they will be interpreted the following way. First, the set
of positive values is processed. As in this case the environment is initially empty each one of
them implies either the instantiation and activation of a RuntimeUnit at the selected
Container, the configuration of a Binding between two RuntimeUnits or the creation of a
container Resource. In the case of the analyzed subset, the three ear units will be deployed at
the same container, bsn_logic2. This is coherent with the local visibility of the resources from
the CDA unit, demanding the consumers to be deployed at the same container. The two [Link]
units providing the required resources for CDA will be deployed on the only feasible container,
oraDB. In addition to the five positive RU literals, another five positive B literals contain the
Binding configuration among those elements. Finally, the CR literal for creating the DSClient
[Link] resource at the bsn_logic2 container is also set to true, so this resource will be
configured to satisfy the constraint of the CDA Runtime units. Overall, from the 22 related
literals of this subset of the complete problem, a total of 11 have been evaluated as true. It is a
high percentage (50%) compared to the overall statistic (30%). This is the case because of the
restricted options present at this specific subset, thanks to the Constraints, existence of only
one Container for the database units, and local visibility of some of the resources. Those
factors greatly reduce the number of possibilities and, as the number of positive literals is
related with a valid solution and not with the alternatives, consequently impact the ratio of
true variables.
Along the process of evaluating those positive literals, there are some special cases that will be
handled with additional processing. First, every positive RU literal belonging to a RU with
defined ContextAwareProperties will be identified and processed in order to obtain the correct
value for the property. Once it has been found, an additional configuration activity will be
defined to set up the correct value. After all those values have been identified, a similar
process will be carried out for the BoundProperties, retrieving the adequate configuration
values in function of the Binding configuration. This is processed after the context aware
values have been obtained in order to allow for those updated properties to be reflected on
the bound ones. In this case there were three ContextAwareProperties and three
BoundProperties participating in the final solution, contributing six additional activities.
After the positive values have been analyzed negative values of RU literals belonging to
RuntimeUnits already existing at the environment should be evaluated. However, as in this
case the environment is initially empty, no additional result is obtained from it.
The execution of these checks has identified 70 changes in total. 38 are derived from the
positive RU literals (installation and activation), 23 from B literals, 3 from the CR literals and
the remaining 6 are related to configurable properties. After applying all those changes the
environment ends at the state shown in the next picture. In order to improve its clarity, the
figure only shows Containers hosting at least one RuntimeUnit. The 19 participating units have
been deployed over 8 containers from the total of 12 belonging to the environment,
distributed over seven nodes from the environment. The distribution is not even as one
Container hosts seven units whereas multiple ones host only one. This is a correct solution as
there has been no explicit preference about an even or unbalanced spread of the
- 156 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
RuntimeUnits among compatible containers. In case that was mandatory, those factors should
have been modeled as factors from the optimization formula. The picture also shows the
bindings which have been configured for the RuntimeUnits. This clearly reflects the actual
complexity of the distributed applications, which for a single end service spans over seven
nodes of the environment. As regards memory and footprint consumption of the described
process, the Change Identification Service was executed in 84ms, with a memory footprint of
6.2 Mbytes.
CPDA
AcMg
AcDA
PrMg
ClMg
ClDA
PrCg
PrRc
CoRs
OpAc
PrRR
OAcP
PrRP
Node N2 Node N4
CPDA
CrPD
ClPo
AcDS
ClDS
ClPS
This section has explained the process followed by the Change Identification Service to find a
correct solution, which clearly matches the initial interpretation of the required changes
initially provided. In the following sections I will present the results of the additional validation
cases which have been executed, analyzing both its scalability when the domain data is larger
than the one managed here, or the correctness of the solution when presented with some
changes of the domain. Nonetheless, in all of them the same internal processing is applied,
consisting of literals definition, constraints definition, SAT invocation, boolean results retrieval,
positive values interpretation and negative runtime units check.
After the base case has been successfully tested, the aim of this set of experiments is to test
the scalability of the proposed algorithm. The tests applied in this section will be based on the
- 157 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
previous one, starting with an empty environment, and the same defined objective, mandating
to instance the client portal service. There will be only two aspects of the input data which will
be altered: the number of defined units at the logical repository and the size of the runtime
environment.
The set of experiments will be divided into three categories, depending on what part of the
domain is modified (altering only the size of the environment, altering only the size of the
logical repository and altering both of them). The definition of the modified sets is not only a
matter of increasing the cardinality of those sets. As it has been discussed over the
dissertation, and shown over the detailed execution of a validation case, there are multiple,
interrelated aspects that determine the complexity of each example. Because of that, the
rationale behind the definition of the test cases, the predicted impact on the results of
modifying the base elements, and the conclusions on the obtained results will be detailed for
each set of experiments.
As regards the numeric data obtained for each experiment three categories of data will be
collected. First, initial statistics on the size of the logical deployment units and the
environment resources will be provided. The second category will focus on the amount of SAT
literals and clauses which have been defined from that input. This information is usually more
representative of the actual complexity of the case when compared to the numbers of input
data. Finally, each experiment will report the average time and memory consumption over a
set of five executions of the case.
- 158 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
with 19 units of one type, and 1 unit of a different one. In an environment with five Containers
the amount of possible options will greatly increase if the large majority of Containers support
the dominating type, whereas more Containers supporting the second type will have a much
lesser impact.
With those considerations in mind, two additional environments have been defined in order to
test the scalability of the Change Identification Service with respect to environment
information. Both of them replicate a number of times (three in the first case, nine times in the
second case) the distribution of the original environment. This way, the ratios of number of
instances for each type are preserved, allowing for a better comparison with the base results.
Before executing the experiments, I will estimate the predicted impact on the size of the SAT
problem. If N is the number of times the original environment has been replicated, the
increase in the problem size would be close to N2, as that is the factor which will affect the B
literals, which constituted the majority of literals in the original case. This way, SAT input
statistics should be roughly nine times more in the second case, and roughly eighty times more
in the third one, whereas the impact in time and memory cannot be estimated beforehand.
The following table shows the obtained metrics over the executions of the base experiment
(Case I), and the additional experiments with larger environments (Case II and Case III).
Before discussing the obtained measurements from the experiments, it must be mentioned
that the execution of both cases II and III did successfully provide a correct set of changes for
the base input, each of them composed by 45 positive values which were interpreted as 70
activities, analogous to the changes identified in the initial case (although with a different
distribution of the units).
- 159 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
By looking at the execution statistics the first conclusion from looking at the SAT input defined
is that, as it was predicted, the increase in both the number of clauses and literals was roughly
exponential, because of the dominating factor of the Bindings. The statistics also show the
impact of the increased size in both the memory consumption and required time for
completion. Both show an exponential growth, although at this stage the numbers in the most
complex case are very manageable (6 seconds of total execution time is perfectly reasonable
for the kind of calculation, and the memory footprint was also kept at very manageable levels.
Also, the experiments have been executed with a machine considerably less powerful (a two
year old laptop) than the available infrastructure for an enterprise management system.
However, it must not be ignored that the size of the environment impacts heavily to the
complexity of the process, so the applicability of this approach should be tested for other
domains, where the size of the environment can be or an order of magnitude larger than those
experiments. However, that is not the case for enterprise domains, were an environment with
over a hundred of containers would already be a very large execution platform).
- 160 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
Nonetheless, I will select a set of additional units which increase considerably the problem
complexity. In order to achieve that, all the additional units will be of the same type. The
selected type has been ear, as there are four possible containers which can host those
elements. The additional units will define Dependencies forming a ‘dependency chain’, with
each one of them but the first additional unit depending on a resource provided by the
previous. All resources will be environment visible in order to not limit the potential
complexity. With those characteristics in mind, the number of RU literals should increase by
the number of additional units multiplied by the number of compatible containers (4), while
the increase in binding literals will be the number of new binding times the square product of
the number of containers (42=16).
With those considerations, three additional tests have been executed, contributing to the
initial repository 19, 99 and 1000 DeploymentUnit definitions, respectively. No additional unit
is related to the original elements, so they should not appear at the proposed solution. The
following table shows a comparison with the collected execution statistics of the initial case
and the expanded ones.
Table 4 Validation Results with an increasing number of non participating units
In the three additional experiments the SAT engine obtained correct sets of changes,
analogous to the ones proposed in the first one. As for the execution statistics, it can be seen
that the growth in SAT variables follows the expected progression. Memory consumption and
execution time increase accordingly, although the exact numbers are still in reasonable levels
taking into account the frequency of change processes in enterprise infrastructures, with the
largest case taking more than 40 seconds to obtain a solution. It must be noted that the largest
scenario represents a managed repository of more than 1000 artifacts, which is already a
considerable size, and is applied to a non trivial environment.
- 161 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
However, in case the characteristics of other domains make the execution time of this process
unacceptable there is one fundamental optimization activity which could be applied to
domains with very large logical repositories. Over these examples it has been mentioned that
all the additional units are not part of the desired solution, thus they just pollute the problem
with useless variables. In case that would be a problem, it would be possible to initially apply a
logical dependency resolution process, where every unit related to the business objectives, or
to an existing runtime unit, would be identified. The change identification process can be
safely executed only against this subset, greatly improving its efficiency (the evolution in time
and memory consumption reflect the large impact of reducing the number of involved units).
Table 5 Sensibility Analysis Results with More Deployment Units and Larger Environment
The execution of these two experiments did also obtain correct results, identifying the same
set of changes as in the other tests. The execution statistics show the multiplicative effect of
both factors when both are increased simultaneously. The last one provides a rough
estimation of the limits of the algorithm implementation, with more than 200Mbytes of
problem size, and a total execution time of 43 minutes. Nonetheless, the change identification
- 162 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
service has been proved to be able to provide a solution for problems with a size of more than
a hundred thousand variables and double that amount of functions.
In case problems of that complexity needed to be supported by the management system, the
previous section did already detail how discarding the non-relevant DeploymentUnits can
address those bottlenecks. It must also be taken into account that neither the preparation nor
processing algorithms were completely optimized, nor the execution platform was state-of-
the-art equipment, which would have improved the overall performance. However, the
specific measurements described here allow an estimation of the relative increment in
memory consumption and time execution.
- 163 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
CPDA
PrMg
ClMg
ClDA
PrCg
PrRc
CoRs
PrRR
OpAc
Container prc1
Node N4
PrRP
OAcP
AcMg
AcDA
CPDA
CrPD
AcDS
ClPo
ClDS
ClPS
Container dyn_web2 Container bsn_logic1 Container oraDB
For both experiments the obtained results were almost identical. In both cases 142 variables
were defined, which equals to the 145 from the empty environments, minus three CR Literals,
as three ‘datasource’ resources were already created at the business logic containers. From
those set, a total of 42 were evaluated to true (again, the same as the 45 from before, minus
the three CR Literals). In each case the set of changes reflected the exact same configuration
which was already at the environment, thus no changes were generated for either experiment.
These experiments verify that false positives in the Change Analysis Service do not originate
unnecessary changes to the system. In addition to that, it has also been shown how the use of
the optimization function respects the current state of the environment, instead of proposing
unnecessary changes.
- 164 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
CPDA
AcMg
AcDA
PrMg
ClMg
ClDA
PrCg
PrRc
CoRs
CPDA
CrPD
AcDS
ClDS
ClPS
Container prc1
Node N4
Container dyn_web2
Container oraDB
Node N6
Node N10
The first noticeable difference over the execution of this test is that the number of SAT literals
has been reduced to 113. The decrease in the total number is logical, as the disappearance of
two Containers has reduced the distribution options for two types of DeploymentUnits.
However the same number of positive variables has been obtained, as all of them are required
for a desirable and stable configuration. After the variables have been analyzed, a total of 28
activities have been defined. The set of changes consists of deploying the missing three
components to the only possible options (the war units to dyn_web2, and the drl unit to
bsn_rule1), configuring both the broken bindings to those units (initially stopping the broken
units for proper configuration), as well as the Bindings and BoundProperties from these three
newly created units.
This case shows how the proposed algorithm can react to unexpected changes to the runtime
environment and restore the intended system functionality. The proposed solution also reuses
the already available RuntimeUnits, in order to minimize the set of required changes to the
environment.
- 165 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 166 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
8. Conclusions
The challenges for automating the change operations to enterprise services include managing
the complexity and heterogeneity of the execution environments, reasoning about all
elements running over the distributed platform, taking care of the existing dependencies,
relationships and constraints between all of them and ensuring that the managed environment
complies with the business objectives for which it has been designed. This dissertation has
tried to address these challenges by combining a modeling approach with a selection of
reasoning techniques over the available information. Over the previous chapters the base
concepts, relationships and restrictions which must be addressed by the management system
have been identified and modeled. In addition to that, an enterprise service management
architecture, based on a set of algorithms which can reason automatically over the defined
models, has been proposed and validated with a set of industrial case studies extracted from
the ITECBAN project. The case studies address management problems which are not exclusive
of the banking domain, but are common to every enterprise infrastructure. This way, the
contributions of this thesis are aligned with the ITEA Roadmap objectives, described in [51].
I will close this dissertation providing a reflection on the main contributions which have been
described, and assessing the objectives which were identified at the beginning of this work
have been sufficiently achieved. After these conclusions have been discussed I will also
mention several future research lines which were identified during the execution of this work
and could lead to interesting future works.
- 167 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
- 168 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
desirability conditions). Based on this, an algorithm has been defined which can take as input
that initial data, and in case the current state is not correct, formulate it as a pseudo boolean
SAT problem, where all the base information is encoded into a set of variables and functions
which represent the possible reachable states of the configuration and the restrictions for it to
be a correct solution. With all that information, the SAT solver obtains a correct solution which
represents a reachable state, and the set of required changes to achieve it are obtained (from
the allowed operations of the management system) and aggregated into a change plan.
This process is completely automated, only requiring the initial domain state for a diagnosis
and a solution to be proposed. However, this could also imply that the algorithm obtains a
correct solution, but not the preferred one. In order to address that, the followed approach
allows expressing additional constraints on the desired solution, in the form of an optimization
function or additional clauses. This way, additional policies can be supported. The dissertation
provides some examples of this additional customization, such as restricting the maximum
number of times a deployment unit appears, or minimizing the number of changes.
To propose a reference architecture and validate the proposed models and algorithms
A reference architecture for an enterprise service change management system, based on the
previous contributions has been described based on the OSI reference model, and its role
illustrated through some selected scenarios. Over this process the valuable input obtained
from the work at the ITECBAN project has allowed to propose an architecture which can be
integrated with the rest of the elements of the enterprise infrastructure and support all the
identified management functions.
Finally, a set of validation cases have been defined and executed in order to assess the
feasibility of the proposed modeling and reasoning solution, obtaining a positive outcome
from all the experiments. First, the models have been proven to be expressive enough to
represent the information from an enterprise system, including their specific constraints and
relationships. On top of that, the reasoning algorithm has performed correctly all the defined
tests, which represented a range of different use cases. Finally, additional experiments were
executed to test the solution’s scalability, showing the feasibility of the proposed solution for
the targeted enterprise domains.
Over the development of this work a set of new research lines have been identified, which
could not be fully pursued because of time constraints. However, because of their potential
interest, they will be briefly mentioned in this section:
Although the focus of the dissertation was supporting enterprise service management, a set of
generic management abstractions were initially defined. Those generic modeling definitions
were not tied to the constraints of modeling enterprise services, and could also be applied to
define other specific management models. The approach and defined abstractions were the
consequence of the scope of the management system established in the Objectives chapter
(represented graphically in the Figure 20), but it would be possible in the future to expand the
management space supported by the solution by applying some modifications to the models
and techniques.
- 169 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
The abstractions described in this dissertation aim at representing the complete management
information about a service-based organization. Because of that, the presented validation
experiments reason only about a set of internally developed services. However, in order to
better support the challenges of an open services ecosystem it is necessary to also reason
about the services provided by third parties (neither under the control of the organization nor
the management system), and adapt automatically to their changes [89] [41]. In principle, the
existing abstractions could be improved to support those elements: External service registries
could be integrated in the runtime model, containing the external runtime services which can
be consumed by the internal elements. On top of that, in order to restrict what bindings can be
established, Service-Level Agreements and conditions of use could be represented as defined
objectives and constraints, which would be later on expressed as SAT clauses and terms of the
optimization function.
Another interesting case is the management of virtualized environments [27], which shares a
lot of similarities with the abstractions taken for service management (hardware hosts being
comparable to containers, and virtual appliances being the equivalent to deployment units).
Because of that, this proposal could be adapted to that environment. Moreover, if that was
the case, it would be possible to greatly extend the scope of the management system with
network and system management capabilities, by combining both service and virtualization
management systems. This way, it would control not only what is or not deployed to the
environment but also the actual topology of the managed system, being able to instantiate
new nodes or containers as part of the solution. This would be a very interesting development
of this work thanks to the recent interest in the cloud computing paradigm. Nonetheless, in
order to support the runtime management of virtual instances several new challenges which
were not covered by this work should be addressed. Runtime changes at node level require
migrating both the persistent data as well as the volatile process data. The low-level support of
these aspects is being integrated into the main virtualization technologies [43], but the way to
manage those concerns in a generic management system should be evaluated.
Another open topic of research is the specification of additional resource subtypes in the
enterprise information metamodel, with an attached inherent behavior. Enterprise
environments present some resource interaction patterns that appear in most scenarios, such
as clustering and load balancing, intended to improve the reliability and efficiency of the
services, or resource proxies and adaptors, such as the integration and transformation
components deployed at an Enterprise Service Bus. If those special cases were identified and
their special behavior was strictly defined it would be possible for the management system to
automatically configure them correctly, further reducing the required manual operation of the
domain.
Another interesting aspect which was identified over the execution of this work had to be
discarded was the support the vigilance of Service-Level Agreement over the managed
environment. This would provide advanced reconfigurability capabilities to the management
system, implementing the autonomic computing self-optimization principle. However, it is not
possible to adequately support those behaviors with the defined management abstractions.
Although it would be possible to define SLAs as LOCAL checks over runtime properties (e.g. the
average response time of a service), clearly the values of those elements cannot be directly
altered by the management system, as they are a consequence of external factors, including
- 170 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
the incoming traffic from the service clients, the hardware capabilities of the underlying
execution platform or the parameters from the network load balancer configuration. Because
of those reasons, this is a very complex problem to solve in a generic way, although it should
be explored because of its relevance for a complete automation of the operations.
The presented algorithm has shown the applicability of SAT solvers to reason about the
problem of finding correct domain environment configurations. Although the results have
been satisfactory, there is still room for improvements of the proposed algorithm. It has
already been mentioned how some optimization clauses were introduced to the solver to
further refine and find the preferred solution. This work should be extended, exploring how
multiple preferences can be expressed, and how relative weights should be defined to
prioritize competing or conflicting policies. Another possible refinement of the algorithm
would be to opt for a different strategy, so that instead of taking the first valid solution
received, a timeout was specified and multiple SAT invocations were executed in order to find
increasingly better solutions.
- 171 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
9. References
[1] Agrawal D., Giles J., Lee K., Lobo J., ―Autonomic Computing Expression Language‖. IBM
developerWorks 2005, Apr 2005.
[2] Agrawal D., Kang-Won L., Lobo J. ―Policy-based management of networked computing
systems‖, Communications Magazine, IEEE 2005, Vol. 43 Issue 10, pp:69-75.
[3] Agrawal D., Calo S., Lee K., Lobo J., ―Issues in Designing a Policy Language for
Distributed Management of IT Infrastructures‖. Integrated Network Management, 2007.
IM ‗07. 10th IFIP/IEEE International Symposium on, 2007, pp:30-39.
[4] Arshad N., Heimbigner D., Wolf A., ―Deployment and dynamic reconfiguration planning
for distributed software systems‖. Software Quality Journal 2007, Vol. 15, Issue 3,
pp:265-281.
[5] Bandara A., Lupu E.C., Sloman M., ―Policy-Based Management‖ Handbook of Network
and System Administration 2007,pp.507-564.
[6] Baude F., Contes V.L., Lestideau V., ―Large-Scale Service Deployment—Application to
OSGi. Autonomic and Autonomous Systems‖, Third International Conference on
Autonomous Systems, ICAS07.
[7] Brenner M., Dreo Rodosek G., Hanemann A., Hegering H., Koenig R., ―Service
provisioning: challenges, process alignment and tool support. Handbook of Network and
System Administration 2007, pp:855-904.
[8] Burgess M., Kristiansen L., ―On the Complexity of Change and Configuration
Management‖, Handbook of Network and System Administration 2007, pp:567-622.
[9] Case J.D., Fedor M., Schoffstal,l M.L., Davin J. ―Simple Network Management Protocol‖
(SNMP). 1990.
[10] Clark J., DeRose S., XML Path Language (XPath) Version 1.0, World Wide Web
Consortium (W3C), 1999.
[11] Computing IBMA. ―An architectural blueprint for autonomic computing‖. Fourth Edition,
June 2006.
[12] Conradi R., Westfetchel B.,―Version models for software configuration management‖,
ACM Computing Surveys (CSUR),Volume 30 ,Issue 2,June 1998, pp. 232 – 282,
ISSN:0360-0300.
[13] Couch A., ―System Configuration Management‖, Handbook of Network and System
Administration 2007, pp.75-134.
[14] Couch A., Chiarini M., ―A Theory of Closure Operators‖, International Conference of
Autonomous Infrastructure, Management and Security, AIMS 208, Lecture Notes in
Computer Science 5127, pp.162-174, 2008.
[15] Crawford J.M., Baker A.M., ―Experimental Results on the Application of Satisfiability
Algorithms to Scheduling Problems‖, Proceedings of the Twelfth National Conference on
Artificial Intelligence, 1994.
[16] Cuadrado F., Dueñas J.C, Garcia R., Ruiz J.L., ―A model for enabling context-adapted
deployment and configuration operations for the banking environment‖, Fifth
International Conference on Networking and Services (ICNS), Valencia, Spain. April
2009.
- 173 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[17] Danciu V.A., Felde N., Sailer M., ―Declarative Specification of Service Management
Attributes‖, IM '07. 10th IFIP/IEEE International Symposium on Integrated Network
Management, 2007.
[18] Davis M., Logemann G., Loveland D., ‖A machine program for theorem-proving‖
Journal of ACM Communications, Vol.5, No 7 pp:394–397, 1962.
[19] Dean M., Connolly D., van Harmelen F., Hendler J., Horrocks I., McGuinness D., Patel-
Schneider P.F., Stein L. A., Web Ontology Language (OWL) W3C Reference version 1.0,
Feb 2004.
[20] Desertot M., Escoffier C., Donsez D, ―Autonomic management of J2EE edge servers‖
MGC ‗05: Proceedings of the 3rd international workshop on Middleware for grid computing
New York, NY, USA: ACM; 2005.
[22] Di Nitto E., Ghezzi C., Metzger A., Papazoglou M., Pohl K, ―A journey to highly
dynamic, self-adaptive, service-based applications‖, Automated Software Engineering,
Ed. Springer, Vol. 15, pp. 313-341, 2008, DOI 10.1007/s10515-008-0032-x.
[23] DMTF (Distributed Management Task Force) Common Information Model (CIM)
specification v2.19. 2008.
[25] Dubus J, Merle P., ―Applying OMG D&C Specification and ECA Rules for Autonomous
Distributed Component-Based Systems‖. Models in Software Engineering, Lecture Notes
in Computer Science 2007, pp: 242-251.
[26] Dueñas J.C., Ruiz J.L., Santillan M. ―An end-to-end service provisioning scenario for
the residential environment‖. Communications Magazine, IEEE 2005, Vol.43, Issue 9, pp:
94-100.
[27] Dueñas J.C. Ruiz, J.L., Cuadrado F., García B., Parada, H.A., ―System Virtualization
Tools for Software Development‖. Internet Computing Magazine, IEEE, Sep-Oct [Link]
13. Issue 5.
[28] Efftinge, S., Voelter, M., ―oAW xText: A framework for textual DSLs‖, Eclipse Summit
Europe, Esslingen, Germany, October 2006.
[29] Eilam T., Kalantar M.H., Konstantinou A.V., Pacifici G., Pershing J., Agrawal A.
―Managing the configuration complexity of distributed applications in Internet data
centers‖. Communications Magazine, IEEE 2006, Vol.44, Issue 3, pp: 166-177.
[30] Fallside D., Walmsley P., XML Schema Part 0: Primer Second Edition, Word Wide Web
Consortium, 2004.
[31] Fiore M., Plotkin G., Turi D., ―Abstract Syntax and Variable Binding‖, Proceedings of
14th Annual IEEE Symposium on Logic in Computer Science, LICS'99, Trento, Italy, 2–5
July 1999, IEEE CS Press, Los Alamitos, CA, 1999, pp. 193–202.
[32] Fleury M., Lindfor J, JBoss Group, ―JMX: Managing J2EE with Java Management
Extensions‖, Ed. SAMS Publishing, 2002, ISBN: 0-672-32288-9.
[33] Fowler M. ―Language Workbenches: The Killer-App for Domain Specific Languages.‖
Accessed online from: [Link]
2005.
- 174 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[34] Frenken T., Spiess P., Anke J, ―A Flexible and Extensible Architecture for Device-Level
Service Deployment‖, ServiceWave 2008, Towards a Service-Based Internet, pp. 230-
241.
[36] Ghallab M., Howe A., Knoblock C., McDermott D., Ram A., Veloso M., Weld D., Wilkins
D., ―PDDL — The Planning Domain Definition language‖. Tech. rep., Yale Center for
Computational Vision and Control, October 1998.
[37] Goldsack P., Guijarro J., Lain A., Mecheneau G., Murray P, Toft P. ―SmartFrog:
Configuration and Automatic Ignition of Distributed Applications‖. HP OVUA 2003.
[38] González J.M., Lozano J.A., López de Vergara J., Villagrá V., ―Context aware services
offering for residential environments‖. Proceedings of the First IEEE Workshop on
Autonomic Communication and Network Management (ACNM), Munich, Germany 2007
May 25 2007, pp: 48-55.
[39] Graham S., Karmarkar A., Mischkinsky J., Robinson I., Sedukhin I.. Web Services
Resource Framework (WS-RF) 1.2. OASIS Standard 2006.
[40] Graham S., Treadwell J., Web Services Resource Properties Specification (WS-
ResourceProperties) 1.2. OASIS Standard 2006.
[42] Hegering H.G., Abeck S., Neumair B.‖ Integrated management of networked systems:
concepts, architectures, and their operational application‖. San Francisco, CA, USA:
Morgan Kaufmann Publishers Inc; 1998.
[43] Hines, M.R., Deshpande, U., Gopalan, K. ―Post-Copy Live Migration of Virtual
Machines‖ SIGOPS Operating Systems Review, Volume 43 Issue 3, July 2009.
[44] Hochstatter I., Dreo G., Serrano M, Serrat J, Nowak K, Trocha S. ―An architecture for
context-driven self-management of services‖. Computer Communications Workshops,
2008. INFOCOM. IEEE Conference on 2008.
[46] Holzner S., Nehren D., Galbraith B. ―Ant: the definitive guide‖ Ed. O‘Reilly &
Associates, Inc., 2005, Sebastopol, CA, USA, ISBN: 0596006098.
[47] Horrocks I., Patel-Schneider P.F., Boley H., Tabet S., Grosof B., Dean M. ―SWRL: A
semantic web rule language combining OWL and RuleML‖, May 2004.
[48] Hutter F., Hoos H.H., Stutzle T. ―Automatic algorithm configuration based on local
search,‖ in AAAI ‘07: Proc. of the Twenty-Second Conference on Artifical Intelligence,
2007, pp. 1152–1157.
- 175 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[51] ITEA2 Core Roadmap Team, ―ITEA technology roadmap for software-intensive systems
and services‖, 3rd Edition, Released on February 2009, available at [Link]
[52] Jellife, R., The Schematron Assertion Language 1.5 Specification Version, 2002.
[53] Jennings B., van der Meer S., Balasubramaniam S., Botvich D., Foghlu M.O., Donnelly
W. ―Towards autonomic management of communications networks‖ Communications
Magazine, IEEE 2007, Vol.45, Issue 10, pp:112-121.
[54] Johnson M., Hately A., Miller B., Orr R. ―Evolving standards for IT service
management‖. IBM Systems Journal 2007, Vol.46, Issue 3.
[55] Keller A., Brown A.B., Hellerstein J.L. ―A Configuration Complexity Model and Its
Application to a Change Management System‖. Network and Service Management, IEEE
Transactions on 2007, Vol.4, Issue 1, pp:13-27.
[56] Keller A., Hellerstein J.L., Wolf JL, Wu K-, Krishnan V. ―The CHAMPS system: change
management with planning and scheduling‖. Network Operations and Management
Symposium, 2004. NOMS 2004. Vol.1. pp: 395-408
[57] Keller A., Badonnel R. ―Automating the Provisioning of Application Services with the
BPEL4WS Workflow Language‖. Utility Computing 2004, pp: 15-27.
[58] Kephart J.O., Chess D.M. ―The vision of autonomic computing‖ Computer 2003, Vol.36,
Issue 1, pp: 41-50.
[59] Klepper A., Warmer, J., Bast, W., ―MDA Explained. The Model Driven Architecture™:
Practice and Promise‖, Ed. Addison Wesley, 2003, ISBN: 0-321-19442-X.
[60] Klerer S.M., ―The OSI Network Management Architecture. An overview‖. IEEE Network,
March 1988, Vol.2 No. 2.
[61] Kolari P., Finin T., Yesha Y., Lyons K., Hawkins J., Perelgut S. ―Policy management of
enterprise systems: a requirements study‖. Policies for Distributed Systems and
Networks, 2006. Seventh IEEE International Workshop on 2006
[62] Kon, F., Marques, J.R., Yamane, T., Campbell, R.H., Mickunas, M.D., ―Design,
implementation and performance of an automatic configuration service for distributed
component systems‖, Software Practice and Experience, issue 35, pp. 667-703, 2005.
[64] Kruchten P., ―The 4+1 View Model of Architecture,‖ IEEE Software, vol. 12, no. 6, pp.
42-50, Nov. 1995.
[65] Le Berre D., Parrain A, ―On SAT Technologies for dependency management and
beyond‖ Limerick, First Workshop on Analyses of Software Product Lines, ASPL
September 2008.
[66] Le Berre D., Rapicault P. ―Dependency Management for the Eclipse Ecosystem. Eclipse
p2. ,metadata and resolution‖. Proceedings of the International Workshop on Open
Component Ecosystems 2009, IWOCE, August 2009, Amsterdam, the Netherlands.
[68] López de Vergara J.E., Villagrá V.A., Berrocal J., ―On the Formalization of the Common
Information Model Metaschema‖, Proceedings of the 16th IFIP/IEEE International
Workshop on Distributed Systems: Operations and Management, (DSOM 2005),
Barcelona, Spain ,2005.
- 176 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[69] López de Vergara J.E., Villagrá V.A., Fadón C., González J.M., Lozano J.A., Álvarez-
Campana M. ―An autonomic approach to offer services in OSGi-based home gateways‖
Computer Communications, 2008, pp: 3049-3058.
[70] Loughran S., Hatcher E., ―Ant in Action‖. Ed. Manning Press, 2007. ISBN:
193239480X.
[71] Machado G.S., Daitx F.F., Cordeiro Weverton L., Both C.B., Gaspary L.P., Granville L.Z.
―Enabling rollback support in IT change management systems‖. Network Operations
and Management Symposium, 2008. NOMS 2008. IEEE 2008, pp: 347-354.
[72] Maghraoui K., Meghranjani A., Eilam T., Kalantar M., Konstantinou A. ―Model Driven
Provisioning: Bridging the Gap Between Declarative Object Models and Procedural
Provisioning Tools‖. Middleware 2006, pp: 404-423.
[74] McCloghrie M., Perkins D., Schönwalder J. Conformance Statements for SMIv2. RFC
2580. 1999.
[77] Miller J., Mujerji J., ―MDA Guide, version 1.0.1.‖ Document number: ormsc/06-09-03
2006.
[78] Mendoza A. Utility Computing. Technologies, Standards and Strategies. 1 st ed.: Artech
House; 2007.
[79] Miller B, McCartht J., Dickau R., Jensen M. OASIS Solution Deployment Descriptor
(SDD) 1.0. OASIS Standard Sep 2008.
[80] Moore B., Ellesson E., Strassner J., Westerinen A. Policy Core Information Model—
Version 1 Specification. 2001.
[81] Murray P., Goldsack A., ―Fully distributed service configuration management‖
HotDep‘07: Proceedings of the 3rd workshop on Hot Topics in System Dependability
Berkeley, CA, USA: USENIX Association; 2007.
[86] Open Services Gateway Initiative (OSGi) Service Platform, Core Specification, Release
4.2, available online at [Link] 2009.
[87] Pandit B., Popescu V., Smith V., Service Modeling Language, Version 1.1 (SML). World
Wide Web Consortium, 2009.
[88] Pandit B., Popescu V., Smith, V., Service Modeling Language Interchange Format,
Version 1.1 (SML). World Wide Web Consortium, 2009.
- 177 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[89] Papazoglou M.P., Traverso P., Dustdar S. , Leymann F., Krämer B.J. ―Service-Oriented
Computing: A Research Roadmap‖ International Journal of Cooperative Information
Systems, Vol. 17, No. 2 pp.223–255, 2008.
[90] Pras A., Martin-Flatin J.P. ―What can Web Services bring to integrated management?‖
Handbook of Network and System Administration 2007, pp:241-294.
[91] Prasad M., Biere, A., Gupta, A, ―A survey of recent advances in SAT-based formal
verification‖, International Journal on Software Tools for Technology Transfer, Vol.7 No 2,
April 2005 Ed. Springer-Verlag.
[92] Rudin W, ―Principles of Mathematical Analysis‖, International series in pure and applied
mathematics. Ed. Mc Graw Hill, 3rd Edition, 1976. ISBN 0070856133.
[93] Ruiz J.L. ―A policy-driven, model-based, software and services deployment architecture
for heterogeneous environments‖. Tesis doctoral, Universidad Politécnica de Madrid;
2007.
[95] Russell S.J., Norvig P. ―Artificial Intelligence: A modern Approach‖, Ed. Prentice Hall,
3rd Edition., ISBN: 0136042597.
[96] Sahai A., Pu C., Jung G., Wu Q., Yan W., Swint G.S. ―Towards Automated Deployment
of Built-to-Order Systems‖ Ambient Networks 2005, pp: 109-120.
[97] Schaaf T., Brenner M., ―On tool support for Service Level Management: From
requirements to system specifications,‖ Business-driven IT Management, 2008. BDIM
2008. 3rd IEEE/IFIP International Workshop on , vol. 7, no.7, pp.71-80,April 2008.
[98] Scheffer P., Strassner J, ―IT Service Management‖. Handbook of Network and System
Administration, Ed. Elsevier, 2007. pp: 905-928.
[101] Schwartzberg, S., Couch, A., ―Experience in Implementing an HTTP Service Closure‖,
Proceedings of the Eighteen Systems Administration Conference, LISA 2004. USENIX
Association, Atlanta, USA, November, 2004.
[102] Sidor D.J.: ―TMN Standards; Satisfying Today‘s needs while preparing for tomorrow.‖
IEEE Communications Magazine, March 1998.
[104] Sloman M. ―Policy driven management for distributed systems.‖ Journal of Network
and Systems Management 1994 12/30, pp:333-360.
[106] Steinberg D., Budinsky F., Paternostro F., Merks E., ―EMF: Eclipse Modeling
Framework‖, 2nd edition, Ed. Addison-Wesley, 2008. ISBN: 978-0321331885.
- 178 -
PhD Dissertation Félix Cuadrado
Universidad Politécnica de Madrid Departamento de Ingeniería de Sistemas Telemáticos
[108] Strassner J., Samudrala S., Cox G., Liu Y., Jiang M., Zhang J., et al. ―The Design of a
New Context-Aware Policy Model for Autonomic Networking‖ Autonomic Computing,
2008. ICAC ‗08. International Conference on 2008, pp: 119-128.
[111] Swint G.S., Gueyoung J., Pu C., Sahai A. Automated Staging for Built-to-Order
Application Systems. Network Operations and Management Symposium, 2006. NOMS
2006. 10th IEEE/IFIP 2006, pp: 361-372.
[112] Talwar V, Milojicic D, Qinyi Wu, Pu C., Yan W., Jung G. ―Approaches for service
deployment‖. Internet Computing, IEEE 2005, Vol.9, Issue 2, pp: 70-80.
[113] Tucker C., Shuffleton D., Jhala R., Lerner S., ―OPIUM: Optimal Package Install
Uninstall Manager‖, 29th International Software Engineering Conference, ICSE07,
Minneapolis, USA.
[115] Vambenepe W., Bullard W. Web Services Distributed Management: Management using
Web Services (MUWS 1.1) Part 1. OASIS Standard Sep. 2006
[117] Warmer J., Kleppe A., ―The Object Constraint Language: Getting Your Models Ready
for MDA‖. Ed. Addison-Wesley, 2nd Edition, Sept. 2003, ISBN: 0321179366.
[118] Wilson K., Sedukhin I. Web Services Distributed Management: Management Of Web
Services (MOWS 1.1). OASIS, Standard 2006.
- 179 -