A Distributed Service Oriented Architecture For Business Process Execution
A Distributed Service Oriented Architecture For Business Process Execution
The Business Process Execution Language (BPEL) standardizes the development of composite
enterprise applications that make use of software components exposed as Web services. BPEL
processes are currently executed by a centralized orchestration engine, in which issues such as
scalability, platform heterogeneity, and division across administrative domains can be difficult to
manage. We propose a distributed agent-based orchestration engine in which several light-weight
agents execute a portion of the original business process and collaborate in order to execute the
complete process. The complete set of standard BPEL activities are supported, and the trans-
formations of several BPEL activities to the agent-based architecture are described. Evaluations
of an implementation of this architecture demonstrate that agent-based execution scales better
than a non-distributed approach, with at least 70% and 120% improvements in process execution
time, and throughput, respectively, even with a large number of concurrent process instances. In
addition, the distributed architecture successfully executes large processes that are shown to be
infeasible to execute with a non-distributed engine.
Categories and Subject Descriptors: H.4 [Information Systems Applications]: Miscellaneous
General Terms: Design, Experimentation, Performance
Additional Key Words and Phrases: business process, BPEL, workflow management, service-
oriented architecture, distributed orchestration, publish/subscribe
1. INTRODUCTION
Enterprise applications are increasingly being architected in a service-oriented ar-
chitecture (SOA) style, in which modular components are composed to implement
the business logic. The properties of such applications, such as the loose coupling
among the modules, is promoted as a way for an agile business to quickly adapt its
processes to an ever changing landscape of opportunities, priorities, partners, and
competitors. The proliferation of Web services standards in this area reflects the
industry interest and demand for distributed enterprise applications that commu-
nicate with software services provided by vendors, clients, and partners.
For example, an online retailer may utilize the services of a partner shipping com-
pany to allow their customers to track the delivery status of products. The shipping
company here would expose a component that allows its partners to retrieve de-
livery status information. Other external services the retailer may use include a
Author’s address: Hans-Arno Jacobsen, Dept. of ECE and Dept. of CS, University of Toronto,
10 King’s College Road, Toronto, Ontario, Canada, M5S 3G4; email: [email protected]
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
c 2009 ACM 1529-3785/2009/0700-0001 $5.00
frastructure [Fidler et al. 2005; Li and Jacobsen 2005; Li et al. 2008; Hu et al.
2009; Kazemzadeh and Jacobsen 2009]. All communication in the system occurs
as pub/sub interactions, including process coordination among the agents, control
and monitoring. This decouples the agents, which now only need to be aware of
one another’s content-based addresses, thereby simplifying agent reconfiguration
and movement, and seamlessly allowing multiple processes and process instances
to coexist. In addition, in NIÑOS, processes are transformed such that certain
computations are carried out in the pub/sub layer, exploiting advanced features
available in PADRES. This further simplifies the orchestration agents, and allows
these computations to be optimized by the PADRES layer by, for example, perform-
ing in-network event correlation. Yet another advantage afforded by the pub/sub
layer is ease of administration. Agents can be configured and controlled individu-
ally, or as some subset, using their location-independent content-based addresses.
Similarly, since all communication occurs over the pub/sub layer, the system can be
fully monitored without additional instrumentation logic. The declarative pub/sub
interface supports expressive queries for precisely the information of interest.
The contributions of this paper include, (1) the design of the NIÑOS distributed
business process execution architecture based on the flexible PADRES pub/sub
layer; (2) a procedure to map standard Business Process Execution Language
(BPEL) processes, including the complete set of BPEL activities, to a set of dis-
tributed NIÑOS agents, with control flow realized using decoupled pub/sub seman-
tics; and (3) an evaluation of the NIÑOS orchestration engine that demonstrates
its improved scalability over a centralized engine.
We present background and related work in Section 2, followed by a description
of the BPEL mapping process to the NIÑOS system architecture in Section 3, an
evaluation of NIÑOS in Section 4, and some concluding remarks in Section 5.
Basic Activities
Activity Description
receive Blocking wait for a message to arrive.
reply Respond to a synchronous operation.
assign Manipulate state variables.
invoke Synchronous or asynch. Web service call.
wait Delay execution for a duration or deadline.
throw Indicate a fault or exception.
compensate Handle a fault or exception.
terminate Terminate a process instance.
Structured Activities
Activity Description
sequence Sequential execution of a set of activities.
while Looping constructs.
switch Conditional exec. based on instance state.
pick Conditional exec. based on events.
flow Concurrent execution.
distributed among the available computing resources. The latter design also al-
lows placing computational activities near the data they operate on, which is not
possible in the cluster architecture. Furthermore, NIÑOS is applicable to the real-
ization of cross-enterprise business process management, where no one single entity
runs and controls the entire business process, but rather the process emerges as a
choreographed concert of activities and sub-processes run by each organization.
Publish/Subscribe: In pub/sub communication, the interaction between the
information producer (publisher ) and consumer (subscriber ) is mediated by a set
of brokers [Fabret et al. 2001; Carzaniga et al. 2001]. Publishers publish events to
the broker network, and subscribers subscribe to interesting events by submitting
subscriptions to the broker network. It is the responsibility of the brokers to route
each event to interested subscribers. In content-based pub/sub, subscribers can
specify constraints on the content of the events, and the broker network is said to
perform content-based routing of events. The terms event and publication are often
used synonymously in the pub/sub literature and in this paper.
The routing in our PADRES distributed content-based pub/sub system works
as follows [Fidler et al. 2005]. Publishers and subscribers connect to one of the
brokers in a broker overlay network. Publishers specify a template of their event
space by submitting an advertisement message that is flooded through the broker
network and creates a spanning tree rooted at the publisher. Similarly, subscribers
specify their interest by sending a subscription message that is forwarded along the
reverse paths of intersecting advertisements, i.e., those with potentially interesting
events. Now publications from publishers are forwarded along the reverse paths of
matching subscriptions to interested subscribers.
PADRES extends traditional pub/sub semantics with composite subscriptions
that allow event correlations to be specified [Jacobsen ; Li and Jacobsen 2005]. For
example, a subscriber may only be interested in being notified of business processes
with at least two failed instances within an hour. The correlation computations are
ACM Transactions on Web, Vol. V, No. N, October 2009.
Distributed Architecture for Business Process Execution · 5
<=>?@=AAB @
K@=LB @ <=@B -01- CD.E. &FF!-
V
0G40 &FF!-
-01- '43
!"#$
2H-FE IJ0-1 &FF!-
"%&'( +*( ,- ( ./$
K@=LB @
M=>Q?=@
!"#$
"%&'( )&*( +*(
56789:; ,- ( ./$
-01- 2343
!#
"%&'( +*(
,- ( ./$
WS
Client
d_[beX d_[beX
WXYXZ[X WXYXZ[X Agent
matching and decides the next-hop destinations of the messages. This novel rule-
based routing approach allows for powerful subscription semantics and naturally
enables composites subscriptions, which are more complex rules in the rule engine.
Mapping the subscription language to a rule language is relatively straightforward,
and extending this subscription language does not require significant changes in
the engine. Furthermore, rule engines are well-studied, allowing PADRES to take
advantage of existing research. Our experience with the system indicates that rule-
based matching is quite efficient, especially for composite subscriptions.
Content-based Routing
rs~vl~vy pwl suvlm r
}l~vw tu
}w lm wuy wm}ylm
y
rsqtuvlmw rsqtuvlmw
z ptv stw xpvpy pwl
klmnlm
rsqtuvlmw k}v k}v
{sm| wvpv}s~
klmnlm opmq x pvpy pwl
klmnlm
z ptv stw
Computing, Storage, and Networking Resources
used to assign and retrieve variables. For example, the while activity may subscribe
to update publications for any variables used in the while condition. The handling
of BPEL variables is discussed further in Section 3.3.7.
3.3.2 Pick activity. The BPEL pick activity waits for one or more events to
occur and conditionally executes a sequence of activities based on the event that
occurred. The events a pick activity can wait on include messages, such as Web
service invocations or asynchronous replies, and alarms, which are triggered after
some time duration or deadline.
A generic use of the pick activity is shown in Figure 5. Note that many details,
such as the onMessage parameters, are omitted. The pick activity is mapped to a
pick agent that blocks and listens for one of the events specified in the pick activity
to occur, and then triggers the appropriate subsequent activity. The execution of
the pick activity is triggered when the preceding activity complete, which the pick
agent listens for with subscription Sub1 in Figure 5. Also, the pick agent issues
a subscription for each onMessage it listens for (Sub2 in Figure 5), and when a
matching event occurs, it issues a publication to trigger the appropriate activity
(Pub1 in Figure 5).
Note that no subscriptions are issued for onAlarm events since alarm deadlines or
durations are evaluated internally by the pick agent. As with the previous activity,
not all the subscription and publications messages are shown here.
3.3.3 Compensate activity. Compensation handlers are an application specific
rollback mechanism in BPEL. The activities in a BPEL process are grouped into
arbitrarily nested scopes, and each scope may define a fault handler and a com-
pensation handler. When a fault, or exception, occurs, the scope’s fault handler is
called. A compensate activity within the fault handler can call the compensation
handlers for any nested scopes that have successfully executed. A compensation
handler attempts to “undo” the logic within the scope. For example, the compen-
sation for a scope whose activities ship a product to a customer may be to cancel
the order if it hasn’t been delivered yet, or otherwise notify the customer that the
order cannot be canceled.
A generic use of the compensate activity is shown in Figure 6. Here, ScopeA’s
fault handler invokes the compensation handler in ScopeB. The scope agent for
ACM Transactions on Web, Vol. V, No. N, October 2009.
12 · G. Li, V. Muthusamy, H.-A. Jacobsen
ScopeB subscribes to compensation events for its scope with Sub1 in Figure 6, and
triggers the first activity in its compensation handler using Pub1 in Figure 6.
BPEL semantics require the compensation handler to be called with a snapshot
of the variables when the scope completed. This can be achieved by retrieving these
values using the PADRES historic access capability [Li et al. 2007], or by having
each scope handler cache these values upon scope completion. These cached values
would be flushed when the process instance completes. In Figure 6, this would be
done by ScopeB’s scope agent.
3.3.4 Switch activity. The BPEL switch activity allows for conditional execu-
tion, whereby one of several case branches is executed based on a Boolean condition
associated with each case. The cases are ordered and the first branch whose condi-
tion evaluates to true is taken. If all the cases fail, an optional otherwise branch is
taken.
Figure 7 gives an example of a process with a switch activity. Not illustrated
in the figure is the possibility for execution to transfer directly from the switch1
activity to activity4 if neither case condition is true. In NIÑOS, a switch agent is
used to evaluate the case conditions in each branch of a switch activity.
A switch agent subscribes to updates from the system for any variables necessary
to evaluate the case conditions, and determines which (if any) branch should be
taken. By using a composite subscription (Sub1 in Figure 7), the switch agent
receives a single notification of its predecessor activity’s completion, along with all
the required variable updates in the associated process instance. After evaluating
the case conditions, the switch agent triggers the appropriate branch with a pub-
lication such as Pub1 in Figure 7. The first activity in each branch subscribes to
these trigger publications. For example, in Figure 7, activity2 subscribes to Sub2.
Note that the case where none of the cases in a switch activity are taken is not
shown.
An alternative implementation could eliminate the need for a switch agent en-
tirely, by transferring the responsibility of determining the appropriate branch to
follow to the first activities within each case branch. For example, in Figure 7,
the agents associated with activity2 and activity3 could independently determine
if they should execute. The tradeoff, however, is that these agents will have to
perform redundant computations of the case conditions. Recall that the case state-
ments are ordered and only the first true case condition is executed. Therefore, in
Figure 7, activity3 must evaluate the condition that case2 is true and that case1
is false. These redundant computations are unnecessary if the conditions are eval-
uated by a single switch agent. Furthermore, distributing the computation of the
case conditions requires sending the variable updates necessary to compute these
conditions to several agents.
3.3.5 Flow activity. The BPEL flow activity supports the execution of parallel
branches. Branches in a flow typically execute concurrently, but may be synchro-
nized by a link. A link between a source and target specifies that the target activity
executes only after the source activity has completed. An activity may be the source
or target of multiple links.
In addition, a source activity may set a Boolean valued transition condition on
ACM Transactions on Web, Vol. V, No. N, October 2009.
Distributed Architecture for Business Process Execution · 13
its outgoing links based on an expression of its process instance’s state. Likewise,
a target activity may specify a Boolean valued join condition based on instance
state including the state of its incoming links. A target activity executes only if
at least one of its incoming links evaluates to true and its join condition is true.
A join condition failure, by default, generates a fault, and control is passed to
the appropriate fault handler. This fault, however, may be suppressed by setting
the suppressJoinFailure attribute to true. In the latter case, the target activity is
skipped, and all its outgoing links (if any) are set to false.
A generic use of the flow activity, including the use of a link, is shown in Figure 8.
For brevity, not all messages are shown, and notably, transition and join conditions
are omitted, and assumed to evaluate to true. The flow activity maps to a flow
agent which waits for the preceding activity to finish (Sub1 in Figure 8), triggers the
execution of each flow branch (Pub1 in Figure 8), and then waits for each branch
to complete before triggering the subsequent activity.
Activities within a flow are first mapped to NIÑOS agents based on their asso-
ciated transformation rules. For example, a flow activity agent will subscribe to
and publish messages as outlined earlier. Then, each activity agent within a flow
is augmented with the behavior described in the following paragraphs.
The first activity in each flow branch subscribes to the initiation of the flow
(Sub2 in Figure 8), and publishes its completion as usual (Pub2 in Figure 8). Both
activity2 and activity5 belong to this case in Figure 8.
Each link source activity publishes the transition condition of each outgoing link.
In Figure 8, Pub3 indicates a true transition condition on activity2’s outgoing link.
On the other hand, link targets subscribe to the status of their incoming links and
the source activities associated with those links. For example, in Figure 8, activity6
subscribes to Sub4, and publishes Pub4 when it has completed successfully. A target
activity that does not execute, due to a false join condition, publishes that it has
skipped the execution of the activity. A successor activity to a link target must,
therefore, subscribe to both the execution or suppression of its predecessor. In
ACM Transactions on Web, Vol. V, No. N, October 2009.
14 · G. Li, V. Muthusamy, H.-A. Jacobsen
Figure 8, activity7, for example, would subscribe to Sub5 and publish Pub5 upon
completion. Notice that the use of the composite subscriptions feature in Sub4 and
Sub5 offloads the detection of event correlation patterns to the PADRES pub/sub
layer, simplifying the work of the activity agents.
All other activities publish and subscribe as usual, and do not change their be-
havior as a consequence of belonging within a flow.
Note that the cases above are not mutually exclusive, and an activity may be
required to behave according to multiple descriptions. For example, an activity
may be both the first activity in a flow branch and the target of a link, or may be
both a source and target of (different) links.
3.3.6 Other activities. The mappings for the basic BPEL activities from Ta-
ble I are relatively straightforward. For example, the reply activity subscribes to
the successful completion of its predecessor activity, and publishes the reply mes-
sage along with any variable updates. The fault activity, likewise, subscribes to the
completion of its predecessor activity and publishes a fault message. The mapping
of the sequence structured activity is also routine compared to the other activities
described above. Each activity within a sequence simply subscribes to its predeces-
sor’s completion, and publishes its own completion status.
3.3.7 BPEL variables. Activities within a BPEL process instance share data by
means of variables, which are global within a process instance. NIÑOS supports
two mechanisms to support BPEL variables.
The first mechanism maintains variables in a distributed manner. Every activity
that modifies a variable publishes a VARIABLE UPDATE message with the new
value. Any activity that needs to read a variable issues a subscription for these
update messages and caches this information locally. In this scheme, each activity
agent independently maintains the variable value, and in the case of a sequential
process, the value will be consistent across all activities.
A second mechanism addresses the issue of concurrent accesses to variables as
is possible with activities executing in parallel flows in a process. In this case,
a variable agent is used to maintain consistent variable values, and synchronize
accesses to variables. Adopting standard distributed locking techniques, activities
that read or write to variables must first acquire a read or write lock, respectively,
from the variable agent and then retrieve the current variable value from the variable
agent. The variable agent supports concurrent reads but exclusive writes. We plan
in future work to explore the use of distributed locking algorithms that support
greater concurrency and efficiency.
The variable agent mechanism can always be used, while distributed VARI-
ABLE UPDATE s are guaranteed to operate correctly only when variables are not
accessed concurrently. Since it is straightforward to distinguish the potentially con-
current and sequential portions of a BPEL process, the process transformation is
able to use the distributed VARIABLE UPDATE mechanism in sequential parts
of the process, but revert to variable agents in concurrent portions.
The visibility of variables by activities in different scopes is well-defined in the
BPEL specification, and can be determined and resolved during process transforma-
tion. For example, activities would only issue subscriptions for updates to variables
ACM Transactions on Web, Vol. V, No. N, October 2009.
Distributed Architecture for Business Process Execution · 15
BPEL Process
BP Manager Assign
Agent
... 3
Receive
Agent WS Web
5 4 Agent
Service
(1)
2
Adv/Sub/Info
Publication PADRES Broker Network 8
(Content-based Routing)
WS
Agent
Client
Adv/Sub/Info
1
Publication (2) 6
Pub1: [class,AGENT_CTL], Adv/Sub 7
Pub2: [class,Trigger],
[agentID,`WhileAgent1`],
[command,`SUBSCRIBE`],
While
Agent
(3) ... Reply
Agent
[process,PAYROLL],
[instanceID,p001],…
[content, `[class,eq,ACTIVITY_STATUS]...[status,eq,SUCCESS]`] [detail,’Name:Raymond,PersonalID:UT001’]
declared within their own or ancestor scopes. Other activities, for whom these
variables are not supposed to be visible would not subscribe to and hence would
not receive these variable updates.
Organization A Organization B
BPEL Process
Receive0
PADRES Network PADRES Network
Flow
Web
Client Service
Organization C Organization D
activity, such as the Boolean looping condition for a while activity. At this point,
the business process is deployed and each agent is ready for execution.
We emphasize that after a BPEL process has been transformed into advertise-
ments, subscriptions and activityinfo messages, there is much flexibility in the ac-
tivity agents where these messages are installed. Furthermore, the provisioning
of the quantity and types of activity agents can itself be arbitrary and accom-
modate to system requirements. For example, Figure 10 shows a scenario where
organizations A and B decide to collaborate in hosting a business process. Each
organization administers its own PADRES federation, and decides on which set of
activity agents to provision. Notice that there may be multiple agents of the same
type. Such replication of activity agents allows greater flexibility during process
deployment, provides more resources with which to balance and support greater
loads, and supports redundancy in case of failures. The BPEL process in Figure 10
may be deployed to the activity agents as annotated in the figure. Notice that
regardless of the complexity of the network architecture, activity agents are simply
identified by their location-independent address in the PADRES network, and the
deployment of a BPEL process to activity agents proceeds exactly as above. As
elaborated in Section 3.6, the ability of the PADRES content-based pub/sub layer
to address components in the system by their network- and location-independent
name is key to managing the complexity of arbitrarily elaborate deployments.
While organizations that wish to participate in the execution of a BPEL process
must administer a PADRES/NIÑOS deployment, it remains possible to invoke
processes hosted by other organizations (see Organization D in Figure 10) that
expose their processes as Web services. Furthermore, BPEL processes executed by
the distributed NIÑOS system can be invoked by outside clients (see Organization C
in Figure 10). The scenario in Figure 10 illustrates the flexible deployment options
ACM Transactions on Web, Vol. V, No. N, October 2009.
Distributed Architecture for Business Process Execution · 17
Monitor an activity:
[class,eq,ACTIVITY_STATUS],
[process,eq,Process1],
[activityName,eq,"while"], [IID,isPresent],
[status,isPresent]
events [Li et al. 2007]. Along with PADRES’s composite subscriptions feature [Li
and Jacobsen 2005], both executing and previously executed process instances can
be correlated and queried. For example, it is possible to monitor the status of new
process invocations by users who invoked the process at least ten times yesterday.
Another management scenario is to trace the execution of process instances that
exhibit some behavior. For example, the second set of ACTIVITY STATUS sub-
scriptions1 trace the invocations of every activity for those process instances whose
activity2 failed. Examining the execution of these instance can help diagnose the
failure or understand its consequences.
Advanced process control functions include suspending, resuming or stopping
running process instances. The target instances can be specified by instance id,
process id, or any constraints expressible by the pub/sub semantics. For exam-
ple, the AGENT CTL publication in Figure 11 instructs all agents executing the
PAYROLL process to suspend the execution of instance p001.
These functions are useful especially when processes need to be updated on-
line. For example, a manager may suspend running process instances, dynamically
update certain activities in the process (by sending modified subscription, adver-
tisement, and activityinfo envelopes to activity agents), and resume the instances.
The agent-based execution in NIÑOS simplifies this task since only the agents cor-
responding to the modified activities need to participate in the process redefinition.
The other activities can continue executing the process.
As mentioned earlier, the provisioning of multiple instances of the same activity
agent type provides more process deployment choices, greater scalability potential,
and the ability to redeploy the activities assigned to a failed activity agent. For
example, in Figure 10, if the receive agent provisioned by Organization A fails, the
Receive0 and Receive1 activities can be redeployed to the receive agent provisioned
by Organization B using the management features described above. While the
mechanisms required to respond to failures are supported by NIÑOS, the automatic
receive1
claim = INPUT
assign1
approval = UNKNOWN
flow1
invoke1 invoke2
wsA.process(claim) wsB.process(claim)
receive2 receive3
reportA = INPUT reportB = INPUT
assign2 assign3
scoreA = reportA.score scoreB = reportB.score
switch1
if scoreA < 8 → case2
if scoreB < 8 → case2
otherwise → case1
assign4 assign5
approval = YES approval = NO
reply1
RESPONSE = approval
detection and correction of failures is left for future work. Towards this end, we
have investigated failure resilience in the PADRES network layer [Kazemzadeh and
Jacobsen 2009], and formalized well-defined semantics for the mobility of activity
agents [Hu et al. 2009].
3.7 Example
Consider a loan approval BPEL process in Figure 12. The process is triggered when
a loan application is received. In order to avoid approving risky loans, the process
invokes two external Web services that independently generate a credit report for
the loan applicant. Only if both credit rating services deem the applicant to be
credit worthy does the process approve the loan application.
Each activity in the BPEL process in Figure 12 is mapped to a NIÑOS agent,
and table II details the advertisements and subscriptions issued by each agent, as
well the publications for a sample run of the process. Some of the agents in the
parallel branches of the process, such as the invoke2 and receive3 activities, are
omitted from Table II. Their messages would correspond to the messages issued by
the corresponding activities in the other branch.
Although there is a flow activity in the BPEL process in Figure 12, there is
no corresponding agent. Instead, the first activity in each branch of the flow are
triggered when the final activity before the flow activity completes. Similarly, it is
possible to eliminate the switch activity by having the first activities in each branch
of the switch subscribe to their respective conditions directly. The switch activity
ACM Transactions on Web, Vol. V, No. N, October 2009.
20 · G. Li, V. Muthusamy, H.-A. Jacobsen
is assigned to an agent in Table II to illustrate what its message would look like.
Activity Subscription Advertisement Sample Publication
(trigger) - [class,eq,TRIGGER], [class,TRIGGER],
[process,eq,"loan"], [process,"loan"],
[IID,isPresent] [IID,"g001"], <<CLAIM>>
receive1 [class,eq,TRIGGER], [class,eq,ACTIVITY STATUS], [class,ACTIVITY STATUS],
[process,eq,"loan"], [process,eq,"loan"], [process,"loan"],
[IID,isPresent] [activityName,eq,"receive1"], [activityName,"receive1"],
[IID,isPresent], [IID,"g001"],
[status,isPresent] [status,"SUCCESS"]
issue the necessary subscriptions and advertisements, and have the original agent
unsubscribe. It is even possible to modify the control logic of a portion of a pro-
cess using the same techniques to insert a new process fragment into an existing
process. Since the process is distributed, this process modification technique allows
the remainder of the process to continue executing while one portion of it is being
altered.
The NIÑOS execution engine exploits the PADRES complex event processing
capabilities to offload certain process execution tasks to the PADRES broker net-
work. For example, activities that are triggered by multiple publications issue
a composite subscription for these publications. The publications that contribute
to matching the composite subscription are collected and correlated in the broker
network itself. This benefits the agents who can avoid processing cost of the corre-
lation, and reduces network traffic by letting the broker network decide the optimal
point to collect and correlate the publications.
The decomposition of a process into fine-grained components affords more precise
control over load balancing or replication needs. For example, a single activity in
a process may be the bottleneck that limits the processing time of the process.
Instead of replicating the entire process, only the bottleneck activity needs to be
duplicated. The details of realizing this are out of the scope of this article, but are
made possible by the distributed NIÑOS execution architecture.
The distributed execution of activities in a process is also potentially more scal-
able by taking advantage of available distributed resources. Furthermore, due to
the fine granularity of the individual execution agents, the system is able to utilize
even relatively small resources. For example, certain activities in a process may be
very lightweight and the associated agent could be deployed on a relatively under-
powered machine; it is not necessary to find a machine that can execute the entire
process.
One potential critique of the NIÑOS architecture is that it requires each organi-
zation to deploy a federation of PADRES brokers. However, this is conceptually no
different from a process choreography where multiple organizations collaborate to
execute a business process. In such choreography scenarios, the process spans ad-
ministrative domains and there is no centralized coordinator, perhaps because the
organizations cannot agree on one trusted central entity. Instead, each organization
administers its own process execution engine, with standards such as BPMN [White
2004] and the family of Web service specifications facilitating the interoperability
among the participants. In a similar manner, the brokers in the NIÑOS architec-
ture can use messaging and event processing standards such as the Java Messaging
Service (JMS), Advanced Message Queuing Protocol (AMQP), or WS-Notification
allowing each organization to deploy their choice of technology. It should also be
reiterated that it is perfectly sensible to deploy the NIÑOS architecture on a single
machine and only add additional resources as required.
Many of the benefits of the NIÑOS architecture stem from the distributed nature
of the execution engine, where a large process is broken down into fine-grained
activities which are each executed by an autonomous agent.
ACM Transactions on Web, Vol. V, No. N, October 2009.
24 · G. Li, V. Muthusamy, H.-A. Jacobsen
4. EVALUATION
This section quantitatively evaluates the distributed NIÑOS process execution ar-
chitecture. In particular, it is compared to centralized and clustered architectures.
A variety of parameters are varied to attempt to understand the conditions for
which each architecture is well suited.
1800
1 Centralized 300
Clustered with 2 replicas Centralized
1600 0.8 Distributed Clustered with 2 replicas
250 Distributed
1400 0.6
Throughtput (/min)
0.4
Avg. Exec. Time (s)
1200 200
0.2
1000
0 150
0 50 100 150 200 250 300
800
600 100
400
50
200
0
0 0 1000 2000 3000 4000 5000 6000
0 1000 2000 3000 4000 5000 6000
Request rate (/min) Request rate (/min)
250
1 Centralized 50
Clustered with 2 replicas Centralized
0.8 Distributed Clustered with 2 replicas
200 Distributed
0.6 40
Throughtput (/min)
Avg. Exec. Time (s)
0.4
150
30
0.2
0
100 0 50 100 150 200 20 Request rate = 50/min
0 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
External serivice time (ms) External serivice time (ms)
6000 1000
100 Centralized
Clustered with 2 replicas Centralized
80 Distributed Clustered with 2 replicas
5000 Distributed
800
60
Throughtput (/min)
Avg. Exec. Time (s)
4000 40
600
20
3000
0
20 25 30 35 40 45 50 400 Request rate = 1000/min
2000
200
1000
0.4 55
Centralized Centralized
Clustered with 2 replicas Clustered with 2 replicas
0.38 Distributed 50 Distributed
Avg. Exec. Time (s)
Throughtput (/min)
45
0.36
Request rate = 50/min 40
0.34
35 Request rate = 50/min
0.32
30
0.3 25
0.28 20
0 50000 100000 150000 200000 250000 0 50000 100000 150000 200000 250000
Message size (bytes) Message size (bytes)
100 290
Centralized Centralized
90 Clustered with 2 replicas 280 Clustered with 2 replicas
Distributed Distributed
270
Avg. Exec. Time (s)
80
Throughtput (/min)
260
70 250
60 Request rate = 500/min 240
three deployment scenarios. The results in Figures 16 and 17 indicate that varying
the message size, even with different request rates, has little effect on either the
process execution time or the throughput in any of the three deployment scenarios.
However, the distributed case performs the worst with a request rate of 50/min
because of the communication overhead and performs the best with a request rate of
500/min because the request queueing times dominate the communication overhead.
When the request rate is low (Figure 16), all three deployment scenarios main-
tain a throughput that roughly equals the request rate. However, the centralized
deployment becomes overloaded with higher requests rates (Figure 17), with the
clustered and distributed approaches achieving about 48% better throughput fig-
ures. As with latency, the results show that the communication and processing
overheads of traversing a larger broker network is not significant with message sizes
up to 256 kbytes.
Larger messages influence performance in two key ways: an increase in the net-
work overhead when transmitting messages, and an increase in the computation
overhead when processing messages. Now, it is not clear to what extent the sta-
ble results in Figures 16 and 17 are generalizable to WAN deployments with slower
network links, but the compute resources of the machines in the experiment are not
unreasonable in commodity hardware, let alone enterprise servers. Therefore, the
observation that the processing overhead is largely independent of the message sizes
evaluated is likely a more universal phenomenon. This is understood by noticing
that a significant portion of the processing overhead is attributable to the pub/sub
ACM Transactions on Web, Vol. V, No. N, October 2009.
28 · G. Li, V. Muthusamy, H.-A. Jacobsen
Receive
BPEL Process Assign
Flow
Invoke Invoke
Receive Receive
Flow
matching of the messages, and the PADRES matching engine we use only performs
matching on the attributes and predicates in the message header; the payload is
not processed. To exploit this, the BPEL process transformation in NIÑOS only
encodes the process control flow details as pub/sub attributes and predicates, and
leaves the process data in the payload. In particular, VARIABLE UPDATE publi-
cations include the variable’s name as a pub/sub attribute, but store the variable’s
value in the message payload. In this way, variations in message size caused by
variable data values do not significantly influence the pub/sub matching time.
4.5 Parallelism
As processes may exhibit different degrees of parallelism, this experiment compares
two processes: one containing many activities with ten parallel branches, as shown
in Figure 18, and another with the same number activities but with only two parallel
branches. To isolate the effects of process parallelism, no external Web services are
invoked by either process.
With the highly parallel process, the distributed deployment offers significantly
better execution time performance as shown in Figure 19(a). When the request
rate is less than 100/min, we observe a trend similar to Figure 13, where the
distributed approach performs worse because of the additional network overhead.
This is understandable since higher request rates result in more activities executing
in parallel, and more opportunities for the distributed deployment to take advantage
of the additional resources available to it. At the highest request rates evaluated, the
distributed scenario executed the parallel process 79% faster than the centralized
approach, and the clustered deployment improved over the centralized one by about
71%.
With the more sequential process, on the other hand, the benefits of additional
resources diminishes. It turns out that there is not enough parallelism in this
particular process for the distributed deployment to benefit from, and the clustered
ACM Transactions on Web, Vol. V, No. N, October 2009.
Distributed Architecture for Business Process Execution · 29
Centralized Centralized
5000 10 Clustered with 2 replicas 5000 10 Clustered with 2 replicas
Distributed Distributed
8 8
4000 4000 6
6
Avg. Exec. Time (s)
0 0
2000 0 50 100 150 200 2000 0 50 100 150 200
1000 1000
0 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Request rate (/min) Request rate (/min)
approach actually achieves the best execution time. The results in Figure 19(b)
indicate that at the request rate of 2000/min, the clustered approach is 64% faster
than the centralized one, whereas the distributed deployment is only 42% better.
Since there are no Web service requests in the processes, which might limit the
maximum throughput, the clustered approach has the best throughput with 25%
and 41% improvement for the highly parallel and the less parallel processes, com-
pared to the distributed approach.
According to the results, the distributed approach is better able to achieve low
process execution times with processes characterized by many parallel flows and in
situations where the request rates are high. Otherwise, there is not enough paral-
lelism to exploit the distributed resources available and the distribution overhead
may actually impair the performance. In such situations the centralized or clustered
architectures may perform better.
5. CONCLUSIONS
In this paper, we first we propose a distributed business process execution architec-
ture, based on a pub/sub infrastructure, using light-weight activity agents to carry
out business process execution in a distributed environment. The pub/sub layer
simplifies the interaction among agents, and reduces the cost of maintaining execu-
tion state for running process instances. Second, we describe how BPEL activities
can be mapped to pub/sub semantics that realize the process control flow among
activity agents. These agents are loosely coupled in the pub/sub layer, which makes
our agent-based BPEL engine more flexible and customizable. Third, we present
how to deploy processes into the agent network, initiate a process instance, and
manage the process execution. The process deployment, execution and manage-
ment are performed through the pub/sub layer taking advantage of the even-driven
and the loosely coupled nature of the pub/sub infrastructure. Finally, we carry out
a set of experiments comparing our distributed agent-based engine with a central-
ized orchestration scenario and a clustered scenario. The evaluation indicates that
the benefit of the distributed approach is more apparent under a higher process
request workload, say over 300 requests per minute. In addition, the distributed
ACM Transactions on Web, Vol. V, No. N, October 2009.
30 · G. Li, V. Muthusamy, H.-A. Jacobsen
approach is well suited to execute highly parallel processes that are not feasible in
a centralized deployment.
For future work, we would like to explore more experiments with larger business
processes and broker topologies, and study the movement of activity agents in order
to satisfy certain goals or constraints. For example, the average execution time
may be minimized by moving tightly coupled activity agents close to one another.
Moreover, it is interesting to study more advanced techniques for the validation of
BPEL process specifications, such as model checking and simulations for process
execution and debugging in the distributed environment.
Acknowledgments
This research was supported by IBM’s Center for Advanced Studies and Bell
Canada’s Bell University Laboratories R&D program, and builds on the PADRES
research project sponsored by CA, Sun Microsystems, the Ontario Centers of Ex-
cellence, the Canada Foundation for Innovation, the Ontario Innovation Trust, and
the Natural Sciences and Engineering Research Council of Canada. We would also
like to thank Serge Mankovskii from CA and our colleagues including Balasubra-
maneyam Maniymaran, Pengcheng Wan, and Chunyang Ye for providing valuable
feedback on earlier versions of this manuscript.
REFERENCES
Abadi, D. J., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang, J.-H.,
Lindner, W., Maskey, A. S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., and Zdonik, S.
2005. The Design of the Borealis Stream Processing Engine. In Proceedings of the Conference
on Innovative Data Systems Research (CIDR 2005). Asilomar, CA.
Alonso, G., Agrawal, D., Abbadi, A. E., Mohan, C., Gunthor, R., and Kamath, M. 1995.
Exotica/FMQM: A persistent message-based architecture for distributed workflow manage-
ment. In Proceedings of the IFIP Working Conference on Information Systems Development
for Decentralized Organizations (IFIP 1995). Trondheim, Norway.
Carzaniga, A., Rosenblum, D. S., and Wolf, A. L. 2001. Design and evaluation of a wide-area
event notification service. ACM Transactions on Computer Systems 19, 3 (Aug.), 332–383.
Casati, F. and Discenza, A. 2001. Modeling and managing interactions among business pro-
cesses. Journal of Systems Integration 10, 2 (Apr.), 145–168. Special Issue: Coordination as a
Paradigm for Systems Integration.
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M.,
Hong, W., Krishnamurthy, S., Madden, S. R., Reiss, F., and Shah, M. A. 2003. Tele-
graphCQ: continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD Inter-
national Conference on Management of Data (SIGMOD 2003). San Diego, California.
Chau, T., Muthusamy, V., Jacobsen, H.-A., Litani, E., Chan, A., and Coulthard, P. 2008.
Automating SLA modeling. In Proceedings of the 2008 Conference of the Center for Advanced
Studies on Collaborative Research (CASCON 2008). Toronto, Canada.
Fabret, F., Jacobsen, H. A., Llirbat, F., Pereira, J., Ross, K. A., and Shasha, D. 2001.
Filtering algorithms and implementation for very fast publish/subscribe systems. In Proceedings
of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD
2001). Santa Barbara, California, United States.
Fidler, E., Jacobsen, H.-A., Li, G., and Mankovski, S. 2005. The PADRES distributed pub-
lish/subscribe system. In Feature Interactions in Telecommunications and Software Systems
VIII (ICFI 2005). Leicester, UK.
Hu, S., Muthusamy, V., Li, G., and Jacobsen, H.-A. 2009. Transactional mobility in distributed
content-based publish/subscribe systems. In Proceedings of the 2009 IEEE International Con-
ference on Distributed Computing Systems (ICDCS 2009). Montreal, Canada.
ACM Transactions on Web, Vol. V, No. N, October 2009.
Distributed Architecture for Business Process Execution · 31