0% found this document useful (0 votes)
4 views27 pages

Dynamic Programming Models

This review paper explores dynamic programming models aimed at maximizing Customer Lifetime Value (CLV) in direct marketing. It discusses traditional models for calculating and optimizing CLV, and highlights the advancements brought by dynamic programming approaches like Markov Decision Process (MDP), Approximate Dynamic Programming (ADP), and Deep Reinforcement Learning (DRL). The paper identifies limitations of existing models and suggests future research directions in applying these advanced algorithms to enhance CLV maximization strategies.

Uploaded by

lin liming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views27 pages

Dynamic Programming Models

This review paper explores dynamic programming models aimed at maximizing Customer Lifetime Value (CLV) in direct marketing. It discusses traditional models for calculating and optimizing CLV, and highlights the advancements brought by dynamic programming approaches like Markov Decision Process (MDP), Approximate Dynamic Programming (ADP), and Deep Reinforcement Learning (DRL). The paper identifies limitations of existing models and suggests future research directions in applying these advanced algorithms to enhance CLV maximization strategies.

Uploaded by

lin liming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Dynamic Programming Models

for Maximizing Customer Lifetime


Value: An Overview

Eman AboElHamd1(B) , Hamed M. Shamma2 , and Mohamed Saleh1


1
Department of Operations Research and Decision Support, Cairo University,
Giza, Egypt
[email protected], [email protected]
2
Department of Management, School of Business,
The American University in Cairo, New Cairo, Egypt
[email protected]

Abstract. Customer lifetime value (CLV) is the most reliable indicator


in direct marketing for measuring the profitability of the customers. This
motivated the researchers to compete in building models to maximize
CLV and consequently, enhancing the firm, and the customer relation-
ship. This review paper analyzes the contributions of applying dynamic
programming models in the area of direct marketing, to maximize CLV.
It starts by reviewing the basic models that focused on calculating CLV,
measuring it, simulating, optimizing it or -rarely- maximizing its value.
Then highlighting the dynamic programming models including, Markov
Decision Process (MDP), Approximate Dynamic Programming (ADP),
also called Reinforcement Learning (RL), Deep RL and Double Deep
RL. Although, MDP contributed significantly in the area of maximiz-
ing CLV, it has many limitations that encouraged researchers to utilize
ADP (i.e. RL) and recently deep reinforcement learning (i.e. deep Q net-
work). These algorithms overcame the limitations of MDP and were able
to solve complex problems without suffering from the curse of dimen-
sionality problem, however they still have some limitations including,
overestimating the action values. This was the main motivation behind
proposing double deep Q networks (DDQN). Meanwhile, neither DDQN
nor the algorithms that outperformed it and overcame its limitations
were applied in the area of direct marketing and this leaves a space for
future research directions.

Keywords: Customer Lifetime Value (CLV) ·


Markov Decision Process (MDP) ·
Approximate Dynamic Programming (ADP) ·
Reinforcement Learning (RL) · Deep Reinforcement Learning (DRL) ·
Deep Q Network (DQN)

c Springer Nature Switzerland AG 2020


Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1037, pp. 419–445, 2020.
https://doi.org/10.1007/978-3-030-29516-5_34
420 E. AboElHamd et al.

1 Introduction
The term “Customer Relationship Management” or CRM, emerged in the mid-
1990. It is a broadly recognized strategy for acquisition and retention of cus-
tomers. In [33], they mentioned that CRM is much more than installing software.
It requires knowledge from the organization that customers are assets. Hence,
CRM depends on the fact that determining how much to spend on a customer
is a significant assessment involved in identifying who and when to retain. In
[9], they stated that firms can be more profitable, gain steady market growth
and increase their market share if they identify the most profitable customers
and invest marketing resources in them. Thus, it is not surprising for the firm
to treat customers differently, some of them would be offered customized pro-
motions and the rest might be left to go. Basically, the main goal of CRM is to
help the firm determine the long term profitability of its customers to deliver
them the optimal promotion plans accordingly. Thus, CRM depends heavily on
the lifetime profitability of the customers especially, for the organizations that
seek long-term relationships with their customers. This is why Customer Life-
time Value (CLV) plays a significant role in CRM as a direct marketing strategy
measurement ([3,40,42,43,54]).

Fig. 1. Base and target periods for CLV prediction

CLV is the most reliable indicator in direct marketing for measuring the
profitability of the customers. Meanwhile, direct marketing is not about offering
the best possible service to every single customer, but about treating customers
differently based on their level of profitability. In [68], they found that CLV
based resource reallocation led to an increase in revenue of about 20 million (a
10-fold increase) without any changes in the level of marketing investment, as
mentioned in [7]. This is one of the reasons why CLV is considered a significant
metric in CRM as a direct marketing indicator. In general, CLV is defined as
the present value of all future profits obtained from the customers over their life
of relationship with a firm. The main idea of CLV is demonstrated in Fig. 1, it
imagines the customer (or segment of customers) stands at time 0, and their
related historical data is already acquired by the organization, the goal is to
predict their future profitability. Meanwhile, the complexity of CLV lies in its
dependency on many indicators, including customers’ retention rate, acquisition
rate, churn probability and more, as shown in Fig. 3 that lists some of these
indicators and a broader list is mentioned in the work of [19].
Dynamic Programming for Maximizing Customer Lifetime Value 421

Fig. 2. CLV indicators

The rest of the paper is organized as follows: Sect. 2 introduces the main idea
and the algorithm of each of the approaches that aim to maximize CLV. Section 3
reviewed the basic and traditional models of CLV that focus on analyzing the
term, calculating its value, simulating the interaction between CLV and its most
effective indicators or predicting its future value. Section 4 lists the limitations
of the traditional models and highlights the most important dynamic models
applied in CLV; starting from MDP, passing by approximate dynamic program-
ming and finalizing by the contribution of reinforcement learning techniques in
the field of maximizing CLV. The summary and discussion over all these methods
is listed in Sect. 5, showing the strengths and limitations of each of them. The
last section concludes the paper and highlights set of future research directions.

2 Background

This section focuses on the dynamic programming algorithms that are widely
used in the area of maximizing CLV. Briefly introduces the main idea of these
algorithms and states the steps of each. It is constructed as follows; Starts from
MDP as a traditional model and stating the algorithm’s steps. Then mentions
ADP (RL) model that outperforms MDP in complex problem. Finally, highlights
the main idea and the algorithm of deep reinforcement learning.

2.1 Markov Decision Process

Markov Decision Process or MDP for short, is a reinforcement learning app-


roach designed to allow the agents to take actions in a grid-world environment.
As demonstrated in Fig. 3, MDP depends on set of key elements including, cur-

rent state (s), next state (s ), an action (a) that invokes this transition from
422 E. AboElHamd et al.

current to next state, and finally, the discounted accumulated reward value (r).

Thus, MDP is described by the tuple (s, a, s , r, γ) where γ is a discount factor
(0 < γ <= 1). The transition probability function and the reward function are
described in Eqs. (1), and (2), respectively. The solution to MDP is called a pol-
icy, that is a mapping between states and actions; and the goal of MDP is to
find the optimal policy that maximizes the long term rewards [19]. There are
two solution approaches to achieve this goal, either value-iteration algorithm or
policy iteration algorithm [84]. The steps of value-iteration algorithm and policy
iteration MDP algorithms are mentioned in Algorithms 1, and 2, respectively
in [85].
 
Pass = P [St+1 = s |St = s, At = a] (1)

Rsa = E[Rt+1 |St = s, At = a] (2)

Fig. 3. MDP key elements

Algorithm 1. Value Iteration


1: initialize V arbitrarily (i.e. V (x) = 0, ∀s ∈ S)
2: repeat
loop to all s ∈ S
set γ = V(s)
loop to all a ∈ A(s) and do
  
Q(s, a) = Σs T (s, a, s )(R(s, a, s ) + γV (s ))
set V (s) = maxa Q(s, a)
Δ = max(Δ, |γ − V (s)|)
3: until Δ < σ

2.2 Approximate Dynamic Programming


Approximate dynamic programming (ADP) is a modeling approach that solves
large and complex problems. It’s proved to outperform MDP in many cases
especially in complex problems, as it overcomes the three curses of dimensional-
ity, whether in the state space, action space or outcome space; and it’s usually,
but not necessarily stochastic. In [65] the researchers mentioned how to apply
a generic ADP using lookup table representation for a maximization problem,
Algorithm 3.
Dynamic Programming for Maximizing Customer Lifetime Value 423

Algorithm 2. Policy Iteration


1: V (s) ∈  and π(s) ∈ A(s) arbitrarily ∀s ∈ S)
[POLICY EVALUATION]
2: repeat
Δ=0
Loop to all s ∈ S and do
set γ = V π (s)
  
V (s) = Σs T (s, π(s), s )(R(s, π(s), s ) + γV (s ))
Δ = max(Δ, |γ − V (s)|)
3: until Δ < σ
[POLICY IMPROVEMENT]
4: policy-stable = True
5: loop to all s ∈ S and do
b = π(s)
  
π(s) = argmaxa Σs T (s, a, s )(R(s, a, s ) + γV (s ))
if b = π(s) then policy-stable = False
6: if policy-stable then stop; else go to step-2

Algorithm 3. Approximate Dynamic Programming


1: initialization
initialize Vt0 (St )∀ state St
choose an initial state S01
set n = 1
2: choose a sample path ω n
3: for t = 0, t++, while t < T do
assuming a maximization problem, solve
n−1
γtn = maxxt (Ct (Stn , xt ) + γE[Vt+1 (St+1 )|St ])
n−1
update Vt  (St ) using
(1 − αn−1 )Vtn−1 (Stn ) + αn−1 γtn , if St = Stn
Vtn (St ) =
Vtn−1 (St ), otherwise
compute St+1 = S M (Stn , xn
n n
t , Wt+1 (ω ))
4: let n = n+1. If n < N, go to step 1.

Approximate dynamic programming is sometimes called Reinforcement


learning especially in the context of computer science, and the former is more
common in Operations Research. RL depends mainly on a Q-learning as a model
free algorithm. In short, Q-learning Q(s, a) learns the action-value function. It
tries to evaluate the action a at a particular state s. The best action is the
one that maximizes the overall long term reward. Basically, Q-learning was per-
formed using lookup tables; where, Q values were stored for all possible combi-
nations of states and actions, and eventually, a machine learning algorithm (i.e.
neural network) has been built to learn the Q-values. Algorithm 4 illustrates the
main steps of Q learning basic algorithm.
424 E. AboElHamd et al.

Algorithm 4. Q-Learning
1: start with Q0 (s, a)∀s, a
2: get initial state s
3: for k = 1, k++, while not converged do

sample action a, get next state s

4: if s is terminal then:

target = R(s, a, s )

sample new initial state s
  
else target = R(s, a, s ) + γmaxa Qk (s , a )
5: Qk+1 (s, a) ← (1 − α(Qk (s, a) + α[target])

6: s←s

2.3 Deep Reinforcement Learning

In complex problems, finding exact solution for the problem might be impossible
and this is why, searching for approximate solution become a must. This was
the motivation behind ADP at first and eventually, deep reinforcement learning
(Or Deep Q Networks (DQN)), Algorithm 5. The basic idea of DQN is that
the action-value function Q is approximated using neural network (or more pre-
cisely, deep learning) algorithm, instead of using lookup table (as mentioned in
the previous section). Hence, DQN combines Q Learning and neural networks
algorithms.

Algorithm 5. DQN with experience replay


1: initialize replay memory D to capacity N
2: initialize action-value function Q with random wights θ

3: initialize target action-value function Q with weights θ− = θ
4: for episode = 1, while episode < M do
initialize sequence s1 = [x1 ] and pre-processed sequence φ1 = φ(s1 )
5: for t = 1, while t < T do
select a random action at with probability ;
otherwise, select at = argmaxa Q(φ(st ), a, θ)
execute action at and observe a reward rt and xt+1
set st+1 = st , at , xt+1 and pre-process φt+1 = φ(st+1 )
store transition (φt , at , rt , φt+1 ) in D
sample random
 mini-batch of transitions (φt , at , rt , φt+1 ) from D
rj , if episode terminates at step j + 1
set yj =  
rj + γmaxa Q (φj+1 , a ; θ− ), otherwise
optimize θ using gradient descent for (yj − Q(φj , aj ; θ))2

set Q = Q every C steps
Dynamic Programming for Maximizing Customer Lifetime Value 425

3 Basic Models of CLV


Due to the significant contribution of CLV in CRM, researchers are competing
in developing different models for CLV. As demonstrated in Fig. 4, CLV contri-
butions might be classified into either basic (i.e. traditional) models, that focus
mainly on analyzing the term itself, simulating or calculating its value, optimiz-
ing it or more interestingly, maximizing it. The second part of the classification
is (dynamic models) for dealing with CLV. The dynamic model umbrella con-
tains many algorithms starting from MDP, ADP then reinforcement learning
and finally deep reinforcement learning. This section and the following one are
devoted to list the contribution under each umbrella, and both are organized
such that to start with mentioning the survey papers that listed the contribu-
tions, strengths and limitations of the literature; then stating the different points
of view of the researchers in each of the basic models. Meanwhile, for a sample of
83 papers, Table 2 lists the nature of the contributions and the number of pub-
lications in each. While, Table 1, counts the number of publications classified by
the industry.
In [31], they compared customer centric and product centric approaches.
They also tried to present various research articles that dealt with CLV. They
summarized their findings and presented directions for future research. In [44],
the researchers reviewed a CLV matric from a different perspective, from its
definition to the approaches to compute and measure it. They also touched the
concept of customer equity. One of the strengths in this review paper is that it
mentioned the implementations’ challenges faced by the organizations. While,
In [8], they reviewed quantitative models for direct marketing from systems per-
spective. They focused on two types of models, statistical and machine learning
based and mentioned the advantages and disadvantages of both. They evalu-
ated the models in terms of accuracy and profitability. In [88], they provided
an overview on customer relationship management with specific focus on retail
industry. They studied how the retailers could extract insights from collected
customer’s data with a focus on the prediction models and that measured the
customer behavior over time. In [29], they classified the contribution of calculat-
ing CLV in the literature. They compared them from a segmentation perspective.
In [1], they analyzed the customer lifetime value literature critically; highlighted
its mathematical model, techniques, applications as well as its limitations.
On top on the basic models lies CLV prediction, for the sake of adjusting
marketing investments accordingly. In [69], Rust et al. developed a model that
predicted customer profitability and performed significantly than naive mod-
els. Their model was applied on a dataset from high-tech company in a B2B
context. Other researchers went for predicting the one of CLV indicators that
directly affecting it, including customer churn. Starting by Sabbeh et al. [71], who
analyzed the performance of ten machine learning algorithms that have been uti-
lized churn prediction problems. Another prediction model was proposed in [76]
that consisted of three steps applied on a real manufacturing company in Iran.
Starting from grouping the customers according to their behavioral similarities,
calculating their current value, finally, estimating their future value with the help
426 E. AboElHamd et al.

Fig. 4. CLV models

of stochastic Markov decision process model. Also, in [72], they highlighted the
fact of 20/80 rule, that the largest portion on the profit caused by only a small
number of users. Hence, they tried to build a prediction model with the help of
neural networks to predict CLV. They also highlighted that the data was imbal-
anced and this is why they combined synthetic minority oversampling (SMOTE)
method with neural network to perform the prediction. SMOTE improved the
data augmentation and thus the overall prediction results.
On another hand, and reference to Fig. 2, that lists set of indicators CLV
depends up on. This dependency attracted many researchers and motivated
them to analyze it. To list some, in [67], they introduced a modeling framework
to balance the resources devoted to customer acquisition and customer retention,
for the sake of maximizing the profitability. However, in [74], they analyzed the
profitability differently. They tried to find relationship between cross-buying and
firm’s profitability. They analyzed customer databases for 5 firms and found that
from 10–35 of the firm’s customers who perform cross-buying are unprofitable.
They also proposed a framework to help managers determine the profitable and
unprofitable cross-buying. In [60], they analyzed the time of the customer’s pre-
vious purchase (i.e. recency) in a developed modeling approaches that target
firm’s marketing efforts taking into consideration the recency of the customers.
In [66], the researchers investigated the long term impact of different channels
including, coupon promotions, TV and internet ads across customer segments.
They applied their model on a digital music provider and over about 500K cus-
tomers and concluded with actionable insights. On top of these indicators, was
customer loyalty. This indicator attracted the researchers in [81], who divided
loyalty to behavioral loyalty and attitudinal loyalty. They conducted longitu-
dinal survey showing that as customer’s behavior loyalty increased, customers
placed more importance on price and less on rewards and convenience. However,
in attitudinal loyalty customers gave more attention to rewards and convenience.
Hence, attitudinal loyalty caused firms to reduce price sensitivity and increase
revenue.
Dynamic Programming for Maximizing Customer Lifetime Value 427

Table 1. Number of publications per application

Application Number of publications


Not mentioned 47
Mailing catalog dataset 10
Banking 9
High-tech company 5
E-commerce 2
Telecom industry 2
Sports 2
Digital music provider 1
Retail 1
Catalog firm 1
Medical 1
Finnair case study 1
Automotive 1
Total 83

As illustrated in Table 2, many researchers contributed in CLV. The majority


went to analyzing the term and its relationship with other indicators (i.e. reten-
tion rate, churn rate, expected revenue, etc.). However, over the years, more
interest was given to searching for models that maximize its value. In [44], they
presented two approaches for computing CLV, some of its related applications
as well as set of the challenges associated with it. In [61], they emphasized on
CRM, they proposed a framework for a systems perspective of CRM. They also
built a client/server architecture for CRM for the sake of facilitating CRM’s
understanding, development and implementation. While, One of the contribu-
tions (i.e. [58]) was proposing a customer experience framework that focused
mainly on the customer’s journey while experiencing the service. Some of them
emphasized on CRM and tried to analyze it and link it with CLV.

Table 2. Number of publications for each CLV related task (Sample of 83 papers)

Methodology
CLV task Basic models MDP ADP DQN
Analysis 18 6 - -
Simulation 3 - - -
Optimization - 4 - -
Prediction 7 7 - -
Maximization 4 4 10 20
Total 32 21 10 20
428 E. AboElHamd et al.

Reference to Table 1, the majority of the publications didn’t mention the


application industry they worked on, however, Nine out of those the reviewed 83
publications, were interested in applying CLV in banking sector. For example,
In [10], they tackled CLV from a unique perspective. They analyzed the effect
of forward looking matrices on employee decision making. Precisely, focusing on
the banking sector, they tested the environment where explicit incentives and
decision rights remained unchanged and tried to gauge the effects of enriching
the information set of the employees in this environment. Their results argued
the literature’s results where CLV negatively impacted pricing and they proved
a significant shift in attention toward more profitable client segments and some
improvements in cross selling. Proper calculation for CLV enables to quantify
company’s long-term profitability and support decisions related to the level of
expenditure on customer acquisition, retention, promotions planning, distribu-
tions channels choice as well as company’s market share increase [18].

Fig. 5. Number of CLV related publications per year, sample size is 83

In [86], they developed a dynamic framework that enabled managers main-


taining the relationships between their companies and their customers and to
maximize CLV. They also observed a relationship between the customers’ CLV
and their associated marketing channels. They concluded with a suggestion for
the managers that they might increase CLV by designing resource allocation
rules. In [39], they designed longitudinal models to increase the customers’ val-
ues over their lifetime. They also discussed set of variables that are contributed
in the rise of use of CRM in the marketplace. In [24], they reviewed a number of
applicable CLV models related to customer segmentation and allocation of the
marketing resources in case of acquisition, retention and cross selling strategies.
Gupta also joined another research team in [25] and built a model paper that
related the customer’s unobservable (i.e. customer satisfaction) matrices with
the observable (i.e. CLV). The researchers in [37] were also interested in relating
segmentation with CLV. Precisely, they were interested in segment the customers
based on their CLV and they called their approach (benefit segmentation) focus-
ing on banking sector. While, in [80], they summarized the literature focusing
Dynamic Programming for Maximizing Customer Lifetime Value 429

on the contribution related to simulating the direct marketing policies. Another


contribution was a decomposition for the problem into distinct tasks. In light of
this, they built an open source simulator that was trained and validated on two
direct marketing datasets. On another hand, in [2], they have had many con-
tributions including simulation for the highest profits in a proposed model that
integrated over-the-top provides, and internet service providers for the sake of
maximizing the profit. They classified the customers according to their CLV and
focus on the churn of the most profitable customers. In [59], they contributed by
a framework to allocate the promotional marketing strategies to the appropriate
customer segments, with the help of CLV. Their two phases framework started
by an information gathering step then a segmentation step upon CLV scores
using Fuzzy C-means algorithm. Fuzzy TOPSIS is utilized in their framework
to help in prioritizing the segments.
Another approach was to link the value of the customers with their
social media and social networks’ activities. This perspective attracted many
researchers. In [55], they organized a discussion around what they called (social
CRM house) and discussed how social media engagement affected the house’s
core areas including acquisition, retention and termination. They addressed set of
challenges related to managing the relationship in the social media era. Including,
having big and unstructured data, privacy and security issues and many others.
In [89] they examined the effect of social networking sites and how it influenced
the customer’s purchase decision. They performed regression analysis for this
matter to assess the dependency relationship between the communication over
the social network sites (independent variable) and the customer’s purchasing
decision (dependent variable). In [17], they also touched social networks how-
ever, from another perspective. They built a customer valuation model called
(customer lifetime network value). Their model reallocated the value of the cus-
tomers based on their social influence, and it was applied on a real world sports
dataset.
Bayesian decision theory was the early trial to maximize the value of the
customers instead of just analyzing it or even predicting its value. In [87], they
utilized Bayesian decision theory in developing a framework that accommodated
the uncertainty while predicting customer behavior. Their developed model max-
imized CLV depending on the customer’s purchasing time and quantity. They
mentioned implementations’ guidelines and illustrated in how their model can
help managers to increase marketing productivity. However, the implementations
of Bayesian decision theory approach suffered from many limitations including
being computationally expensive, especially for the large datasets that includes
many variables (i.e. related to high technology industry like what has been men-
tioned in their work). Besides, the sensitivity of the approach to the choice of
prior and posterior variables [22].
In fact, due to the limitations of the previously mentioned algorithms and
approaches in maximizing CLV, and because they tackled the CLV related prob-
lem from other perspectives, far from maximizing its value (expect the trails of
Bayesian decision theory). The researchers started thinking in other more effec-
tive approaches (i.e. Markov Decision Process models, Approximate Dynamic
430 E. AboElHamd et al.

Programming and Reinforcement Learning). The rest of the paper addresses


these trails in the following sections, starting from listing the literature review
papers that concluded the work that have been done in each.

4 Dynamic Models of CLV


The dynamic models umbrella starts with a basic Markov models; with set of
states and a transition matrix represents the probability of moving from state to
another state. In [63], they built a stochastic Markov model assuming only two
states for the customer (basic model), then expanded to more than two states
and compared the results of both. Eventually, actions that enforce the movement
between the states are added as well as a cumulative reward. This new model
with the added actions and reward is called Markov decision process or for short
MDP. Markov Decision Process is an environment in which all states are Markov,
which means the future is independent of the past, or more precisely, the past is
encapsulated in the current state. Hence, the future depends only on the present.
As illustrated in Eq. (3)

P [St+1 |St ] = P [St+1 |S1 , ..., St ] (3)

As mentioned in the previous section, states and actions are two of the main
MDP’s elements. Hence, their characteristics categorize the MPD problem into
either discrete or continuous. Figure 6 shows the categorizes of states and actions
to discrete or continuous and categorize MDP problem accordingly.

4.1 Markov Decision Process Models in CLV

Traditionally, MDP was on top of the models for maximizing CLV; and over
many years, it outperformed many other approaches; due to its mechanism that
suits the purpose. This subsection lists the literature’s contributions according
to the publishing year. The listing is guided by what has been mentioned in

Fig. 6. MDP classification


Dynamic Programming for Maximizing Customer Lifetime Value 431

Tables 1, 2 and Fig. 5. Set of the most effective contributions mentioned in Fig. 5
are selected and listed in the rest of this subsection.
The list starts with 2007 and there are three contributions on three differ-
ent datasets selected then. Starting by the researchers in [26] who developed
a customer valuation model combined first order Markov decision model with
classification and regression tree (CART). The strength of their model lied in its
ability to deal with both the discrete and continuous transitions. Their model
is tested on a leading German bank with a 6.2 million datasets. In [45], they
proposed a comprehensive framework combines customer equity with lifetime
management. Their framework maximized ROI as it helped in the optimal plan-
ning and budgeting of targeted marketing campaigns. Their proposed model
combined advanced Markov decision process models, Monte Carlo simulation,
and portfolio optimization. They tested their model on the Finnair case study.
The contribution in [56] was a bit different. They proposed a model that provided
closed form approximations for the bias and variance; considering finite-state,
finite-action, infinite-horizon and discounted rewards Markov decision process.
Their tested their model on a large scale mailing catalog dataset.
While in 2008, in [92], they discussed the interest of academics and prac-
titioners in the development and implementation of stochastic customer base
analysis models. They also presented the arguments in favor of or against the
usage of these models in practice. As a contribution, the authors compared the
quality of these models in practice with a simple heuristic that firms typically
use. They confirmed that the heuristics perform at least as well as the stochastic
models expect for future purchases predictions. Hence, the authors recommended
the usage of stochastic customer base analysis models in practice. In [52], they
captured relationship marketing in a Markov decision framework as well as a life
distribution of the customer relationship focusing on CLV and illustrated a way
to find optimal marketing policy numerically.
Moving to 2011, to [11] who added to the contributions in the literature,
a comprehensive decision support system. They predicted customer purchas-
ing behavior given set of factors including product, customer, and marketing
influencing factors. They utilized the predicted customer purchasing behavior in
estimating the customer’s net CLV for a specific product. They verified their sys-
tem through a case study. While, in [14], they developed a framework consisted
of three groups of techniques to compute CLV based on its predicted value, iden-
tified its critical variables and finally predicted the profit of the customers under
different purchasing behavior using ANN. Their model is tested on a dataset
related to a company in Taiwan.
Ekinci has many contributions in the area of maximizing CLV. In [21], they
designed a methodology to help managers maximize the CLV of their customers
based on determining the optimal promotion campaigns. Their methodology was
built with the help of classification and regression tree (CART) and stochastic
dynamic programming. Ekinci also joined a research group (2014) and designed
a two-step methodology study that aimed to maximize CLV via determining
optimal promotion campaigns; Based on stochastic dynamic programming and
regression tree. Their model is applied in banking sector and they also tried
432 E. AboElHamd et al.

to determine the states of the customers according to their values using CART
technique. In [19], they tackled the problem from another perspective. They
developed a simple, industrial specific, easily measurable model with objective
indicators to predict CLV. They injected the predicted CLV as states in Markov
decision process model. Their proposed model is tested in the banking sector.
One of the strengths of their model is that they conducted set of in-depth inter-
views to collect the most effective indicators for CLV. In the same year, in [90],
they estimated the customer’s lifetime duration and discounted expected trans-
actions using a simple and standard discrete-time based transaction data. They
also identified the relational and demographical factors that might cause the
variance in customer’s CLV. They concluded by set of insights to the market-
ing managers and decision makers. In [93], they built a Markov decision process
model with multivariate interrelated state-dependent behavior. The uniqueness
in their model lied in its ability to capture the effect of firm pricing decisions in
customer purchasing behavior and consequently, in customer lifetime profitabil-
ity in a Business to Business (B2B) market. In [16], they presented a method
for finding local optimal policy for CLV, developed for a class of ergodic con-
trollable finite Markov chains for a non-converging state value function. They
validated their method by simulated credit-card marketing experiments. While,
in [15], they developed a stochastic dynamic programming model in both finite
and infinite time horizons to optimize CLV. Their model is tested using practical
data of computer service company.
In [82], they utilized stochastic dynamic programming model that was opti-
mized using two deterministic linear programming algorithms. They concluded
with set of insights including that the loyalty might decrease in short term net
revenue. They also compared different loyalty programs. The most exiting find-
ing is that their analytical long term evaluation of loyalty programs was capa-
ble of determining the most appropriate loyalty factors. In the same year, the
researchers in [41] have had two contributions, initially, they studied how to bal-
ance the tradeoff between short term attainable revenues and long term customer
relationship using Markov decision process model. Then they investigated the
impact of limited capacity on CLV by introducing an opportunity cost-based
approach that understood customer profitability as a customer’s contribution
to customer equity. In [30], they proposed a method to calculate CLV dynam-
ically for the sake of adopting personalized CRM activities. They also applied
data mining techniques to predict CLV and their model is tested on a wireless
Telecom industry in Korea.
In the last two years, the researchers tried to add to what have been done as
well. Taking [23] as example. They proposed an offline algorithm that computed
an optimal policy for the quantile criterion. Their algorithm could be applied
both for finite and infinite horizons. As a future work they mentioned the aim
to upgrade their algorithm to a reinforcement learning where the dynamics of
the problem is unknown and has to be learned. In [94], they designed a predic-
tion probabilistic model to measure the lifetime profitability of customers. They
were interested in the customers whose purchasing behavior followed purchasing
cycles. Their model is measured using the inter purchase time of the customers
Dynamic Programming for Maximizing Customer Lifetime Value 433

and was assumed to follow Poisson distribution. They also measured the cus-
tomer lifetime profitability based on a proposed a customer’s probability scoring
model. Their model was applied on dataset for 529 customers from catalog firm
and showed outperforming results.

4.2 Approximate Dynamic Programming in CLV


Approximate Dynamic Programming (ADP), also called (Reinforcement Learn-
ing, in computer science context) is part of a sub-optimal control theory, and
it’s also called adaptive dynamic programming. In fact, ADP has many other
synonyms including, neural dynamic programming, neuro-dynamic program-
ming and reinforcement learning [51]. ADP might be considered as (a complex)
Markov decision process. It’s a powerful technique that is able to solve large
scale problems, overcoming the curse of dimensionality whether in state space,
action space or outcome space via searching for an approximate solution for the
complex problem instead of the true/exact solution [64].
On top of the researchers who contributed in the area of ADP was Pow-
ell. For example, in [65], they proved that MDP lakes to completely capture
the dynamism in the relationship management as well as handling the curse of
dimensionality arises from increasing the customer segments and the organiza-
tion’s actions related to the marketing strategy. They also noticed that MDP
depends on the existence of transition probability matrix (TPM) that might not
necessarily applicable to be constructed. Finally, it strongly needs a model to
be formulated for the problem and formulating that model in some situations
is impossible. Least but not last, MDP needs a lot of feature engineering prior
to the modeling phase. All these gaps and even more have been noticed and
encouraged them and other researchers to search for other approaches to maxi-
mize CLV without suffering from these limitations. On top of these approaches
lied approximate dynamic programming; and this is what is illustrated in the
rest of this subsection.
In [75], the researchers wrote a book chapter that reviewed the history of
field experiments in marketing along 20 years. They grouped set of papers into
topics and reviewed them per topic. During (2006), the researchers addressed the
mailing catalog problem by proposing a model that allowed firms to optimize
mailing decisions. The major strengths of their model are not only its ability
to address the dynamic implications of the firms’ decisions but also for being
simple and straightforward to be implemented. They tested their model on a
large sample of historical data and evaluated its performance on a large scale field
test. While, in [6], they extended the multi-armed bandit problem by presenting
a Bayesian formulation for the exploration, exploitation tradeoff in marketing,
where the decisions were made for batches of customers and the decisions may
vary within a batch. Their solution combined Lagrangian decomposition-based
ADP and a heuristic based on a known asymptotic approximation to the multi-
armed bandit. Their proposed model outperformed those methods that have
ignored the effect of information gain.
434 E. AboElHamd et al.

The banking sector attracted many researchers to enhance the bank and
customer’s relationship using different approximate dynamic programming
approaches. For example, in [12] they utilized dynamic programming to solve
the problem of maximizing bank’s profit. The optimization problem is formu-
lated as a discrete, multi-stage decision process. The obtained solution is globally
optimal and numerically stable. In [20], they contributed in the area of marketing
budget allocation for the sake of optimizing CLV, by introducing a decomposi-
tion algorithm that overcame the curse of dimensionality in stochastic dynamic
programming problems. In [91], they utilized iterative adaptive dynamic pro-
gramming to establish a data-based iterative optimal learning control scheme,
that is used for a discrete-time nonlinear systems. Their model is used to solve a
coal gasification optimal tracking control problem. Neural networks were used to
represent the dynamical process of coal gasification, coal quality and reference
control. Finally, iterative ADP was mainly used to obtain the optimal control
laws for the transformed system.
Jiang et al. has many contributes in this context [34,35]. In (2015), they
proposed Monotone-ADP, a provably convergent algorithm that exploited the
value function monotonicity to increase the convergence rate. Their algorithm
was applied on a finite horizon problem and show numerical results for three
application domains including optimal stopping, energy storage (allocation), and
glycemic control for diabetes patients. The same researchers and within the same
year (i.e. 2015), published a paper that formulated the problem of the real-time
placement of the battery storage operators while simultaneously accounting for
the leftover energy value. Their algorithm exploited value function monotonicity
to be able to find the revenue generating bidding policy. They also proposed a
distribution-free variant of the ADP algorithm. Their algorithm is tested on New
York Independent System and recommended that a policy trained on historical
real-time price data using their proposed algorithm was indeed effective.
Moving to 2016, in [32], their point’s of view was a bit different. They devel-
oped a single server queuing model that determined a service policy to max-
imize the long term reward from serving customer. Their model excluded the
holding costs and penalties due to customers waiting and leaving before receiv-
ing the service. They also developed a model that utilized ADP to estimate
the bias function that was used in a dynamic programming recursion. While,
in [62], they proposed an approximate dynamic programming algorithm called
simulated-based modified policy iteration, for large scale undiscounted Markov
decision processes. Their algorithm overcame the curse of dimensionality when
tested on numerical examples. Back to the banking sector’s applications, in [57],
they explored a restricted family of dynamic auctions to check the possibility of
implementing them in online fashion and without too much commitment from
the seller in a space of single shot auctions that they called (bank account).
Although Approximate dynamic programming and Reinforcement Learning
are used interchangeably and present the same concept. ADP is more popular in
the context of Operations Research (OR) while, RL is widely used in the context
of computer science. It is considered as a branch of machine learning, focused
Dynamic Programming for Maximizing Customer Lifetime Value 435

on how the agent would perform in an environment when being in certain state,
and has to take action to be able to move to another state while maximizing a
commutative reward function. On another hand, reinforcement learning is very
close to Markov Decision Process (MDP). However, the former don’t assume
knowledge about the mathematical model and is mainly used when having a
model for the problem is infeasible. This is why machine learning algorithms (i.e.
neural network) is used to approximate the solution function of the reinforcement
learning [73].
Computer science researchers tackled ADP from this perspective, calling it
Reinforcement Learning. As an overview on the survey papers that mentioned
the contributions in reinforcement learning. Let’s start with the researchers in
[77] who explored marketing recommendations through reinforcement learning
and Markov decision process, and evaluated their performance through an offline
evaluation method using a well-crafted simulator. They tested their proposal on
several real world datasets from automotive and banking industry. While, in
[74], they proposed a framework for concurrent reinforcement learning using
temporal difference learning, that captured the parallel interaction between the
company and its customers. They tested their proposal on large scale dataset for
online and email interactions. Back to [78], they made another contribution is
a manuscript that aimed to build a comprehensive framework for utilizing rein-
forcement learning with off policy techniques to optimize CLV for personalized
ads recommendation systems. They also compared the performance of life time
value matric with click through rate for evaluating the performance of person-
alized ads recommendation systems. Their framework was tested on a real data
and proved its outperformance.
On another hand, in [5], they stated five recent applications for reinforcement
learning, including personalized web services. This included recommending the
best content for each particular user based on his profile interests inferred from
his history of online activity. While the survey paper in [49] summarized the
achievements of deep reinforcement learning. They discussed its six core ele-
ments, six mechanisms and listed twelve applications, including its applicability
in Business management (i.e. ads, recommendation, and marketing), finance and
health care. They also mentioned topics not reviewed until the time they wrote
their manuscript and listed set of resources and tutorials for deep reinforcement
learning. In [47], they focused on illustrating the deep reinforcement learning
environment components, and the properties of intelligent agents. They listed
eight RL related problems including safe interruptibility, avoiding side effects,
absent supervisor, reward gaming, safe exploration, as well as robustness to self-
modification, distributional shift, and adversaries. These problems were catego-
rized into two categories (i.e. robustness and specification problems) according
to whether the performance function corresponded to the observed reward func-
tion. They built two models to deal with these problems (i.e. A2C and Rainbow).
In [4], they wrote a survey that covered central algorithms in deep reinforcement
learning including deep Q-network, trust region policy optimization, and asyn-
chronous advantage actor-critic. They also highlighted the unique advantages
436 E. AboElHamd et al.

of neural networks and tried to focus on a visual understanding of RL. In [50],


their application was medical, however their writing style is perfect and their
demonstrative charts are awesome. In [23], they wrote a survey paper presenting
a concept of safe reinforcement learning. They classified the papers in light of
this concept to two approaches, optimization criterion and exploration process.
They concluded by the effect of preventing the risk situations from the early
steps in the learning process.

4.3 Deep Reinforcement Learning in CLV


As mentioned earlier, the idea of reinforcement learning is to utilize a model
free algorithm (i.e. Q learning) for the sake of maximizing CLV, especially if the
problem doesn’t have exact solution. Q-learning is trained using a machine learn-
ing algorithm (i.e. Neural network) and an approximated solution is generated.
In Deep reinforcement learning, Q value is approximated using deep learning
model, instead of multi-layer perceptron neural network. Meanwhile, there are
many advantages for using deep reinforcement learning, not only to overcome
the curse of dimensionality problem related to the huge number of states, actions
or outcome spaces, minimize the complexity of the problem, save a lot of feature
engineering steps but also -and more importantly-approximate the Q value in
case of absence of transition probability matrix with the help of Q-learning as a
model free reinforcement learning technique [36].
Table 1, classified the publications lists in this review and their correspond-
ing industries. Those industries varied from banking, retail to direct mailing
campaigns, and even more. However, one of the most significant ways of interac-
tion between the firm and its customers in direct marketing is through mailing
campaigns. It attracted many researchers and motivated them to analyze the
dynamic implications of mailing decisions. Starting with [48] who focused on
deep learning and built a deep reinforcement learning model (i.e. RNN and
LSTM) on a partially observable state space and discrete actions. Their model
made use of the strengths of supervised and reinforcement learning. They run
their model on a KDD Cup 1998 mailing donation dataset. In [79], they proposed
a framework captured the autonomous control for customer relationship man-
agement system. They utilized Q learning to train deep neural network assuming
that the customer’s states represented by recency, frequency, and monetary val-
ues, and the actions are both discrete and continuous. Their model assumed
that the estimated value function represented the CLV of each customer. Their
model was run on a KDD Cup 1998 mailing donation dataset. While, in [53],
they built an approach that aimed to optimize the mailing decisions by max-
imizing the customer’s CLV. Their approach was a two-step approach started
from a non-homogenous MDP that captured the dynamics of customers, mailings
interactions, then determined the optimal mailing decisions upon using partially
observable MDP.
Back to [46], they wrote an overview paper presenting the recent achieve-
ments in deep reinforcement learning including, personalized web services,
healthcare, and finance in a comprehensive review paper. While, in [48], they
Dynamic Programming for Maximizing Customer Lifetime Value 437

focused on e-commerce and built a model that understood the customer behav-
ior in e-commerce using recurrent neural networks. The power of their model
lied in its ability to capture the customers’ actions sequences; hence, it overcame
the traditional vector based methods (i.e. Logistic Regression). In [95], they
focused on e-commerce transactions as well; however, they combined Markov
decision process with unbounded action space with deep learning, to build a
deep reinforcement learning platform that overcame the limitations of the other
fraud detection mechanisms. Their model maximized the e-commerce profit and
reduced the fraudulent behavior. It was applied on a real world e-commerce
dataset. Finally, the researchers in [13] tackled CLV from a totally different per-
spective, by applying it on video games industry. These researchers suggested
that Convolutional Neural Network (CNN) structure is the most efficient among
all neural network structures in predicting the economic value of the individual
players, and best suited the nature of video games large dataset. They also high-
lighted one of the most benefits of CNN that no feature engineering is needed.
Although deep reinforcement learning has a significant contribution in the
area of direct marketing for the sake of maximizing CLV. It has many limita-
tions including the fact that it might overestimate the action values and hence,
generates unrealistic actions, as it always shows the action that maximizes the
Q values, as shown in Eq. (4) and Algorithm 5; where Q represents the CLV, r
 
is the rewards, γ is the discount factor, s is the next states and, a is the next
actions that maximize the Q values. These limitations motivated few researchers
to develop double deep reinforcement learning or DDQN for short [27]. Although
DDQN as well has some limitations, it overcame the disadvantages of DQN
by generating reliable and robust actions values. DDQN has been designed to
have two decoupled (i.e. separate) networks, one for the selection of the opti-
mal action that maximizes Q and one for the evaluation of this selected action.
This decoupling process helps DDQN to generate reliable actions. Equation (5)
demonstrates DDQN model, where θ represents the weights of the first network

and θ represents the weights of the second network.
 
Q(s, a) = r + γmax(Q(s , a , θ) (4)

   
Q(s, a) = r + γQ(s , argmaxa ((Q(s , a ; θ), θ ) (5)
Hasselt et al. published few papers about the mechanism of DDQN (i.e.
[27,83]). However, these papers lakes the applicability as they only illustrated
the theory of the model and didn’t apply it on a real case study. In [38], they
proposed a scalable reinforcement learning approach that was applied on robotic
manipulations. However, this still far from the direct marketing area. Hence,
applying double deep reinforcement learning in the context of direct marketing
to maximize CLV still unreached island. However, empirical studies proved some
limitations for DDQN and this what was mentioned in the work of [28] who
proposed a model called “Rainbow” that combined six extensions in DQN. Their
model was applied on Atari 2600 benchmark and not related to marketing area
as well. Also, in [70], they contributed to the literature by deep quality value
438 E. AboElHamd et al.

(DQV), that used a value neural network to estimate the temporal-difference


errors which are used by a second quality network for directly learning the state-
action values. Their model proved its effectiveness in four games of Atari Arcade
Learning. Yet, was not applied in the context of direct marketing.

5 Summary and Discussion

This section discusses the contributions of CLV as a direct marketing approach


in maximizing customer lifetime value. Basically, as mentioned in Sect. 3, many
researchers have been focused on calculating CLV using simple formula or ana-
lyze its behavior with respect to other factors related to it (i.e. churn probability,

Table 3. Advantages and disadvantages of CLV models

CLV task Advantages Disadvantages


CLV formula – Simple – Doesn’t measure the
changing customer revenue
and costs over time
– Interpretable – It assumes stable retention
rate
– Reliable – Doesn’t apply discount rate
to future customer revenue
this results in an overstated
customer value
– Efficient for measuring the profit
of the customer
Simulation – Study the behavior of the system – Sometimes it might be
without building it difficult to interpret the
simulation results
– Usually, more accurate results
compared to analytical models
– Facilitate performing “What if”
analysis
Prediction – Measure the future value of the – The future is difficult to
customer instead of assuming that predict
the best customer will be continue
to be the best customers
– Predicting CLV provides chance – It’s not factual
to plan for scalability
Maximization – Maximize the potential value – Might overestimate or
from the customer underestimate the customer’s
value
– Maximize the overall firm value
and market share
Dynamic Programming for Maximizing Customer Lifetime Value 439

Table 4. Strengths and limitations of CLV dynamic models

Dynamic Strengths Limitations


model
MDP – Simple, easy to implement and – TPM is mandatory and having a
Less computational time for model is a must
solving the problem than the
traditional techniques
– Best suits CLV problem as it can – Suffers from curse of
solve sequential problems under dimensionality
uncertainty
ADP – No model is required and hence, – Shallow knowledge
it Outperforms MDP in complex
problems
– Overcomes three curses of – Can’t look ahead and this
dimensionality restricts the ability to learn
– Finds approximate solution for
the complex problems where exact
solution is impossible
DRL – Deep knowledge of the problem – Utilizes single network for both
at hand and this allows higher actions selection and evaluation
convergence rates than traditional
Q-learning algorithm
– Deals better with the complex – Overestimate the action values
problems that have both discrete and hence, might generates
and continuous states and actions’ inaccurate customer value
spaces
DDRL – Utilizes two separate networks for – Might Overestimate the action
the action selection and evaluation values
– Generates accurate and reliable – Sometimes it might inaccurate
action values customer value
Rainbow – Combines set of traditional – Wasn’t applied to CLV yet
techniques (i.e. DDQN, dueling
networks, . . .), and outperforms
each of them separately
– Generates stable action values
and accurate evaluation for the
customer lifetime value
VQM – Utilizes two separate networks, – Wasn’t applied to CLV yet
one for estimating the temporal
difference error and the second for
learning the state, action values
– Outperforms DDQN – Still unstable and there are lots
of spaces for enhancements (i.e.
multi-threaded DQV)
440 E. AboElHamd et al.

crossselling and upselling policies, ...). Although this approach has many advan-
tages including being simple and easy to interpret to top management, it lakes
capturing the dynamism in revenue and costs over time. This limitation and
even more were solved with other researchers who went for applying simulation
models to capture CLV. As mentioned in Table 3, this type of models also has
it’s limitations including high costs, and difficulty to interpret its results in some
cases. One of the breakthroughs in the area of CLV done with those researchers
who went for predicting its value. This type of models has many advantages and
attracted both the researchers and the practitioners. Meanwhile, On top of the
effective literature contributions is CLV maximization. This research direction
not only predicted CLV but also maximized the potential of the customer, his
loyalty, and the firm’s overall profitability and market share. However, only few
researchers started utilizing basic models for this purpose (i.e. Bayesian models)
and the majority achieved this purpose by utilizing dynamic models instead.
In general, the applicability of the basic models in real life applications is
higher than the dynamic models, due to its implementation’s simplicity and
results interpretability. However, the dynamic models attracted the researchers
due to its effectiveness and outperformance. Hence, the vast majority of
researchers utilized dynamic models for the sake of maximizing CLV due to its
suitability for the problem and its effectiveness. Table 4 lists set of strengths and
limitations of each algorithm and it can easily observed that each algorithm’s
basic strengths lies in avoiding the limitations of its previous one. The most effec-
tive algorithm under this algorithm is “rainbow” proposed by the researchers in
[28] as it combined lots of deep reinforcement learning techniques and made use
of each of their advantages; Yet it wasn’t utilized in the area of CLV.

6 Conclusion and Future Work

This paper presented an overview for the contribution of dynamic program-


ming algorithms in the area of direct marketing to maximize CLV. It started
by reviewing the basic models that contributed in CLV focusing on those ones
that maximized its value. Then assessing the dynamic models that significantly
contributed in the area of maximizing CLV. The very basic algorithm under this
umbrella is MDP that has many limitations including the dependency of having a
model formulated for the problem, the necessity of having a transition probabil-
ity matrix and the possibility of suffering from curse of dimensionality. All these
reasons encouraged the researches to utilize approximate dynamic programming
(i.e. reinforcement learning) and its model free version (i.e. Q learning algorithm)
showed outperforming results in this context. It was able to find approximate
solution for the problem that might not have an exact one. Basically, Q-learning
achieved that with the help of look-up tables and eventually, a neural network
and deep learning algorithms were trained to optimize the Q values. Although
deep reinforcement learning outperformed many other algorithms, it might over-
estimate the action values and generate unrealistic actions. Meanwhile, double
deep reinforcement learning overcame these problems and proved robustness
Dynamic Programming for Maximizing Customer Lifetime Value 441

in the generated actions. However, it suffered from set of limitations as well.


Eventually, many algorithms proved its effectiveness and outperformed DDQN
(i.e. rainbow, and DQV models) however, these were not applied on the area of
direct marketing yet. As future research directions, double deep Q network and
the algorithms that outperformed it (i.e. rainbow and DQN) might be validated
by applying them on the area of direct marketing for the sake of maximizing
CLV.

References
1. Abdolvand, N., Albadvi, A., Koosha, H.: Customer lifetime value: literature scop-
ing map, and an agenda for future research. Int. J. Manag. Perspect. 1(3), 41–59
(2014)
2. Ahmad, A., Floris, A., Atzori, L.: OTT-ISP Joint service management: a customer
lifetime value based approach. In: 2017 IFIP/IEEE Symposium on Integrated Net-
work and Service Management (IM). IEEE (2017)
3. Amin, H.J., Aminu, A., Isa, R.: Adoption and impact of marketing strategies in
Adama beverages Adamawa state, Northern Nigeria. Manag. Adm. Sci. Rev. 5(1),
38–47 (2016)
4. Arulkumaran, K., et al.: A brief survey of deep reinforcement learning. arXiv
preprint arXiv:1708.05866 (2017)
5. Barto, A.G., Thomas, P.S., Sutton, R.S.: Some recent applications of reinforce-
ment learning. In: Proceedings of the Eighteenth Yale Workshop on Adaptive and
Learning Systems (2017)
6. Bertsimas, D., Mersereau, A.J.: A learning approach for interactive marketing to
a customer segment. Oper. Res. 55(6), 1120–1135 (2007)
7. Bijmolt, T.H., Leeflang, P.S., Block, F., Eisenbeiss, M., Hardie, B.G., Lemmens,
A., Saffert, P.: Analytics for customer engagement. J. Serv. Res. 13(3), 341–356
(2010)
8. Bose, I., Chen, X.: Quantitative models for direct marketing: a review from systems
perspective. Eur. J. Oper. Res. 195(1), 1–16 (2009)
9. Cannon, J.N., Cannon, H.M.: Modeling strategic opportunities in product-mix
strategy: a customer-versus product-oriented perspective. In: Developments in
Business Simulation and Experiential Learning, vol. 35 (2014)
10. Casas-Arce, P., Martı́nez-Jerez, F.A., Narayanan, V.G.: The impact of forward-
looking metrics on employee decision-making: the case of customer lifetime value.
Account. Rev. 92(3), 31–56 (2016)
11. Chan, S.L., Ip, W.H.: A dynamic decision support system to predict the value
of customer for new product development. Decis. Support Syst. 52(1), 178–188
(2011)
12. Chen, J., Patton, R.J.: Robust Model-Based Fault Diagnosis for Dynamic Systems,
vol. 3. Springer, New York (2012)
13. Chen, P.P., et al.: Customer Lifetime Value in Video Games Using Deep Learning
and Parametric Models. arXiv preprint arXiv:1811.12799 (2018)
14. Cheng, C.-J., et al.: Customer lifetime value prediction by a Markov chain based
data mining model: application to an auto repair and maintenance company in
Taiwan. Scientia Iranica 19(3), 849–855 (2012)
15. Ching, W., et al.: Customer lifetime value: stochastic optimization approach. J.
Oper. Res. Soc. 55(8), 860–868 (2004)
442 E. AboElHamd et al.

16. Clempner, J.B., Poznyak, A.S.: Simple computing of the customer lifetime value: a
fixed local-optimal policy approach. J. Syst. Sci. Syst. Eng. 23(4), 439–459 (2014)
17. Däs, M., et al.: Customer lifetime network value: customer valuation in the context
of network effects. Electron. Mark. 27(4), 307–328 (2017)
18. Ekinci, Y., et al.: Analysis of customer lifetime value and marketing expenditure
decisions through a Markovian-based model. Eur. J. Oper. Res. 237(1), 278–288
(2014)
19. Ekinci, Y., Ulengin, F., Uray, N.: Using customer lifetime value to plan optimal
promotions. Serv. Ind. J. 34(2), 103–122 (2014)
20. Esteban-Bravo, M., Vidal-Sanz, J.M., Yildirim, G.: Valuing customer portfo-
lios with endogenous mass and direct marketing interventions using a stochastic
dynamic programming decomposition. Mark. Sci. 33(5), 621–640 (2014)
21. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning.
J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
22. Gelman, A.: Objections to Bayesian statistics. Bayesian Anal. 3(3), 445–449 (2008)
23. Gilbert, H., Weng, P., Xu, Y.: Optimizing quantiles in preference-based Markov
decision processes. AAAI (2017)
24. Gupta, S., Zeithaml, V.: Customer metrics and their impact on financial perfor-
mance. Mark. Sci. 25(6), 718–739 (2006)
25. Gupta, S., et al.: Modeling customer lifetime value. J. Serv. Res. 9(2), 139–155
(2006)
26. Haenlein, M., Kaplan, A.M., Beeser, A.J.: A model to determine customer lifetime
value in a retail banking context. Eur. Manag. J. 25(3), 221–234 (2007)
27. Hasselt, H.V.: Double Q-learning. In: Advances in Neural Information Processing
Systems, pp. 2613–2621 (2010)
28. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learn-
ing. arXiv preprint arXiv:1710.02298 (2017)
29. Hiziroglu, A., Sengul, S.: Investigating two customer lifetime value models from
segmentation perspective. Procedia Soc. Behav. Sci. 62, 766–774 (2012)
30. Hwang, H.: A stochastic approach for valuing customers: a case study. Int. J. Softw.
Eng. Appl 10(3), 67–82 (2016)
31. Jain, D., Singh, S.S.: Customer lifetime value research in marketing: a review and
future directions. J. Interact. Mark. 16(2), 34–46 (2002)
32. James, T., Glazebrook, K., Lin, K.: Developing effective service policies for mul-
ticlass queues with abandonment: asymptotic optimality and approximate policy
improvement. INFORMS J. Comput. 28(2), 251–264 (2016)
33. Jerath, K., Fader, P.S., Hardie, B.G.S.: Customer-base analysis using repeated
cross-sectional summary (RCSS) data. Eur. J. Oper. Res. 249(1), 340–350 (2016)
34. Jiang, D.R., Powell, W.B.: An approximate dynamic programming algorithm for
monotone value functions. Oper. Res. 63(6), 1489–1511 (2015)
35. Jiang, D.R., Powell, W.B.: Optimal hour-ahead bidding in the real-time electricity
market with battery storage using approximate dynamic programming. INFORMS
J. Comput. 27(3), 525–543 (2015)
36. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J.
Artif. Intell. Res. 4, 237–285 (1996)
37. Kahreh, M.S., et al.: Analyzing the applications of customer lifetime value (CLV)
based on benefit segmentation for the banking sector. Procedia Soc. Behav. Sci.
109, 590–594 (2014)
38. Kalashnikov, D., et al.: QT-Opt: scalable deep reinforcement learning for vision-
based robotic manipulation. arXiv preprint arXiv:1806.10293 (2018)
Dynamic Programming for Maximizing Customer Lifetime Value 443

39. Kamakura, W., et al.: Choice models and customer relationship management.
Mark. Lett. 16(3–4), 279–291 (2005)
40. Khajvand, M., et al.: Estimating customer lifetime value based on RFM analysis
of customer purchase behavior: case study. Procedia Comput. Sci. 3, 57–63 (2011)
41. Klein, R., Kolb, J.: Maximizing customer equity subject to capacity constraints.
Omega 55, 111–125 (2015)
42. Kumar, V., Ramani, G., Bohling, T.: Customer lifetime value approaches and best
practice applications. J. Interact. Mark. 18(3), 60–72 (2004)
43. Kumar, V., Petersen, J.A., Leone, R.P.: Driving profitability by encouraging cus-
tomer referrals: who, when, and how. J. Mark. 74(5), 1–17 (2010)
44. Kumar, V.: Customer lifetime value–the path to profitability. Found. Trends Mark.
2(1), 1–96 (2008)
45. Labbi, A., et al.: Customer Equity and Lifetime Management (CELM). Marketing
Science (2007)
46. Lang, T., Rettenmeier, M.: Understanding consumer behavior with recurrent neu-
ral networks. In: International Workshop on Machine Learning Methods for Rec-
ommender Systems (2017)
47. Leike, J., et al.: AI safety gridworlds. arXiv preprint arXiv:1711.09883 (2017)
48. Li, X., et al.: Recurrent reinforcement learning: a hybrid approach. arXiv preprint
arXiv:1509.03044 (2015)
49. Li, Y.: Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274
(2017)
50. Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image
Anal. 42, 60–88 (2017)
51. Liu, D., Wang, D., Ichibushi, H.: Adaptive dynamic programming and reinforce-
ment learning. In: UNESCO Encyclopedia of Life Support Systems (2012)
52. Ma, M., Li, Z., Chen, J.: Phase-type distribution of customer relationship with
Markovian response and marketing expenditure decision on the customer lifetime
value. Eur. J. Oper. Res. 187(1), 313–326 (2008)
53. Ma, S., et al.: A nonhomogeneous hidden Markov model of response dynamics
and mailing optimization in direct marketing. Eur. J. Oper. Res. 253(2), 514–523
(2016)
54. Malthouse, E.C., Blattberg, R.C.: Can we predict customer lifetime value? J. Inter-
act. Mark. 19(1), 2–16 (2005)
55. Malthouse, E.C., et al.: Managing customer relationships in the social media era:
Introducing the social CRM house. J. Interact. Mark. 27(4), 270–280 (2013)
56. Mannor, S., et al.: Bias and variance approximation in value function estimates.
Manag. Sci. 53(2), 308–322 (2007)
57. Mirrokni, V.S., et al.: Dynamic auctions with bank accounts. In: IJCAI (2016)
58. Nasution, R.A., et al.: The customer experience framework as baseline for strategy
and implementation in services marketing. Procedia Soc. Behav. Sci. 148, 254–261
(2014)
59. Nemati, Y., et al.: A CLV-based framework to prioritize promotion marketing
strategies: a case study of telecom industry. Iran. J. Manag. Stud. 11(3), 437–462
(2018)
60. Neslin, S.A., et al.: Overcoming the “recency trap” in customer relationship man-
agement. J. Acad. Mark. Sci. 41(3), 320–337 (2013)
61. Nour, M.A.: An integrative framework for customer relationship management:
towards a systems view. Int. J. Bus. Inf. Syst. 9(1), 26–50 (2012)
444 E. AboElHamd et al.

62. Ohno, K., et al.: New approximate dynamic programming algorithms for large-
scale undiscounted Markov decision processes and their application to optimize a
production and distribution system. Eur. J. Oper. Res. 249(1), 22–31 (2016)
63. Permana, D., Pasaribu, U.S., Indratno, S.W.: Classification of customer lifetime
value models using Markov chain. J. Phys. Conf. Ser. 893(1), 012026 (2017)
64. Powell, W.B.: Approximate dynamic programming: lessons from the field. In: 2008
Winter Simulation Conference. IEEE (2008)
65. Powell, W.B.: What you should know about approximate dynamic programming.
Nav. Res. Logist. (NRL) 56(3), 239–249 (2009)
66. Reimer, K., Rutz, O.J., Pauwels, K.: How online consumer segments differ in long-
term marketing effectiveness. J. Interact. Mark. 28(4), 271–284 (2014)
67. Reinartz, W., Thomas, J.S., Kumar, V.: Balancin acquisition and retention
resources to maximize customer protability. J. Mark. 69(1), 63–79 (2005)
68. Rust, R.T., Kumar, V., Venkatesan, R.: Will the frog change into a prince? Pre-
dicting future customer profitability. Int. J. Res. Mark. 28(4), 281–294 (2011)
69. Sabatelli, M., et al.: Deep Quality-Value (DQV) Learning. arXiv preprint
arXiv:1810.00368 (2018)
70. Sabbeh, S.F.: Machine-learning techniques for customer retention: a comparative
study. Int. J. Adv. Comput. Sci. Appl. 9(2), 273–281 (2018)
71. Shah, D., et al.: Unprofitable cross-buying: evidence from consumer and business
markets. J. Mark. 76(3), 78–95 (2012)
72. Sifa, R., et al.: Customer lifetime value prediction in non-contractual freemium
settings: chasing high-value users using deep neural networks and SMOTE. In:
Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
73. Silver, D., et al.: Concurrent reinforcement learning from customer interactions.
In: International Conference on Machine Learning (2013)
74. Simester, D.I., Sun, P., Tsitsiklis, J.N.: Dynamic catalog mailing policies. Manag.
Sci. 52(5), 683–696 (2006)
75. Simester, D.: Field experiments in marketing. In: Handbook of Economic Field
Experiments, vol. 1, pp. 465–497. North-Holland (2017)
76. Tarokh, M.J., EsmaeiliGookeh, M.: A new model to speculate CLV based on
Markov chain model. J. Ind. Eng. Manag. Stud. 4(2), 85–102 (2017)
77. Theocharous, G., Hallak, A.: Lifetime value marketing using reinforcement learn-
ing. In: RLDM 2013, p. 19 (2013)
78. Theocharous, G., Thomas, P.S., Ghavamzadeh, M.: Personalized ad recommenda-
tion systems for life-time value optimization with guarantees. In: IJCAI (2015)
79. Tkachenko, Y., Kochenderfer, M.J., Kluza, K.: Customer simulation for direct
marketing experiments. In: 2016 IEEE International Conference on Data Science
and Advanced Analytics (DSAA). IEEE (2016)
80. Tkachenko, Y.: Autonomous CRM control via CLV approximation with deep
reinforcement learning in discrete and continuous action space. arXiv preprint
arXiv:1504.01840 (2015)
81. Umashankar, N., Bhagwat, Y., Kumar, V.: Do loyal customers really pay more for
services? J. Acad. Mark. Sci. 45(6), 807–826 (2017)
82. Vaeztehrani, A., Modarres, M., Aref, S.: Developing an integrated revenue man-
agement and customer relationship management approach in the hotel industry. J.
Revenue Pricing Manag. 14(2), 97–119 (2015)
83. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double
Q-learning. In: AAAI, vol. 2 (2016)
84. Van Otterlo, M.: Markov decision processes: concepts and algorithms. Course on
‘Learning and Reasoning’ (2009)
Dynamic Programming for Maximizing Customer Lifetime Value 445

85. Venkatesan, R., Kumar, V.: A customer lifetime value framework for customer
selection and resource allocation strategy. J. Mark. 68(4), 106–125 (2004)
86. Venkatesan, R., Kumar, V., Bohling, T.: Optimal customer relationship manage-
ment using Bayesian decision theory: an application for customer selection. J.
Mark. Res. 44(4), 579–594 (2007)
87. Verhoef, P.C., et al.: CRM in data-rich multichannel retailing environments: a
review and future research directions. J. Interact. Mark. 24(2), 121–137 (2010)
88. Verma, S.: Effectiveness of social network sites for influencing consumer purchase
decisions. Int. J. Bus. Excel. 6(5), 624–634 (2013)
89. Wang, C., Pozza, I.D.: The antecedents of customer lifetime duration and dis-
counted expected transactions: discrete-time based transaction data analysis. No.
2014-203 (2014)
90. Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control
of unknown nonlinear systems with application to coal gasification. IEEE Trans.
Autom. Sci. Eng. 11(4), 1020–1036 (2014)
91. Wübben, M., Wangenheim, F.V.: Instant customer base analysis: managerial
heuristics often “get it right”. J. Mark. 72(3), 82–93 (2008)
92. Zhang, J.Z., Netzer, O., Ansari, A.: Dynamic targeted pricing in B2B relationships.
Market. Sci. 33(3), 317–337 (2014)
93. Zhang, Q., Seetharaman, P.B.: Assessing lifetime profitability of customers with
purchasing cycles. Mark. Intell. Plan. 36(2), 276–289 (2018)
94. Zhao, M., et al.: Impression allocation for combating fraud in E-commerce via deep
reinforcement learning with action norm penalty. In: IJCAI (2018)
95. Tirenni, G., et al.: The 2005 ISMS practice prize winner-customer equity and
lifetime management (CELM) finnair case study. Mark. Sci. 26(4), 553–565 (2007)

You might also like