Dynamic Programming Models
Dynamic Programming Models
1 Introduction
The term “Customer Relationship Management” or CRM, emerged in the mid-
1990. It is a broadly recognized strategy for acquisition and retention of cus-
tomers. In [33], they mentioned that CRM is much more than installing software.
It requires knowledge from the organization that customers are assets. Hence,
CRM depends on the fact that determining how much to spend on a customer
is a significant assessment involved in identifying who and when to retain. In
[9], they stated that firms can be more profitable, gain steady market growth
and increase their market share if they identify the most profitable customers
and invest marketing resources in them. Thus, it is not surprising for the firm
to treat customers differently, some of them would be offered customized pro-
motions and the rest might be left to go. Basically, the main goal of CRM is to
help the firm determine the long term profitability of its customers to deliver
them the optimal promotion plans accordingly. Thus, CRM depends heavily on
the lifetime profitability of the customers especially, for the organizations that
seek long-term relationships with their customers. This is why Customer Life-
time Value (CLV) plays a significant role in CRM as a direct marketing strategy
measurement ([3,40,42,43,54]).
CLV is the most reliable indicator in direct marketing for measuring the
profitability of the customers. Meanwhile, direct marketing is not about offering
the best possible service to every single customer, but about treating customers
differently based on their level of profitability. In [68], they found that CLV
based resource reallocation led to an increase in revenue of about 20 million (a
10-fold increase) without any changes in the level of marketing investment, as
mentioned in [7]. This is one of the reasons why CLV is considered a significant
metric in CRM as a direct marketing indicator. In general, CLV is defined as
the present value of all future profits obtained from the customers over their life
of relationship with a firm. The main idea of CLV is demonstrated in Fig. 1, it
imagines the customer (or segment of customers) stands at time 0, and their
related historical data is already acquired by the organization, the goal is to
predict their future profitability. Meanwhile, the complexity of CLV lies in its
dependency on many indicators, including customers’ retention rate, acquisition
rate, churn probability and more, as shown in Fig. 3 that lists some of these
indicators and a broader list is mentioned in the work of [19].
Dynamic Programming for Maximizing Customer Lifetime Value 421
The rest of the paper is organized as follows: Sect. 2 introduces the main idea
and the algorithm of each of the approaches that aim to maximize CLV. Section 3
reviewed the basic and traditional models of CLV that focus on analyzing the
term, calculating its value, simulating the interaction between CLV and its most
effective indicators or predicting its future value. Section 4 lists the limitations
of the traditional models and highlights the most important dynamic models
applied in CLV; starting from MDP, passing by approximate dynamic program-
ming and finalizing by the contribution of reinforcement learning techniques in
the field of maximizing CLV. The summary and discussion over all these methods
is listed in Sect. 5, showing the strengths and limitations of each of them. The
last section concludes the paper and highlights set of future research directions.
2 Background
This section focuses on the dynamic programming algorithms that are widely
used in the area of maximizing CLV. Briefly introduces the main idea of these
algorithms and states the steps of each. It is constructed as follows; Starts from
MDP as a traditional model and stating the algorithm’s steps. Then mentions
ADP (RL) model that outperforms MDP in complex problem. Finally, highlights
the main idea and the algorithm of deep reinforcement learning.
current to next state, and finally, the discounted accumulated reward value (r).
Thus, MDP is described by the tuple (s, a, s , r, γ) where γ is a discount factor
(0 < γ <= 1). The transition probability function and the reward function are
described in Eqs. (1), and (2), respectively. The solution to MDP is called a pol-
icy, that is a mapping between states and actions; and the goal of MDP is to
find the optimal policy that maximizes the long term rewards [19]. There are
two solution approaches to achieve this goal, either value-iteration algorithm or
policy iteration algorithm [84]. The steps of value-iteration algorithm and policy
iteration MDP algorithms are mentioned in Algorithms 1, and 2, respectively
in [85].
Pass = P [St+1 = s |St = s, At = a] (1)
Algorithm 4. Q-Learning
1: start with Q0 (s, a)∀s, a
2: get initial state s
3: for k = 1, k++, while not converged do
sample action a, get next state s
4: if s is terminal then:
target = R(s, a, s )
sample new initial state s
else target = R(s, a, s ) + γmaxa Qk (s , a )
5: Qk+1 (s, a) ← (1 − α(Qk (s, a) + α[target])
6: s←s
In complex problems, finding exact solution for the problem might be impossible
and this is why, searching for approximate solution become a must. This was
the motivation behind ADP at first and eventually, deep reinforcement learning
(Or Deep Q Networks (DQN)), Algorithm 5. The basic idea of DQN is that
the action-value function Q is approximated using neural network (or more pre-
cisely, deep learning) algorithm, instead of using lookup table (as mentioned in
the previous section). Hence, DQN combines Q Learning and neural networks
algorithms.
of stochastic Markov decision process model. Also, in [72], they highlighted the
fact of 20/80 rule, that the largest portion on the profit caused by only a small
number of users. Hence, they tried to build a prediction model with the help of
neural networks to predict CLV. They also highlighted that the data was imbal-
anced and this is why they combined synthetic minority oversampling (SMOTE)
method with neural network to perform the prediction. SMOTE improved the
data augmentation and thus the overall prediction results.
On another hand, and reference to Fig. 2, that lists set of indicators CLV
depends up on. This dependency attracted many researchers and motivated
them to analyze it. To list some, in [67], they introduced a modeling framework
to balance the resources devoted to customer acquisition and customer retention,
for the sake of maximizing the profitability. However, in [74], they analyzed the
profitability differently. They tried to find relationship between cross-buying and
firm’s profitability. They analyzed customer databases for 5 firms and found that
from 10–35 of the firm’s customers who perform cross-buying are unprofitable.
They also proposed a framework to help managers determine the profitable and
unprofitable cross-buying. In [60], they analyzed the time of the customer’s pre-
vious purchase (i.e. recency) in a developed modeling approaches that target
firm’s marketing efforts taking into consideration the recency of the customers.
In [66], the researchers investigated the long term impact of different channels
including, coupon promotions, TV and internet ads across customer segments.
They applied their model on a digital music provider and over about 500K cus-
tomers and concluded with actionable insights. On top of these indicators, was
customer loyalty. This indicator attracted the researchers in [81], who divided
loyalty to behavioral loyalty and attitudinal loyalty. They conducted longitu-
dinal survey showing that as customer’s behavior loyalty increased, customers
placed more importance on price and less on rewards and convenience. However,
in attitudinal loyalty customers gave more attention to rewards and convenience.
Hence, attitudinal loyalty caused firms to reduce price sensitivity and increase
revenue.
Dynamic Programming for Maximizing Customer Lifetime Value 427
Table 2. Number of publications for each CLV related task (Sample of 83 papers)
Methodology
CLV task Basic models MDP ADP DQN
Analysis 18 6 - -
Simulation 3 - - -
Optimization - 4 - -
Prediction 7 7 - -
Maximization 4 4 10 20
Total 32 21 10 20
428 E. AboElHamd et al.
As mentioned in the previous section, states and actions are two of the main
MDP’s elements. Hence, their characteristics categorize the MPD problem into
either discrete or continuous. Figure 6 shows the categorizes of states and actions
to discrete or continuous and categorize MDP problem accordingly.
Traditionally, MDP was on top of the models for maximizing CLV; and over
many years, it outperformed many other approaches; due to its mechanism that
suits the purpose. This subsection lists the literature’s contributions according
to the publishing year. The listing is guided by what has been mentioned in
Tables 1, 2 and Fig. 5. Set of the most effective contributions mentioned in Fig. 5
are selected and listed in the rest of this subsection.
The list starts with 2007 and there are three contributions on three differ-
ent datasets selected then. Starting by the researchers in [26] who developed
a customer valuation model combined first order Markov decision model with
classification and regression tree (CART). The strength of their model lied in its
ability to deal with both the discrete and continuous transitions. Their model
is tested on a leading German bank with a 6.2 million datasets. In [45], they
proposed a comprehensive framework combines customer equity with lifetime
management. Their framework maximized ROI as it helped in the optimal plan-
ning and budgeting of targeted marketing campaigns. Their proposed model
combined advanced Markov decision process models, Monte Carlo simulation,
and portfolio optimization. They tested their model on the Finnair case study.
The contribution in [56] was a bit different. They proposed a model that provided
closed form approximations for the bias and variance; considering finite-state,
finite-action, infinite-horizon and discounted rewards Markov decision process.
Their tested their model on a large scale mailing catalog dataset.
While in 2008, in [92], they discussed the interest of academics and prac-
titioners in the development and implementation of stochastic customer base
analysis models. They also presented the arguments in favor of or against the
usage of these models in practice. As a contribution, the authors compared the
quality of these models in practice with a simple heuristic that firms typically
use. They confirmed that the heuristics perform at least as well as the stochastic
models expect for future purchases predictions. Hence, the authors recommended
the usage of stochastic customer base analysis models in practice. In [52], they
captured relationship marketing in a Markov decision framework as well as a life
distribution of the customer relationship focusing on CLV and illustrated a way
to find optimal marketing policy numerically.
Moving to 2011, to [11] who added to the contributions in the literature,
a comprehensive decision support system. They predicted customer purchas-
ing behavior given set of factors including product, customer, and marketing
influencing factors. They utilized the predicted customer purchasing behavior in
estimating the customer’s net CLV for a specific product. They verified their sys-
tem through a case study. While, in [14], they developed a framework consisted
of three groups of techniques to compute CLV based on its predicted value, iden-
tified its critical variables and finally predicted the profit of the customers under
different purchasing behavior using ANN. Their model is tested on a dataset
related to a company in Taiwan.
Ekinci has many contributions in the area of maximizing CLV. In [21], they
designed a methodology to help managers maximize the CLV of their customers
based on determining the optimal promotion campaigns. Their methodology was
built with the help of classification and regression tree (CART) and stochastic
dynamic programming. Ekinci also joined a research group (2014) and designed
a two-step methodology study that aimed to maximize CLV via determining
optimal promotion campaigns; Based on stochastic dynamic programming and
regression tree. Their model is applied in banking sector and they also tried
432 E. AboElHamd et al.
to determine the states of the customers according to their values using CART
technique. In [19], they tackled the problem from another perspective. They
developed a simple, industrial specific, easily measurable model with objective
indicators to predict CLV. They injected the predicted CLV as states in Markov
decision process model. Their proposed model is tested in the banking sector.
One of the strengths of their model is that they conducted set of in-depth inter-
views to collect the most effective indicators for CLV. In the same year, in [90],
they estimated the customer’s lifetime duration and discounted expected trans-
actions using a simple and standard discrete-time based transaction data. They
also identified the relational and demographical factors that might cause the
variance in customer’s CLV. They concluded by set of insights to the market-
ing managers and decision makers. In [93], they built a Markov decision process
model with multivariate interrelated state-dependent behavior. The uniqueness
in their model lied in its ability to capture the effect of firm pricing decisions in
customer purchasing behavior and consequently, in customer lifetime profitabil-
ity in a Business to Business (B2B) market. In [16], they presented a method
for finding local optimal policy for CLV, developed for a class of ergodic con-
trollable finite Markov chains for a non-converging state value function. They
validated their method by simulated credit-card marketing experiments. While,
in [15], they developed a stochastic dynamic programming model in both finite
and infinite time horizons to optimize CLV. Their model is tested using practical
data of computer service company.
In [82], they utilized stochastic dynamic programming model that was opti-
mized using two deterministic linear programming algorithms. They concluded
with set of insights including that the loyalty might decrease in short term net
revenue. They also compared different loyalty programs. The most exiting find-
ing is that their analytical long term evaluation of loyalty programs was capa-
ble of determining the most appropriate loyalty factors. In the same year, the
researchers in [41] have had two contributions, initially, they studied how to bal-
ance the tradeoff between short term attainable revenues and long term customer
relationship using Markov decision process model. Then they investigated the
impact of limited capacity on CLV by introducing an opportunity cost-based
approach that understood customer profitability as a customer’s contribution
to customer equity. In [30], they proposed a method to calculate CLV dynam-
ically for the sake of adopting personalized CRM activities. They also applied
data mining techniques to predict CLV and their model is tested on a wireless
Telecom industry in Korea.
In the last two years, the researchers tried to add to what have been done as
well. Taking [23] as example. They proposed an offline algorithm that computed
an optimal policy for the quantile criterion. Their algorithm could be applied
both for finite and infinite horizons. As a future work they mentioned the aim
to upgrade their algorithm to a reinforcement learning where the dynamics of
the problem is unknown and has to be learned. In [94], they designed a predic-
tion probabilistic model to measure the lifetime profitability of customers. They
were interested in the customers whose purchasing behavior followed purchasing
cycles. Their model is measured using the inter purchase time of the customers
Dynamic Programming for Maximizing Customer Lifetime Value 433
and was assumed to follow Poisson distribution. They also measured the cus-
tomer lifetime profitability based on a proposed a customer’s probability scoring
model. Their model was applied on dataset for 529 customers from catalog firm
and showed outperforming results.
The banking sector attracted many researchers to enhance the bank and
customer’s relationship using different approximate dynamic programming
approaches. For example, in [12] they utilized dynamic programming to solve
the problem of maximizing bank’s profit. The optimization problem is formu-
lated as a discrete, multi-stage decision process. The obtained solution is globally
optimal and numerically stable. In [20], they contributed in the area of marketing
budget allocation for the sake of optimizing CLV, by introducing a decomposi-
tion algorithm that overcame the curse of dimensionality in stochastic dynamic
programming problems. In [91], they utilized iterative adaptive dynamic pro-
gramming to establish a data-based iterative optimal learning control scheme,
that is used for a discrete-time nonlinear systems. Their model is used to solve a
coal gasification optimal tracking control problem. Neural networks were used to
represent the dynamical process of coal gasification, coal quality and reference
control. Finally, iterative ADP was mainly used to obtain the optimal control
laws for the transformed system.
Jiang et al. has many contributes in this context [34,35]. In (2015), they
proposed Monotone-ADP, a provably convergent algorithm that exploited the
value function monotonicity to increase the convergence rate. Their algorithm
was applied on a finite horizon problem and show numerical results for three
application domains including optimal stopping, energy storage (allocation), and
glycemic control for diabetes patients. The same researchers and within the same
year (i.e. 2015), published a paper that formulated the problem of the real-time
placement of the battery storage operators while simultaneously accounting for
the leftover energy value. Their algorithm exploited value function monotonicity
to be able to find the revenue generating bidding policy. They also proposed a
distribution-free variant of the ADP algorithm. Their algorithm is tested on New
York Independent System and recommended that a policy trained on historical
real-time price data using their proposed algorithm was indeed effective.
Moving to 2016, in [32], their point’s of view was a bit different. They devel-
oped a single server queuing model that determined a service policy to max-
imize the long term reward from serving customer. Their model excluded the
holding costs and penalties due to customers waiting and leaving before receiv-
ing the service. They also developed a model that utilized ADP to estimate
the bias function that was used in a dynamic programming recursion. While,
in [62], they proposed an approximate dynamic programming algorithm called
simulated-based modified policy iteration, for large scale undiscounted Markov
decision processes. Their algorithm overcame the curse of dimensionality when
tested on numerical examples. Back to the banking sector’s applications, in [57],
they explored a restricted family of dynamic auctions to check the possibility of
implementing them in online fashion and without too much commitment from
the seller in a space of single shot auctions that they called (bank account).
Although Approximate dynamic programming and Reinforcement Learning
are used interchangeably and present the same concept. ADP is more popular in
the context of Operations Research (OR) while, RL is widely used in the context
of computer science. It is considered as a branch of machine learning, focused
Dynamic Programming for Maximizing Customer Lifetime Value 435
on how the agent would perform in an environment when being in certain state,
and has to take action to be able to move to another state while maximizing a
commutative reward function. On another hand, reinforcement learning is very
close to Markov Decision Process (MDP). However, the former don’t assume
knowledge about the mathematical model and is mainly used when having a
model for the problem is infeasible. This is why machine learning algorithms (i.e.
neural network) is used to approximate the solution function of the reinforcement
learning [73].
Computer science researchers tackled ADP from this perspective, calling it
Reinforcement Learning. As an overview on the survey papers that mentioned
the contributions in reinforcement learning. Let’s start with the researchers in
[77] who explored marketing recommendations through reinforcement learning
and Markov decision process, and evaluated their performance through an offline
evaluation method using a well-crafted simulator. They tested their proposal on
several real world datasets from automotive and banking industry. While, in
[74], they proposed a framework for concurrent reinforcement learning using
temporal difference learning, that captured the parallel interaction between the
company and its customers. They tested their proposal on large scale dataset for
online and email interactions. Back to [78], they made another contribution is
a manuscript that aimed to build a comprehensive framework for utilizing rein-
forcement learning with off policy techniques to optimize CLV for personalized
ads recommendation systems. They also compared the performance of life time
value matric with click through rate for evaluating the performance of person-
alized ads recommendation systems. Their framework was tested on a real data
and proved its outperformance.
On another hand, in [5], they stated five recent applications for reinforcement
learning, including personalized web services. This included recommending the
best content for each particular user based on his profile interests inferred from
his history of online activity. While the survey paper in [49] summarized the
achievements of deep reinforcement learning. They discussed its six core ele-
ments, six mechanisms and listed twelve applications, including its applicability
in Business management (i.e. ads, recommendation, and marketing), finance and
health care. They also mentioned topics not reviewed until the time they wrote
their manuscript and listed set of resources and tutorials for deep reinforcement
learning. In [47], they focused on illustrating the deep reinforcement learning
environment components, and the properties of intelligent agents. They listed
eight RL related problems including safe interruptibility, avoiding side effects,
absent supervisor, reward gaming, safe exploration, as well as robustness to self-
modification, distributional shift, and adversaries. These problems were catego-
rized into two categories (i.e. robustness and specification problems) according
to whether the performance function corresponded to the observed reward func-
tion. They built two models to deal with these problems (i.e. A2C and Rainbow).
In [4], they wrote a survey that covered central algorithms in deep reinforcement
learning including deep Q-network, trust region policy optimization, and asyn-
chronous advantage actor-critic. They also highlighted the unique advantages
436 E. AboElHamd et al.
focused on e-commerce and built a model that understood the customer behav-
ior in e-commerce using recurrent neural networks. The power of their model
lied in its ability to capture the customers’ actions sequences; hence, it overcame
the traditional vector based methods (i.e. Logistic Regression). In [95], they
focused on e-commerce transactions as well; however, they combined Markov
decision process with unbounded action space with deep learning, to build a
deep reinforcement learning platform that overcame the limitations of the other
fraud detection mechanisms. Their model maximized the e-commerce profit and
reduced the fraudulent behavior. It was applied on a real world e-commerce
dataset. Finally, the researchers in [13] tackled CLV from a totally different per-
spective, by applying it on video games industry. These researchers suggested
that Convolutional Neural Network (CNN) structure is the most efficient among
all neural network structures in predicting the economic value of the individual
players, and best suited the nature of video games large dataset. They also high-
lighted one of the most benefits of CNN that no feature engineering is needed.
Although deep reinforcement learning has a significant contribution in the
area of direct marketing for the sake of maximizing CLV. It has many limita-
tions including the fact that it might overestimate the action values and hence,
generates unrealistic actions, as it always shows the action that maximizes the
Q values, as shown in Eq. (4) and Algorithm 5; where Q represents the CLV, r
is the rewards, γ is the discount factor, s is the next states and, a is the next
actions that maximize the Q values. These limitations motivated few researchers
to develop double deep reinforcement learning or DDQN for short [27]. Although
DDQN as well has some limitations, it overcame the disadvantages of DQN
by generating reliable and robust actions values. DDQN has been designed to
have two decoupled (i.e. separate) networks, one for the selection of the opti-
mal action that maximizes Q and one for the evaluation of this selected action.
This decoupling process helps DDQN to generate reliable actions. Equation (5)
demonstrates DDQN model, where θ represents the weights of the first network
and θ represents the weights of the second network.
Q(s, a) = r + γmax(Q(s , a , θ) (4)
Q(s, a) = r + γQ(s , argmaxa ((Q(s , a ; θ), θ ) (5)
Hasselt et al. published few papers about the mechanism of DDQN (i.e.
[27,83]). However, these papers lakes the applicability as they only illustrated
the theory of the model and didn’t apply it on a real case study. In [38], they
proposed a scalable reinforcement learning approach that was applied on robotic
manipulations. However, this still far from the direct marketing area. Hence,
applying double deep reinforcement learning in the context of direct marketing
to maximize CLV still unreached island. However, empirical studies proved some
limitations for DDQN and this what was mentioned in the work of [28] who
proposed a model called “Rainbow” that combined six extensions in DQN. Their
model was applied on Atari 2600 benchmark and not related to marketing area
as well. Also, in [70], they contributed to the literature by deep quality value
438 E. AboElHamd et al.
crossselling and upselling policies, ...). Although this approach has many advan-
tages including being simple and easy to interpret to top management, it lakes
capturing the dynamism in revenue and costs over time. This limitation and
even more were solved with other researchers who went for applying simulation
models to capture CLV. As mentioned in Table 3, this type of models also has
it’s limitations including high costs, and difficulty to interpret its results in some
cases. One of the breakthroughs in the area of CLV done with those researchers
who went for predicting its value. This type of models has many advantages and
attracted both the researchers and the practitioners. Meanwhile, On top of the
effective literature contributions is CLV maximization. This research direction
not only predicted CLV but also maximized the potential of the customer, his
loyalty, and the firm’s overall profitability and market share. However, only few
researchers started utilizing basic models for this purpose (i.e. Bayesian models)
and the majority achieved this purpose by utilizing dynamic models instead.
In general, the applicability of the basic models in real life applications is
higher than the dynamic models, due to its implementation’s simplicity and
results interpretability. However, the dynamic models attracted the researchers
due to its effectiveness and outperformance. Hence, the vast majority of
researchers utilized dynamic models for the sake of maximizing CLV due to its
suitability for the problem and its effectiveness. Table 4 lists set of strengths and
limitations of each algorithm and it can easily observed that each algorithm’s
basic strengths lies in avoiding the limitations of its previous one. The most effec-
tive algorithm under this algorithm is “rainbow” proposed by the researchers in
[28] as it combined lots of deep reinforcement learning techniques and made use
of each of their advantages; Yet it wasn’t utilized in the area of CLV.
References
1. Abdolvand, N., Albadvi, A., Koosha, H.: Customer lifetime value: literature scop-
ing map, and an agenda for future research. Int. J. Manag. Perspect. 1(3), 41–59
(2014)
2. Ahmad, A., Floris, A., Atzori, L.: OTT-ISP Joint service management: a customer
lifetime value based approach. In: 2017 IFIP/IEEE Symposium on Integrated Net-
work and Service Management (IM). IEEE (2017)
3. Amin, H.J., Aminu, A., Isa, R.: Adoption and impact of marketing strategies in
Adama beverages Adamawa state, Northern Nigeria. Manag. Adm. Sci. Rev. 5(1),
38–47 (2016)
4. Arulkumaran, K., et al.: A brief survey of deep reinforcement learning. arXiv
preprint arXiv:1708.05866 (2017)
5. Barto, A.G., Thomas, P.S., Sutton, R.S.: Some recent applications of reinforce-
ment learning. In: Proceedings of the Eighteenth Yale Workshop on Adaptive and
Learning Systems (2017)
6. Bertsimas, D., Mersereau, A.J.: A learning approach for interactive marketing to
a customer segment. Oper. Res. 55(6), 1120–1135 (2007)
7. Bijmolt, T.H., Leeflang, P.S., Block, F., Eisenbeiss, M., Hardie, B.G., Lemmens,
A., Saffert, P.: Analytics for customer engagement. J. Serv. Res. 13(3), 341–356
(2010)
8. Bose, I., Chen, X.: Quantitative models for direct marketing: a review from systems
perspective. Eur. J. Oper. Res. 195(1), 1–16 (2009)
9. Cannon, J.N., Cannon, H.M.: Modeling strategic opportunities in product-mix
strategy: a customer-versus product-oriented perspective. In: Developments in
Business Simulation and Experiential Learning, vol. 35 (2014)
10. Casas-Arce, P., Martı́nez-Jerez, F.A., Narayanan, V.G.: The impact of forward-
looking metrics on employee decision-making: the case of customer lifetime value.
Account. Rev. 92(3), 31–56 (2016)
11. Chan, S.L., Ip, W.H.: A dynamic decision support system to predict the value
of customer for new product development. Decis. Support Syst. 52(1), 178–188
(2011)
12. Chen, J., Patton, R.J.: Robust Model-Based Fault Diagnosis for Dynamic Systems,
vol. 3. Springer, New York (2012)
13. Chen, P.P., et al.: Customer Lifetime Value in Video Games Using Deep Learning
and Parametric Models. arXiv preprint arXiv:1811.12799 (2018)
14. Cheng, C.-J., et al.: Customer lifetime value prediction by a Markov chain based
data mining model: application to an auto repair and maintenance company in
Taiwan. Scientia Iranica 19(3), 849–855 (2012)
15. Ching, W., et al.: Customer lifetime value: stochastic optimization approach. J.
Oper. Res. Soc. 55(8), 860–868 (2004)
442 E. AboElHamd et al.
16. Clempner, J.B., Poznyak, A.S.: Simple computing of the customer lifetime value: a
fixed local-optimal policy approach. J. Syst. Sci. Syst. Eng. 23(4), 439–459 (2014)
17. Däs, M., et al.: Customer lifetime network value: customer valuation in the context
of network effects. Electron. Mark. 27(4), 307–328 (2017)
18. Ekinci, Y., et al.: Analysis of customer lifetime value and marketing expenditure
decisions through a Markovian-based model. Eur. J. Oper. Res. 237(1), 278–288
(2014)
19. Ekinci, Y., Ulengin, F., Uray, N.: Using customer lifetime value to plan optimal
promotions. Serv. Ind. J. 34(2), 103–122 (2014)
20. Esteban-Bravo, M., Vidal-Sanz, J.M., Yildirim, G.: Valuing customer portfo-
lios with endogenous mass and direct marketing interventions using a stochastic
dynamic programming decomposition. Mark. Sci. 33(5), 621–640 (2014)
21. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning.
J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
22. Gelman, A.: Objections to Bayesian statistics. Bayesian Anal. 3(3), 445–449 (2008)
23. Gilbert, H., Weng, P., Xu, Y.: Optimizing quantiles in preference-based Markov
decision processes. AAAI (2017)
24. Gupta, S., Zeithaml, V.: Customer metrics and their impact on financial perfor-
mance. Mark. Sci. 25(6), 718–739 (2006)
25. Gupta, S., et al.: Modeling customer lifetime value. J. Serv. Res. 9(2), 139–155
(2006)
26. Haenlein, M., Kaplan, A.M., Beeser, A.J.: A model to determine customer lifetime
value in a retail banking context. Eur. Manag. J. 25(3), 221–234 (2007)
27. Hasselt, H.V.: Double Q-learning. In: Advances in Neural Information Processing
Systems, pp. 2613–2621 (2010)
28. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learn-
ing. arXiv preprint arXiv:1710.02298 (2017)
29. Hiziroglu, A., Sengul, S.: Investigating two customer lifetime value models from
segmentation perspective. Procedia Soc. Behav. Sci. 62, 766–774 (2012)
30. Hwang, H.: A stochastic approach for valuing customers: a case study. Int. J. Softw.
Eng. Appl 10(3), 67–82 (2016)
31. Jain, D., Singh, S.S.: Customer lifetime value research in marketing: a review and
future directions. J. Interact. Mark. 16(2), 34–46 (2002)
32. James, T., Glazebrook, K., Lin, K.: Developing effective service policies for mul-
ticlass queues with abandonment: asymptotic optimality and approximate policy
improvement. INFORMS J. Comput. 28(2), 251–264 (2016)
33. Jerath, K., Fader, P.S., Hardie, B.G.S.: Customer-base analysis using repeated
cross-sectional summary (RCSS) data. Eur. J. Oper. Res. 249(1), 340–350 (2016)
34. Jiang, D.R., Powell, W.B.: An approximate dynamic programming algorithm for
monotone value functions. Oper. Res. 63(6), 1489–1511 (2015)
35. Jiang, D.R., Powell, W.B.: Optimal hour-ahead bidding in the real-time electricity
market with battery storage using approximate dynamic programming. INFORMS
J. Comput. 27(3), 525–543 (2015)
36. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J.
Artif. Intell. Res. 4, 237–285 (1996)
37. Kahreh, M.S., et al.: Analyzing the applications of customer lifetime value (CLV)
based on benefit segmentation for the banking sector. Procedia Soc. Behav. Sci.
109, 590–594 (2014)
38. Kalashnikov, D., et al.: QT-Opt: scalable deep reinforcement learning for vision-
based robotic manipulation. arXiv preprint arXiv:1806.10293 (2018)
Dynamic Programming for Maximizing Customer Lifetime Value 443
39. Kamakura, W., et al.: Choice models and customer relationship management.
Mark. Lett. 16(3–4), 279–291 (2005)
40. Khajvand, M., et al.: Estimating customer lifetime value based on RFM analysis
of customer purchase behavior: case study. Procedia Comput. Sci. 3, 57–63 (2011)
41. Klein, R., Kolb, J.: Maximizing customer equity subject to capacity constraints.
Omega 55, 111–125 (2015)
42. Kumar, V., Ramani, G., Bohling, T.: Customer lifetime value approaches and best
practice applications. J. Interact. Mark. 18(3), 60–72 (2004)
43. Kumar, V., Petersen, J.A., Leone, R.P.: Driving profitability by encouraging cus-
tomer referrals: who, when, and how. J. Mark. 74(5), 1–17 (2010)
44. Kumar, V.: Customer lifetime value–the path to profitability. Found. Trends Mark.
2(1), 1–96 (2008)
45. Labbi, A., et al.: Customer Equity and Lifetime Management (CELM). Marketing
Science (2007)
46. Lang, T., Rettenmeier, M.: Understanding consumer behavior with recurrent neu-
ral networks. In: International Workshop on Machine Learning Methods for Rec-
ommender Systems (2017)
47. Leike, J., et al.: AI safety gridworlds. arXiv preprint arXiv:1711.09883 (2017)
48. Li, X., et al.: Recurrent reinforcement learning: a hybrid approach. arXiv preprint
arXiv:1509.03044 (2015)
49. Li, Y.: Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274
(2017)
50. Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image
Anal. 42, 60–88 (2017)
51. Liu, D., Wang, D., Ichibushi, H.: Adaptive dynamic programming and reinforce-
ment learning. In: UNESCO Encyclopedia of Life Support Systems (2012)
52. Ma, M., Li, Z., Chen, J.: Phase-type distribution of customer relationship with
Markovian response and marketing expenditure decision on the customer lifetime
value. Eur. J. Oper. Res. 187(1), 313–326 (2008)
53. Ma, S., et al.: A nonhomogeneous hidden Markov model of response dynamics
and mailing optimization in direct marketing. Eur. J. Oper. Res. 253(2), 514–523
(2016)
54. Malthouse, E.C., Blattberg, R.C.: Can we predict customer lifetime value? J. Inter-
act. Mark. 19(1), 2–16 (2005)
55. Malthouse, E.C., et al.: Managing customer relationships in the social media era:
Introducing the social CRM house. J. Interact. Mark. 27(4), 270–280 (2013)
56. Mannor, S., et al.: Bias and variance approximation in value function estimates.
Manag. Sci. 53(2), 308–322 (2007)
57. Mirrokni, V.S., et al.: Dynamic auctions with bank accounts. In: IJCAI (2016)
58. Nasution, R.A., et al.: The customer experience framework as baseline for strategy
and implementation in services marketing. Procedia Soc. Behav. Sci. 148, 254–261
(2014)
59. Nemati, Y., et al.: A CLV-based framework to prioritize promotion marketing
strategies: a case study of telecom industry. Iran. J. Manag. Stud. 11(3), 437–462
(2018)
60. Neslin, S.A., et al.: Overcoming the “recency trap” in customer relationship man-
agement. J. Acad. Mark. Sci. 41(3), 320–337 (2013)
61. Nour, M.A.: An integrative framework for customer relationship management:
towards a systems view. Int. J. Bus. Inf. Syst. 9(1), 26–50 (2012)
444 E. AboElHamd et al.
62. Ohno, K., et al.: New approximate dynamic programming algorithms for large-
scale undiscounted Markov decision processes and their application to optimize a
production and distribution system. Eur. J. Oper. Res. 249(1), 22–31 (2016)
63. Permana, D., Pasaribu, U.S., Indratno, S.W.: Classification of customer lifetime
value models using Markov chain. J. Phys. Conf. Ser. 893(1), 012026 (2017)
64. Powell, W.B.: Approximate dynamic programming: lessons from the field. In: 2008
Winter Simulation Conference. IEEE (2008)
65. Powell, W.B.: What you should know about approximate dynamic programming.
Nav. Res. Logist. (NRL) 56(3), 239–249 (2009)
66. Reimer, K., Rutz, O.J., Pauwels, K.: How online consumer segments differ in long-
term marketing effectiveness. J. Interact. Mark. 28(4), 271–284 (2014)
67. Reinartz, W., Thomas, J.S., Kumar, V.: Balancin acquisition and retention
resources to maximize customer protability. J. Mark. 69(1), 63–79 (2005)
68. Rust, R.T., Kumar, V., Venkatesan, R.: Will the frog change into a prince? Pre-
dicting future customer profitability. Int. J. Res. Mark. 28(4), 281–294 (2011)
69. Sabatelli, M., et al.: Deep Quality-Value (DQV) Learning. arXiv preprint
arXiv:1810.00368 (2018)
70. Sabbeh, S.F.: Machine-learning techniques for customer retention: a comparative
study. Int. J. Adv. Comput. Sci. Appl. 9(2), 273–281 (2018)
71. Shah, D., et al.: Unprofitable cross-buying: evidence from consumer and business
markets. J. Mark. 76(3), 78–95 (2012)
72. Sifa, R., et al.: Customer lifetime value prediction in non-contractual freemium
settings: chasing high-value users using deep neural networks and SMOTE. In:
Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
73. Silver, D., et al.: Concurrent reinforcement learning from customer interactions.
In: International Conference on Machine Learning (2013)
74. Simester, D.I., Sun, P., Tsitsiklis, J.N.: Dynamic catalog mailing policies. Manag.
Sci. 52(5), 683–696 (2006)
75. Simester, D.: Field experiments in marketing. In: Handbook of Economic Field
Experiments, vol. 1, pp. 465–497. North-Holland (2017)
76. Tarokh, M.J., EsmaeiliGookeh, M.: A new model to speculate CLV based on
Markov chain model. J. Ind. Eng. Manag. Stud. 4(2), 85–102 (2017)
77. Theocharous, G., Hallak, A.: Lifetime value marketing using reinforcement learn-
ing. In: RLDM 2013, p. 19 (2013)
78. Theocharous, G., Thomas, P.S., Ghavamzadeh, M.: Personalized ad recommenda-
tion systems for life-time value optimization with guarantees. In: IJCAI (2015)
79. Tkachenko, Y., Kochenderfer, M.J., Kluza, K.: Customer simulation for direct
marketing experiments. In: 2016 IEEE International Conference on Data Science
and Advanced Analytics (DSAA). IEEE (2016)
80. Tkachenko, Y.: Autonomous CRM control via CLV approximation with deep
reinforcement learning in discrete and continuous action space. arXiv preprint
arXiv:1504.01840 (2015)
81. Umashankar, N., Bhagwat, Y., Kumar, V.: Do loyal customers really pay more for
services? J. Acad. Mark. Sci. 45(6), 807–826 (2017)
82. Vaeztehrani, A., Modarres, M., Aref, S.: Developing an integrated revenue man-
agement and customer relationship management approach in the hotel industry. J.
Revenue Pricing Manag. 14(2), 97–119 (2015)
83. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double
Q-learning. In: AAAI, vol. 2 (2016)
84. Van Otterlo, M.: Markov decision processes: concepts and algorithms. Course on
‘Learning and Reasoning’ (2009)
Dynamic Programming for Maximizing Customer Lifetime Value 445
85. Venkatesan, R., Kumar, V.: A customer lifetime value framework for customer
selection and resource allocation strategy. J. Mark. 68(4), 106–125 (2004)
86. Venkatesan, R., Kumar, V., Bohling, T.: Optimal customer relationship manage-
ment using Bayesian decision theory: an application for customer selection. J.
Mark. Res. 44(4), 579–594 (2007)
87. Verhoef, P.C., et al.: CRM in data-rich multichannel retailing environments: a
review and future research directions. J. Interact. Mark. 24(2), 121–137 (2010)
88. Verma, S.: Effectiveness of social network sites for influencing consumer purchase
decisions. Int. J. Bus. Excel. 6(5), 624–634 (2013)
89. Wang, C., Pozza, I.D.: The antecedents of customer lifetime duration and dis-
counted expected transactions: discrete-time based transaction data analysis. No.
2014-203 (2014)
90. Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control
of unknown nonlinear systems with application to coal gasification. IEEE Trans.
Autom. Sci. Eng. 11(4), 1020–1036 (2014)
91. Wübben, M., Wangenheim, F.V.: Instant customer base analysis: managerial
heuristics often “get it right”. J. Mark. 72(3), 82–93 (2008)
92. Zhang, J.Z., Netzer, O., Ansari, A.: Dynamic targeted pricing in B2B relationships.
Market. Sci. 33(3), 317–337 (2014)
93. Zhang, Q., Seetharaman, P.B.: Assessing lifetime profitability of customers with
purchasing cycles. Mark. Intell. Plan. 36(2), 276–289 (2018)
94. Zhao, M., et al.: Impression allocation for combating fraud in E-commerce via deep
reinforcement learning with action norm penalty. In: IJCAI (2018)
95. Tirenni, G., et al.: The 2005 ISMS practice prize winner-customer equity and
lifetime management (CELM) finnair case study. Mark. Sci. 26(4), 553–565 (2007)