Syntactic Pattern Recognition
Syntactic Pattern Recognition
1529
slowdown, specific driving behavior known as ‘harmonica driving’. This
[0, 300] often occurs when a truck is driving at a somewhat higher
1 2 3 4 5 speed than the vehicle directly in front of it (no time con-
constant slowdown speedup, constant, straint). The driver slows down a bit (no time constraint),
[0, 50] [0, 20]
waits until there is enough distance between him and the ve-
hicle in front (takes at most 50 seconds), and then speeds
Figure 1: The harmonica driving behavior modeled as a up again and closes in on the vehicle (for at most 20 sec-
DRTA. The intervals on the state transitions are bounds on onds). This whole process often repeats itself a couple of
the amount of time (in tenths of a second) the behavior can times (within 5 minutes) before the driver finally adjusts the
remain in the current state directly before firing the transition. speed of the truck to match the vehicle in front of him. The
result of this whole process is unnecessary fuel consumption.
to learn a classifier for driving behavior. Our results using this The example clearly shows the importance of time in-
approach on truck acceleration are discussed in Section 4. We formation for modeling driving behavior. Time constraints
end this paper with a general discussion and some ideas for model the dependence on this time information explicitly, i.e.,
future research in Section 5. using numbers. Because this uses a binary representation
of time, it can result in exponentially more compact mod-
els than an implicit representation. Consequently, also the
2 Motivation time, space, and data required to learn DRTAs can be expo-
We are interested in methods that learn models for the be- nentially smaller than the time, space, and data required to
havior of a system (process). A well-known model for char- learn DFAs [Verwer et al., 2009]. This efficiency argument
acterizing systems is the deterministic finite state automaton is our main reason for using DRTA models. In the next sec-
(DFA, see, e.g. [Sipser, 1997]). An advantage of an automa- tion, we briefly describe the algorithm we use to learn these
ton model is that it is easy to interpret. In contrast, many DRTA models, and show how to use it in the practical setting
other well-known models that are commonly used to learn of learning driving behavior.
the behavior of a system are quite difficult to interpret, for
example neural networks or support vector machines, see, 3 Learning DRTAs
e.g., [Bishop, 2006]. Since a DFA is a language model, its The RTI+ algorithm for learning DRTAs [Verwer et al., 2010]
learning (or identification) problem has been well studied in is a kind of model selection technique. It requires as in-
the grammatical inference field [de la Higuera, 2005]. Hence, put a set S+ of timed strings (time-stamped event sequences)
we would like to apply an established method in order to learn that have been generated by a process (system), and returns
a DFA from observations of the system. However, when ob- a DRTA model M that is consistent with S+ . Intuitively, a
serving a system there often is more information than just the model M is consistent if all timed strings S that reach the
sequence of symbols: the time at which these symbols occur same state in M do not have significantly different futures
is also available. A straightforward way to make use of this in S+ . In other words, M is a DRTA that satisfies a Markov
timed information is to sample the timed data. For instance, property. This property is tested using likelihood ratio tests on
a symbol that occurs 3 seconds after the previous event can S+ . The goal of RTI+ is to find the smallest consistent model
be modeled as 3 consecutive event symbols. This technique M∗ . Unfortunately, finding M∗ is very difficult (NP-hard).
is commonly used when applying syntactic pattern recogni- RTI+ is a greedy method that finds good solutions efficiently.
tion [Fu, 1974] or hidden Markov models [Rabiner, 1989] to We apply RTI+ to the problem of automatically construct-
signal processing. However, since this technique models time ing model that can detect different driving behaviors from
implicitly, i.e., using additional states, it can lead to an expo- sensor measurements. As can be seen in Example 1, DRTAs
nential blowup (requiring n states instead of log(n) bits) of provide succinct and insightful models for such behavior. We
both the input data and the resulting model. now describe the challenges that need to be overcome in order
We use of a different method that uses the time information to learn such models based on data from onboard sensors.
directly in order to produce a timed model. A common model
for real-time systems is the timed automaton (TA) [Alur and 3.1 Data discretization
Dill, 1994]. Unfortunately, there exists no learning algorithm
We first need to discretize the sensor measurements into basic
for TAs in general. We therefore use a restricted type of TA
events (symbols). The data we obtained from the trucks were
known as a deterministic real-time-automata (DRTAs) [Dima,
15 complete round trips. One round trip typically takes two
2001] for which an efficient learning algorithm exists [Ver-
full days. These trips were driven on different dates, with
wer et al., 2010]. A DRTA contains constraints on the time
different weather conditions, and with different drivers. We
between two consecutive events. Using these constraints, the
focus our attention on three sensor values that were recorded
language of a DRTA does not only depend on the types of
during these trips: speed, engine rotations per minute (RPM),
events occurring, but also on their time values relative to pre-
and fuel usage. These measurements were recorded at a rate
vious event occurrences. We clarify this using an example
of one per second. Figure 2 shows one hour of these sensor
from our application domain of constructing models for truck
measurements.
driver behavior:
Ideally, we would like to transform these measurements
Example 1 Figure 1 shows an example DRTA that models a into driving actions/events such as: speed-ups, slow-downs,
1530
a a [0, 5] a [0, 5] a [0, 5]
b b
b b 2
1 2 1
b
a [5, 10] a [5, 10] a [5, 10]
3 a
1531
few unlabeled
size AIC AIC single state model
examples (S+, S-) speed 2, 305 871, 550 1, 472, 840
PDRTA Predictor PDRTA Classifier fuel 960 842, 536 1, 102, 270
Many unlabeled Compute runs of
engine 1, 009 697, 190 1, 076, 750
examples S+ S+ and S-
1532
accelerating too fast normal acceleration normal too fast
(a, 42)(b, 80)(c, 12)(d, 10)(c, 7) (a, 10)(b, 14)(c, 21)(b, 42)
(a, 13)(b, 65)(c, 6)(d, 8) (a, 11)(b, 34)(c, 13)(b, 18) a [0,62] 11 a [0,320] 11
(a, 6)(b, 39)(c, 10)(d, 10) (a, 8)(b, 10)(c, 14)(b, 45)
(a, 7)(b, 15)(c, 8)(d, 7) (a, 25)(b, 5)(c, 18)(b, 27) state4 state4
(a, 7)(b, 2)(c, 7)(d, 7) (a, 11)(b, 6)(c, 20)(b, 14) b [0,62] 11 b [142,320] 1 b [0,80] 10 c [0,320] 1
(c, 22)(b, 4)(c, 12)(d, 7)(c, 8) (a, 12)(b, 18)(c, 15)(b, 20) state1 state23 state1
(a, 7)(b, 11)(c, 6)(d, 7)(c, 16) (a, 8)(b, 35)(c, 24)(b, 20) c [8,62] 11 c [0,18] 1 c [8,72] 5
(a, 8)(b, 29)(c, 7)(d, 8)(c, 8) (a, 12)(b, 32)(c, 18)(d, 13)
(a, 11)(b, 28)(c, 7)(b, 12) (a, 7)(b, 62)(c, 21)(b, 28) state2 state117 state2
(a, 10)(b, 320)(c, 12)(d, 7)(c, 34) (a, 12)(b, 8)(c, 17) b [7,62] 8 d [10,31] 2 d [0,11] 1 c [0,7] 5 b [0,5] 1 d [10,31] 2 d [0,9] 3
(a, 10)(b, 8)(c, 8)(d, 8)(c, 31) (a, 10)(b, 27)(c, 19)(d, 16) state5 state3 state243 state13 state3 state16
(a, 8)(b, 24)(c, 9)(d, 8)(c, 12) c [34,38] 1 c [0,16] 1 c [0,84] 1 c [0,29] 1 c [30,34] 1
Figure 7: Events obtained from the speed sensor during a b [6,33] 1 d [0,7] 3 d [8,9] 2
test-lap with both too quick and normal accelerations. state75 state755 c [0,320] 2 d [0,320] 1 state154
c [0,16] 1
state161
4.1 The labeled examples
For the detection of truck driver behavior, we tested our ap-
proach on a simple type of behavior that results in unnec- Figure 8: Parts of the learned PDRTA for the speed sensor
essary fuel usage: accelerating too quickly. This behavior that are reached by the timed strings from Figure 7. A labels,
serves as an example of how to construct a classifier out of time constraint, and the number of reaching strings are shown
the learned PDRTAs using only a few labeled examples. Us- for each transition.
ing the same techniques, the learned PDRTAs can be used to
classify many other interesting driving behaviors.
In order to classify accelerations, we require some labeled states and the traversed transitions for the speed sensor ex-
examples. The labeled examples were obtained by driving a amples. We used these PDRTA parts to identify some states
short test lap with many traffic lights. After a full stop, the that are only reached when the driver accelerates too quickly
truck driver accelerated too fast about half of the time, and and some when he accelerates normally. In Figure 8, state 5
the other half he accelerated normally. We labeled the timed is reached by 8 of the 11 normal examples, and none of the
events that occurred during 11/2 minute from the start of the 12 fast examples. An example of a state that is only reached
acceleration, Figure 7 shows these events for the speed sensor. by the fast examples is state 27; 6 of the fast examples reach
We used the same time span in order to obtain examples for this state. Note that the transition to this state has c as symbol
the fuel and engine sensors. and [0, 7] as clock guard. Thus, it is only reached by timed
All but one of the timed strings in Figure 7 start with an strings that contain a c event within 7 seconds after the first
a symbol. This means that the truck came to a full stop (or b. A c event with more than 7 seconds goes to state 2, which
at least a speed less than 5 km/hour). In one timed string the is also the state that is reached by all the normal accelera-
initial symbol is a c symbol. In this case, the driver actually tion examples. Thus, the learned PDRTA model for the speed
did not slow down completely, but accelerated too fast from sensor does distinguish between the two behaviors of accel-
driving around 20 km/hour (the next event is a b event). In erating too fast and accelerating normally, exactly in the way
every timed string, the second symbol is a b symbol, which we expected based on the timed strings of Figure 7.
indicates a speed-up of the truck. The time until this symbol This shows that the models learned by RTI+ can be used to
occurs is the number of seconds the truck remained stationary distinguish different types of behavior, even if we beforehand
at the traffic light. The third symbol is a c event, indicating a do not know what behavior to distinguish (the PDRTAs are
further speed-up. It should be possible to use the time value learned from unlabeled data). In this way, timed syntactic
of this event in order to classify the two different acceleration pattern recognition can be used to give more insight into the
behaviors: a small time value indicates a fast acceleration, a different kinds of behavior of a system/process.
large value indicates a slow acceleration. This can be clearly For all PDRTAs, we identified and labeled all states that
seen in the examples. The next event of the examples is either were reached by at least 4 example strings in one case, and
a b or a d event, depending on whether the speed of the truck by none in the other case. Thus, for the speed sensor state 27
exceeded 50 km/hour or not. A small number combined with and 16 are labeled in the too-fast speed PDRTA, and state 5 is
a d event indicates that the driver is accelerating too quickly. labeled in the normal speed PDRTA. We did the same for the
A b event indicates that the driver is slowing down again, for fuel sensor and engine sensor PDRTAs.
instance in order to stop for the next traffic light.
Having collected these labeled examples, all we need to 4.2 Classifier performance
do is to run the learned PDRTAs on these examples, and dis- We tested the obtained classifiers on the historical data that
cover which states are reached when accelerating too quickly, we have available. This is the unlabeled data that we initially
and which when accelerating normally. Figure 8 shows these used to learn the PDRTAs. From this data, we extracted all
1533
acceleration instances and used this data to first determine the transition probabilities depend on the state durations by
the correlation of our classifiers with the truck speedup. The means of time constraints.
truck speedup is the speed increase between two consecutive
measurements of the speed sensor. In our tests, the correla- References
tion between these speedups and our classifiers turns out to [Akaike, 1974] Hirotsugu Akaike. A new look at the statisti-
be high only for the fuel sensor. This is surprising since only cal model identification. IEEE Transactions on Automatic
4 of the approximately 200 states in this classifier are labeled. Control, 19(6):716–723, 1974.
We would like to determine the quality of the fuel classi-
fier. Unfortunately, we do not exactly know which of the un- [Alur and Dill, 1994] Rajeev Alur and David L. Dill. A the-
labeled accelerations are too fast, and which are not. Instead ory of timed automata. Theoretical Computer Science,
of determining the exact values, we therefore implemented 126:183–235, 1994.
a simple method that is currently used by domain experts to [Bishop, 2006] Christopher M. Bishop. Pattern Recognition
determine a driver’s profile to approximate these values: and Machine Learning. Springer, 2006.
• Over 30 seconds, count the number of times that the ac- [de la Higuera, 2005] Colin de la Higuera. A bibliographi-
celeration was normal n (less than 0.8m/s2 ) and the num- cal study of grammatical inference. Pattern Recognition,
ber of times it was too quick q (greater than 1.2m/s2 ). 38(9):1332–1348, 2005.
• If the n is less or equal to 25, and q is greater or equal [Dima, 2001] Catalin Dima. Real-time automata. Journal
to 1, then label the acceleration as too quick; label it as of Automata, Languages and Combinatorics, 6(1):2–23,
normal otherwise. 2001.
[Fu, 1974] King Sun Fu. Syntactic Methods in Pattern
The two numbers n and q give a good overview of a driver’s
profile. The fuel classifier labels an acceleration as too quick Recognition. Academic Press, 1974.
if its score is greater or equal to 1. We compare the labels [Rabiner, 1989] Lawrence R. Rabiner. A tutorial on hidden
assigned by the fuel classifier and this simple decision rule: Markov models and selected applications in speech recog-
nition. In Proceedings of the IEEE, volume 77, pages 257–
decision rule 286, 1989.
quick normal
quick 920 61 [Roddick and Spiliopoulou, 2002] John F. Roddick and
classifier Myra Spiliopoulou. A survey of temporal knowledge
normal 262 289
discovery paradigms and methods. IEEE Transactions on
The labels given by these two rules match in 79% of all cases. Knowledge and Data Engineering, 14(4):750–767, 2002.
This is rather high considering we only used 23 example [Shalizi and Crutchfield, 2001] Cosma Rohilla Shalizi and
strings to label only 4 states of the fuel PDRTA.
James P. Crutchfield. Computational mechanics: pattern
and prediction, structure and simplicity. Journal of statis-
5 Discussion tical physics, 104(3-4):817–879, 2001.
We applied timed syntactic pattern recognition techniques to [Sipser, 1997] Michael Sipser. Introduction to the Theory of
the problem of detecting driver behavior. Although our tech- Computation. PWS Publishing, 1997.
niques are still preliminary, the result that labeling only 4 [van der Aalst and Weijters, 2005] Wil M.P. van der Aalst
states in a learned PDRTA is sufficient to obtain an adequate and A.J.M.M. (Ton) Weijters. Process mining. In Process-
classifier clearly shows the high potential of our techniques. Aware Information Systems: Bridging People and Soft-
It would be interesting to investigate whether other driving ware through Process Technology, pages 235–255. Wiley
patterns can also be distinguished using these PDRTAs. & Sons, 2005.
Unfortunately, since the learned PDRTA models are quite
[Verwer et al., 2009] Sicco Verwer, Mathijs de Weerdt, and
large, it is difficult to discover different driving behaviors by
examining the models. We aim to use tools from process min- Cees Witteveen. One-clock deterministic timed automata
ing [van der Aalst and Weijters, 2005] for this purpose. Pro- are efficiently identifiable in the limit. In LATA, volume
cess mining tools contain many routines and visualizations 5457 of LNCS, pages 740–751. Springer, 2009.
that help to make large process models insightful. [Verwer et al., 2010] Sicco Verwer, Mathijs de Weerdt, and
Interesting directions for future work are the construction Cees Witteveen. A likelihood-ratio test for identifying
of more sophisticated discretization routines, better decision probabilistic deterministic real-time automata from posi-
rules and more specialized automaton models. In addition, tive data. In ICGI, volume 6339 of LNCS, pages 203–216.
our techniques are also useful for learning other models. It Springer, 2010.
would for instance be interesting to see whether our semi- [Yu, 2010] Shun-Zheng Yu. Hidden semi-markov models.
supervised method also achieves good results with learning Artificial Intelligence, 174(2):215 – 243, 2010.
algorithms for non-timed automata such as the state-merging [Zhu and Goldberg, 2009] Xiaojin Zhu and Andrew B.
algorithms for DFAs (see, e.g., [de la Higuera, 2005]). Fur-
Goldberg. Introduction to semi-supervised learning. Syn-
thermore, it would be interesting to see whether our tech-
thesis Lectures on Artificial Intelligence and Machine
niques can also be applied to methods for learning types of
Learning, 3(1):1–130, 2009.
hidden semi-Markov models (see, e.g., [Yu, 2010]) where
1534