0% found this document useful (0 votes)
42 views6 pages

Syntactic Pattern Recognition

Uploaded by

benc8182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views6 pages

Syntactic Pattern Recognition

Uploaded by

benc8182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

Learning Driving Behavior by


Timed Syntactic Pattern Recognition
Sicco Verwer Mathijs de Weerdt and Cees Witteveen
Katholieke Universiteit Leuven Delft University of Technology
Celestijnenlaan 200A, B-3001 Mekelweg 4, 2628 CD
Leuven, Belgium Delft, the Netherlands
[email protected] [email protected], [email protected]

Abstract The data at our disposal consists of onboard sensor mea-


surements that have been collected from truck round-trips.
We advocate the use of an explicit time representa- By applying a simple discretization method, we obtain se-
tion in syntactic pattern recognition because it can quences of timed events. The behavior that is displayed in
result in more succinct models and easier learning these sequences is unknown. From this data, we want to learn
problems. We apply this approach to the real-world a model that we can use to monitor the driving behavior in
problem of learning models for the driving behav- new data, i.e., to use it as a classifier. Our approach is to first
ior of truck drivers. We discretize the values of learn a timed model from the unlabeled sequences using the
onboard sensors into simple events. Instead of the RTI+ algorithm [Verwer et al., 2010]. Subsequently, we cre-
common syntactic pattern recognition approach of ate a small number of labeled examples by displaying some
sampling the signal values at a fixed rate, we model typical behaviors during a test lap. We use these to label some
the time constraints using timed models. We learn states of the learned model. The labels of these states are used
these models using the RTI+ algorithm from gram- to classify new data. This is a novel method to perform classi-
matical inference, and show how to use compu- fication when many unlabeled and few labeled examples are
tational mechanics and a form of semi-supervised available; the setting of semi-supervised classification [Zhu
classification to construct a real-time automaton and Goldberg, 2009]. This setting occurs in many practical
classifier for driving behavior. Promising results applications because it is often too time-consuming to assign
are shown using this new approach. labels to the observed examples.
In addition to there being mostly unlabeled data, there is
an additional restrictive property in our application domain:
1 Introduction the process under observation is continuous, i.e., it never
Processes often contain constraints on the timing of events stops producing events. Consequently, we need to be able to
(such as deadlines). When learning a model for such a learn a model from a single but very long unlabeled event se-
process using for instance syntactic pattern recognition [Fu, quence. We use the theory of computational mechanics [Shal-
1974] or hidden Markov models (HMMs) [Rabiner, 1989], izi and Crutchfield, 2001] to show why sampling random sub-
these time constraints are modeled implicitly by sampling the sequences from this single example provides sufficient infor-
signal values at a fixed rate, resulting in a blowup of both mation to learn a useful model, the so-called causal structure
models and events. Using explicit state duration models such of the process. This model can be learned by RTI+ by pro-
as hidden semi-Markov models (HSMMs) [Yu, 2010] avoids viding the random subsequences as input.
this blowup but does not model the time constraints directly. We apply our approach to a test case of detecting bad driv-
Recently, the idea of learning real-time automata has been ing behavior when accelerating after a complete stop. After
proposed [Verwer et al., 2010]. Since probabilistic automata learning a timed model and labeling some of the states using
can model only a strict subset of the distributions of HMMs, 23 labeled accelerations, we constructed a simple decision
these real-time automata are less powerful than traditional rule based on the number of times the labeled states are vis-
HSMMs. However, the advantage of such models is that ited. We also implemented a simple decision rule based on
they do allow their structure including time constraints to expert knowledge. These two decision rules agree in 79%
be learned efficiently [Verwer et al., 2009]. To the best of of all acceleration cases in the original unlabeled data. This
our knowledge, learning real-time automata has never actu- surprisingly accurate result serves as a proof-of-concept for
ally been applied to real data. By learning these models for timed syntactic pattern recognition.
driving behavior, we show in this paper that this results in This paper is structured as follows. We first describe the
useful models for use in practical applications. In addition, background of our application, its goals, and why we choose
we develop the necessary preprocessing and postprocessing timed syntactic pattern recognition in order to achieve these
methods. Our approach can be considered a first attempt at goals in Section 2. Afterwards, in Section 3.1, we briefly
timed syntactic pattern recognition. introduce the RTI+ algorithm, and show how to use it in order

1529
slowdown, specific driving behavior known as ‘harmonica driving’. This
[0, 300] often occurs when a truck is driving at a somewhat higher
1 2 3 4 5 speed than the vehicle directly in front of it (no time con-
constant slowdown speedup, constant, straint). The driver slows down a bit (no time constraint),
[0, 50] [0, 20]
waits until there is enough distance between him and the ve-
hicle in front (takes at most 50 seconds), and then speeds
Figure 1: The harmonica driving behavior modeled as a up again and closes in on the vehicle (for at most 20 sec-
DRTA. The intervals on the state transitions are bounds on onds). This whole process often repeats itself a couple of
the amount of time (in tenths of a second) the behavior can times (within 5 minutes) before the driver finally adjusts the
remain in the current state directly before firing the transition. speed of the truck to match the vehicle in front of him. The
result of this whole process is unnecessary fuel consumption.

to learn a classifier for driving behavior. Our results using this The example clearly shows the importance of time in-
approach on truck acceleration are discussed in Section 4. We formation for modeling driving behavior. Time constraints
end this paper with a general discussion and some ideas for model the dependence on this time information explicitly, i.e.,
future research in Section 5. using numbers. Because this uses a binary representation
of time, it can result in exponentially more compact mod-
els than an implicit representation. Consequently, also the
2 Motivation time, space, and data required to learn DRTAs can be expo-
We are interested in methods that learn models for the be- nentially smaller than the time, space, and data required to
havior of a system (process). A well-known model for char- learn DFAs [Verwer et al., 2009]. This efficiency argument
acterizing systems is the deterministic finite state automaton is our main reason for using DRTA models. In the next sec-
(DFA, see, e.g. [Sipser, 1997]). An advantage of an automa- tion, we briefly describe the algorithm we use to learn these
ton model is that it is easy to interpret. In contrast, many DRTA models, and show how to use it in the practical setting
other well-known models that are commonly used to learn of learning driving behavior.
the behavior of a system are quite difficult to interpret, for
example neural networks or support vector machines, see, 3 Learning DRTAs
e.g., [Bishop, 2006]. Since a DFA is a language model, its The RTI+ algorithm for learning DRTAs [Verwer et al., 2010]
learning (or identification) problem has been well studied in is a kind of model selection technique. It requires as in-
the grammatical inference field [de la Higuera, 2005]. Hence, put a set S+ of timed strings (time-stamped event sequences)
we would like to apply an established method in order to learn that have been generated by a process (system), and returns
a DFA from observations of the system. However, when ob- a DRTA model M that is consistent with S+ . Intuitively, a
serving a system there often is more information than just the model M is consistent if all timed strings S that reach the
sequence of symbols: the time at which these symbols occur same state in M do not have significantly different futures
is also available. A straightforward way to make use of this in S+ . In other words, M is a DRTA that satisfies a Markov
timed information is to sample the timed data. For instance, property. This property is tested using likelihood ratio tests on
a symbol that occurs 3 seconds after the previous event can S+ . The goal of RTI+ is to find the smallest consistent model
be modeled as 3 consecutive event symbols. This technique M∗ . Unfortunately, finding M∗ is very difficult (NP-hard).
is commonly used when applying syntactic pattern recogni- RTI+ is a greedy method that finds good solutions efficiently.
tion [Fu, 1974] or hidden Markov models [Rabiner, 1989] to We apply RTI+ to the problem of automatically construct-
signal processing. However, since this technique models time ing model that can detect different driving behaviors from
implicitly, i.e., using additional states, it can lead to an expo- sensor measurements. As can be seen in Example 1, DRTAs
nential blowup (requiring n states instead of log(n) bits) of provide succinct and insightful models for such behavior. We
both the input data and the resulting model. now describe the challenges that need to be overcome in order
We use of a different method that uses the time information to learn such models based on data from onboard sensors.
directly in order to produce a timed model. A common model
for real-time systems is the timed automaton (TA) [Alur and 3.1 Data discretization
Dill, 1994]. Unfortunately, there exists no learning algorithm
We first need to discretize the sensor measurements into basic
for TAs in general. We therefore use a restricted type of TA
events (symbols). The data we obtained from the trucks were
known as a deterministic real-time-automata (DRTAs) [Dima,
15 complete round trips. One round trip typically takes two
2001] for which an efficient learning algorithm exists [Ver-
full days. These trips were driven on different dates, with
wer et al., 2010]. A DRTA contains constraints on the time
different weather conditions, and with different drivers. We
between two consecutive events. Using these constraints, the
focus our attention on three sensor values that were recorded
language of a DRTA does not only depend on the types of
during these trips: speed, engine rotations per minute (RPM),
events occurring, but also on their time values relative to pre-
and fuel usage. These measurements were recorded at a rate
vious event occurrences. We clarify this using an example
of one per second. Figure 2 shows one hour of these sensor
from our application domain of constructing models for truck
measurements.
driver behavior:
Ideally, we would like to transform these measurements
Example 1 Figure 1 shows an example DRTA that models a into driving actions/events such as: speed-ups, slow-downs,

1530
a a [0, 5] a [0, 5] a [0, 5]
b b
b b 2
1 2 1
b
a [5, 10] a [5, 10] a [5, 10]
3 a

Figure 4: A deterministic real-time automaton A (left) and its


causal states A (right). States 1 and 2 of the original model
correspond respectively to states 3 and 2 of the causal model.
Figure 2: The values of three sensors installed on trucks. The
first graph shows the fuel rate in liters per hour. The second
shows the vehicle speed in kilometers per hour. The third
shows the engine speed in rotations per minute. a finite observation sequence, i.e., a single long sequence of
events. Learning a predictive model requires a set of pos-
itive example strings S+ . This set is created by selecting
f (random or all) subsequences of the observed sequence. In
e order to generalize (learn) over such arbitrary subsequences,
d we have to make the assumption that the process is stationary.
c
This assumption seems restrictive, but it is necessary since
the subsequences in S+ are allowed to start and stop at arbi-
b
trary points in time. Consequently, important information for
a
a non-stationary process (regarding event occurrences before
(e,0) (d,80)(c,15)(b,12) (c,150) (d,20) (e,24)
the starting event) is unavailable. Although driving behavior
is not a typical example of a stationary process, this assump-
Figure 3: The discretization routine used by our algorithm. tion does allow us to learn useful models.
Whenever the sensor value exceeds a predetermined bound, The subsequences in S+ can be used to learn what are
plus or minus a small additional region (in this case an ad- called the causal states in computational mechanics. Essen-
ditional 2 km/h), an event is generated corresponding to the tially, the causal states form a DRTA that represents behav-
region the value has entered. The time value of the event is ior (a probability distribution over timed strings) identical to
the time that has elapsed since the previous event. a non-deterministic version of the original DRTA. This non-
deterministic version has an identical structure as the origi-
nal DRTA, but has the complete set of states as possible start
a turn left/right, a bump in the road, etc. However, since find- states. We illustrate this concept for the problem of learning
ing patterns in the sensor values that are indicative of these a DRTA using an example.
kinds of events is far from trivial (see, e.g., [Roddick and
Spiliopoulou, 2002]), we discretize the data in a much sim- Example 2 In Figure 4, a DRTA is depicted along with its
pler way. We divide the range of a sensor value into different causal states. The causal states also form a DRTA. The
regions, and whenever the value enters a region, we treat it as causal start state corresponds to all (or any) of the states of
the occurrence of an event associated with that region. The the original DRTA. In other words, we do not know whether
time value of the occurrence is the time that has elapsed since the original DRTA is in state 1 or in state 2. The occurrences
the previous event occurrence. We use expert knowledge to of events can be used to determine the state of the original
create an intuitive division of the sensor values into such re- DRTA. For example, the occurrence of a b event ensures that
gions. The discretization process for the speed sensor is de- the next state is state 2, since all transitions with b as label
picted in Figure 3. The additional region of 2km/h ensures have state 2 as their target. Similarly, the occurrence of an
that the event occurrence does not change constantly when a event with a time value greater than 5 ensures that the next
the sensor value is near a region boundary. state is state 1. When an a event occurs with a time value less
Using the discretization procedure, we obtain from every than 4, we do not know whether 1 or 2 is the next state. Thus,
round-trip of a truck, and for every sensor, a very long se- in this case, the target state in the causal DRTA is the state
quence of timed symbols. RTI+ requires a data set S+ of that corresponds to both state 1 and 2, i.e., the start state.
preferably many not too long timed strings. We transform the Since the causal states are the deterministic equivalent of
discretized data to such a set S+ using theory from computa- a non-deterministic automaton, it is possible that the blow-
tional mechanics. up in the number of states is exponential. This is the price
we have to pay for learning the causal states instead of the
3.2 Selecting subsequences original automaton. We have no choice, however, because the
Computational mechanics [Shalizi and Crutchfield, 2001] original DRTA can only be learned from timed strings that are
models a process as an bi-infinite sequence of random vari- guaranteed to start in the start state.
ables (observations). The goal of computational mechanics Combining all the data we have available, we obtain for
is to predict the (infinite) future of a process based only on the speed sensor about 50, 000 symbol occurrences in to-

1531
few unlabeled
size AIC AIC single state model
examples (S+, S-) speed 2, 305 871, 550 1, 472, 840
PDRTA Predictor PDRTA Classifier fuel 960 842, 536 1, 102, 270
Many unlabeled Compute runs of
engine 1, 009 697, 190 1, 076, 750
examples S+ S+ and S-

Figure 6: The sizes and AIC scores of the PDRTAs (lower is


better) learned by our algorithm for every sensor.
Figure 5: Constructing a classifier based on a PDRTA. We
use the runs of a few labeled examples to assign labels to the more correct since it is based on more data (larger future dis-
states of the PDRTA. The result is a classifier. tributions) during learning, and the run of a longer sequence
will be less correct but better at distinguishing the different
types of behavior. We decided to use all sequences of length
tal. For the engine and fuel sensor we obtain about twice 1 to 10 and combine their classifications using a simple deci-
this amount. From this data, we took 10, 000 random subse- sion rule based on the amount of reached good and bad states:
quences of length 20. We now explain how we use this data
in order to construct a classifier for driving behavior. ⎧
⎨1 if label(s, i, j) = good
value(i, j) = −1 if label(s, i, j) = bad
3.3 Learning a DRTA classifier ⎩
0 otherwise
The result of running RTI+ on an unlabeled data set is a prob- 
abilistic DRTA (PDRTA). This automaton does not contain score(i) = score(i − 1) + value(i, j)
any final states and hence can initially not be used to classify 0≤j≤10
new data. However, the PDRTA model does describe the be- where label(s, i, j) is a function that returns the label of the
havior that is displayed in the unlabeled data set. If learned subsequence from index i − j to index i of a new sequence s,
correctly, two timed strings only reach the same states if their and score(0) = 0. By summing these score values over all
possible futures are identically distributed. Intuitively, these sensors we obtain a classifier that can be used to determine the
two timed strings display the same behavior, and hence their type of driving behavior at every index i of a new sequence
labels should also be the same. Consequently, we can use the of driving events.
label of a single labeled string to label a state of the PDRTA.
This process is depicted in Figure 5. 4 Results
Since RTI+ learns the causal states instead of the actual
We applied the RTI+ algorithm to the discretized data and
states of the PDRTA, the states that are visited by good be-
learned one PDRTA model for each of the three sensors. The
havior b and bad behavior b can overlap. After some events
performance of each of these models is measured using their
however, different behaviors will display different future dis-
sizes and their Akaike information criterion (AIC) [Akaike,
tributions and hence the visited states will be different. Let
1974] values. These measure a trade-off between model size
Qb and Qb denote the sets of states that are reached by the
and likelihood (lower is better). The sizes and AIC scores
timed strings that display behavior b and b respectively. We
of each of these models and a trivial single state model are
use these sets of states to build our classifier as follows:
displayed in Figure 6. From the AIC scores, we can conclude
• assign the label b to the states in Qb \ Qb , i.e., the states that the models perform a lot better than the trivial single state
reached by b but not by b ; model. Other than that, the AIC scores by themselves do not
say much about the quality of the learned PDRTA models.
• assign the label b to the states in Qb \ Qb , i.e., the states
However, the sizes of the learned models allow us to draw
reached by b but not by b.
some conclusions regarding this quality.
When a new timed string reaches a state labeled with b, we Unfortunately, these sizes are quite large. From results us-
conclude that it is displaying behavior b, and not b . In ad- ing RTI+ described in [Verwer et al., 2010], we know that
dition to classifying new timed strings, the labels assigned it is capable of correctly inferring a PDRTA with about size
to states can also be used to analyze the old unlabeled data 40 from a data set of size 2, 000. From the truck data, we
set, for instance, in order to determine how often good or bad obtained PDRTAs that are much bigger. Hence, it is very
driving occurred in the past. unlikely that the learned PDRTAs are completely correct. Al-
In this way, RTI+ learns one classifier for every sensor. though we did learn these PDRTAs using 10, 000 instead of
However, since we model the causal states of the process, it 2, 000 examples, this increase in data does not explain the
is unclear what part of a new timed string to use in order to large size of the learned models. The first 50 states of the
determine the classification. We could use the whole timed PDRTAs (in distance from the start state) however are learned
string, i.e., simply compute the complete run and determine using a lot of data. Because of cycles in the transitions of the
the label at every index. But this only works if the learned PDRTAs, most of these states are reached by around 5, 000
model is completely correct. When we make some small error timed strings. Since 2, 000 timed strings is sufficient to cor-
during learning, the run of the new timed string can diverge rectly learn the start state of a size 40 PDRTA, these first states
from its actual run and from that point on many classifications are very likely to be correct. We therefore use these first 50
will be incorrect. Thus, the run of a shorter sequence will be states of the PDRTAs as classifiers.

1532
accelerating too fast normal acceleration normal too fast
(a, 42)(b, 80)(c, 12)(d, 10)(c, 7) (a, 10)(b, 14)(c, 21)(b, 42)
(a, 13)(b, 65)(c, 6)(d, 8) (a, 11)(b, 34)(c, 13)(b, 18) a [0,62] 11 a [0,320] 11
(a, 6)(b, 39)(c, 10)(d, 10) (a, 8)(b, 10)(c, 14)(b, 45)
(a, 7)(b, 15)(c, 8)(d, 7) (a, 25)(b, 5)(c, 18)(b, 27) state4 state4

(a, 7)(b, 2)(c, 7)(d, 7) (a, 11)(b, 6)(c, 20)(b, 14) b [0,62] 11 b [142,320] 1 b [0,80] 10 c [0,320] 1

(c, 22)(b, 4)(c, 12)(d, 7)(c, 8) (a, 12)(b, 18)(c, 15)(b, 20) state1 state23 state1

(a, 7)(b, 11)(c, 6)(d, 7)(c, 16) (a, 8)(b, 35)(c, 24)(b, 20) c [8,62] 11 c [0,18] 1 c [8,72] 5
(a, 8)(b, 29)(c, 7)(d, 8)(c, 8) (a, 12)(b, 32)(c, 18)(d, 13)
(a, 11)(b, 28)(c, 7)(b, 12) (a, 7)(b, 62)(c, 21)(b, 28) state2 state117 state2

(a, 10)(b, 320)(c, 12)(d, 7)(c, 34) (a, 12)(b, 8)(c, 17) b [7,62] 8 d [10,31] 2 d [0,11] 1 c [0,7] 5 b [0,5] 1 d [10,31] 2 d [0,9] 3

(a, 10)(b, 8)(c, 8)(d, 8)(c, 31) (a, 10)(b, 27)(c, 19)(d, 16) state5 state3 state243 state13 state3 state16

(a, 8)(b, 24)(c, 9)(d, 8)(c, 12) c [34,38] 1 c [0,16] 1 c [0,84] 1 c [0,29] 1 c [30,34] 1

state376 state27 state7 state57 state232

Figure 7: Events obtained from the speed sensor during a b [6,33] 1 d [0,7] 3 d [8,9] 2

test-lap with both too quick and normal accelerations. state75 state755 c [0,320] 2 d [0,320] 1 state154

c [0,16] 1

state161
4.1 The labeled examples
For the detection of truck driver behavior, we tested our ap-
proach on a simple type of behavior that results in unnec- Figure 8: Parts of the learned PDRTA for the speed sensor
essary fuel usage: accelerating too quickly. This behavior that are reached by the timed strings from Figure 7. A labels,
serves as an example of how to construct a classifier out of time constraint, and the number of reaching strings are shown
the learned PDRTAs using only a few labeled examples. Us- for each transition.
ing the same techniques, the learned PDRTAs can be used to
classify many other interesting driving behaviors.
In order to classify accelerations, we require some labeled states and the traversed transitions for the speed sensor ex-
examples. The labeled examples were obtained by driving a amples. We used these PDRTA parts to identify some states
short test lap with many traffic lights. After a full stop, the that are only reached when the driver accelerates too quickly
truck driver accelerated too fast about half of the time, and and some when he accelerates normally. In Figure 8, state 5
the other half he accelerated normally. We labeled the timed is reached by 8 of the 11 normal examples, and none of the
events that occurred during 11/2 minute from the start of the 12 fast examples. An example of a state that is only reached
acceleration, Figure 7 shows these events for the speed sensor. by the fast examples is state 27; 6 of the fast examples reach
We used the same time span in order to obtain examples for this state. Note that the transition to this state has c as symbol
the fuel and engine sensors. and [0, 7] as clock guard. Thus, it is only reached by timed
All but one of the timed strings in Figure 7 start with an strings that contain a c event within 7 seconds after the first
a symbol. This means that the truck came to a full stop (or b. A c event with more than 7 seconds goes to state 2, which
at least a speed less than 5 km/hour). In one timed string the is also the state that is reached by all the normal accelera-
initial symbol is a c symbol. In this case, the driver actually tion examples. Thus, the learned PDRTA model for the speed
did not slow down completely, but accelerated too fast from sensor does distinguish between the two behaviors of accel-
driving around 20 km/hour (the next event is a b event). In erating too fast and accelerating normally, exactly in the way
every timed string, the second symbol is a b symbol, which we expected based on the timed strings of Figure 7.
indicates a speed-up of the truck. The time until this symbol This shows that the models learned by RTI+ can be used to
occurs is the number of seconds the truck remained stationary distinguish different types of behavior, even if we beforehand
at the traffic light. The third symbol is a c event, indicating a do not know what behavior to distinguish (the PDRTAs are
further speed-up. It should be possible to use the time value learned from unlabeled data). In this way, timed syntactic
of this event in order to classify the two different acceleration pattern recognition can be used to give more insight into the
behaviors: a small time value indicates a fast acceleration, a different kinds of behavior of a system/process.
large value indicates a slow acceleration. This can be clearly For all PDRTAs, we identified and labeled all states that
seen in the examples. The next event of the examples is either were reached by at least 4 example strings in one case, and
a b or a d event, depending on whether the speed of the truck by none in the other case. Thus, for the speed sensor state 27
exceeded 50 km/hour or not. A small number combined with and 16 are labeled in the too-fast speed PDRTA, and state 5 is
a d event indicates that the driver is accelerating too quickly. labeled in the normal speed PDRTA. We did the same for the
A b event indicates that the driver is slowing down again, for fuel sensor and engine sensor PDRTAs.
instance in order to stop for the next traffic light.
Having collected these labeled examples, all we need to 4.2 Classifier performance
do is to run the learned PDRTAs on these examples, and dis- We tested the obtained classifiers on the historical data that
cover which states are reached when accelerating too quickly, we have available. This is the unlabeled data that we initially
and which when accelerating normally. Figure 8 shows these used to learn the PDRTAs. From this data, we extracted all

1533
acceleration instances and used this data to first determine the transition probabilities depend on the state durations by
the correlation of our classifiers with the truck speedup. The means of time constraints.
truck speedup is the speed increase between two consecutive
measurements of the speed sensor. In our tests, the correla- References
tion between these speedups and our classifiers turns out to [Akaike, 1974] Hirotsugu Akaike. A new look at the statisti-
be high only for the fuel sensor. This is surprising since only cal model identification. IEEE Transactions on Automatic
4 of the approximately 200 states in this classifier are labeled. Control, 19(6):716–723, 1974.
We would like to determine the quality of the fuel classi-
fier. Unfortunately, we do not exactly know which of the un- [Alur and Dill, 1994] Rajeev Alur and David L. Dill. A the-
labeled accelerations are too fast, and which are not. Instead ory of timed automata. Theoretical Computer Science,
of determining the exact values, we therefore implemented 126:183–235, 1994.
a simple method that is currently used by domain experts to [Bishop, 2006] Christopher M. Bishop. Pattern Recognition
determine a driver’s profile to approximate these values: and Machine Learning. Springer, 2006.
• Over 30 seconds, count the number of times that the ac- [de la Higuera, 2005] Colin de la Higuera. A bibliographi-
celeration was normal n (less than 0.8m/s2 ) and the num- cal study of grammatical inference. Pattern Recognition,
ber of times it was too quick q (greater than 1.2m/s2 ). 38(9):1332–1348, 2005.
• If the n is less or equal to 25, and q is greater or equal [Dima, 2001] Catalin Dima. Real-time automata. Journal
to 1, then label the acceleration as too quick; label it as of Automata, Languages and Combinatorics, 6(1):2–23,
normal otherwise. 2001.
[Fu, 1974] King Sun Fu. Syntactic Methods in Pattern
The two numbers n and q give a good overview of a driver’s
profile. The fuel classifier labels an acceleration as too quick Recognition. Academic Press, 1974.
if its score is greater or equal to 1. We compare the labels [Rabiner, 1989] Lawrence R. Rabiner. A tutorial on hidden
assigned by the fuel classifier and this simple decision rule: Markov models and selected applications in speech recog-
nition. In Proceedings of the IEEE, volume 77, pages 257–
decision rule 286, 1989.
quick normal
quick 920 61 [Roddick and Spiliopoulou, 2002] John F. Roddick and
classifier Myra Spiliopoulou. A survey of temporal knowledge
normal 262 289
discovery paradigms and methods. IEEE Transactions on
The labels given by these two rules match in 79% of all cases. Knowledge and Data Engineering, 14(4):750–767, 2002.
This is rather high considering we only used 23 example [Shalizi and Crutchfield, 2001] Cosma Rohilla Shalizi and
strings to label only 4 states of the fuel PDRTA.
James P. Crutchfield. Computational mechanics: pattern
and prediction, structure and simplicity. Journal of statis-
5 Discussion tical physics, 104(3-4):817–879, 2001.
We applied timed syntactic pattern recognition techniques to [Sipser, 1997] Michael Sipser. Introduction to the Theory of
the problem of detecting driver behavior. Although our tech- Computation. PWS Publishing, 1997.
niques are still preliminary, the result that labeling only 4 [van der Aalst and Weijters, 2005] Wil M.P. van der Aalst
states in a learned PDRTA is sufficient to obtain an adequate and A.J.M.M. (Ton) Weijters. Process mining. In Process-
classifier clearly shows the high potential of our techniques. Aware Information Systems: Bridging People and Soft-
It would be interesting to investigate whether other driving ware through Process Technology, pages 235–255. Wiley
patterns can also be distinguished using these PDRTAs. & Sons, 2005.
Unfortunately, since the learned PDRTA models are quite
[Verwer et al., 2009] Sicco Verwer, Mathijs de Weerdt, and
large, it is difficult to discover different driving behaviors by
examining the models. We aim to use tools from process min- Cees Witteveen. One-clock deterministic timed automata
ing [van der Aalst and Weijters, 2005] for this purpose. Pro- are efficiently identifiable in the limit. In LATA, volume
cess mining tools contain many routines and visualizations 5457 of LNCS, pages 740–751. Springer, 2009.
that help to make large process models insightful. [Verwer et al., 2010] Sicco Verwer, Mathijs de Weerdt, and
Interesting directions for future work are the construction Cees Witteveen. A likelihood-ratio test for identifying
of more sophisticated discretization routines, better decision probabilistic deterministic real-time automata from posi-
rules and more specialized automaton models. In addition, tive data. In ICGI, volume 6339 of LNCS, pages 203–216.
our techniques are also useful for learning other models. It Springer, 2010.
would for instance be interesting to see whether our semi- [Yu, 2010] Shun-Zheng Yu. Hidden semi-markov models.
supervised method also achieves good results with learning Artificial Intelligence, 174(2):215 – 243, 2010.
algorithms for non-timed automata such as the state-merging [Zhu and Goldberg, 2009] Xiaojin Zhu and Andrew B.
algorithms for DFAs (see, e.g., [de la Higuera, 2005]). Fur-
Goldberg. Introduction to semi-supervised learning. Syn-
thermore, it would be interesting to see whether our tech-
thesis Lectures on Artificial Intelligence and Machine
niques can also be applied to methods for learning types of
Learning, 3(1):1–130, 2009.
hidden semi-Markov models (see, e.g., [Yu, 2010]) where

1534

You might also like