0% found this document useful (0 votes)
10 views16 pages

Paper Ejemplo Rolando

This study explores the use of machine learning models to predict passenger satisfaction in public transportation, specifically analyzing data from the Transantiago bus system in Chile. The research evaluates four machine learning models, with the Random Forest model identified as the most effective in predicting satisfaction levels based on variables such as waiting time, bus occupation, and bus speed. The findings aim to enhance understanding of passenger preferences and provide insights for improving transit services.

Uploaded by

eliot2009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views16 pages

Paper Ejemplo Rolando

This study explores the use of machine learning models to predict passenger satisfaction in public transportation, specifically analyzing data from the Transantiago bus system in Chile. The research evaluates four machine learning models, with the Random Forest model identified as the most effective in predicting satisfaction levels based on variables such as waiting time, bus occupation, and bus speed. The findings aim to enhance understanding of passenger preferences and provide insights for improving transit services.

Uploaded by

eliot2009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Transportation Research Part A 181 (2024) 103995

Contents lists available at ScienceDirect

Transportation Research Part A


journal homepage: www.elsevier.com/locate/tra

Predicting passenger satisfaction in public transportation using


machine learning models
Elkin Ruiz b , Wilfredo F. Yushimito a ,∗, Luis Aburto b,c , Rolando de la Cruz b,c
a Faculty of Engineering and Sciences, Universidad Adolfo Ibáñez, 750 Padre Hurtado Ave., Viña del Mar, 252000, Valparaíso, Chile
b Faculty of Engineering and Sciences, Universidad Adolfo Ibáñez, Diagonal Las Torres 2640., Santiago, 7941169, Santiago de Chile, Chile
c
Data Observatory Foundation, ANID Technology Center, Eliodoro Yáñez 2990, Oficina A5, Providencia, 7510277, Santiago de Chile, Chile

ARTICLE INFO ABSTRACT

Keywords: Enhancing the understanding of passenger satisfaction in public transportation is crucial for
Bus public transportation operators to refine transit services and to establish and elevate quality standards. While many
Machine learning researchers have tackled this issue using diverse tools and methods, the prevalent approach
Passenger satisfaction
involves surveys with discrete choice models or structural equations. However, a common
Prediction
limitation of these models lies in their inherent assumptions and predefined relationships
between dependent and independent variables.
To address these limitations, we introduce a novel perspective by harnessing machine
learning (ML) models to gauge and predict passenger satisfaction. ML models are advantageous
when dealing with complex, non-linear relationships and massive datasets, and do not rely on
predefined assumptions. Thus, in this paper, we evaluate four ML models for the prediction
of ratings of the quality of transit service. These models were calibrated using data from the
Transantiago bus system in Chile.
Among the ML models, the Random Forest model emerges as the most effective, showcasing
its ability to analyze and predict passengers’ satisfaction levels. We delve deeper into its
capabilities by examining the impact of three pivotal variables on passengers’ score ratings:
waiting time, bus occupation, and bus speed. The Random Forest model is able to capture
threshold values for these variables that significantly influence or have no effect on passenger
preferences.

1. Introduction

Bus and metro systems contribute to a significant reduction in greenhouse gas emissions, with a 33% and 76% decrease compared
to private cars respectively, as noted by Hodges (2010). This underscores the pivotal role of promoting public transportation (PT) in
mitigating carbon emissions, curbing air pollution, and addressing other externalities like congestion. However, for PT to effectively
compete with private cars, it must excel in various aspects such as availability, schedule, frequency, and trip time, among others.
These elements collectively form the quality of service, a crucial metric tied to consumer satisfaction and the perception of the
quality and efficiency of the transit service and its demand (Transportation Research Board, 1999; dell’Olio et al., 2018a).
One established method for gauging service quality involves employing customer satisfaction surveys, where customers articulate
their opinions and perceptions. They assess various aspects of the service using evaluation scales or satisfaction ratings (dell’Olio
et al., 2018b). In some cases, these individual ratings are then assigned weights and consolidated into an index or indicator.

∗ Corresponding author.
E-mail address: [email protected] (W.F. Yushimito).

https://doi.org/10.1016/j.tra.2024.103995
Received 11 September 2023; Received in revised form 5 January 2024; Accepted 3 February 2024
Available online 15 February 2024
0965-8564/© 2024 Elsevier Ltd. All rights reserved.
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

This index (or rating score) aims to encapsulate user preferences across different dimensions such as route characteristics, service
reliability, comfort, cleanliness, fare, information, safety, personnel, customer services, and environment. By synthesizing these
diverse aspects into a singular value or index, the overall level and quality of the service or characteristic under scrutiny can be
effectively measured. Furthermore, this index is usually designed to account for variations in judgments among different service
attributes, as demonstrated by studies such as those conducted by Eboli and Mazzulla (2009, 2011), Burlando et al. (2016) and de
Oña et al. (2016). Additionally, the index may encompass the element of involvement, reflecting individuals’ sentiments regarding
the relevance or importance of an object based on its inherent needs, values, and interests, as outlined by Lai and Chen (2011).
Another utilized technique for analyzing user-perceived quality and attitudes towards the service in public transport is Structural
Equation Modeling or SEM (dell’Olio et al., 2018d). SEM is a statistical modeling technique used to examine relationships between
observed and latent (unobservable) variables (Jöreskog, 1973). For the case of the quality of public transportation, it allows for
simultaneous examination of the relationships between observed variables (e.g., on-time performance, interior cleanliness, frequency
of service) and latent variables (i.e., reliability, comfort, accessibility) to gain insights into how different factors contribute to the
perceived quality of public transportation services (see for instance (Eboli and Mazzulla, 2007), Weng et al. (2023), Chou et al.
(2014) or the review by de Oña and de Oña (2014)). In all cases, the use of SEM is restricted to the estimation of effects and
relationships among multiple variables rather than to predict the quality of service.
Another method for assessing passengers’ perceptions involves employing stated preference surveys that focus on individual
travel experiences (de Oña and de Oña, 2014; dell’Olio et al., 2018c). Stated preference surveys is a survey method used to collect
data from individuals on what they would do in a hypothetical but realistic situation. These surveys offer a valuable avenue for
evaluating both current and potential users, allowing for insights into their preferences for existing or hypothetical services. In such
studies, the opinions of potential users are captured, shedding light on the significance of various service aspects that influence
their decisions regarding the use of public transit services. The responses can also report an actual or observed behavior, in these
cases, the survey is referred to as a revealed preference survey. In recent years, the proliferation of passively collected data (sensors,
GPS, mobile phone data, trip diary apps, etc.) is helping to provide a fuller picture of individual mobility behavior (Tsoleridis et al.,
2022). The exploration of these innovative and more robust data collection methods is recommended over or to complement stated
preferences (Daly et al., 2014).
To analyze the data gathered from these surveys, choice-based models such as multinomial logit, mixed logit, ordered logit,
or ordered probit models are commonly employed (see for instance, Hensher et al. (2003), Eboli and Mazzulla (2010), Bellizzi
et al. (2020), dell’Olio et al. (2011), Quddus et al. (2019), Eboli et al. (2020), Allen et al. (2020), Lois et al. (2018), Donoso et al.
(2013), Echaniz et al. (2018) or the comprehensive reviews by de Oña and de Oña (2014) and dell’Olio et al. (2018c)). In certain
scenarios, the abundance of attribute-specific satisfaction measures obtained from satisfaction surveys is streamlined into a concise
set of variables. This reduction, often necessary to handle a potentially correlated array of variables, is achieved through factor
analysis (Allen et al., 2018; Stradling et al., 2007; Deb et al., 2022). This approach can be further complemented by employing
additional statistical techniques such as ordered logit, as demonstrated by Tyrinopoulos and Antoniou (2008), or regression models,
as exemplified in the works of Agarwal (2008), Le et al. (2020) and Lunke (2020).
Nevertheless, these prevailing statistical models (namely SEM and discrete choice models) are laden with pre-established
relationships between dependent and independent variables, accompanied by assumptions such as expectations of normal data
distribution, linear associations between variables, and minimal multicollinearity, among others (Garrido et al., 2014; Garver, 2003).
It is noteworthy that these assumptions are frequently unmet in the context of customer satisfaction research (Taylor, 1997; Garver,
2003; dell’Olio et al., 2018e).
In light of these challenges, machine learning emerges as a more fitting approach for addressing such complexities and deviations.
As mentioned by van Cranenburgh et al. (2022), unlike traditional models, machine learning models exhibit a remarkable degree
of flexibility—they possess numerous parameters and make minimal apriori assumptions about the data-generating process. This
inherent flexibility empowers machine learning models to discern the intricate structure within the data without being confined by
rigid assumptions, offering a more robust and flexible framework for customer satisfaction research.
In this aspect, previous studies have explored the use of machine learning models in public transportation quality satisfaction
studies. dell’Olio et al. (2018e) revise three machine learning and data mining techniques employed in the evaluation and modeling
of public transport quality: neural networks used in Garrido et al. (2014), Bayesian networks (Perucca and Salini, 2014) (also used
in Wu et al. (2016)), and decision trees (de Oña et al., 2012). In all these cases, the models are used to determine the importance of
factors affecting the overall satisfaction or quality of the transportation system (Garrido et al., 2014; Perucca and Salini, 2014; Wu
et al., 2016; de Oña et al., 2012) or to discern latent aspects within a series of attributes delineating service quality, and to unveil
underlying dimensions that may not be overtly evident (de Oña et al., 2012). However, none of them explore the prediction power
capabilities of machine learning models.
The objective of this paper is to contribute to the methodological advancements in the field of public transportation quality
satisfaction modeling by proposing and validating innovative approaches within the machine learning framework. For this, we seek
to

• Assess and compare the performance of four machine learning models in modeling and predicting quality satisfaction in the
context of public transportation.
• Systematically select key variables that significantly contribute to the quality satisfaction of public transportation services.
• Explore the ability of machine learning models to effectively capture and model nonlinear relationships among selected key
variables, acknowledging the dynamic and complex nature of public transportation service satisfaction.

2
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

• Identify threshold values for selected variables that indicate the presence or absence of a significant impact on the overall
service quality, providing practical insights for service improvement strategies.

In doing so, in this paper we leverage a substantial dataset collected from the bus transit service of the Transantiago Bus System
in Santiago de Chile, the country’s primary bus system. Our primary objective is to rigorously evaluate a diverse set of machine
learning (ML) models, aiming to analyze and forecast passenger satisfaction. The dataset originates directly from the Transantiago
mobile application, providing a comprehensive snapshot of the entire trip experience. To enhance the robustness of our analysis, we
supplement this mobile app data with information directly obtained from the bus service (e.g., GPS, smart card data), to enhance
the accuracy of predicted preference ratings.
The study systematically compares the outcomes derived from our ML models with those obtained from discrete choice models,
specifically ordinal logit and multinomial logit models (used previously in the same transportation system by Donoso et al. (2013)).
The overarching goal is to assess the efficacy of ML models in providing more accurate predictions. Additionally, our research delves
into the application of ML models to scrutinize the impact of three pivotal operational variables: waiting time, bus occupancy, and
bus speed, on the quality of service rating scores. This multifaceted approach not only contributes to advancing our understanding
of passenger satisfaction prediction, but also aligns with the broader objective of testing ML models, predicting outcomes, and
identifying crucial threshold values in the realm of public transportation quality of service.
The organization of this paper is as follows. The next section briefly summarizes the theoretical aspects of the machine learning
models tested. Section 3 presents a description of the data, attributes, data sources as well as the study area. Section 4 presents the
results for each model and a comparison with benchmark models (multinomial and ordinal probit models) including a prediction
evaluation of the ratings of the quality of the transit service. Section 5 performs a sensitivity analysis of three of the most relevant
variables affecting passenger perception of the quality of the service finding threshold values that indicate the presence or absence
of a significant impact in the quality of service. Section 6 presents a discussion of the findings as well as police 𝑦 implications.
Finally, Section 7 presents the conclusions.

2. Machine learning predictive models

Predictive modeling is a concept in which models based on statistics seek to predict outcomes from certain data input. Typically,
such models include Machine Learning algorithms where computer programs can learn and train from historical data. Predictive
modeling can be subdivided into regression models and classification models. In the regression models, inputs are used in order to
predict a continuous output. In classification models, the objective is to assign observations to particular labels (or discrete classes).
Classification models can be further subdivided into supervised and unsupervised learning. In unsupervised learning, also called
descriptive, the main objective is the search for structures that are able to explain any phenomena over the data. In contrast, in
supervised learning, in most cases, there is a single variable called label (𝑌 ) that has to be explained using the rest of the variables
(𝐗). Thus, supervised learning learns a mapping function from 𝐗 to 𝑌 . In this case, computational algorithms based on statistics
(called models) predict the outcome 𝑌 from certain data input 𝐗. Typically, such models include Machine Learning and Data Mining
algorithms that are learned and trained from historical data. Since one of the purposes of this research is to predict different discrete
values with the rating score of the quality of the bus transit service the problem corresponds to a supervised classification problem.
The most relevant classification models and the variations that will be evaluated later are detailed in the following subsections

2.1. Supervised classification models

A classification model is an algorithm capable of predicting a result based on a series of inputs. These models use a supervised
type of learning, where one of the variables in the data set becomes the variable to be classified, or class label, and the other variables
are used to learn the class label (Aggarwal, 2014). The label corresponds to a discrete attribute or variable whose value is predicted
based on the values of other known attributes. Thus, the model could be seen as a ‘machine’, with a certain type of structure and
parameters, which, given input data, assigns it with certain probability to a specific class. That is, the algorithm receives as input a
new data point with valued variables and returns the probability of belonging to a class. In some models, instead of calculating a
probability, the model assigns the data point directly to a class, this is equivalent to assigning a probability 1.0 to one of the classes.
Most supervised models consider two stages: training and application. In the training stage, part of the data set, called the training
set, is used to develop the structure and parameters of the model, which are later tested over the test set. The second stage consists
of predicting the class of an input data 𝐗. Here, the trained model is applied on 𝐗 and the output corresponds to the probability
of belonging to a class. There are several classification models in the literature, four of the most common: decision tree, random
forest, support vector machines, and neural networks.

2.1.1. Decision tree


A decision tree is a structure composed of nodes, leaves, and branches, where each node corresponds to a decision (or a test
applied on some attribute), and each branch represents a possible path of this decision or test. hen the response is discrete, the
decision tree is referred to as the Classification Tree. In a decision tree, when a data point 𝐗 is inserted in the model, the tree
travels until it reaches a leaf. Each ‘leaf’ determines the probability that the data point belongs to one of the possible existing
classes (Quinlan, 1986). To learn the structure of a decision tree, the training algorithm calculates the information gain (decrease
of the entropy over the data) produced by the inclusion of each attribute. Then, the attribute with higher information gain, i.e. the

3
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

best variable to split the data into two different classes, is chosen as a node. This is repeated recursively until all the attributes are
added to the tree (Archana and Elangovan, 2014).
As it can be observed, decision trees have the advantage that they are simple and fast (once they have already learned the
model), in addition to producing results with high precision (Jadhav and Channe, 2013). Also, they are easy to understand and
support incremental learning, so new training data can be continuously added to improve the prediction without having to start
from scratch (Archana and Elangovan, 2014).

2.1.2. Random forest


Random Forest (Breiman, 2001) is a machine learning classification (and prediction) algorithm that consists of an ensemble of
decision trees. It creates multiple decision tree models during training and combines their predictions to make a final prediction,
typically yielding more accurate and robust results compared to a single decision tree. To do so, it generates a collection of decision
trees, each trained on a random subset of the training data, and potentially using a random subset of features for splitting at each
node. Then the algorithm selects a random subset of data points from the training dataset with replacement. This process is known
as bootstrap aggregating or ‘‘bagging’’. This creates different training datasets for each decision tree, introducing diversity and
reducing overfitting. At each node of a decision tree, only a random subset of features is considered for the best split. This further
enhances the diversity among the trees and reduces the correlation between them. Finally, for classification, the final prediction
is determined by majority voting among the predictions of individual trees (whereas for regression, the predictions are averaged.)
An advantage of Random Forest is that the combination of multiple models mitigates overfitting. It also tends to provide higher
accuracy compared to individual decision trees, and it is less sensitive to noise and outliers due to the averaging or voting process.
However, one disadvantage of this algorithm is that it is slow due to the long process involved in is generating and combining the
trees.

2.1.3. Support vector machines (SVMs)


The support vector machine (Christmann and Steinwart, 2008) is a machine learning algorithm that seeks to find a hyperplane
or set of hyperplanes in a high or infinite-dimensional space. It can be used for classification, regression, or other tasks like outliers
detection. The idea behind the algorithm is that a good separation is achieved by the hyperplane that has the largest distance to
the nearest training data point of any class (so-called functional margin), since in general the larger the margin, the lower the
generalization error of the classifier. For instance, in a two-class classification problem, an SVM aims to find the hyperplane that
maximizes the margin between the two classes. The margin is the distance between the hyperplane and the nearest data points from
each class. The larger the margin, the more confident the SVM is in its predictions. The data points that lie closest to the hyperplane
are called support vectors and they are critical in determining the position and orientation of the hyperplane. These points influence
the margin and the final decision boundary. For nonlinear relationships between features and classes, SVMs use a technique called
the ‘‘kernel trick’’. Kernels transform the input data into a higher-dimensional space, allowing the SVM to find a hyperplane that
can separate the data effectively even when it is not linearly separable in the original feature space. For the cases in which it is
not possible to find a hyperplane that perfectly separates the classes, the SVM allows for a certain degree of misclassification by
introducing a soft margin, which allows some data points to fall on the wrong side of the margin or hyperplane. This trade-off helps
improve the generalization ability of the model. Once the optimal hyperplane is found, SVM classifies new data points based on
which side of the hyperplane they fall on.
SVM performs well even when the number of features is much larger than the number of samples. SVM also prevents overfitting
by including a form of regularization, and it can capture complex relationships in the data. However, SVM has limitations when
dealing with very large datasets due to their computational complexity. Additionally, selecting appropriate kernel functions and
tuning hyperparameters can be challenging.

2.1.4. Neural networks


Neural networks (Neural nets) (Haykin, 2008) is a type of model that is loosely based on the human brain. A neural network is
composed of layers of interconnected nodes called neurons. Neurons in one layer are connected to neurons in the adjacent layers.
The three main types of layers are the input layer (receives the initial data), hidden layers (process information), and the output
layer (produces the final prediction or result). Each connection between neurons has an associated weight, which determines the
strength of the connection. Neurons also apply an activation function (to learn complex relationships in data) to the weighted sum
of their inputs to produce an output. Then data is passed (feedforward process) through the network layer by layer, transforming it
at each layer. The network’s output is then compared to the desired output, and an error is calculated. Backpropagation is used to
adjust the weights of connections in reverse order, propagating the error back through the network. This process helps the network
learn by minimizing the error and updating weights to improve predictions. Training a neural network involves feeding it a labeled
dataset and iteratively updating the weights to reduce the prediction error with optimization algorithms used to adjust the weights
and find the optimal values. This allows, neural networks to model highly complex relationships and patterns in data. However,
training neural networks requires substantial computational resources and large datasets. Hyperparameter tuning and architecture
selection can also be challenging.

4
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Fig. 1. Study area.

3. Methodology and data

3.1. Context

The data used in this study was obtained from Transantiago, the public transportation system of Santiago de Chile, the capital
city of Chile. This system has been operating since 2007 and it is based on a trunk-feeder structure, where the Metro is the principal
component. This Metro rail has 7 lines, 1,346 trains, and 136 stations covering a total of 140 km of network, and serving 2.4
million average transactions on a business day. The system also includes six bus operators and a suburban train line. According to
the management report published by the DTPM (2019), the bus system is made up of 382 bus routes, 7,279 buses, and 11,400 stops.
The bus system handles about 2.6 million average transactions in a working day. The suburban train is the most recent addition to
the system and consists of a single line of 20.3 km with 10 stations, and 16 trains, serving 73,451 average transactions in a working
day. The data used for this study corresponds to the information of the operator (U6) that serves the northeastern area of the city
providing 102 bus routes as shown in Fig. 1 The operator has a demand of 160,000 transactions in an average working day, and
its services use 1,478 stops. The obtained to calibrate the models was collected during 4 months between March and June 2018.
A total of 4,850 records with complete information were obtained. Of the data used for the calibration of the models, 90% of the
stops and 97% of the bus routes have information at some time of the day.

3.2. Data sources

Transantiago uses an integrated smartcard-based payment system. All the information on the smartcard is stored in a database.
The bus system also collects GPS pulses with the position of the buses. In addition, the bus system has mobile applications that
provide information to users about the arrival times of the buses at the bus stops and additionally. The mobile applications also
collect information on the travel experience, the state of the stops, and the rating (score) on the travel experience made. The data
used in this work come from three different sources (see Fig. 2) The first source corresponds to the data of the mobile application
which collects the travel experience. This is mainly an evaluation the user makes of the characteristics of the trip, the condition of
the bus, and the condition of the bus stop. In addition, the user also provides a general evaluation of the experience by means of a
score. In addition, the mobile application also collects the position. The second source contains data on bus stop characteristics and
location, and bus characteristics. The third data source consists of both the smartcard card database and the bus GPS database. A
procedure is developed to match-cross these two databases to obtain the time-location of the stops of boarding and alighting, bus
occupancy, trip time, waiting time, and distance traveled by a user. This is obtained by match-crossing the data obtained from the
smartcard and the bus GPS data (for details on the procedure see Munizaga and Palma (2012)).
As a result, a database is constructed using these three sources for the study area. For the estimation of the models, the score
given by the users at the end of their trip was used as the dependent variable while 4 groups of variables associated with the
conditions of the trip experienced by the user (Travel Dimension), physical attributes of the bus stops (Bus Stop Dimension), bus
operation (Bus Dimension), and quality of driving (Driver Dimension) that the drivers develop were used as independent variables.
The final database consists of 4,090 records and 17 variables summarized in Table 1.

5
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Fig. 2. Sketch of data sources and data used in the models.

3.3. Attribute information

The variables associated with the trips are represented by 4 variables that correspond to the duration of the trip, the percentage
of occupancy that the bus had at the moment a user boards the bus, the average time that a user waited at the bus stop, and
passenger reported status about capacity saturation of the bus.
In terms of the variables associated with the bus stop, these are represented by two groups: location variables that indicate
whether the bus stops are close to special infrastructures such as the metro or exclusive bus way, and physical variables associated
with the infrastructure of services that the bust stop offers such as information, seats, and illumination. For the bus operation, the
variables correspond to the average speed of the bus during the trip, and the physical conditions of the bus such as age, condition,
and cleanliness. Finally, a variable associated with the quality of the driving is included, which is obtained from the evaluations
made by the users in a 1–7 rating system(where 1 is poor and 7 is excellent). For the continuous variables such as waiting time,
bus occupancy, and bus speed, Fig. 3 shows their distribution according to the valuation score given by the user. It can be noticed,
that, as expected, users evaluate with lower scores services with longer waiting times (min) and higher bus occupancy (%). In the
case of the bus speed, the values have little difference. This is because the bus service under study runs in mixed traffic, which in
Chile has averages ranging between 5–20 km/h depending on the route, day, and time of day (Muñoz et al., 2013, 2014; Schmidt
et al., 2016; Soza-Parra et al., 2022)
Similar to Donoso et al. (2013), the quality scores (ratings) were aggregated into 3 categories (Bad, Average, Good). This
aggregation was carried out to have a more balanced data set because some of the categories had little data. A balanced data
set helps to prevent the model from becoming biased towards one class. As a result, the ‘‘Bad’’ category corresponds to scores 1-2-3
of the original base and represents 10% of the records, the ‘‘Average’’ category corresponds to scores 4–5 of the original base and
represents 49% of the records, and the ‘‘Good’’ category corresponds to scores 6–7 and represents the remaining 41%. The spatial
distribution of the scores for the unit of study is presented in Fig. 4.
The following section explains the calibration of the models as well as the evaluation of their performance in predicting the
quality of the bus service.

4. Results and evaluation of models

In this section, we present the results of the models tested with the database described in Section 3. For comparison, we used
choice-based models as a benchmark as these models were used in Donoso et al. (2013) with a similar database of Transantiago.
The choice-based models include an ordinal logit and a multinomial logit. We later calibrated the four ML models (Random Forest,
Decision Tree, Support Vector Machine, and Neural Networks) described in Section 2 following a Cross-Industry Standard Process
(CRISP) for the development of Machine Learning Models (Chapman et al., 2000). The section concludes with a comparison between
the best ordinal and multinomial models and the machine learning models focusing mainly on the quality of the predictions.

6
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Table 1
Variables considered.
Dimension Attribute Attribute values Frequency Average/std. dev.
definition
‘‘Bad’’: score [1–3] 418
Score ‘‘Average’’: score [4–5] 2011
‘‘Good’’: score [6–7] 1660
Travel Trip duration [0–60] 13.3/12.5
(min)
Bus Occupancy [0–1] 0.3/0.2
(%)
Average wait [0–30] 6.9/3.7
time (min)
Bus stop Full Bus stop (yes:1 or not:0) 1:353–0:3736
Bus stop with (yes:1 or no:0) 1:411–0:3678
pay zone
Bus stop near (yes:1 or no:0) 1:574–0:3515
the metro station
Bus Stop near (yes:1 or no:0) 1:52–0:4037
the exclusive bus
way
Bus stop without (yes:1 or no:0) 1:1694–0:2395
seats
Bus stop without (yes:1 or no:0) 1:894–0:3195
information
services
Bus stop without (yes:1 or no:0) 1:1370–0:2719
lighting
Bus stop without (yes:1 or no:0) 1:959–0:3130
roof
Bus Bus age (years) [0–13] 8.9/2.5
Bus speed [0–75] 12.7/6.2
(km/h)
Bus in good (yes:1 or no:0) 1:2675–0:1414
condition
Dirty Bus (yes:1 or no:0) 1:1071–0:3018
Driver Driver drives (yes:1 or no:0) 1:2302–0:1787
well

Score = Dependent Variable.

Fig. 3. Distribution of continuous variables by user valuation score.

4.1. Statistical models

We calibrated an Ordinal Logit model and a Multinomial Logit model using the Score Value as the dependent variable. The
models were estimated using 70% of the data while the rest was saved for prediction testing for comparison (see next subsection).

7
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Fig. 4. Spatial distribution of the scores reported by the users of the mobile app.

Table 2 the model including all original variables (significant and not significant) and the final model with only the significant
variables for both types of model. As it can be observed, in both types of models the final significant variables were the same. For
the travel dimension, the significant variables were bus occupation, average waiting time, and whether the bus had a full bus stop or
not. The trip duration was not significant in both types of models. Regarding the bus stop dimension (characteristics of the bus stop)
only the condition of the bus stop related to whether the bus stop has seats and lightning were relevant. For the bus dimension the
conditions of the bus (good or bad) and whether the bus was clean or not appear significant. The variables encountered significantly
(and their effects) are similar to the ones available in the literature, in particular, they are similar to the ones found in Donoso et al.
(2013). Also, as in Donoso et al. (2013) by observing both the AIC and the Log Likelihood values, we obtained that, for the data,
the most appropriate statistical model is the Ordinal Logit with only 8 the significant variables.

4.2. Machine learning models

For all the machine learning models we calibrated the models with 70% of the data. The remaining 30% was used for testing
following a 𝑘-fold cross-validation with 𝑘 = 10. Then, we computed the following indicators: accuracy, recall, F-1 score, sensitivity,
specificity, and AUC for each machine learning model, namely Random Forest, Decision Tree, SVM, and Neural Networks. We also
computed the same measures for the Ordinal Logit and the Multinomial Logit models with 30% of the data saved for training
(see previous subsection). As it is shown in Table 3, all machine learning models outperform both the best Ordinal Logit and the
Multinomial Logit models. Among the machine learning models, the best performance was achieved by Random Forest.

4.3. Comparison between choice-based and machine learning model

We also included a confusion matrix or error matrix ( Table 4), which allows us to visualize the performance of the models. Each
column of Table 4 represents the instances in an actual category (bad, average, or good) while each row represents the predicted
score given by the models. As observed at the diagonal values in each model, Random Forest performs slightly better than the other
models. Thus, we select Random Forest for further analysis. The next subsection identifies the key relevant variables encountered
by the Random Forest and presents the effects of these variables in the prediction of the rating score.

5. Analysis

For the analysis, we first identify the most relevant variables in the Random Forest model (the best ML model encountered)
that influence the probability of rating the bus service as bad, average, or good. Once the key variables are identified, we analyze
their effect in predicting the probability of the score and discuss their implications. For comparison, and to show the benefits of the
Random Forest model, we computed the predictions of the Ordered Logit model (the best choice-based model) from Table 2 in the
probability of the scores.

5.1. Importance of variables

To identify the most relevant variables in the Random Forest model, we ranked the variables by their importance in the model.
The ranking is based on two metrics. The first metric is the mean decrease accuracy which is the loss in the mean accuracy of the

8
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Table 2
Ordinal and multinomial models.
Attribute Ordinal-All Ordinal Multinomial Logit-All Multinomial Logit
Logit Logit Average Good Average Good
−0.004* 0.004 −0.0003
Trip duration(min)
(−0.004) (0.010) (0.011)
Bus stop near −0.111 −0.152 −0.248
the metro station (−0.15) (0.493) (0.502)
Bus Stop near −0.255 −0.716 −1.035
the exclusive bus way (−0.392) (0.878) (0.919)
Bus stop with 0.083 1.217** 0.982*
pay zone (−0.177) (0.571) (0.585)
Bus age −0.006 −0.026 −0.044
(years) (−0.017) (0.053) (0.054)
Bus Occupancy −3.513*** −3,475*** −5.046*** −7.133*** −4.841*** −6.978**
(Load Factor) (−0.211) (0.201) (0.461) (0.493) (0.445) (0.477)
Average wait time −0.308*** −0.291*** −0.467*** −0.554*** −0.449*** −0.521**
(min) (−0.015) (0.014) (0.031) (0.035) (0.029) (0.032)
−0.893*** −0.560*** 0.769*** −0.267 0.620*** −0.144
Bus stop without seats
(−0.089) (0.082) (0.247) (0.255) (0.225) (0.233)
Bus stop without −0.182** −0.344 −0.370
information services (−0.120) (0.324) (0.333)
Bus stop without −1.042*** −0.449*** 0.654** −0.668* 0.125 −0.465*
lightning (−0.119) (0.086) (0.334) (0.347) (0.236) (0.244)
1.195*** −0.563 0.677
Bus stop without roof
(−0.152) (0.408) (0.418)
0.016** 0.019** 0.018 0.031* 0.011 0.028
Bus speed (km/h)
(−0.007) (0.006) (0.018) (0.019) (0.017) (0.017)
0.461*** 0.644*** 0.275 0.754*** 0.255 0.731***
Bus in good condition
(−0.093) (0.085) (0.250) (0.257) (0.250) (0.257)
−0.777*** −0.905*** −0.583** −1.242*** −0.609** −1.293**
Dirty Bus
(−0.097) (0.095) (0.240) (0.251) (0.237) (0.248)
−0.171** −0.203* 0.118 −0.102 0.153 −0.126
Driver drives well
(−0.088) (−0.141) (0.244) (0.249) (0.240) (0.246)
−0.723*** −0.737*** 0.168 −0.843** 0.226 −0.816**
Full Bus stop
(−0.143) (0.143) (0.389) (0.414) (0.379) (0.404)
8.252*** 9.780*** 7.178*** 8.383***
Constant
(0.791) (0.819) (0.475) 0.496)
Log Likelihood: −2065.8 −2078.2 −1888.3 −1939.1
Akaike Inf. Crit. (AIC) 4110.8 4178.4 3848.5 3918.2

Dependent Variable = Score, Observations (Training) = 2862, Observations (Testing) = 1227.


* 𝑝 < 0.1.
** 𝑝 < 0.05.
*** 𝑝 < 0.01.

random forest model in the absence of a specific variable. That is, it measures how much the model accuracy decreases if a variable
is left out. The second metric is the mean decrease in the Gini coefficient. This metric measures how each variable splits leaves and
contributes to the homogeneity or purity of the nodes. In other words, it measures the importance of the variable based on the Gini
impurity index used for calculating the splits in the trees. In both cases, the higher the value of mean decrease accuracy or mean
decrease Gini score, the higher the importance of the variable in the model.
Fig. 5 shows the ranking of the importance of the variables. We can observe that waiting time and bus occupancy are the most
relevant variables according to both measures. This is not surprising as it is widely known that these two factors affect the perceived
quality of service. For instance, passengers perceive waiting time as about two times as long as the in-vehicle travel time (Wardman,
1998), and waiting time is usually related to bus occupancy (Chien, 2005).
To better visualize the advantage of Random Forest, we assess the effects of waiting time, and percentage of bus occupancy in the
predicted probabilities of the score, that is, rating a trip as good, average, or bad by varying the waiting time and bus occupation.
In addition to these two variables, we included bus speed in the analysis because of its value in the mean decrease Gini index,
and because bus speed is usually related to in-vehicle travel time, another key driver of customer satisfaction according to Chien
(2005), Garrido et al. (2014), Saleem et al. (2023), Weng et al. (2023). It is noteworthy to highlight that we also evaluated the
partial correlation including simultaneous variation of the most relevant variables identified by the Random Forest model through
partial dependency plots (Hastie et al., 2009). This analysis is intended to give intuition about interaction effects, offering insights

9
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Table 3
Goodness-of-fit indicators for choice-based and machine learning models.
Indicator Ordinal Multinomial Random Decision SVM Neural
logit logit forest tree networks
Accuracy: 0.66 0.66 0.74 0.68 0.71 0.72
(95% IC) (0.63–0.69) (0.63–0.69) (0.71–0.77) (0.66– 0.71) (0.68–0.74) (0.69–0.74)
Recall:
Bad 0.72 0.78 0.80 0.82 0.78 0.79
Average 0.69 0.70 0.68 0.68 0.69 0.73
Good 0.59 0.59 0.66 0.64 0.71 0.59
Weighted
Recall: 0.66 0.67 0.68 0.68 0.71 0.69
F1 Score:
Bad 0.81 0.83 0.85 0.85 0.84 0.82
Average 0.68 0.69 0.68 0.69 0.72 0.69
Good 0.60 0.59 0.65 0.62 0.67 0.71
Weighted F1: 0.66 0.67 0.68 0.68 0.71 0.71
Sensitivity:
Bad 0.72 0.78 0.87 0.82 0.78 0.75
Average 0.70 0.70 0.67 0.68 0.70 0.67
Good 0.59 0.59 0.74 0.64 0.71 0.66
Weighted
Sensitivity: 0.66 0.67 0.71 0.68 0.71 0.67
Specificity:
Bad 0.99 0.99 0.98 0.98 0.99 0.99
Average 0.62 0.64 0.78 0.68 0.73 0.69
Good 0.74 0.74 0.72 0.73 0.74 0.73
Weighted
Specificity: 0.70 0.71 0.78 0.73 0.76 0.74
AUC:
Bad 0.96 0.96 0.99 0.93 0.93 0.93
Average 0.71 0.73 0.81 0.75 0.76 0.75
Good 0.75 0.76 0.82 0.78 0.78 0.76
Weighted
AUC: 0.75 0.76 0.83 0.78 0.79 0.77

Table 4
Confusion matrices for choice-based and random forest, decision trees, SVM, and neural networks.
Prediction Ordinal logit Multinomial logit Random forest
Bad Average Good Bad Average Good Bad Average Good
Bad 79 6 1 86 8 3 81 6 1
Average 29 446 192 18 450 192 27 496 162
Good 1 189 284 5 183 282 1 119 334
Prediction Decision trees SVM Neural net
Bad Average Good Bad Average Good Bad Average Good
Bad 90 12 0 86 8 1 89 9 1
Average 13 436 171 18 446 136 27 439 210
Good 6 193 306 5 187 340 5 147 300

into their collective influence on the perceived quality of the trip. However, when evaluating the correlation with joint changes
in the predictor variables, we found no clear interaction effects with the quality score. Consequently, recognizing the absence of
pronounced interactions, our focus is on a detailed exploration of the individual variables.

5.2. Effect of waiting time

Fig. 6 shows the waiting time effect on the predicted probability of the quality score. It can be observed that the probability
of rating a trip depends on the waiting time. A good rating score is achieved for low waiting times while high waiting times are
related to a poor quality rating. This effect is captured in both the Ordered Logit model and the Random Forest model. However,
while the effect is smooth in the Ordered Logit model, in the Random Forest model the effect is non-even and non-linear with clear
breakdown points. For instance, the first drop in the probability of rating a trip as good occurs at 5 min of waiting time, and the
second drop at around 10 min. The same breakdown points can be identified in the probability of passengers starting to evaluate

10
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Fig. 5. Importance variables- random forest model-final model (Left: Mean decrease accuracy, Right: Mean decrease Gini.

Fig. 6. Waiting time effect on predicted probability of the score by model (Left: Ordered logit, Right: Random forest).

their trip as bad. This effect might be related to the fact that passengers tend to perceive the waiting times worse than they actually
are (Fan et al., 2016).

5.3. Effect of bus occupancy

The second analysis focuses on bus occupancy. According to the Ordered Logit model, the effect of bus occupation is similar to
the effect of the waiting time. However, the Random Forest model detects other behaviors in the probability of rating the quality
of the trip. For instance, passengers seem to tolerate up to 80%–85% of occupancy until rating a trip as bad. Fig. 7 shows the bus
occupancy effect on the predicted probability of the score by each model. As can be observed, the Ordered Logit model predicts two
critical occupancy rates. One at approximately 20% for the service to start degrading, and another at 75% when the service becomes
poor. In the case of Random Forest, the model predicts that the bus occupancy only becomes critical when at 85% of occupancy,
before that the rating of bad service is almost constant while the rating of a good service reduces at 55% of occupancy. These results
are more consistent with (Bazaki et al., 2022) who mention that bus occupancy affects negatively the quality of the services when

11
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Fig. 7. Bus occupancy effect on predicted probability of the score by model (Left: Ordered logit, Right: Random forest).

Fig. 8. Bus speed effect on predicted probability of the score by model (Left: Ordered logit, Right: Random forest).

it is above an acceptable threshold which in this case would be 85%. It is worth remarking that this effect is not captured in other
models such as Weng et al. (2023) which find almost a linear relationship between passenger satisfaction and load factor.

5.4. Effect of bus speed

In terms of the bus speed, bus speed shows almost a linear effect according to the Ordered Logit model (similar to Weng et al.
(2023)), while the Random Forest detects a non-smooth behavior. For instance, the model detects that the probability of rating the
quality of the trip as bad decreases when the speed increases from very low speed to values between 10–15 km/h (similar to Weng
et al. (2023)). However, this probability increases once the bus reaches 15 km/h (see Fig. 8) which can be related to the perception
of loss of safety or risky driver behavior (Khoo and Ahmed, 2018).

6. Discussion

In this section, we discuss policy implications derived from the analysis in Section 5. Concerning the findings on waiting time
hold significance for public transportation strategies:

• For instance, the observed breakpoints at 5 and 10 min in the probability of rating a trip as good or bad emphasize the
critical importance of minimizing waiting times. This implies that policymakers should consider measures to optimize service

12
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

schedules, reduce waiting intervals, and enhance operational efficiency to positively influence passenger satisfaction. As Muñoz
et al. (2020), Soza-Parra et al. (2019) point out, regularizing headway can be an effective tool for improving waiting times.
• Also, acknowledging the breakpoints as indicators of changing passenger perceptions, implies that policymakers should adopt
a dynamic approach to service adjustments. Real-time monitoring and adaptive scheduling can help address fluctuations in
passenger satisfaction associated with specific waiting time thresholds. However, this requires the involvement of drivers as
certain drivers (more experienced drivers), are usually the more reluctant to use the technology, for example, onboard headway
control tool (Martínez-Estupiñan et al., 2023).
• Following Fan et al. (2016), to mitigate the impact of waiting times on passenger satisfaction, policymakers could invest in
amenities at transportation stops. Comfortable seating, shelter from inclement weather, and access to informational displays
may contribute to a more positive waiting experience, reducing the perceived inconvenience.

The policy implications arising from the second analysis, focusing on bus occupancy, imply that

• The findings from both the Ordered Logit and Random Forest models reveal critical occupancy thresholds associated with
service degradation. Thus, policymakers should consider establishing operational guidelines that address these thresholds,
emphasizing strategies to manage and regulate bus occupancy levels to prevent service quality deterioration.
• The disparities in predicted critical occupancy rates between models require that policymakers adopt a dynamic approach to
occupancy management. Again, real-time monitoring systems such as the one proposed by Delgado et al. (2009) can be adapted
to facilitate proactive adjustments to avoid exceeding critical thresholds, ensuring a more responsive and passenger-centric
service.
• Building on the findings that passengers tolerate up to 80%–85% occupancy, policymakers can establish benchmark levels for
acceptable occupancy. This benchmark can guide operators in maintaining service quality within tolerable limits and serve as
a reference point for continuous improvement initiatives.

Finally, the policy implications of the observed findings on bus speed suggest that:

• The non-smooth behavior detected by the Random Forest model, particularly the increase in the probability of negative ratings
beyond a certain speed, indicates a potential perception of safety risks or risky driver behavior. Provided that drivers might be
reluctant to the use of technology (Martínez-Estupiñan et al., 2023), policy makers should prioritize safety measures, such as
driver training programs, monitoring systems, and awareness campaigns, to address passenger concerns and enhance overall
satisfaction. Moreover, investment could be made in training programs for bus drivers focused on maintaining a balance
between efficiency and passenger comfort. Training initiatives that emphasize safe and customer-oriented driving practices
may contribute to a positive perception of the service. Ideally, these training initiatives should also consider some factors that
usually affect drivers such as driver demographics, driving experience, sleep state, psychological stress, and levels of anger
driving, among others (Wang et al., 2021).
• The observed non-linear relationship highlights the dynamic nature of passenger preferences regarding bus speed. Policymakers
should implement continuous monitoring systems to assess and adapt bus speed policies in response to changing passenger
perceptions and evolving safety considerations.

7. Conclusions

This paper has undertaken a comprehensive assessment of four distinct machine learning models – Decision Trees, SVM, Neural
Networks, and Random Forest – in the context of predicting rating scores for a bus public transportation service in Santiago de Chile.
The calibration of these models utilized a diverse dataset from three distinct sources: operational data derived from GPS and smart
card records, bus, and bus stop data extracted from the bus system database, and passenger evaluation scores along with associated
information gathered from the transportation system mobile app. The evaluation scores were categorized into bad service, average
service, and good service.
For comparison, we conducted estimations using an Ordinal Logit and a Multinomial Logit model, with the former having been
previously employed by Donoso et al. (2013) for akin objectives within the same transportation system. Our comparative analysis
revealed that the machine learning models outperformed the choice-based statistical models in predicting the service level scores,
with Random Forest emerging as the most proficient among the machine learning models. Even when utilizing the same variables
obtained significantly by the Ordinal Logit model, the Random Forest model demonstrated superior predictive capabilities as Random
Forest shows more flexibility in capturing non-linear relationships in the data.
Furthermore, we delved into assessing the impact on score prediction of the three most influential variables identified by the
Random Forest model -waiting time, bus occupancy, and bus speed. Unlike the Ordinal Logit model, due to the non-smooth behavior
of the variables, the Random Forest model exhibited an adept ability to pinpoint value thresholds. This capability holds significance
in determining instances when the performance of the bus system is likely to be rated unfavorably. For instance, the model found
that passengers can tolerate up to 80%–85% of bus occupancy, passengers also value 5-10 min of waiting time. Another interesting
finding is that speed is valued up to a certain value implying some concerns of safety among passengers. These findings shed light
on the needed improvements in the system.
However, some improvements can be included in future work. For instance, temporal dynamics can be included in the analysis
by exploring variations in service quality ratings across different periods and identifying potential patterns or trends. In addition,

13
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

the robustness and generalizability of the machine learning models need to be tested by conducting cross-validation studies across
diverse datasets and different geographical or operational contexts. Finally, we are currently enhancing the predictive capabilities
of the models by developing strategies for dynamic model updating that can adapt to evolving transportation system dynamics
and changing passenger preferences. In doing so, we are currently developing an interactive tool that exemplifies dynamic model
updating to seamlessly adapt to the evolving dynamics of transportation systems and shifting passenger preferences. This system
would help identify routes, bus stops, and areas of the city that are providing low-quality service. This information would also
enable decision-makers to monitor and use their resources effectively.

CRediT authorship contribution statement

Elkin Ruiz: Data curation, Formal analysis, Software, Writing – original draft. Wilfredo F. Yushimito: Conceptualization, Formal
analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing. Luis Aburto: Conceptualization, Formal
analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing. Rolando de la Cruz: Conceptualization,
Formal analysis, Methodology, Writing – original draft.

Acknowledgments

Luis Aburto acknowledges partial support from the National Fund for Scientific and Technological Research of Chile (FONDECYT)
through grant No. 11220944. Rolando de la Cruz acknowledges partial support from ANID/PIA/Anillo ACT 210096.

References

Agarwal, R., 2008. Public transportation and customer satisfaction: The case of Indian railways. Global Bus. Rev. 9, 257–272. http://dx.doi.org/10.1177/
097215090800900206.
Aggarwal, C., 2014. Data Classification: Algorithms and Applications, first ed. Chapman & Hall/CRC.
Allen, J., Eboli, L., Mazzulla, G., Ortúzar, J., 2020. Effect of critical incidents on public transport satisfaction and loyalty: An ordinal probit SEM-MIMIC approach.
Transportation 47, 827–863. http://dx.doi.org/10.1007/s11116-018-9921-4.
Allen, J., Muñoz, J., Ortúzar, J., 2018. Modelling service-specific and global transit satisfaction under travel and user heterogeneity. Transp. Res. A 113, 509–528.
http://dx.doi.org/10.1016/j.tra.2018.05.009.
Archana, S., Elangovan, D.K., 2014. Survey of classification techniques in data mining. Int. J. Comput. Sci. Mob. Appl. 2 (2).
Bazaki, I., Gioldasis, C., Giannoulaki, M., Christoforou, Z., 2022. Transit quality of service assessment using smart data. Future Transp. 2 (2), 414–424.
http://dx.doi.org/10.3390/futuretransp2020023, URL: https://www.mdpi.com/2673-7590/2/2/23.
Bellizzi, M., dell’Olio, L., Eboli, L., Mazzulla, G., 2020. Heterogeneity in desired bus service quality from users’ and potential users’ perspective. Transp. Res. A.
132, 365–377. http://dx.doi.org/10.1016/j.tra.2019.11.029.
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. http://dx.doi.org/10.1023/A:1010933404324.
Burlando, C., Ivaldi, E., Musso, E., 2016. An indicator for measuring the perceived quality of local public transport: Relationship with use and satisfaction with
the ticket price. Int. J. Transp. Econ. 43, 451–473. http://dx.doi.org/10.19272/201606704003.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T.P., Shearer, C., Wirth, R., 2000. CRISP-DM 1.0: Step-by-step data mining guide. URL: https:
//api.semanticscholar.org/CorpusID:59777418.
Chien, S.I.-J., 2005. Optimization of headway, vehicle size and route choice for minimum cost feeder service. Transp. Plan. Technol. 28 (5), 359–380.
http://dx.doi.org/10.1080/03081060500322565.
Chou, P.-F., Lu, C.-S., Chang, Y.-H., 2014. Effects of service quality and customer satisfaction on customer loyalty in high-speed rail services in Taiwan.
Transportmetrica A 10 (10), 917–945. http://dx.doi.org/10.1080/23249935.2014.915247.
Christmann, A., Steinwart, I., 2008. Support Vector Machines, first ed. Springer-Verlag, New York, US.
Daly, A., Tsang, F., Rohr, C., 2014. The value of small time savings for non-business travel. J. Transp. Econ. Policy 48 (2), 205–218, http://www.jstor.org/
stable/24396326.
de Oña, J., de Oña, R., Eboli, L., Mazzulla, G., 2016. Index numbers for monitoring transit service quality. Transport. Res. A 84, 18–30. http://dx.doi.org/10.
1016/j.tra.2015.05.018.
Deb, S., Ahmed, M.A., Das, D., 2022. Service quality estimation and improvement plan of bus service: A perception and expectation based analysis. Case Stud.
Transp. Policy 10, 1775–1789. http://dx.doi.org/10.1016/j.cstp.2022.07.008, URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85137138798,
Cited by: 1.
Delgado, F., Muñoz, J.C., Giesen, R., Cipriano, A., 2009. Real-time control of buses in a transit corridor based on vehicle holding and boarding limits. Transp.
Res. Rec. 2090 (1), 59–67. http://dx.doi.org/10.3141/2090-07.
dell’Olio, L., Ibeas, A., Cecin, P., 2011. The quality of service desired by public transport users. Transp. Policy 18, 217–227. http://dx.doi.org/10.1016/j.tranpol.
2010.08.005.
dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R., 2018a. Chapter 1 - Introduction. In: dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R. (Eds.), Public Transportation
Quality of Service. Elsevier, pp. 1–6. http://dx.doi.org/10.1016/B978-0-08-102080-7.00001-X.
dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R., 2018b. Chapter 6 - Most basic methods. In: dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R. (Eds.), Public Transportation
Quality of Service. Elsevier, pp. 85–100. http://dx.doi.org/10.1016/B978-0-08-102080-7.00006-9, URL: https://www.sciencedirect.com/science/article/pii/
B9780081020807000069.
dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R., 2018c. Chapter 7 - Methods based on random utility theory. In: dell’Olio, L., Ibeas, A., de Oña, J.,
de Oña, R. (Eds.), Public Transportation Quality of Service. Elsevier, pp. 101–139. http://dx.doi.org/10.1016/B978-0-08-102080-7.00007-0, URL: https:
//www.sciencedirect.com/science/article/pii/B9780081020807000070.
dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R., 2018d. Chapter 8 - structural equation models. In: dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R. (Eds.),
Public Transportation Quality of Service. Elsevier, pp. 141–154. http://dx.doi.org/10.1016/B978-0-08-102080-7.00008-2, URL: https://www.sciencedirect.
com/science/article/pii/B9780081020807000082.
dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R., 2018e. Chapter 9 - Data mining approaches. In: dell’Olio, L., Ibeas, A., de Oña, J., de Oña, R. (Eds.), Public
Transportation Quality of Service. Elsevier, pp. 155–179. http://dx.doi.org/10.1016/B978-0-08-102080-7.00009-4, URL: https://www.sciencedirect.com/
science/article/pii/B9780081020807000094.

14
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

Donoso, P., Munizaga, M., Rivera, J., 2013. Measuring user satisfaction in transport services: Methodology and application. In: Zmud, J., Lee-Gosselin, M.,
Munizaga, M., Carrasco, J. (Eds.), Transport Survey Methods: Best Practice for Decision Making, first ed. Emerald Publishing, pp. 603–624.
Eboli, L., Forciniti, C., Mazzulla, G., 2020. Capturing the differences in perceiving service quality of metro passengers of Madrid. Eur. Transp. - Trasporti Europei.
Eboli, L., Mazzulla, G., 2007. Service quality attributes affecting customer satisfaction for bus transit. J. Public Transport. 10 (3), 21–34. http://dx.doi.org/10.
5038/2375-0901.10.3.2.
Eboli, L., Mazzulla, G., 2009. A new customer satisfaction index for evaluating transit service quality. J. Public Transport. 12, 21–37. http://dx.doi.org/10.5038/
2375-0901.12.3.2.
Eboli, L., Mazzulla, G., 2010. How to capture the passengers’ point of view on a transit service through rating and choice options. Transp. Rev. 30, 435–450.
http://dx.doi.org/10.1080/01441640903068441.
Eboli, L., Mazzulla, G., 2011. A methodology for evaluating transit service quality based on subjective and objective measures from the passenger’s point of
view. Transp. Policy 18, 172–181. http://dx.doi.org/10.1016/j.tranpol.2010.07.007.
Echaniz, E., dell’Olio, L., Ibeas, A., 2018. Modelling perceived quality for urban public transport systems using weighted variables and random parameters. Transp.
Policy 67, 31–39. http://dx.doi.org/10.1016/j.tranpol.2017.05.006, URL: https://www.sciencedirect.com/science/article/pii/S0967070X16304632, cited By
29.
Fan, Y., Guthrie, A., Levinson, D., 2016. Waiting time perceptions at transit stops and stations: Effects of basic amenities, gender, and security. Transp. Res. A
88, 251–264. http://dx.doi.org/10.1016/j.tra.2016.04.012, URL: https://www.sciencedirect.com/science/article/pii/S0965856416303494.
Garrido, C., de Oña, R., de Oña, J., 2014. Neural networks for analyzing service quality in public transportation. Expert Syst. Appl. 41, 6830–6838.
http://dx.doi.org/10.1016/j.eswa.2014.04.045, cited By 59.
Garver, M.S., 2003. Best practices in identifying customer-driven improvement opportunities. Ind. Market. Manag. 32 (6), 455–466. http://dx.doi.org/10.1016/
S0019-8501(02)00238-9.
Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning. In: Springer Series in Statistics, Springer, New York, NY, http://dx.doi.org/
10.1007/978-0-387-84858-7.
Haykin, S., 2008. Neural Networks and Learning Machines, third ed. Person, New York, US.
Hensher, D., Stopher, P., Bullock, P., 2003. Service quality - developing a service quality index in the provision of commercial bus contracts. Transp. Res. A 37,
499–517. http://dx.doi.org/10.1016/S0965-8564(02)00075-7.
Hodges, T., 2010. Public Transportation’s Role in Responding to Climate Change. Technical Report, U.S. Department of Transportation Federal Transit
Administration.
Jadhav, S.D., Channe, H.P., 2013. Comparative study of K-NN, naive Bayes and decision tree classification techniques. Int. J. Sci. Res.
Jöreskog, K.G., 1973. Analysis of covariance structures. In: Krishnaiah, P. (Ed.), Multivariate Analysis, III. Academic Press, New York, pp. 263–285.
Khoo, H.L., Ahmed, M., 2018. Modeling of passengers’ safety perception for buses on mountainous roads. Accid. Anal. Prev. 113, 106–116. http://dx.doi.org/
10.1016/j.aap.2018.01.025.
Lai, W.-T., Chen, C.-F., 2011. Behavioral intentions of public transit passengers-the roles of service quality, perceived value, satisfaction and involvement. Transp.
Policy 18, 318–325. http://dx.doi.org/10.1016/j.tranpol.2010.09.003.
Le, H., Carrel, A., Li, M., 2020. How much dissatisfaction is too much for transit? Linking transit user satisfaction and loyalty using panel data. Travel Behav.
Soc. 20, 144–154. http://dx.doi.org/10.1016/j.tbs.2020.03.007.
Lois, D., Monzón, A., Hernández, S., 2018. Analysis of satisfaction factors at urban transport interchanges: Measuring travellers’ attitudes to information, security
and waiting. Transp. Policy 67, 49–56. http://dx.doi.org/10.1016/j.tranpol.2017.04.004.
Lunke, E.B., 2020. Commuters’ satisfaction with public transport. J. Transp. Health 16, http://dx.doi.org/10.1016/j.jth.2020.100842.
Martínez-Estupiñan, Y., Delgado, F., Muñoz, J.C., Watkins, K.E., 2023. Understanding what elements influence a bus driver to use headway regularity tools: Case
study of Santiago public transit system. Transportmetrica A 19 (2), 2025950. http://dx.doi.org/10.1080/23249935.2022.2025950.
Munizaga, M., Palma, C., 2012. Estimation of a disaggregate multimodal public transport origin-destination matrix from passive smartcard data from Santiago,
Chile. Transp. Res. C 24, 9–18. http://dx.doi.org/10.1016/j.trc.2012.01.007.
Muñoz, J.C., Batarce, M., Hidalgo, D., 2014. Transantiago, five years after its launch. Res. Transport. Econ. 48, 184–193. http://dx.doi.org/10.1016/j.retrec.2014.
09.041, Competition and Ownership in Land Passenger Transport (selected papers from the Thredbo 13 conference).
Muñoz, J.C., Batarce, M., Torres, I., 2013. Comparación del nivel de servicio del transporte público de seis ciudades latinoamericanas. Congreso Chileno de
Ingeniería de Transporte (16), URL: https://revistas.uchile.cl/index.php/CIT/article/view/28452.
Muñoz, J.C., Soza-Parra, J., Raveau, S., 2020. A comprehensive perspective of unreliable public transport services’ costs. Transportmetrica A 16 (3), 734–748.
http://dx.doi.org/10.1080/23249935.2020.1720861.
de Oña, J., de Oña, R., 2014. Quality of service in public transport based on customer satisfaction surveys: A review and assessment of methodological approaches.
Transp. Sci. 49 (3), 605–622. http://dx.doi.org/10.1287/trsc.2014.0544.
de Oña, J., de Oña, R., Calvo, F., 2012. A classification tree approach to identify key factors of transit service quality. Expert Syst. Appl. 39, 11164–11171.
http://dx.doi.org/10.1016/j.eswa.2012.03.037.
Perucca, G., Salini, S., 2014. Travellers’ satisfaction with railway transport: A Bayesian network approach. Qual. Technol. Quant. Manag. 11 (1), 71–84.
http://dx.doi.org/10.1080/16843703.2014.11673326.
Quddus, M., Rahman, F., Monsuur, F., de Ona, J., Enoch, M., 2019. Analyzing Bus Passengers’ Satisfaction in Dhaka using Discrete Choice Models. 2673, pp.
758–768. http://dx.doi.org/10.1177/0361198119825846,
Quinlan, J.R., 1986. Introduction of decision trees. Mach. Learn. 1 (1), 81–106. http://dx.doi.org/10.1007/BF00116251.
Saleem, M.A., Afzal, H., Ahmad, F., Ismail, H., Nguyen, N., 2023. An exploration and importance-performance analysis of bus rapid transit systems’ service
quality attributes: Evidence from an emerging economy. Transp. Policy 141, 1–13. http://dx.doi.org/10.1016/j.tranpol.2023.07.010.
Schmidt, A., Muoz, J.C., Bucknell, C., Navarro, M., Simonetti, C., 2016. Increasing the speed: Case study from Santiago, Chile. Transp. Res. Rec. 2539 (1), 65–71.
http://dx.doi.org/10.3141/2539-08.
Soza-Parra, J., Cats, O., Carney, Y., Vanderwaart, C., 2019. Lessons and evaluation of a headway control experiment in Washington, D.C. Transp. Res. Rec. 2673
(8), 430–438. http://dx.doi.org/10.1177/0361198119845369.
Soza-Parra, J., Raveau, S., Muñoz, J., 2022. Public transport reliability across preferences, modes, and space. Transportation (49), 621–640. http://dx.doi.org/
10.1007/s11116-021-10188-2.
Stradling, S., Carreno, M., Rye, T., Noble, A., 2007. Passenger perceptions and the ideal urban bus journey experience. Transp. Policy 14, 283–292.
http://dx.doi.org/10.1016/j.tranpol.2007.02.003.
Taylor, S.A., 1997. Assessing regression-based importance weights for quality perceptions and satisfaction judgements in the presence of higher order
and/orinteraction effects. J. Retailing 73 (1), 135–159. http://dx.doi.org/10.1016/S0022-4359(97)90018-X.
Transportation Research Board, 1999. A Handbook for Measuring Customer Satisfaction and Service Quality. Transit Cooperative Research Program (TCRP) Report
47, Transportation Research Board, Washington DC., Transit Cooperative Research Program (TCRP) Report 47.
Tsoleridis, P., Choudhury, C.F., Hess, S., 2022. Deriving transport appraisal values from emerging revealed preference data. Transport. Res. A 165, 225–245.
http://dx.doi.org/10.1016/j.tra.2022.08.016.
Tyrinopoulos, Y., Antoniou, C., 2008. Public transit user satisfaction: Variability and policy implications. Transp. Policy 15, 260–272. http://dx.doi.org/10.1016/
j.tranpol.2008.06.002.

15
E. Ruiz et al. Transportation Research Part A 181 (2024) 103995

van Cranenburgh, S., Wang, S., Vij, A., Pereira, F., Walker, J., 2022. Choice modelling in the age of machine learning - Discussion paper. J. Choice Model. 42,
100340. http://dx.doi.org/10.1016/j.jocm.2021.100340.
Wang, X., Jiao, Y., Huo, J., Li, R., Zhou, C., Pan, H., Chai, C., 2021. Analysis of safety climate and individual factors affecting bus drivers’ crash involvement
using a two-level logit model. Accid. Anal. Prev. 154, 106087. http://dx.doi.org/10.1016/j.aap.2021.106087.
Wardman, M., 1998. The value of travel time: A review of British evidence. J. Transport Econ. Policy 32 (3), 285–316, URL: http://www.jstor.org/stable/20053775.
Weng, J., Yu, J., Di, X., Lin, P., Wang, J.-J., Mao, L.-Z., 2023. How does the state of bus operations influence passengers’ service satisfaction? A method
considering the differences in passenger preferences. Transport. Res. Part A: Policy Practice 174, 103734. http://dx.doi.org/10.1016/j.tra.2023.103734.
Wu, J., Yang, M., Rasouli, S., Xu, C., 2016. Exploring passenger assessments of bus service quality using Bayesian networks. J. Public Transport. 19 (3), 36–54.
http://dx.doi.org/10.5038/2375-0901.19.3.3.

16

You might also like