0% found this document useful (0 votes)
90 views57 pages

Microstructure ML

The document discusses the application of machine learning to enhance the estimation of market liquidity measures using daily data from the US and Chinese stock markets. By integrating classical microstructure models with machine learning techniques, the authors demonstrate significant improvements in estimating liquidity metrics such as bid-ask spread and effective spread. The study highlights the interpretability of machine learning models and their ability to capture complex relationships in the data, making liquidity estimation more accessible and accurate.

Uploaded by

qinjn.09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views57 pages

Microstructure ML

The document discusses the application of machine learning to enhance the estimation of market liquidity measures using daily data from the US and Chinese stock markets. By integrating classical microstructure models with machine learning techniques, the authors demonstrate significant improvements in estimating liquidity metrics such as bid-ask spread and effective spread. The study highlights the interpretability of machine learning models and their ability to capture complex relationships in the data, making liquidity estimation more accessible and accurate.

Uploaded by

qinjn.09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Estimating Market Liquidity from

Daily Data: Marrying Microstructure


Models and Machine Learning
Yuehao Dai∗, Chao Shi†, and Ruixun Zhang‡

September 15, 2024

Abstract

We apply machine learning to estimate daily measures of market liquidity by combining classical mi-
crostructure models with widely available low-frequency (daily) data, in the US and Chinese stock
markets. Boosting trees and neural networks significantly improve the performance across different
liquidity measures, including those that extend beyond the initial targets of microstructure models,
particularly in terms of cross-sectional correlations. Our machine learning models are interpretable
and improvements are due to (a) more information from raw data that microstructure models do
not capture; and (b) better utilization of information from learned nonlinear and non-monotonic
relationships, allowing microstructure models to contribute only when relevant. We make our soft-
ware and learned model publicly available to disseminate improved estimates of liquidity measures
using daily data.

Keywords: Liquidity; Bid-ask spread; Microstructure; Machine learning; Interpretability

JEL Classification: C45, G12, G14, G15


Peking University School of Mathematical Sciences and Laboratory for Mathematical Economics and Quantitative
Finance.

Shanghai University of Finance and Economics School of Information Management and Engineering and Key
Laboratory of Interdisciplinary Research of Computation and Economics (Ministry of Education).

Peking University School of Mathematical Sciences, Laboratory for Mathematical Economics and Quantitative
Finance, Center for Statistical Science, and National Engineering Laboratory for Big Data Analysis and Applications;
MIT Laboratory for Financial Engineering. Please direct all correspondence to [email protected] (email),
+86-13615710256 (phone), or 5 Yiheyuan Road, Peking University Zhihua Building 472, Beijing, China 100871
(postal).
Contents
1 Introduction 1

2 Related Literature 3

3 Methodology 4
3.1 Target Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Input Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Performance of Liquidity Estimation 13


4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Influence of Price and Grouping Stocks . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Feature Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Performance of Microstructure Models . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Performance of Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Model Explanation 22
5.1 Coefficients for Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Feature Importance for GR-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Partial Dependence for GR-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Shapley Additive Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Conclusion 30

Online Appendix 36

A MicroStructure Models 36

B Additional Results for the Performance of Microstructure Models 39

C Additional Results for the Performance of Machine Learning Models 42

D Shapley Values 42
1 Introduction
The role of market liquidity in finance has grown rapidly over the past decade, influencing studies in
microstructure, asset pricing, market efficiency, and corporate finance. There is a growing literature
on the liquidity premium in asset pricing (Amihud, 2002; Asness, Moskowitz, and Pedersen, 2013;
Bali et al., 2014; Bekaert et al., 2014; Barardehi et al., 2022) and the interaction between liquidity
and market structure (Foucault, Kadan, and Kandel, 2005; Henkel, 2008; De Nicolò and Ivaschenko,
2009; Clark, 2011; Hollifield, Neklyudov, and Spatt, 2017; Baron et al., 2019; Li, Wang, and Ye,
2021).
Numerous measures have been proposed to summarize various aspects of market liquidity. For
example, the bid-ask spread measures the cost of a market order for immediate execution, while
the effective bid-ask spread measures the average distance between the transaction price and the
mid-price, both of which are crucial elements of the execution cost of investment strategies (O’Hara,
Saar, and Zhong, 2019; Hagströmer, 2021). The realized spread measures the profit or loss for a
liquidity provider that is assumed to close her position sometime after the trade (Fong, Holden,
and Trzcinka, 2017). Kyle’s lambda reflects the level of adverse selection and private information
in the market (Kyle, 1985; Glosten and Milgrom, 1985; Easley et al., 1996).
However, accurate measurements of many liquidity measures rely on high-frequency intraday
data, such as snapshots of the limit order book or even tick-level trade and quote data, leading
to at least two challenges. First, high-frequency data have only become available in recent years.
They are very difficult to obtain for older data and for markets with less electronicization, if
not impossible. Second, the size of intraday high-frequency data is typically enormous, requiring
significant computational, storage, and financial costs.
To address these challenges, there is a growing literature on models of market microstructure
that estimate certain liquidity measures, such as the average daily bid-ask spread, using widely
available low-frequency data (Roll, 1984; Lesmond, Ogden, and Trzcinka, 1999; Hasbrouck, 2004;
Goyenko, Holden, and Trzcinka, 2009; Corwin and Schultz, 2012; Abdi and Ranaldo, 2017). While
these models provide valuable insights into market microstructure, they share a common underlying
structure, which is to take daily or monthly data, such as the close prices, as model input, and
estimate the average daily or monthly liquidity measures, which allows for easy implementation by
all researchers and investors.
In this paper, motivated by this common structure, we apply data-driven machine learning to
improve the estimation of various market liquidity measures, including the bid-ask spread, effective
spread, realized spread, mid-price impact, and Kyle’s lambda, by combining existing microstructure
models in the literature with raw stock-level features, both of which are easily accessible using widely
available daily data. We evaluate model performance using both cross-sectional and time-series
correlations between model estimates and the ground truth of liquidity measures. Our study covers
a large panel of 992 stocks in the US and 2,081 stocks in the Chinese markets, over approximately
750 trading days from 2019 to 2021. We compare results between the US and Chinese markets, the
two biggest stock markets in the world.

1
We first test microstructure models in the literature, including Roll’s (1984) method, the LOT
method of Lesmond, Ogden, and Trzcinka (1999) and Goyenko, Holden, and Trzcinka (2009),
the Gibbs method of Hasbrouck (2004), the High-Low method of Corwin and Schultz (2012),
the Close-High-Low method of Abdi and Ranaldo (2017), the Amihud measure of Amihud and
Mendelson (1986), and the Amivest measure of Goyenko, Holden, and Trzcinka (2009). We find
that their performances vary across different target variables, markets, and types of correlations.
In general, microstructure models perform better when estimating the (relative) spread in the US
market. Their performances are particularly concerning when estimating other liquidity measures
that extend beyond the initial targets of these microstructure models, sometimes delivering negative
correlations between the estimates and the ground truth. These findings suggest that microstructure
models may be relevant only in certain scenarios, and the dynamic relationship may be better
captured by data-driven machine learning models.
We then apply four statistical and machine learning models to improve the performance of
liquidity estimation, including linear regression, gradient boosting trees, fully-connected neural
networks, and a more advanced neural network with a gate and residual structure (GR-Net). We
combine the output of microstructure models (as features) with raw stock-level features, including
daily open, high, low, and close prices, daily trading volumes, daily market capitalization, daily
close-to-close return, overnight return, and rolling volatility.
Our main findings are three-fold. First, machine learning models that combine only microstruc-
ture features outperform individual microstructure models. Second, combining raw and microstruc-
ture features further improves the performance. Third, more complex models outperform simpler
ones, driven by nonlinear and non-monotonic relationships between the target variable and input
features. The conclusions hold when the target variable is evaluated at both daily and monthly
resolutions.
Finally, to understand what drives the performance improvement of machine learning models
over stand-alone microstructure models, we use partial dependence plots (Friedman, 2001) and
Shapley additive explanations (Shapley, 1997; Lundberg and Lee, 2017) to analyze feature impor-
tance as well as the learned patterns of machine learning models. First, machine learning models
utilize information from raw features that individual microstructure models do not capture, which
enlarges the information set for our problem. Second, machine learning models learn nonlinear and
non-monotonic relationships between features and the target variable, which allows microstructure
models to contribute even if they are relevant only in certain regions. This improves our ability to
better utilize information for the problem.
These results demonstrate the potential of machine learning in estimating market liquidity using
widely available low-frequency data in an interpretable way.

2
2 Related Literature
As mentioned above, our study contributes to the literature on using low-frequency data to estimate
liquidity measures which requires high-frequency intraday data to compute accurately. Classical
microstructure models include Roll’s (1984) method, the LOT method of Lesmond, Ogden, and
Trzcinka (1999) and Goyenko, Holden, and Trzcinka (2009), the Gibbs method of Hasbrouck (2004),
the High-Low method of Corwin and Schultz (2012), and the Close-High-Low method of Abdi and
Ranaldo (2017). Ardia, Guidotti, and Kroencke (2024) propose an estimator of the effective spread
from open, high, low, and close prices. Goyenko, Holden, and Trzcinka (2009), Schestag, Schuster,
and Uhrig-Homburg (2016), Fong, Holden, and Trzcinka (2017), and Jahan-Parvar and Zikes (2023)
analyze the effectiveness of these methods extensively in the equity, fixed income, and foreign
exchange markets. To the best of our knowledge, we are the first to apply machine learning to this
problem, and we demonstrate that machine learning models can significantly improve performance
compared to microstructure models.
There is an emerging literature on the interface between machine learning and microstructure.
Using data from the futures market, Easley et al. (2021) apply random forests to predict the sign of
change in bid-ask spread computed using HL estimator of Corwin and Schultz (2012), among other
quantities, using a range of microstructure measures. Instead of predicting the change in spread,
our study differs from Easley et al. (2021) fundamentally because we focus on a different problem
to estimate the spread using only low-frequency data. The predictive accuracy is around 0.5 in
Easley et al. (2021), whereas the correlation in our study can be as high as 0.92 when estimating
spread. Other differences between our study and Easley et al. (2021) include that we adopt neural
networks compared to tree-based models, and that we adopt partial dependence plots and Shapley
additive explanations to explain our black-box models.
Other studies apply machine learning and deep learning to extract universal patterns in limit
order books (Sirignano, 2019; Sirignano and Cont, 2019), model behaviors of automated market
makers and algorithmic traders (Colliard, Foucault, and Lovo, 2022; Dou, Goldstein, and Ji, 2023),
and predict asset returns (Huck, 2019; Aı̈t-Sahalia et al., 2022; Cont, Cucuringu, and Zhang, 2023;
Kolm, Turiel, and Westray, 2023).
More generally, there is a rapidly growing literature on using machine learning for asset pric-
ing. Examples include Chinco, Clark-Joseph, and Ye (2019), Gu, Kelly, and Xiu (2020), Bianchi,
Büchner, and Tamoni (2021), Avramov, Cheng, and Metzker (2023), Bali et al. (2023), Brogaard
and Zareei (2023), Chen, Pelger, and Zhu (2023), Huddleston, Liu, and Stentoft (2023), and Jiang,
Kelly, and Xiu (2023). Leippold, Wang, and Zhou (2022) also focus on the Chinese market. Another
strand of the literature focuses on forecasting loan defaults using machine learning. Examples in-
clude Sadhwani, Giesecke, and Sirignano (2021), Barbaglia, Manzan, and Tosetti (2023), and Davis
et al. (2023). Fuster et al. (2022) study the fairness of applying machine learning to credit markets.
Despite this growing literature, we are the first to systematically study the performance of ma-
chine learning models for liquidity estimation using only low-frequency data. Our results demon-
strate the potential for significant performance improvements, making it practically relevant and

3
easily accessible to researchers and investors.

3 Methodology
Consider N stocks indexed by n = 1, 2, · · · , N on D trading days indexed by d = 1, 2, · · · , D. In
the spirit of the recent literature on machine learning applications in asset pricing (Gu, Kelly, and
Xiu, 2020), we formulate the problem of liquidity estimation for the n-th stock on day d as an
additive error model:
sn,d = E[sn,d ] + εn,d = g ∗ (x∗n,d ) + εn,d , (1)

where sn,d is the liquidity measure, such as the average spread, of the n-th stock on day d, which
is a random variable whose expectation is given by a function g ∗ (·) of a set of features, x∗n,d . Here
εn,d is the residual error. We require that x∗n,d only contains information up to day d so that the
liquidity does not depend on future information. For example, xn,d should not contain the close
price on day d + 1. Here, g ∗ is the unknown ground truth that determines the average liquidity
as a function of all features x∗n,d , which may be a highly complex, nonlinear, and high-dimensional
function.
The objective of our paper is to estimate the expectation of the liquidity measure, E[sn,d ], or
the unknown function, g ∗ (·), by a model g(·) that only uses features that rely on low-frequency
data, so that the estimation is efficient and widely accessible to anyone with only low-frequency
data. In other words, the features we use, xn,d ⊆ x∗n,d , form a subset of all features relevant for
liquidity. The latter may contain information only available at high frequency that is not very
easily accessible:
g ∗ (x∗n,d ) ←− g(xn,d ). (2)

We make a few remarks about the function g. First, the functional form of g reflects our specific
model, including linear models and nonlinear models such as boosting trees or neural networks.
Second, g does not depend on n or d because we want to learn a universal model that reflects the
common relationship for all stocks on all trading days. Stock-specific characteristics are reflected
in the features xn,d . Third, when estimating the liquidity E[sn,d ] for a stock, g only uses the input
xn,d that relies on that stock’s own information and does not use information from other stocks,
making the model more contained and computationally realistic in practice.
In the following, Section 3.1 defines the target variables. Section 3.2 describes input features
to our models, which contain both raw stock-level features and microstructure models. Section 3.3
describes the machine learning models we employ. Section 3.4 defines the performance evaluation
criteria.

3.1 Target Variables


In this study, we analyze ten liquidity measures as the target variable in the model (1), including the
daily average of (relative) quoted spreads, (relative) effective spreads, (relative) realized spreads,

4
(relative) mid-price impact, alongside with Kyle’s lambda on both the mid-price and trade price.
These measures are based on tick-level intraday data, which is not readily accessible to researchers
and investors without access to such high-frequency data.
To describe the computation of these liquidity measures, we establish a set of notational conven-
tions. First, we use the subscript k = 1, 2, . . . , K to denote the sequence of trade event timestamps
within a trading day, which index each instance that a trade is executed. Though K reflects the
total number of trade events for a given stock on a specific trading day, which is subject to vari-
ation across different stocks and days, we consistently use the symbol K throughout our analysis
for notational simplicity. Then, for the n-th stock on the d-th trading day, we define Askn,d,k and
Bidn,d,k as the best ask and bid prices at the k-th timestamp, respectively. The mid-price is then
calculated as Midn,d,k = (Askn,d,k + Bidn,d,k )/2. We further denote Traden,d,k and Volumen,d,k as
the trade price and volume associated with the k-th trade event, respectively. The trade direction
is captured by signn,d,k , which assigns a value of +1 for trades triggered by buy orders and −1 for
those triggered by sell orders. Consequently, we define the liquidity measures of the n-th stock on
the d-th trading day as follows.

Average Quoted Spread. The average quoted spread is defined as

K
!−1 K
X X
AQSn,d := Volumen,d,k (Askn,d,k − Bidn,d,k ) · Volumen,d,k . (3)
k=1 k=1

Average Relative Quoted Spread. The average relative quoted spread is defined as

K
!−1 K
X X Askn,d,k − Bidn,d,k
ARQSn,d := Volumen,d,k · Volumen,d,k . (4)
Midn,d,k
k=1 k=1

Effective Spread. The effective spread is defined as

K
!−1 K
X X
ESn,d := Volumen,d,k 2 · |Traden,d,k − Midn,d,k | · Volumen,d,k . (5)
k=1 k=1

Relative Effective Spread. The relative effective spread is defined as

K
!−1 K
X X 2 · |Traden,d,k − Midn,d,k |
RESn,d := Volumen,d,k · Volumen,d,k . (6)
Midn,d,k
k=1 k=1

Realized Spread. The realized spread is defined as

K
1 X (5)
RSn,d := signk · (Traden,d,k − Traden,d,k ), (7)
K
k=1

5
(5)
where Traden,d,k is the trade price five minutes after the k-th trade.

Relative Realized Spread. The relative realized spread is defined as

K (5)
1 X Traden,d,k − Traden,d,k
RRSn,d := signn,d,k · . (8)
K Midn,d,k
k=1

Mid-price Impact. The mid-price impact is defined as

K
1 X (5)
M P In,d := signn,d,k · (Midn,d,k − Midn,d,k ), (9)
K
k=1

(5)
where Midn,d,k is the mid-price five minutes after the k-th trade.

Relative Mid-price Impact. The relative mid-price impact is defined as

K (5)
1 X Midn,d,k − Midn,d,k
RM P In,d := signk · . (10)
K Midn,d,k
k=1

Kyle’s lambda on Trade Price. The Kyle’s lambda on trade price is defined as the slope
coefficient λ of the regression model

(5)
Traden,d,k − Traden,d,k = λ · Voln,d,k + εn,d,k , (11)

where Voln,d,k is the signed square root of trading volume during the five minutes after the k-th
trade. That is,
X p
Voln,d,k = signn,d,k′ Volumen,d,k′ ,
k′

where k ′ is the timestamp within the five minutes after the timestamp k.

Kyle’s lambda on Mid-Price. The Kyle’s lambda on mid-price is the slope coefficient λ of the
regression model
(5)
Midn,d,k − Midn,d,k = λ · Voln,d,k + εn,d,k . (12)

The target variables defined in (3)–(12) are computed at a daily resolution.

3.2 Input Features


We use two categories of input features to our model, including nine raw features widely available
for each stock on each day, and the output of eight classical microstructure models that provide
estimates of liquidity based on daily data. Table 1 summarizes the 17 features we use in our
empirical study, and we provide details of their definitions below.

6
Table 1: Definitions of daily features.

Feature Definition
Roll the Roll method of Roll (1984) calculated in (13)
LOT X-split LOT X-split method of Lesmond, Ogden, and Trzcinka (1999) calculated in (30) and (31)
LOT Y-split LOT Y-split method of Goyenko, Holden, and Trzcinka (2009) calculated in (30) and (32)
Gibbs the Gibbs method of Hasbrouck (2004)
HL the High-Low method of Corwin and Schultz (2012) calculated in (37)
CHL the Close-High-Low method of Abdi and Ranaldo (2017) calculated in (14)
Amihud Amihud (2002) method calculated in (15)
Amivest Proposed by Goyenko, Holden, and Trzcinka (2009), calculated in (16)
log open Daily open price
log close Daily close price
log high Daily high price
log low Daily low price
log volume Daily trading volume
log capitalization Capitalization calculated with the daily close price
daily return Daily return calculated with consecutive close price
overnight return Overnight return calculated open price and the last day’s close price
volatility Volatility calculated with consecutive 20 days’ daily return

3.2.1 Raw Features

We use nine raw features as input for machine learning models, including the open price, close
price, high price, low price, volume, market capitalization, daily return, overnight return, and 20
days’ volatility for each stock on each day. These nine features are easily accessible in most stock
markets around the world.

3.2.2 Microstructure Models

Microstructure models derive estimators for some specific market liquidity measures by assum-
ing specific dynamics of stock prices or returns. We summarize several classical microstructure
models below, and the mathematical definitions of these microstructure models are documented in
Appendix A.
A common price formation mechanism in microstructure models assumes that the observed
stock price equals the true price plus or minus a half spread, depending on the trade direction. By
modeling the dynamics of the true price, one can derive various estimators for the spread. Roll
(1984) assumes that the directions of two consecutive trades are independent, and hence, the true
price is a martingale, leading to a negative autocovariance of consecutive observed price changes.
Consequently, Roll (1984) estimates the spread using:
p
Roll = 2 −Cov[∆Ct , ∆Ct−1 ], (13)

where ∆Ct = Ct − Ct−1 is the price change in daily close prices and Cov[∆Ct , ∆Ct−1 ] is the auto-
covariance of the ∆Ct . Hasbrouck (2004) follows Roll (1984)’s model and implements a Bayesian
method to estimate the spread. Corwin and Schultz (2012) estimate the relative spread using daily

7
high and low prices, based on the intuition that the high price results from a buy order and the
low price from a sell order, such that the observed price range equals the true price range plus the
spread. Corwin and Schultz (2012) thereby assume that the true stock price follows a Geometric
Brownian motion and estimate the relative spread by solving a set of equations. Abdi and Ranaldo
(2017) further incorporate high, low, and close prices to propose an estimator for the spread as:
p
CHL = E[(2Ct − Ht − Lt )(2Ct − Ht+1 − Lt+1 )]. (14)

Instead of modeling the stock price, Lesmond, Ogden, and Trzcinka (1999) model the stock
return. They assume that investors buy when the expected return exceeds the spread and sell
when the expected return is below the negative spread, implying that the spread information is
embedded in the difference between observed and true returns. Consequently, Lesmond, Ogden,
and Trzcinka (1999) propose an estimator for the relative spread based on this assumption, known
as the LOT-X estimator. Goyenko, Holden, and Trzcinka (2009) offers a variant of this model, the
LOT-Y estimator.
Beyond spread and relative spread estimation, Amihud (2002) introduce a proxy for market
liquidity:  
|Rt |
Amihud = E , (15)
Volumet
where Rt is the return of day t, and Volumet is the trading volume of day t. Goyenko, Holden, and
Trzcinka (2009) also propose the Amivest measure for market impact:
 
Volumet
Amivest = E . (16)
|Rt |

3.3 Machine Learning Models


We consider four models to estimate the three versions of spread in Section 3.1, including linear
regression, boosting tree, a fully connected neural network, and a more complex neural network
with gate and residual structures, which we refer to as GR-Net. Linear regression is one of the
simplest statistical learning methods, which we include as a benchmark for all models. The other
three machine learning models can capture nonlinearities and interactions in the input features. We
use well-known software packages to implement our models, including PyTorch for linear models
and neural networks, and LightGBM for boosting trees.

3.3.1 Linear Regression

A linear regression implies that g(·) in (2) has the form

g(xd,n ; β) = β ⊤ xd,n ,

8
where β is the regression parameters, and we add a unit variable to the feature set, xd,n , so that β
contains the intercept term.

3.3.2 Boosting Tree

Boosting trees are a category of additive ensemble models that use regression or decision trees
as weak learners. They have demonstrated remarkable effectiveness in various learning tasks. A
boosting tree can be expressed as

M
X
g(xd,n ) = fm (xd,n ),
m=1

where M is the number of trees, and each fm is a simple tree. In our implementation, we set
M = 100, and each fm is a binary regression tree with a maximum depth of 5.
Compared to linear regression, trees exhibit a high degree of nonlinearity. They partition the
sample space into several segments and approximate the unknown function with a single function,
such as a constant or a linear function, within each segment.

3.3.3 Fully Connected Neural Network

Fully connected neural networks are the simplest class of neural network-based models, and they
can approximate very general functions (Cybenko, 1989; Hornik, 1991). Mathematically, a fully
connected neural network can be written recursively as
 
(k) (k−1) (0)
xd,n = fk xd,n , k = 1, 2, · · · , K, xd,n = xd,n , (17)

where K is the total number of hidden layers, and function fk has the form
   
(k−1) (k−1)
fk xd,n = act Wk xd,n + bk , (18)

where act(·) is a nonlinear activation function, examples of which include ReLU, sigmoid, tanh,
etc. Wk and bk are network parameters. Finally, the last layer of the neural network gives the
prediction using a linear function:
 
(k) (k)
fK+1 xd,n = W xd,n , (19)

where W is also a model parameter. Therefore, the final prediction g(xd,n ) in (2) is given by (17)–
(19). In our implementation, we set K = 2 and use 4 neurons and the ReLU activation function in
each hidden layer.

9
3.3.4 GR-Net

Over the last decades, several new model architectures have been proven very effective in improving
the performance of neural networks, such as the gate structure (Hochreiter and Schmidhuber, 1997)
and the residual structure (He et al., 2016). We combine both to develop the gate-residual network,
or GR-Net for short.
Mathematically, a gate structure can be written as

gate(x) = x · Sigmoid(W x + b),

where “·” represents the dot product. A gate structure multiplies a scalar to each input and
therefore can be viewed as a feature selector. The more important the feature is, the closer the scalar
is to 1. In comparison to traditional methods of feature selection such as the LASSO (Tibshirani,
1996) that shrinks the regression coefficient of a feature only based on the feature itself, the gate
structure is more flexible because it determines the coefficient of each feature based on all input
features, thereby capturing interactions between different features to assess the importance of a
feature.
A residual structure can be written as

res(x) = x + act(W x + b).

The residual structure allows the model to learn an identical mapping, which allows the gradients
to be effectively propagated to shallow layers from deep layers where minimal nonlinear relationship
exists. We therefore employ a residual structure in the last hidden layer which allows the model to
learn an identical mapping if nonlinear relationships are well learned in previous hidden layers.
The GR-Net is, therefore, written recursively as
 
(k) (k−1) (0)
xd,n = hk xd,n , k = 1, 2, xd,n = xd,n , (20)

where      
(k−1) (k−1)
hk xd,n = act Wk gatek xd,n + bk . (21)

Here, subscript k indicates the number of layers. Hence gate1 and gate2 have different parameters.
(2)
The output, xd,n , is then passed through a residual structure:
 
(3) (2)
xd,n = res3 xd,n , (22)

which is combined with a weight matrix W to generate the final prediction:

(3)
f4 (xn,d ) = W xd,n . (23)

Therefore, the final prediction g(xd,n ) in (2) is given by (20)–(23). In our implementation, the

10
widths of the three hidden layers of the GR-Net are 16, 8, and 4, respectively.

3.4 Performance Evaluation


We first describe the metrics we use to evaluate the performance of any estimator of liquidity
measures, including both the microstructure models in Section 3.2.2 and machine learning models
in Section 3.3. We then discuss how we split samples in model training.

3.4.1 Evaluation Metrics

Cross-sectional Correlation. The cross-sectional correlation between the average daily spread
sn,d and its estimator ŝn,d on a particular trading day d is:
PN
− s:,d ) · (ŝn,d − ŝ:,d )
n=1 (sn,d
cscorrd := qP qP ,
N 2 N 2
(s
n=1 n,d − s :,d ) (s
n=1 n,d ˆ − ŝ :,d )

where
N N
1 X 1 X
s:,d = sn,d , ŝ:,d = ŝn,d
N N
n=1 n=1

are the cross-sectional average spread and average estimated spread on day d, respectively. This is
the Pearson correlation coefficient between the ground truth and its estimator across all stocks on
that day. These correlations are then averaged over all trading days as a performance evaluation
criterion, which can also be used as a negative loss for machine learning models. We are inter-
ested in this evaluation metric because investors care about the relative magnitude of liquidity in
applications involving cross-sectional comparisons.

Time-series Correlation. The time-series correlation between the average daily spread sn,d and
its estimator ŝn,d for a particular stock n is:
PD
− sn,: ) · (ŝn,d − ŝn,: )
d=1 (sn,d
tscorrn := qP qP ,
D 2 D 2
(s
d=1 n,d − sn,: ) (ŝ
d=1 n,d − ŝn,: )

where
D D
1 X 1 X
sn,: = sn,d , ŝn,: = ŝn,d
D D
d=1 d=1

are the time-series average spread and average estimated spread for stock n, respectively. This is
the Pearson correlation coefficient between the ground truth and its estimator over all trading days
for one stock. The time-series correlations are then averaged across all stocks as a performance
evaluation criterion. This metric evaluates whether an estimator can distinguish different levels of
liquidity for the same stock.

11
Mean Square Error. The mean square error (MSE) is a classical metric in machine learning
for target variables with continuous values:

N D
1 XX
M SE := (sn,d − ŝn,d )2 .
DN
n=1 d=1

which we use as a loss function to train our machine learning models.


Finally, we make a remark on the resolution at which we evaluate our models. As is mentioned
in Section 3.1, the target variables in (3)–(12) are computed at a daily resolution, which we use to
train and evaluate models. In addition, we also follow the literature to evaluate models based on
the average liquidity over the last month (20 trading days). In particular, we compute the liquidity
measure sn,d over trading days d − 19 to d for stock n to derive the average monthly liquidity
measure:
19
1 X
s̄n,d = sn,d−i , (24)
20
i=0

and the daily estimate ŝn,d over the same period to derive the average monthly estimate:

19
1 X
s̄ˆn,d = ŝn,d−i . (25)
20
i=0

In our empirical analysis in Section 4, we provide model performance metrics at monthly (s̄ˆn,d as
an estimate for s̄n,d ) resolution, and the results at the daily (ŝn,d as an estimate for sn,d ) resolution
are documented in Appendix C.

3.4.2 Strategy to Split Samples

Machine learning models are strong at fitting nonlinear functions. In the meantime, they are also
prone to overfitting if not properly trained and validated. We use different strategies to split
our sample into training, validation, and test sets when training models for cross-sectional and
time-series correlations. We always report metrics in the test set.
For cross-sectional correlations, we divide our sample into three disjoint periods that maintain
the temporal ordering of the data. In our dataset, each stock has approximately 750 trading days
of samples. We set the first 150 trading days as the training set, the next 50 trading days as the
validation set, and the last around 550 days as the test set.
For time-series correlations, we randomly divide all stocks into three disjoint groups to form
the training, validation, and test sets. In particular, the US market has 340 stocks in the training
set, 160 stocks in the validation set, and 492 stocks in the test set. The Chinese market has 700
stocks in the training set, 700 stocks in the validation set, and 681 stocks in the test set. We fix
the random seed so the split remains the same for different models.

12
4 Performance of Liquidity Estimation
In this section, we present empirical results of liquidity estimation based on low-frequency daily
data. Section 4.1 describes our data. Section 4.2 describes how we group stocks based on their
prices in the Chinese market. Section 4.3 reports correlations between input features. Section 4.4
reports empirical results for microstructure models, and Section 4.5 reports empirical results for
machine learning models.

4.1 Data
For our empirical study, we utilize data from both the US and Chinese markets.
In the US market, we obtain the trade and quote data from the NYSE Trade and Quote
(TAQ) database and the daily stock-level data from The Center for Research in Security Prices
(CRSP) database. Our dataset includes a representative subset of 992 stocks from the US market.
Specifically, we select stocks that were constituents of the S&P 500 index as of January 1, 2021,
excluding those not available in the TAQ database, resulting in 492 stocks. Additionally, we
randomly select 500 stocks with average market capitalization between 20 million and 400 million
dollars during 2021. Note that the market capitalization of stocks in the S&P 500 typically exceeds
several billion dollars. Therefore, these 500 stocks have much smaller capitalization than those in
the S&P. This approach ensures our sample covers a broad range of market capitalizations and
liquidity profiles.
In the Chinese market, we obtain the raw trade and quote data from the Shenzhen Stock
Exchange (SZSE) Historical Tick Data, which contains all stocks traded in the Shenzhen Stock
Exchange. We also exclude stocks with more than 60 abnormal trading days, including being
suspended or hitting the price limit, which leaves us with 2081 stocks in total from 2019 to 2021.
Table 2 shows summary statistics of the close price and different market liquidity measures in
our sample. The daily close prices range from under one dollar to over 500 dollars in the US market
(Panel A), and from about one Chinese yuan to around 100 Chinese yuan in the Chinese market
(Panel B). The average close price is over 80 dollars in the US market and approximately 15 yuan
in the Chinese market. In addition to the higher average close price, the US market has a higher
average spread and relative spread than the Chinese market.

4.2 Influence of Price and Grouping Stocks


We find distinct effects of stock price levels on relative spreads in the US and Chinese markets.
Figure 1 illustrates the average relative spread as a function of the average close price in both
markets. In the Chinese market, low stock prices result in the spread almost always being one tick
size, making the relative spread nearly an inverse function of the close price. This simplifies the
problem of estimating the spread and relative spread at low price levels. However, as stock prices
rise, the relative spread becomes more heterogeneous across different stocks, making the estimation
problem more complex.

13
Table 2: Summary statistics of close prices and the ten liquidity measures of our data. Note
that Kyle’s lambda is in basis point per squared traded volume. For example, Kyle’s lambda
of
√ 0.24 means that−4 a 10.000 dollar buy order would move the log mid-price by approximately
10000 × 0.24 × 10 = 0.0024, or 24 basis points.

Panel A: US

mean 1% q 25% q 50% q 75% q 99% q


close price (dollar) 86.91 0.44 12.80 37.88 96.90 654.57
average spread (dollar) 0.55 0.01 0.10 0.26 0.60 4.88
average relative spread (%) 1.40 0.07 0.33 0.64 1.76 8.16
average effective spread (dollar) 0.47 0.01 0.06 0.16 0.41 5.79
average relative effective spread (%) 1.18 0.05 0.22 0.45 1.32 8.54
average realized spread (cent) 1.18 -8.47 -0.13 0.30 1.23 20.71
average relative realized spread (%) 0.05 -0.25 -0.00 0.01 0.04 0.85
average mid-price impact (dollar) 0.16 -0.05 0.02 0.06 0.15 1.78
average relative mid-price impact (%) 0.58 -0.18 0.08 0.17 0.47 6.92
Kyle’s lambda mid-price 2.54 -1.01 0.11 0.24 1.34 38.16
Kyle’s lambda trade price 2.23 -0.52 0.06 0.16 1.25 38.42

Panel B: Chinese

mean 1% q 25% q 50% q 75% q 99% q


close price (yuan) 14.61 1.77 5.35 8.80 15.88 99.99
average spread (cent) 1.91 1.00 1.07 1.22 1.77 11.40
average relative spread (%) 0.18 0.05 0.11 0.16 0.22 0.60
average effective spread (cent) 2.27 1.00 1.19 1.52 2.30 12.94
average relative effective spread (%) 0.22 0.06 0.13 0.19 0.26 0.70
average realized spread (cent) 0.17 -1.42 -0.18 -0.03 0.25 3.99
average relative realized spread (%) 0.05 -14.29 -2.67 -0.35 2.00 19.15
average mid-price impact (cent) 1.18 0.07 0.43 0.69 1.27 7.97
average relative mid-price impact (%) 0.09 0.01 0.06 0.08 0.11 0.30
kyle lambda mid-price 0.06 0.01 0.03 0.05 0.07 0.18
kyle lambda trade price 0.04 -0.02 0.01 0.02 0.03 0.08

14
In contrast, the US market shows a different pattern. One tick size divided by the close price
does not provide much information for the relative spread. Moreover, the relative spread in the US
market initially decreases and then slightly increases as stock prices rise, indicating a more intricate
relationship compared to the Chinese market.

(a) US (b) Chinese

Figure 1: Scatter plot for the relationship between close price and relative spread on 31 December
2020. The relationship is similar on other days.

Therefore, we divide the stocks in the Chinese market into two groups based on the average
stock price. The high-price group consists of stocks with a close price above five yuan on more than
80% of all trading days from 2019 to 2021, while the remaining stocks form the low-price group,
and we mainly focus on the high-price group because the estimation problem for the other group
is trivial. For the US market, we do not divide stocks into different groups. This grouping ensures
that our empirical study focuses on a non-trivial problem in both markets.

4.3 Feature Correlation


We report correlations between the features listed in Table 1 computed at the stock-day resolution.
Figure 2 shows the correlation matrix of microstructure features and six raw daily features. We
only include the close price out of all four prices because they are highly correlated to each other
and including one of them is representative enough. The correlations between different features are
not too high, and some are even negative. To avoid co-linearity, we only use the log close price but
not the open, high, and low prices in linear regression.
In addition, Table 3 reports the variation inflation factors (VIF) for all fourteen features. Consis-
tent with observations from the correlation matrix, though some features can be partially explained
by other features, the collinearity is not very high.

15
(a) US; Spread as target (b) US; Relative spread as target

(c) Chinese; Spread as target (d) Chinese; Relative spread as target

Figure 2: Correlation matrix between all microstructure models and three raw daily features. (a)
and (c) shows the version when Roll’s method, HL, and CHL are calculated using spread as the
target. (b) and (d) shows the version when Roll’s method, HL, and CHL are calculated using the
relative spread as the target.

16
Table 3: Variation inflation factors for all microstructure features and six raw features.

Panel A: US

feature VIF for spread VIF for relative spread


Roll 4.17 4.25
HL 2.96 6.62
LOT X-split 1.21 1.18
LOT Y-split 4.89 6.00
Gibbs 5.68 3.90
CHL 4.13 7.01
Amihud 1.07 1.09
Amivest 1.02 1.00
close 1.82 2.82
volume 1.29 2.35
capitalization 1.47 2.34
volatility 1.19 4.15
daily return 1.52 1.50
overnight return 1.48 1.49

Panel B: Chinese

feature VIF for spread VIF for relative spread


Roll 2.30 2.55
HL 5.48 5.09
LOT X-split 1.08 1.09
LOT Y-split 6.88 9.24
Gibbs 9.03 4.78
CHL 9.43 11.74
Amihud 1.07 1.00
Amivest 1.02 1.01
close 8.51 9.15
volume 1.84 1.86
capitalization 1.72 1.95
volatility 8.09 11.05
daily return 1.20 1.20
overnight return 1.20 1.20

17
4.4 Performance of Microstructure Models
As discussed in Section 3.4.1, when reporting the performance of microstructure models, we follow
the literature to use the monthly liquidity measures defined in (24). Tables 4 and 5 report results
of four representative liquidity measures for the US and the Chinese markets, respectively. In each
table, we include performance metrics when estimating the spread (Panel A), relative spread (Panel
B), relative effective spread (Panel C), and Kyle’s lambda on trade price (Panel D). We also report
the bootstrap standard errors and statistical significance. The first row in each panel represents the
average cross-sectional correlation, and the second represents the average time-series correlation.

Table 4: Performance of microstructure models in the US market. Bootstrap standard errors are
given in parenthesis, and we report statistical significance at the 1% (***), 5% (**), and 10% (*)
levels.

Panel A: spread
Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
23%∗∗∗ 24%∗∗∗ −3% −2%∗∗∗ 0% 23%∗∗∗ 0% 0%
cs corr
(1%) (1%) (1%) (1%) (< 1%) (1%) (< 1%) (< 1%)
19%∗∗∗ 34%∗∗∗ 0%∗∗∗ 1%∗∗∗ −9%∗∗∗ 34%∗∗∗ −2%∗∗ 0%
ts corr
(1%) (1%) (1%) (1%) (< 1%) (1%) (< 1%) (< 1%)

Panel B: relative spread


Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
12%∗∗∗ 44%∗∗∗ 9%∗∗∗ 8%∗∗∗ 0% 40%∗∗∗ −5%∗∗∗ 0%
cs corr
(1%) (1%) (< 1%) (1%) (< 1%) (1%) (1%) (< 1%)
26%∗∗∗ 39%∗∗∗ 1%∗∗∗ 11%∗∗∗ −1% 38%∗∗∗ −5%∗∗∗ −1%
ts corr
(5%) (1%) (1%) (1%) (< 1%) (1%) (1%) (1%)

Panel C: relative effective spread


Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
2%∗∗∗ 6%∗∗∗ 3%∗∗∗ 3%∗∗∗ 6%∗∗∗ −1% 5%∗∗∗ −2%∗∗
cs corr
(2%) (1%) (1%) (1%) (1%) (2%) (1%) (1%)
10%∗∗∗ 15%∗∗∗ 2%∗∗∗ 9%∗∗∗ 6%∗∗∗ 16%∗∗∗ −4%∗∗∗ −1%
ts corr
(1%) (1%) (1%) (1%) (1%) (1%) (1%) (1%)

Panel D: Kyle’s lambda on trade price


Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
8%∗∗∗ 20%∗∗∗ 2%∗∗∗ 3%∗∗∗ 0% 20%∗∗∗ −18%∗∗∗ 0%
cs corr
(2%) (1%) (1%) (1%) (< 1%) (2%) (1%) (1%)
21%∗∗∗ 32%∗∗∗ −3%∗∗∗ 8%∗∗∗ 3%∗∗∗ 30%∗∗∗ −2%∗∗∗ −2%∗∗∗
ts corr
(1%) (1%) (1%) (1%) (1%) (1%) (1%) (1%)

First, we find that microstructure models perform differently in the US and the Chinese market.

18
Table 5: Performance of microstructure models in the Chinese market. Bootstrap standard errors
are given in parenthesis, and we report statistical significance at the 1% (***), 5% (**), and 10%
(*) levels.

Panel A: spread
Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
47%∗∗∗ 69%∗∗∗ 2%∗∗∗ 3%∗∗∗ 0% 74%∗∗∗ −1% −1%
cs corr
(1%) (1%) (1%) (1%) (< 1%) (1%) (< 1%) (< 1%)
12%∗∗∗ 42%∗∗∗ 10% 22%∗∗∗ 3%∗∗∗ 43%∗∗∗ 10%∗∗∗ 1%
ts corr
(1%) (1%) (1%) (1%) (< 1%) (1%) (1%) (1%)

Panel B: relative spread


Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
−9%∗∗∗ −14%∗∗∗ 0% −2%∗∗∗ 0% −18%∗∗∗ −6%∗∗∗ −4%∗∗∗
cs corr
(5%) (1%) (1%) (1%) (< 1%) (1%) (1%) (1%)
−7%∗∗∗ −9%∗∗∗ 3%∗∗∗ 3%∗∗∗ 5%∗∗∗ % −9%∗∗∗ 3%∗∗∗ −4%∗∗∗
ts corr
(1%) (1%) (< 1%) (1%) (< 1%) (1%) (1%) (1%)

Panel C: relative effective spread


Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
−5%∗∗∗ −1% 4%∗∗∗ 4%∗∗∗ 0% −3%∗∗∗ 0% 2%∗∗∗
cs corr
(1%) (1%) (1%) (1%) (1%) (1%) (1%) (1%)
−1% 13%∗∗∗ 8%∗∗∗ 13%∗∗∗ 6%∗∗∗ 15%∗∗∗ 17%∗∗∗ 0%
ts corr
(2%) (1%) (1%) (1%) (1%) (2%) (1%) (1%)

Panel D: Kyle’s lambda on trade price


Roll HL LOT X-split LOT Y-split Gibbs CHL Amihud Amivest
−7%∗∗∗ −18%∗∗∗ 4%∗∗∗ 0% 0% −16%∗∗∗ 7%∗∗∗ 1%
cs corr
(2%) (1%) (1%) (1%) (1%) (2%) (1%) (1%)
−8%∗∗∗ −15%∗∗∗ 3%∗∗∗ 3%∗∗∗ 2%∗∗∗ −15%∗∗∗ 10%∗∗∗ −2%∗∗∗
ts corr
(2%) (1%) (1%) (1%) (1%) (2%) (1%) (1%)

19
Roll, HL, and CHL provide positive cross-sectional and time-series correlations for relative spread
and Kyle’s lambda on trade price in the US market, whereas they show negative correlations in
the Chinese market. This result highlights important differences in market microstructure between
the two markets. For example, as shown in Figure 1, stock prices influence the relative spread
differently in the two markets. Additionally, differences in other microstructure factors may also
contribute to the poorer performance in the Chinese market.
Second, microstructure models vary in their performance when estimating different liquidity
measures across both markets and different metrics. For example, HL achieves a cross-sectional
correlation above 40% when estimating the relative spread in the US market, whereas correlations
for the relative effective spread are below 10% in both markets. Roll shows a cross-sectional
correlation of 47% when estimating the spread in the Chinese market but with a lower time-series
correlation of 12%. Conversely, the LOT Y-split model shows a cross-sectional correlation of 3%
for estimating the spread in the Chinese market but a much higher time-series correlation of 22%.
In summary, these observations suggest that microstructure models may perform well only
for specific target variables, specific markets, and specific types of correlations. In other words,
constructing a universal microstructure model that is useful across all scenarios is very challenging,
if not impossible. Different microstructure models may be more suitable for different markets and
targets to reflect different market structures, a flexibility that data-driven machine learning models
enjoy.

4.5 Performance of Machine Learning Models


This section reports the performance of machine learning models. We train three parallel sets
of models outlined in Section 3.3 to estimate the ten liquidity measures defined in Section 3.1.
For each liquidity measure, we train models using three different sets of input features, including
microstructure features only, raw features only, and features that combine the two. We report both
cross-sectional and time-series correlations when liquidity is estimated at both daily and monthly
resolutions. In total, we build 4 (types of models) × 10 (measures of liquidity) × 3 (sets of input
features) × 2 (versions of correlation) = 240 models.
To compare with the performance of microstructure models in Section 4.4, we first focus on
the performance of machine learning models when liquidity is estimated at a monthly resolution,
summarized in Table 6. We report the results of all 240 models and the results at a daily resolution
in Appendix C. Panel A shows results for the US market, and Panel B for the Chinese market. The
eight columns correspond to four of the ten liquidity measures: the spread, relative spread, relative
effective spread, and Kyle’s lambda on trade price, evaluated using cross-sectional and time-series
correlations. We discuss the main implications of these results below.
First of all, we observe that linearly combining different microstructure features performs rea-
sonably well, outperforming individual microstructure models in the US (Table 4) and Chinese
market (Table 5) in terms of both cross-sectional and time-series correlations. For example, the
linear model for spread in the US market achieves a cross-sectional correlation of 40%, while the

20
Table 6: Performance of machine learning models when liquidity is estimated at a monthly resolu-
tion.

Panel A: US

spread relative spread relative effective spread Kyle’s lambda on trade price
Feature
cs corr ts corr cs corr ts corr cs corr ts corr cs corr ts corr
Liner regression
microstructure 40% 39% 46% 42% 12% 19% 24% 41%
raw 73% 24% 33% 4% 8% 10% 26% 11%
both 75% 40% 42% 44% 14% 21% 30% 45%
LightGBM
microstructure 67% 40% 55% 47% 19% 22% 32% 49%
raw 84% 30% 72% 30% 13% 12% 38% 13%
both 85% 44% 73% 51% 21% 28% 42% 53%
Fully-connected neural network
microstructure 64% 42% 50% 49% 18% 21% 31% 49%
raw 89% 31% 72% 37% 19% 18% 42% 21%
both 92% 46% 70% 53% 19% 26% 44% 52%
GR-Net
microstructure 83% 44% 58% 50% 20% 24% 33% 52%
raw 89% 36% 74% 45% 20% 17% 43% 24%
both 92% 47% 73% 57% 23% 29% 45% 55%

Panel B: Chinese

spread relative spread relative effective spread Kyle’s lambda on trade price
Feature
cs corr ts corr cs corr ts corr cs corr ts corr cs corr ts corr
Liner regression
microstructure 75% 44% 39% 29% 38% 23% 26% 16%
raw 76% 27% 43% 31% 43% 22% 28% 12%
both 77% 45% 44% 35% 41% 24% 30% 20%
LightGBM
microstructure 75% 45% 62% 35% 60% 26% 53% 22%
raw 84% 33% 75% 40% 66% 24% 67% 22%
both 84% 49% 75% 50% 6% 38% 67% 36%
Fully-connected neural network
microstructure 75% 46% 60% 32% 56% 25% 54% 21%
raw 85% 29% 74% 40% 65% 24% 67% 16%
both 85% 47% 75% 48% 63% 38% 68% 33%
GR-Net
microstructure 76% 46% 63% 40% 58% 30% 58% 28%
raw 86% 34% 76% 44% 66% 24% 67% 23%
both 87% 49% 76% 50% 69% 39% 69% 36%

21
best microstructure model, HL’s method, has a cross-sectional correlation of 23%. The correlation
of linear models for Kyle’s lambda on trade price is 24%, whereas HL and CHL’s methods have the
highest correlations of 20% among all microstructure models. In terms of time-series correlations,
the linear model for spread in the US market achieves a correlation of 39% compared to 34% for
CHL alone. Similar improvements also exist in the Chinese market. These observations suggest
that significant and robust improvements can be achieved by simply linearly combining individual
microstructure models.
Second, combining raw and microstructure features further improves the performance of ma-
chine learning models. For example, in the US market, GR-Net using both sets of features achieves
cross-sectional correlations of 92%, 73%, 23%, and 45% for spread, relative spread, relative effective
spread, and Kyle’s lambda on trade price, compared to 83%, 58%, 20%, and 33% using only mi-
crostructure features. This improvement shows that machine learning models can utilize additional
information from raw features that microstructure features alone do not capture.
Third, more complex machine learning models outperform simpler ones and linear models,
which is likely driven by nonlinear relationships between the target variable and input features. In
general, GR-Net outperforms the simple fully-connected neural network, and both neutral network-
based models outperform the linear regression. For example, on cross-section, when estimating
spread, linear regressions with only microstructure models underperform nonlinear models with
only raw features or microstructure models. This suggests that significant nonlinear relationships
exist between spread and raw features, which microstructure models can only partly capture.
In summary, machine learning models can significantly improve the performance of estimating
liquidity measures using daily data, with the most complex model performing the best.

5 Model Explanation
We have shown that machine learning models can significantly improve the performance of liquidity
estimation by combining microstructure and raw daily features, and nonlinear models outperform
linear regression. This section first reports regression coefficients and feature importance to un-
derstand the relative contribution of different microstructure and raw features. Furthermore, we
use partial dependence plots and the Shapley additive explanations to understand what nonlinear
relationships machine learning models have learned. In this section, we show results for models
that learn cross-sectional correlations as an example, and results for models that learn time-series
correlations are similar.

5.1 Coefficients for Linear Regression


Figure 3 shows the regression coefficients of linear regression in the US and Chinese markets,
respectively. These coefficients reveal the dependence of estimated liquidity on individual feature.
In our analysis, they can also be interpreted as feature importance scores, as both features and
targets are standardized with zero mean and unit cross-sectional standard deviation.

22
(a) US.

(b) Chinese.

Figure 3: Regression coefficients for the linear model.

23
First, we observe that the coefficients for close price, trading volume, and market capitalization
are significant for the four target liquidity measures in both markets, although the direction of
influence may differ. For example, close price is negatively correlated with relative spread in both
markets. Conversely, when estimating spread, the coefficient of trading volume is negative in the
Chinese market and positive in the US market. Additionally, volatility has a more significant
regression coefficient in the Chinese market than in the US market.
Second, we observe that different microstructure models may show different coefficients when
targeting different liquidity measures in different markets. The HL shows positive regression coef-
ficients when estimating the relative spread in the US and Chinese markets. In contrast, it shows
a coefficient close to zero when estimating relative effective spread in the US market but exhibits
a positive coefficient in the Chinese market.
Overall, these observations suggest that raw features such as close price, trading volume, and
market capitalization, play important roles in both markets. While microstructure models add
additional information, they have very different impact to different target liuqidity measures in the
two markets.

5.2 Feature Importance for GR-Net


Figure 4 shows the normalized feature importance for GR-Net in both markets when spread and
relative spread are used as target variables. The feature importance score here is defined by the
aggregate absolute Shapley value of a feature across all samples, and we defer the details of the
definition to Section 5.4 when we present the Shapley additive explanations. We include the four
log prices as inputs in neural networks.
In general, the feature importance of GR-Net shows a pattern similar to that in linear regression.
Among the microstructure models, Roll’s method, HL, and CHL are more important than the other
microstructure models in the US market.
The raw features are much more important than microstructure features, especially in the
Chinese market. As a concrete example, the trading volume contributes more than half of the
overall feature importance when estimating the relative spread and the relative effective spread in
the Chinese market. This observation is not surprising because most microstructure models are
developed based on data from developed markets in the first place, and is consistent with our results
in Section 4.4 that the microstructure models alone underperform in the Chinese market compared
to the US market.

5.3 Partial Dependence for GR-Net


We use partial dependence plots (Friedman, 2001) to show the average marginal effect of GR-Net
with respect to each input feature. The partial dependence function of the i-th feature for g,

24
(a) US.

(b) Chinese.

Figure 4: Feature importance for GR-Net.

25
denoted by ĝi (·), is defined as
Z
ĝi (xi ) = EX−i [g(xi , X−i )] = g(xi , X−i )dP(X−i ),

where xi is the value of the i-th feature and X−i represents all other features used in a machine
learning model g. Here X−i is treated as a random vector. In practice, the expectation in g is
calculated by averaging all samples in the training set. To be specific, for each stock n on day d in
the training set, we set its i-th feature xn,d;i to be the same value xi and fix the values of all other
features xn,d;−i . This leads to:

N D
1 XX
ĝi (xi ) = g(xi , xn,d;−i ).
DN
n=1 d=1

A partial dependence plot can show whether the relationship between a model’s output and a
feature is linear, monotonic, or follows more complex patterns. For example, partial dependence
plots always show linear relationships when applied to linear regressions.
Figures 5 show the partial dependence plot of two representative input features, Roll and capi-
talization, for GR-Net in both markets, when estimating spread, relative spread, relative effective
spread, and Kyle’s lambda on trade price. These plots demonstrate the average marginal effect
learned by the GR-Net. We summarize our main observations below.
First, GR-Net utilizes information from raw features that microstructure features do not cap-
ture. For instance, the model learns a negative relationship between market capitalization and
(relative) spread in the US and Chinese markets, which indicates that larger stocks have better
liquidity. However, none of the microstructure models utilizes information from the daily market
capitalization. This result partially explains the improved performance of our machine learning
model compared to microstructure models.
Second, nonlinear relationships between features and liquidity measures are pervasive. For ex-
ample, although the Roll method, as a microstructure model, assumes a linear dependence with
relative spread, nonlinearities do exist based on real data. Figure 5a shows that the partial depen-
dence plot of the Roll method for relative spread in the US market first increases, then decreases as
the Roll value increases, which indicates opposites relationships in different regions of values. More-
over, nonlinear relationships also exist for the raw features. For example, the partial dependence
plots of the capitalization for relative spread in both markets resembles an L shape, indicating
that the estimated relative spread decreases as the capitalization increases, particularly when the
capitalization is low. These nonlinear relationships learned by the GR-Net explain the superior
performance compared to the linear model.

5.4 Shapley Additive Explanations


In this section, we adopt the Shapley value to provide additional explanations of our machine
learning model. The Shapley value accounts for interactions between features and measures feature

26
(a) US; Roll (b) Chinese; Roll

(c) US; capitalization (d) Chinese; capitalization

Figure 5: Partial dependence plots of the Roll and capitalization for GR-Net, in the US and Chinese
market.

27
contributions for the prediction of each sample. It was first developed in coalitional game theory
to distribute the payout among multiple players fairly (Shapley, 1997).
The Shapley value of the i-th feature for the n-th stock on day d, ϕn,d;i , is defined as:

X |X|!(p − |X| − 1)!


ϕn,d;i = [valn,d (X ∪ {i}) − valn,d (X)] , (26)
p!
X⊂{1,··· ,p}/{i}

where X is a subset of all p input features, and valn,d (X) is the average marginal prediction when
fixing the feature values in set X of the n-th stock on day d:
Z
valn,d (X) = g(x1 , · · · , xp )dPxi =xn,d;i (X),∀i∈X − EX [g(X)]. (27)

All possible sets of feature values are evaluated with and without feature j to calculate the exact
Shapley value, leading to an exponential complexity with respect to p. In practice, we apply
Monte-Carlo sampling to approximate the Shapley value for the n-th stock on day d:

M
1 X m
g(xn,d;+i ) − g(xm

ϕ̂n,d;i = n,d;−i ) , (28)
M
m=1

where xm
n,d;+i represents the m-th Monte-Carlo sample, which assumes random feature values (from
another random sample z) except for the i-th feature. xm m
n,d;−i is generated the same way as xn,d;+i
except that the i-th feature is also replaced by feature values of z, which refers to another stock
n′ on day d′ , such that (n, d) ̸= (n′ , d′ ). For a given sample n and feature i, each time we sample
a subset X ⊂ {1, 2, · · · , p}/{i} and another sample z. We then have (xm
n,d;−i )j = xn,d;j for j ∈ X,
(xm / X, (xm
n,d;−i )j = zj for j ∈
m
n,d;+i )j = xn,d;j for j ∈ X ∩ {i}, and (xn,d;+i )j = zj for j ∈
/ X ∩ {i}.
We take M = 10, 000 for each sample and each feature in our calculation.
The Shapley value offers valuable information about the contribution of individual features to
the overall output of a model on each sample. Lundberg and Lee (2017) introduce the Shapley
additive explanations (SHAP) to provide a comprehensive summary of the Shapley value across
all samples. In particular, we average the absolute Shapley values over all samples to get the
SHAP feature importance. In addition, we utilize SHAP summary plots to visualize the relation-
ship between the Shapley value and the corresponding feature value, which provides an intuitive
understanding of the effect of each feature on the model output.
Figure 6 shows the SHAP summary plot for GR-Net when estimating spread and relative spread
in the US and Chinese markets. The SHAP summary plot for other target liquidity measures can
be found in Appendix D. We derive more intuition about our model by combining the Shapley
value and the partial dependence plots.
First, raw features play an essential role in both markets. The top three most important
features in terms of the Shapley value are raw features in all four subfigures. The most important
microstructure feature is CHL in the US market, HL in the Chinese market for estimating spread,
and Amivest for estimating relative spread.

28
(a) US; spread (b) Chinese; spread

(c) US; relative spread (d) Chinese; relative spread

Figure 6: SHAP summary plots for GR-Net. In each subfigure, each row represents the distribution
of SHAP values for a feature across all samples. The horizontal axis shows the SHAP value, and
different colors represent different values of the feature.

29
Second, most features have monotonic relationships with their Shapley values. For example,
volume is negatively correlated with its Shapley value for relative spread in both markets, consistent
with the observation that stocks with higher trading volumes have smaller relative spreads. Stock
prices are negatively correlated with their Shapley values in both markets when estimating relative
spread, aligning with with observations in Figure 1.
Third, we observe non-monotonic relationships between some microstructure features and their
Shapley values, which again suggests that they are informative for estimating liquidity only in
certain regions of their value. For example, when estimating spread in the Chinese market, the
Shapley value of HL is close to zero if the value of HL is low. As HL increases, its Shapley value also
increases in absolute value but can be either negative or positive. Similar patterns also exist for
Roll when estimating relative spread in the US market. These non-monotonic relationships reveal
that more information is contained in HL and Roll when their values are large.

Summary. Taking results from partial dependence plots and Shapley additive explanations to-
gether, we provide several intuitions behind the performance improvement of machine learning
models over stand-alone microstructure models. First, machine learning models utilize informa-
tion from raw features that individual microstructure models do not capture, which enlarges the
information set for our problem. Second, machine learning models learn nonlinear relationships
between features and the target variable. Finally, machine learning models capture non-monotone
relationships between features and the target variable, which allow microstructure models to con-
tribute even if they are relevant only in certain regions. The last two observations demonstrate
that machine learning models improve our ability to better utilize information for the problem.

6 Conclusion
The problem of estimating market liquidity using only low-frequency daily data has received much
attention from academics and practitioners. We test a range of classical microstructure models to
estimate ten different market liquidity measures in the US and Chinese stock markets. We then
construct four statistical and machine learning models to estimate the same targets by combining
microstructure and raw stock-level features. We demonstrate that machine learning models com-
bining microstructure features outperform each individual microstructure model. The performance
can be further improved by combining raw and microstructure features.
We use partial dependence plots and Shapley additive explanations to understand how machine
learning models utilize the raw and microstructure features. This provides intuitions for what
drives the performance improvement of machine learning models. First, machine learning models
can utilize information from raw features that the microstructure models do not use. Second,
machine learning models can capture nonlinear dependence between features and target variables.
Third, machine learning models learn non-monotonic relationships that better utilize microstructure
features in regions when they are most relevant.

30
Overall, machine learning combining human knowledge from microstructure models and raw
low-frequency data can provide interpretable improvements for the performance of liquidity esti-
mation, making them more easily accessible using low-frequency data.

Acknowledgement
We thank Jean-Edouard Colliard, Thierry Foucault, Paul Glasserman, Andrew W. Lo, and seminar
and conference participants at the 2024 Asian Meeting of Econometrics Society, 2024 CIRF & CFRI
Conference, 2023 INFORMS Annual Meetings, 2023 China FinTech Research Conference, and the
Seventh PKU-NUS Annual International Conference on Quantitative Finance and Economics for
very helpful comments and discussion. Research support from the National Key R&D Program
of China (2022YFA1007900), the National Natural Science Foundation of China (12271013), and
the Fundamental Research Funds for the Central Universities (Peking University) is gratefully
acknowledged.

References
Abdi, F., and A. Ranaldo, 2017, A simple estimation of bid-ask spreads from daily close, high, and
low prices, The Review of Financial Studies 30, 4437–4480.

Aı̈t-Sahalia, Y., J. Fan, L. Xue, and Y. Zhou, 2022, How and when are high-frequency stock returns
predictable?, Technical report, National Bureau of Economic Research.

Amihud, Y., 2002, Illiquidity and stock returns: cross-section and time-series effects, Journal of
Financial Markets 5, 31–56.

Amihud, Y., and H. Mendelson, 1986, Asset pricing and the bid-ask spread, Journal of Financial
Economics 17, 223–249.

Ardia, D., E. Guidotti, and T. A. Kroencke, 2024, Efficient estimation of bid–ask spreads from
open, high, low, and close prices, Journal of Financial Economics 161, 103916.

Asness, C. S., T. J. Moskowitz, and L. H. Pedersen, 2013, Value and momentum everywhere, The
Journal of Finance 68, 929–985.

Avramov, D., S. Cheng, and L. Metzker, 2023, Machine learning vs. economic restrictions: Evidence
from stock return predictability, Management Science 69, 2587–2619.

Bali, T. G., H. Beckmeyer, M. Moerke, and F. Weigert, 2023, Option return predictability with
machine learning and big data, The Review of Financial Studies 36, 3548–3602.

Bali, T. G., L. Peng, Y. Shen, and Y. Tang, 2014, Liquidity shocks and stock market reactions,
The Review of Financial Studies 27, 1434–1485.

31
Barardehi, Y. H., D. Bernhardt, Z. Da, and M. Warachka, 2022, Uncovering the liquidity premium
in stock returns using retail liquidity provision, Available at SSRN 4057713.

Barbaglia, L., S. Manzan, and E. Tosetti, 2023, Forecasting loan default in europe with machine
learning, Journal of Financial Econometrics 21, 569–596.

Baron, M., J. Brogaard, B. Hagströmer, and A. Kirilenko, 2019, Risk and return in high-frequency
trading, Journal of Financial and Quantitative Analysis 54, 993–1024.

Bekaert, G., C. R. Harvey, C. T. Lundblad, and S. Siegel, 2014, Political risk spreads, Journal of
International Business Studies 45, 471–493.

Bianchi, D., M. Büchner, and A. Tamoni, 2021, Bond risk premiums with machine learning, The
Review of Financial Studies 34, 1046–1089.

Brogaard, J., and A. Zareei, 2023, Machine learning and the stock market, Journal of Financial
and Quantitative Analysis 58, 1431–1472.

Chen, L., M. Pelger, and J. Zhu, 2023, Deep learning in asset pricing, Management Science .

Chinco, A., A. D. Clark-Joseph, and M. Ye, 2019, Sparse signals in the cross-section of returns,
The Journal of Finance 74, 449–492.

Clark, A., 2011, Revamping liquidity measures: improving investability in emerging and frontier
market indices and their related etfs, The Journal of Index Investing 2, 37–43.

Colliard, J.-E., T. Foucault, and S. Lovo, 2022, Algorithmic pricing and liquidity in securities
markets, HEC Paris Research Paper.

Cont, R., M. Cucuringu, and C. Zhang, 2023, Cross-impact of order flow imbalance in equity
markets, Quantitative Finance 23, 1373–1393.

Corwin, S. A., and P. Schultz, 2012, A simple way to estimate bid-ask spreads from daily high and
low prices, The Journal of Finance 67, 719–760.

Cybenko, G., 1989, Approximation by superpositions of a sigmoidal function, Mathematics of


control, signals and systems 2, 303–314.

Davis, R., A. W. Lo, A. Mishra, Sudhanshu amd Nourian, M. Singh, N. Wu, and R. Zhang, 2023,
Explainable machine learning models of consumer credit risk, The Journal of Financial Data
Science 5, 9–39.

De Nicolò, G., and I. V. Ivaschenko, 2009, Global liquidity, risk premiums and growth opportunities,
CESifo Working Paper Series.

Dou, W. W., I. Goldstein, and Y. Ji, 2023, Ai-powered trading, algorithmic collusion, and price
efficiency, Available at SSRN 4452704.

32
Easley, D., N. M. Kiefer, M. O’hara, and J. B. Paperman, 1996, Liquidity, information, and infre-
quently traded stocks, The Journal of Finance 51, 1405–1436.

Easley, D., M. López de Prado, M. O’Hara, and Z. Zhang, 2021, Microstructure in the machine
age, The Review of Financial Studies 34, 3316–3363.

Fong, K. Y., C. W. Holden, and C. A. Trzcinka, 2017, What are the best liquidity proxies for global
research?, Review of Finance 21, 1355–1401.

Foucault, T., O. Kadan, and E. Kandel, 2005, Limit order book as a market for liquidity, The
review of financial studies 18, 1171–1217.

Friedman, J. H., 2001, Greedy function approximation: a gradient boosting machine, Annals of
statistics 1189–1232.

Fuster, A., P. Goldsmith-Pinkham, T. Ramadorai, and A. Walther, 2022, Predictably unequal?


The effects of machine learning on credit markets, The Journal of Finance 77, 5–47.

Glosten, L. R., and P. R. Milgrom, 1985, Bid, ask and transaction prices in a specialist market
with heterogeneously informed traders, Journal of Financial Economics 14, 71–100.

Goyenko, R. Y., C. W. Holden, and C. A. Trzcinka, 2009, Do liquidity measures measure liquidity?,
Journal of Financial Economics 92, 153–181.

Gu, S., B. Kelly, and D. Xiu, 2020, Empirical asset pricing via machine learning, The Review of
Financial Studies 33, 2223–2273.

Hagströmer, B., 2021, Bias in the effective bid-ask spread, Journal of Financial Economics 142,
314–337.

Hasbrouck, J., 2004, Liquidity in the futures pits: Inferring market dynamics from incomplete data,
Journal of Financial and Quantitative Analysis 39, 305–326.

He, K., X. Zhang, S. Ren, and J. Sun, 2016, Deep residual learning for image recognition, in
Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.

Henkel, S., 2008, Is global illiquidity contagious? Contagion and cross-market commonality in
liquidity, Unpublished working paper. Indiana University .

Hochreiter, S., and J. Schmidhuber, 1997, Long short-term memory, Neural computation 9, 1735–
1780.

Hollifield, B., A. Neklyudov, and C. Spatt, 2017, Bid-ask spreads, trading networks, and the pricing
of securitizations, The Review of Financial Studies 30, 3048–3085.

Hornik, K., 1991, Approximation capabilities of multilayer feedforward networks, Neural networks
4, 251–257.

33
Huck, N., 2019, Large data sets and machine learning: Applications to statistical arbitrage, Euro-
pean Journal of Operational Research 278, 330–342.

Huddleston, D., F. Liu, and L. Stentoft, 2023, Intraday market predictability: A machine learning
approach, Journal of Financial Econometrics 21, 485–527.

Jahan-Parvar, M. R., and F. Zikes, 2023, When do low-frequency measures really measure effective
spreads? Evidence from equity and foreign exchange markets, The Review of Financial Studies
36, 4190–4232.

Jiang, J., B. T. Kelly, and D. Xiu, 2023, (Re-)Imag(in)ing price trends, The Journal of Finance
78, 3193–3249.

Kolm, P. N., J. Turiel, and N. Westray, 2023, Deep order flow imbalance: Extracting alpha at
multiple horizons from the limit order book, Mathematical Finance 33, 1044–1081.

Kyle, A. S., 1985, Continuous auctions and insider trading, Econometrica 53, 1315–1335.

Leippold, M., Q. Wang, and W. Zhou, 2022, Machine learning in the Chinese stock market, Journal
of Financial Economics 145, 64–82.

Lesmond, D. A., J. P. Ogden, and C. A. Trzcinka, 1999, A new estimate of transaction costs, The
review of financial studies 12, 1113–1141.

Li, S., X. Wang, and M. Ye, 2021, Who provides liquidity, and when?, Journal of Financial Eco-
nomics 141, 968–980.

Lundberg, S. M., and S.-I. Lee, 2017, A unified approach to interpreting model predictions, Ad-
vances in neural information processing systems 30.

O’Hara, M., G. Saar, and Z. Zhong, 2019, Relative tick size and the trading environment, The
Review of Asset Pricing Studies 9, 47–90.

Roll, R., 1984, A simple implicit measure of the effective bid-ask spread in an efficient market, The
Journal of Finance 39, 1127–1139.

Sadhwani, A., K. Giesecke, and J. Sirignano, 2021, Deep learning for mortgage risk, Journal of
Financial Econometrics 19, 313–368.

Schestag, R., P. Schuster, and M. Uhrig-Homburg, 2016, Measuring liquidity in bond markets, The
Review of Financial Studies 29, 1170–1219.

Shapley, L. S., 1997, A value for n-person games, Classics in game theory 69.

Sirignano, J., and R. Cont, 2019, Universal features of price formation in financial markets: per-
spectives from deep learning, Quantitative Finance 19, 1449–1459.

34
Sirignano, J. A., 2019, Deep learning for limit order books, Quantitative Finance 19, 549–570.

Tibshirani, R., 1996, Regression shrinkage and selection via the lasso, Journal of the Royal Statis-
tical Society Series B: Statistical Methodology 58, 267–288.

35
Online Appendix

A MicroStructure Models
We provide more detailed descriptions for some of the microstructure models we use in this paper.
We use Ht , Lt , and Ct to denote the high, low, and close price of a stock on day t. They can
refer to either raw prices or log prices depending on the design of the specific microstructure model.
For example, the the Roll method of Roll (1984) is initially designed to estimate spread. Hence,
the Ct used in the the Roll method of Roll (1984) refers to the raw close price. S represents the
spread. As a convention, we always use the superscript ∗ to denote the unobserved (true) value of
a quantity.
The targets of these microstructure models are either spread or relative spread. We extend
these models to estimate all three versions of the spread in (3)–(4) by replacing the raw price with
the log price or vice versa. For example, if we replace the close price with the log close price, we get
a version for relative spread. However, the Gibbs and LOT methods cannot be directly extended
to estimating the spread because they use returns as the input.

Roll’s Method. Roll (1984) proposes to estimate the spread by:


p
Roll = 2 −Cov[∆Ct , ∆Ct−1 ], (29)

where ∆Ct = Ct − Ct−1 and Cov[∆Ct , ∆Ct−1 ] is the autocovariance of the change in daily close
prices. Note that Cov[∆Ct , ∆Ct−1 ] may be positive in practice, which leads to taking the square
root of a negative number. We follow the literature to bound −Cov[∆Ct , ∆Ct−1 ] by zero below
when using real data.

Lesmond, Ogden, and Trzcinka (LOT). Lesmond, Ogden, and Trzcinka (1999) propose a
model to estimate relative spread using daily returns, which is often referred to as the LOT method.
LOT assumes the following market model for unobserved true returns:

Rt∗ = βRmt + εt ,

where Rmt is the market return on day t, Rt∗ is the unobserved true return for the stock on day t,
β is the regression coefficient, and εt is the residual error with a variance of σ 2 .
Investors buy when the expected return exceeds the spread and sell when the expected return
is lower than the negative spread. As a result, the existence of trading costs leads to trading days
with zero returns. Let α1 ≤ 0 represent the proportion of loss due to spread when selling a stock,
and α2 ≥ 0 represents the proportion of loss due to spread when buying a stock, the observed

36
returns, Rt , are therefore given by:

R ∗ − αj , Rt∗ ≤ α1
 t


Rt = Rt∗ , α1 < Rt∗ < α2

 ∗
α2 ≤ Rt∗

Rj − α2 ,

Lesmond, Ogden, and Trzcinka (1999) use the following LOT estimator for spread:

LOT = α2 − α1 , (30)

where α1 and α2 are estimated by maximizing the following log-likelihood function on day t:
    
α2 − βRmt α1 − βRmt
L(α1 , α2 , β, σ|Rt , Rmt ) = 1{Rt = 0} · log Φ −
σ σ
  
1 Rt + α1 − βRmt
+ 1{Rmt < 0, Rt ̸= 0} · log ϕ (31)
σ σ
  
1 Rt + α2 − βRmt
+ 1{Rmt > 0, Rt ̸= 0} · log ϕ ,
σ σ

where Φ(·) and ϕ are the cumulative distribution function and density function of the standard
normal distribution, respectively. The log-likelihood function consists of three parts that correspond
to (a) Rt = 0, (b) Rt ̸= 0 and Rmt < 0, and (c) Rt ̸= and Rmt > 0.
Goyenko, Holden, and Trzcinka (2009) develop a different version of LOT, commonly referred
to as the LOT Y-split, which maximizes the likelihood function:
    
α2 − βRmt α1 − βRmt
L′ (α1 , α2 , β, σ|Rt , Rmt ) = 1{Rt = 0} · log Φ −
σ σ
  
1 Rt + α1 − βRmt
+ 1{Rt < 0} · log ϕ (32)
σ σ
  
1 Rt + α2 − βRmt
+ 1{Rt > 0} · log ϕ .
σ σ

This likelihood function is similar to that in (31) except that the three parts in L′ are defined by
stock returns rather than market returns: (a) Rt = 0, (b) Rt < 0, and (c) Rt > 0. To distinguish
between the two methods, we refer to the original LOT estimator as LOT X-split henceforth.

Gibbs Method. Hasbrouck (2004) proposes a method to estimate relative spread based on Gibbs
sampling. Assuming the following model for the true close price:

i.i.d
Ct∗ = Ct−1

+ εt , εt ∼ N (0, σ 2 ),

where εt is the noise term interpreted as innovations in public information. The observed close

37
price is therefore given by ( ∗
Ct − S/2, qt = −1,
Ct =
Ct∗ + S/2, qt = 1.

where the direction of trade qt ∈ {−1, 1}. Based on this model, we have ∆Ct = S∆qt /2 + εt , from
which it follows that

Var[∆Ct ] = σ 2 + S 2 /2, Cov[∆Ct , ∆Ct−1 ] = −S 2 /4.

To apply Gibbs sampling, one needs to maximize the posterior F (S, σ, q|C). Here, C =
{C1 , · · · , CT } is a sequence of observed close prices, q = {q1 , · · · , qT } is a sequence of trade di-
rections to be estimated, and S and σ are two parameters to be estimated. Hasbrouck (2004)
applies the Gibbs sampler
S (j+1) ∼ f (S|σ (j) , q (j) , C),
σ (j+1) ∼ f (σ|S (j+1) , q (j) , C),
q (j+1) ∼ f (q|S (j+1) , σ (j+1) , C),
to iteratively update the estimates of S, σ, and q, where the superscript j denotes the j-th iteration.
The prior of q is a two-point distribution, and the spread S follows a truncated normal distribution,
and σ follows an inverse Gamma distribution. For example, Hasbrouck (2004) chooses

S ∼ trancN (0, ∞, 106 ), σ 2 ∼ InvGamma(10−12 , 10−12 ),

where trancN (0, ∞, 106 ) stands for the normal distribution truncated from 0 to ∞, and InvGamma
stands for the inverse Gamma distribution.

High-Low (HL). Corwin and Schultz (2012) propose a method to estimate relative spread using
daily high prices and low prices. The intuition behind their method is that the high price should
be initialized by a market buy order and the low price should be initialized by a market sell order,
hence the amplitude of stock price in a trading day equals the true range plus a spread. Corwin
and Schultz (2012) assume that the true stock price follows a Geometric Brownian motion and
calculate the spread by solving a set of equations.
Using our notations, it follows that:
2
Ht∗ (1 + S/2)
 
2
[ln(Ht /Lt )] = ln , (33)
L∗t (1 − S/2)

and
T T
( ) ( )
1X 1X
E [ln(Ht∗ /L∗t )]2 = k1 σ 2 , E [ln(Ht∗ /L∗t )] = k2 σ, (34)
T T
t=1 t=1
q
8
where σ is the diffusion coefficient of the continuous price process, k1 = 4 ln 2, and k2 = π.

38
Denote
 
   1  
2+S 1 X  1
α = ln , β=E [ln(Ht+j /Lt+j )]2 , γ=E [ln(Ht,t+1 /Lt,t+1 )]2 , (35)
2−S T  T
j=0

where Ht,t+1 and Lt,t+1 represent the high and low prices over two consecutive days t and t + 1,
respectively. Corwin and Schultz (2012) derive an analytical solution for α by solving (33), (34),
and (35): √√
2β − β
r
γ
α= √ − √ . (36)
3−2 2 3−2 2
As a result, the spread can be estimated by:

2(eα − 1)
S= . (37)
eα + 1

Close-High-Low (CHL). Abdi and Ranaldo (2017) propose a method to estimate relative
spread using daily high, low, and close prices. Similar to the HL method, CHL assumes that
the true log price follows a Geometric Brownian motion with diffusion coefficient σ, and

S S
Ht = Ht∗ + , Lt = L∗t − . (38)
2 2

Let ηt = (Ht + Lt )/2, Abdi and Ranaldo (2017) show that


" 2 #
s2
   
ηt + ηt+1 ηt + ηt+1 1 k1
E ct − = 0, E ct − = + − σ2, (39)
2 2 4 2 8

and  
2 k1
E[(ηt+1 − ηt ) ] = 2 − σ2 (40)
2
with k1 = 4 ln 2. Combining (39) and (40) yields S 2 = 4E[Ct − ηt )(Ct − ηt+1 )]. Using our notations,
it follows that
p
CHL = E[(2Ct − Ht − Lt )(2Ct − Ht+1 − Lt+1 )]. (41)

B Additional Results for the Performance of Microstructure Mod-


els
We present the performance of the eight microstructure models in Figure 7 for the US market, and
Figure 8 for the Chinese market. In general, Roll, HL, and CHL perform better than the other
microstructure models.

39
Figure 7: Performance of microstructure models in the US market. Each subfigure corresponds to
a specific measure of liquidity, with the values represented in percentage correlations. For example,
the final subfigure illustrates that the Roll method estimates Kyle’s lambda on trade price with a
time-series correlation of 21% in the US market.

40
Figure 8: Performance of microstructure models in the Chinese market. Each subfigure corresponds
to a specific measure of liquidity, with the values represented in percentage correlations. For
example, the final subfigure illustrates that the Roll method estimates Kyle’s lambda on trade
price with a time-series correlation of 8% in the Chinese market.

41
C Additional Results for the Performance of Machine Learning
Models
Monthly resolution. The cross-sectional and time series performance of machine learning mod-
els in the US market, when liquidity is estimated at a monthly resolution, are shown in Figures 9
and 10, respectively. Overall, machine learning models that combine microstructure models and
raw features significantly outperform linear models with only microstructure models.
The cross-sectional and time series performance of machine learning models in the Chinese mar-
ket when liquidity is estimated at a monthly resolution are shown in Figures 11 and 12, respectively.

Daily resolution. While results in Figures 9 and 10 demonstrate the potential performance
improvements compared to microstructure models, machine learning models have the additional
benefit of allowing liquidity estimation at a much higher (daily) resolution. As mentioned in
Section 3.4.1, microstructure models require data from several consecutive trading days (e.g., 20
days) and are typically used to estimate monthly liquidity measures in the literature; see, for
example, Goyenko, Holden, and Trzcinka (2009) and Fong, Holden, and Trzcinka (2017). However,
our machine learning approach can be easily applied to estimate daily liquidity measures because
our target variables are available daily.
Figures 13 and 14 report the parallel set of performance results for all machine learning models
when the target variable is estimated at a daily resolution. Overall, although correlations are
slightly lower compared with the monthly results, the performance is still very robust and can be
very useful practically.

D Shapley Values
We present the Shapley values of the ten liquidity measures of the GR-Net in Figures 17–26.

42
Figure 9: Cross-sectional performance of machine learning models in the US market when liquidity
is estimated at a monthly resolution. Each subfigure corresponds to a machine learning model
applied to different liquidity measures (in columns) and three sets of input features (in rows), with
values represented in percentage correlations. For example, the final subfigure shows that GR-Net
estimates the spread with a cross-sectional correlation of 92% when using both microstructure and
raw features.

43
Figure 10: Time series performance of machine learning models in the US market when liquidity
is estimated at a monthly resolution. Each subfigure corresponds to a machine learning model
applied to different liquidity measures (in columns) and three sets of input features (in rows), with
values represented in percentage correlations. For example, the final subfigure shows that GR-Net
estimates the spread with a time series correlation of 47% when using both microstructure and raw
features.

44
Figure 11: Cross-sectional performance of machine learning models in the Chinese market when
liquidity is estimated at a monthly resolution. Each subfigure corresponds to a machine learning
model applied to different liquidity measures (in columns) and three sets of input features (in rows),
with values represented in percentage correlations. For example, the final subfigure shows that GR-
Net estimates the spread with a cross-sectional correlation of 87% when using both microstructure
and raw features.

45
Figure 12: Time series performance of machine learning models in the Chinese market when liquid-
ity is estimated at a monthly resolution. Each subfigure corresponds to a machine learning model
applied to different liquidity measures (in columns) and three sets of input features (in rows), with
values represented in percentage correlations. For example, the final subfigure shows that GR-Net
estimates the spread with a time series correlation of 49% when using both microstructure and raw
features.

46
Figure 13: Cross-sectional performance of machine learning models in the US market when liq-
uidity is estimated at a daily resolution. Each subfigure corresponds to a machine learning model
applied to different liquidity measures (in columns) and three sets of input features (in rows), with
values represented in percentage correlations. For example, the final subfigure shows that GR-Net
estimates the spread with a cross-sectional correlation of 86% when using both microstructure and
raw features.

47
Figure 14: Time series performance of machine learning models in the US market when liquidity is
estimated at a daily resolution. Each subfigure corresponds to a machine learning model applied
to different liquidity measures (in columns) and three sets of input features (in rows), with values
represented in percentage correlations. For example, the final subfigure shows that GR-Net esti-
mates the spread with a time series correlation of 31% when using both microstructure and raw
features.

48
Figure 15: Cross-sectional performance of machine learning models in the Chinese market when
liquidity is estimated at a daily resolution. Each subfigure corresponds to a machine learning model
applied to different liquidity measures (in columns) and three sets of input features (in rows), with
values represented in percentage correlations. For example, the final subfigure shows that GR-Net
estimates the spread with a cross-sectional correlation of 83% when using both microstructure and
raw features.

49
Figure 16: Time series performance of machine learning models in the Chinese market when liq-
uidity is estimated at a daily resolution. Each subfigure corresponds to a machine learning model
applied to different liquidity measures (in columns) and three sets of input features (in rows), with
values represented in percentage correlations. For example, the final subfigure shows that GR-Net
estimates the spread with a time series correlation of 27% when using both microstructure and raw
features.

50
(a) US; spread (b) Chinese; spread

Figure 17: SHAP summary plots for GR-Net on estimating spread. In each subfigure, each row
represents the distribution of SHAP values for a feature across all samples. The horizontal axis
shows the SHAP value, and different colors represent different values of the feature.

(a) US; relative spread (b) Chinese; relative spread

Figure 18: SHAP summary plots for GR-Net on estimating relative spread. In each subfigure, each
row represents the distribution of SHAP values for a feature across all samples. The horizontal axis
shows the SHAP value, and different colors represent different values of the feature.

51
(a) US; effective spread (b) Chinese; effective spread

Figure 19: SHAP summary plots for GR-Net on estimating effective spread. In each subfigure, each
row represents the distribution of SHAP values for a feature across all samples. The horizontal axis
shows the SHAP value, and different colors represent different values of the feature.

(a) US; relative effective spread (b) Chinese; relative effective spread

Figure 20: SHAP summary plots for GR-Net on estimating relative effective spread. In each
subfigure, each row represents the distribution of SHAP values for a feature across all samples.
The horizontal axis shows the SHAP value, and different colors represent different values of the
feature.

52
(a) US; realized spread (b) Chinese; realized spread

Figure 21: SHAP summary plots for GR-Net on estimating realized spread. In each subfigure, each
row represents the distribution of SHAP values for a feature across all samples. The horizontal axis
shows the SHAP value, and different colors represent different values of the feature.

(a) US; relative realized spread (b) Chinese; relative realized spread

Figure 22: SHAP summary plots for GR-Net on estimating relative realized spread. In each
subfigure, each row represents the distribution of SHAP values for a feature across all samples.
The horizontal axis shows the SHAP value, and different colors represent different values of the
feature.

53
(a) US; mid-price impact (b) Chinese; mid-price impact

Figure 23: SHAP summary plots for GR-Net on estimating mid-price impact. In each subfigure,
each row represents the distribution of SHAP values for a feature across all samples. The horizontal
axis shows the SHAP value, and different colors represent different values of the feature.

(a) US; relative mid-price impact (b) Chinese; relative mid-price impact

Figure 24: SHAP summary plots for GR-Net on estimating relative mid-price impact. In each
subfigure, each row represents the distribution of SHAP values for a feature across all samples.
The horizontal axis shows the SHAP value, and different colors represent different values of the
feature.

54
(a) US; Kyle’s lambda on mid-price (b) Chinese; Kyle’s lambda on mid-price

Figure 25: SHAP summary plots for GR-Net on estimating Kyle’s lambda on mid-price. In each
subfigure, each row represents the distribution of SHAP values for a feature across all samples. The
horizontal axis shows the SHAP value, and different colors represent different values of the feature.

(a) US; Kyle’s lambda on trade price (b) Chinese; Kyle’s lambda on trade price

Figure 26: SHAP summary plots for GR-Net on estimating Kyle’s lambda on trade price. In each
subfigure, each row represents the distribution of SHAP values for a feature across all samples. The
horizontal axis shows the SHAP value, and different colors represent different values of the feature.

55

You might also like