0% found this document useful (0 votes)
42 views12 pages

Finance-Grounded Optimization For Algorithmic Trading

This document presents a study on finance-grounded optimization for algorithmic trading, proposing loss functions based on financial metrics such as the Sharpe ratio and Profit-and-Loss (PnL). The authors introduce turnover regularization to manage trading position turnover and demonstrate that their methods outperform traditional mean squared error loss in return prediction tasks. The research emphasizes the importance of using finance-specific metrics for enhancing predictive performance in trading strategies and portfolio optimization.

Uploaded by

Pedro Kraus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views12 pages

Finance-Grounded Optimization For Algorithmic Trading

This document presents a study on finance-grounded optimization for algorithmic trading, proposing loss functions based on financial metrics such as the Sharpe ratio and Profit-and-Loss (PnL). The authors introduce turnover regularization to manage trading position turnover and demonstrate that their methods outperform traditional mean squared error loss in return prediction tasks. The research emphasizes the importance of using finance-specific metrics for enhancing predictive performance in trading strategies and portfolio optimization.

Uploaded by

Pedro Kraus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Finance-Grounded Optimization For Algorithmic Trading

Kasymkhan Khubiyev∗
Sirius University of Science and Technology, Sirius, Russia
[email protected]
Mikhail Semenov
Sirius University of Science and Technology, Sirius, Russia
[email protected]
Irina Vyacheslavovna Podlipnova
arXiv:2509.04541v1 [cs.LG] 4 Sep 2025

Sirius University of Science and Technology, Sirius, Russia


Moscow Institute of Physics and Technology, Moscow, Russia
[email protected],
[email protected]

September 8, 2025

Abstract
Deep Learning is evolving fast and integrates into various domains. Finance is a challeng-
ing field for deep learning, especially in the case of interpretable artificial intelligence (AI).
Although classical approaches perform very well with natural language processing, computer
vision, and forecasting, they are not perfect for the financial world, in which specialists use
different metrics to evaluate model performance.
We first introduce financially grounded loss functions derived from key quantitative fi-
nance metrics, including the Sharpe ratio, Profit-and-Loss (PnL), and Maximum Draw down.
Additionally, we propose turnover regularization, a method that inherently constrains the
turnover of generated positions within predefined limits.
Our findings demonstrate that the proposed loss functions, in conjunction with turnover
regularization, outperform the traditional mean squared error loss for return prediction
tasks when evaluated using algorithmic trading metrics. The study shows that financially
grounded metrics enhance predictive performance in trading strategies and portfolio opti-
mization.

1 Introduction
Deep learning (DL) is evolving fast and integrates into various domains, affecting complex
problems and the routine of daily life. Finance is an ongoing challenge for deep learning due
to domain specificity. There are many tasks to challenge with deep learning, and algorithmic
trading is one of those challenges. The main goal of algorithmic trading is to discover new
signals from various data flows to build strategies that increase profits.
Large language models (LLMs) succeeded in different tasks, including solving mathematical
problems. Chain-of-Thoughts (CoT) [1] and reasoning in a couple with agentic architecture [2]
help achieve the best results. There are many attempts to fit LLMs for stock price predic-
tion [6, 7, 8], starting with zero-shot and few-shot learning and ending with LLM supervised
finetuning (SFT). For example, Zhang et al. [6] proposed a FinGPT - a GPT-like model, fine-
tuned on financial data, outperforms general-purpose LLMs in tasks where domain-specific
0
Preprint for the ICOMP 2025: International Conference on Computational Optimization

1
numerical understanding is crucial. Yu et al. [7] proposed a framework to build explainable
forecasting multimodal models based on LLMs. They highlighted that LLMs struggle with
numerical data and addressed a discrete bins embedding technique to the issue. Lopez and
Lira [8] used ChatGPT to forecast price movements and built a simple trading strategy upon
the forecast to show the LLM capabilities. In our previous study [9] we focused on multimodal
approaches for stock price prediction. We used LLM as embedding to vectorize news flow and
concatenate it with time-series. We showed that news flow directly embedded into candlestick
time series improves forecasting performance reducing average prediction error and in most
tasks outperforms backbone model – long-short term memory recurrent model (LSTM). An-
other approach is reinforcement learning (RL) that is widely used in robotics and brilliantly
demonstrated its power with a new highly efficient LLM Deep-Seek R1 [3]. The key point to
achieve efficient training with RL is robust and effective reward model.
In all researches mentioned above, the authors used standard for DL optimization tools and
frameworks for the regression and classification tasks. Although the standard methods showed
their robustness in various scenarios, financial evaluation and quality metrics differ from the
ones used in classical problems. For example, Mean Squared Error (MSE) – the top-choice loss
function for regression task. But from financial perspective applying the MSE is not informative,
because finance experts base on other metrics for evaluation and decision making. That is
why implementing finance-grounded metrics might benefit forecasting performance and improve
models decision-making interpretation being a step forward to trustworthy AI in finance. For
example, the authors [4] applying RL to design algorithmic trading strategies, used Profit-
and-Loss (PnL) metric as a key component of a reward policy and Sharpe ratio to select top-
performing strategies on a historical interval. The authors [5] of the DianJin-R1 model, a LLM
specialized for finance, focused on how the model argues and interprets its responses. To train
the model to reason and response with (CoT) the authors used the following datasets: the
CFLUE dataset contains 38 finance exam questions in Chinese, the FinQA dataset contains
8 thousand financial report questions with numeric answer in English and the CCC (Chinese
Compliance Check) to ensure the model safety. They usedGPT-4o to filter questions by difficulty
and ambiguity and to compare the model output with ground truth value to compute the reward
value for RL. The final language model outperforms multiagentic systems, which tend to spend
more tokens to solve the same problem. The model scrutinizes and analyzes market data
and events and assesses them, having some situational knowledge. Although the model has
strong financial and economical knowledge, it was not trained for algorithmic trading and to
use quantitative tools.
The current paper aims to propose loss functions based finance fundamentals for algorithmic
trading strategies and portfolio management. The proposed functions might be used straight-
forwardly to generate positions as we show in the paper and as a part of reward policies.
The key contributions of this paper are as follows.
1. We use finance-grounded loss functions like SharpeLoss, MaxDrawDownLoss, PnLLoss
that are fundamentally more suitable for financial time-series forecasting.

2. We propose turnover regularization implicitly control a strategy turnover while training.

3. We fit the proposed function for portfolio management tasks.

The paper is organized as follows. In Section 2, we briefly describe the original dataset
and conduct an exploration of it. In Section 3, we present a research methodology that covers
evaluation metrics and custom loss functions. In Section 4, we describe performed experiments
set up and pipelines, introduce models architectures and algorithmic trading strategies. In
Section 5, we describe the results of a computational experiment. Finally, Section 6 concludes
the paper and discusses the future work.

2
2 Data
In this section we discuss the data and its sources needed to perform experiments, describing
the data key features and arguing the sources choice. Because of the data quality requirements
and experiments specificity we used time series and statistics data from Binance - one of the most
popular Centralized Exchange (CEX). Binance provides API to parse high-frequency market
data.
To perform an experiment we first chose back test time interval: from the 1st of January
2022 until the 1st of July 2025. To select coins we first looked through which coins were listed
not later than the year of 2021 and are not de-listed at the moment of the experiment at the year
of 2025. We skipped data of the year 2021 because the market had a dramatic price change 1
from 2021 to 2022 having median change equal to 432.42%. There were 61 coins that satisfied
the requirements. We parsed market data with data points frequency of three types: daily,
hourly, and fifteen minutes. The API provides the following data: close, open, high, and low
prices, base and quote asset volumes, taker buy base and quote asset volume, and number of
trades.

Years Absolute price change in per cent


2021–2022 432.41
2022–2023 45.20
2023–2024 1.06
2024-2025 90.00

Table 1: Coins median annual percentage price change.

3 Methodology
We examined different financial data modalities in perspective of algorithmic trading strate-
gies (alphas) [10]. For candlestick data we build alphas using heuristics, machine learning and
deep learning models, comparing the influence of custom finance-grounded loss functions on
alphas execution results.
3.1 Evaluation Metrics
Although the collected dataset contains 15-minute frequency data points, for algorithmic
trading strategies (alphas) we use a conservative strategy - we rebalance the porfolio once a
day. The choice of trading frequency is not occasional. In the current research we focus on
middle-frequency trading strategies with a rigid constraint – any portfolio rebalance orders will
be executed within a time interval between two time points determined by the data points
frequency.
We use classical trading strategies built only on heuristics as baseline strategies. For
example, we use reversion, momentum, mean reversion [10], and conservative buy and hold
(Buy&Hold) alphas:

Reversion(d) = − r(d − 1),


Momentum = M A(r(d), w), (1)
Mean Reversion = − M A(r(d), w),
where d – indicates a day sequence, MA – moving average with a window size equal w, r(d) is
returns, i. e. the change in price of an asset, relative to its previous value

p(d)
r(d) = − 1. (2)
p(d − 1)

3
Returns is a scaling transformation, that makes it possible to compare assets in relative terms.
Let us a batch of stock tickers observed in a specific alpha to be an universe. Each alpha
consists of vectors of positions of the length equal to the number of assets included into a current
universe. Behind the reversion alpha stands the idea that if the current trend is growth, later
it will decrease. Momentum stands for the idea, that if the market is in the growth stage it
will grow for a while. Mean Reversion stands for trade against the mean, if the price deviates
from mean, it will return back to it later. Buy&Hold strategy is simple – we buy assents
proportionally into the portfolio and hold them immutably.
For ML based strategies we used linear regression. For DL models: Multi Layer Perceptron
(MLP) as a baseline model, Long-Short Term Memory recurrent models (LSTM). For all models
we performed singular and ensemble forecasting. In singular setting each model predicts a vector
of the length of included assets tickers, in the ensemble setting for we have a prediction for each
asset and then aggregate the models output into a single vector as a final result.
To create training samples we used data of all three frequencies. Firstly, we transformed
close prices into returns via equation (2), having daily return as a target value. Secondly, with
a sliding window with a size of 20 days we subsampled data points, first 14 days contain daily
returns, next 3 days contain hourly returns and the last 3 days contain 15-minutes returns. We
assumed that the closer to the execution date, the more frequent data points must be. Because
the model could capture local short-term trends from more frequent data and long-term trends
from less frequent earlier data points. Because price change orders in daily, hourly, and 15-
minute candles might differ dramatically, we normalize aggregated returns vectors via min-max
scaling to have data points within an interval [0, 1].
To compare different alphas execution results we use the following metrics – Sharpe ratio,
PnL, Maximum draw down, and Turnover:
√ E(pnl)
Sharpe ratio = N , (3)
σ(pnl)
where E(x), σ(x) – expected value and standard deviation of random variable x respectively,
pnl = (α1 r1 , α2 r2 , . . . , αN rN ) is profit and loss vector, α = (α1 , . . . , αM ), r = (r1 , . . . , rM ) is a
vector of predicted and historical returns respectively, N – a forecasting horizon length, M –
the number of stocks.
N
X
PnL = αr = pnli , (4)
i=1

Maximum draw down = min(cumsum(pnl) − cummax(pnl)), (5)

N
X
Turnover = | αi (d) − αi (d − 1) |, (6)
i=1
where cummax () and cumsum() are cumulative maximum and cumulative sum of a given profit
and loss vector correspondingly.
Sharpe ratio represents how constantly a give alpha earns, the greater the Sharpe ratio
value, the more smooth a cumulative PnL curve is and the more constantly the strategy earns.
High Sharpe ratio values do not mean huge profit, but mean less states, when the alpha losses
money. Maximum draw down indicates the maximum money loss of the strategy, while PnL
shows the final profit compared to initial bank account. These metrics are correlated. For
example, having large draw down values affects lower Sharpe ratio.
3.2 Custom Loss Functions
For the regression task we basically use mean squared error loss function (MSELoss) which
is a common default choice. But we are conscious the standard ML losses do not exactly match

4
financial time series forecasting. To address the issue we propose custom losses that are strongly
associated with trading features and results. There are Sharpe ratio (SharpeLoss), maximum
draw down (MDDLoss), and Profit-and-Loss (PnLLoss) losses.
We define a Sharpe√ ratio as it is stated in Eq. (3) and propose the following modifications:
firstly, we removed N factor, that does not affect optimization, but reflects a batch size;
secondly, we add an extra factor to the loss that penalizes for the deviation from the ground
truth value:
E(pnl)
SharpeLoss = . (7)
σ(pnl) + ϵ
We also implemented PnL (PnLLoss) (8), Risk Adjusted Loss (RiskAdjLoss) and maximum
draw down (MDDLoss) (10) losses, and used pytorch mean squared error (MSELoss) loss as a
baseline. PnLLoss occurs with a negative sign because the task is to maximize profit value, and
MDDLoss with a positive sign to minimize.

P nLLoss = −αr, (8)

RiskAdjLoss = −E(pnl) + λ × DrawDown + γ × (α − r)2 , (9)


where λ and γ are draw down and position regularization factors respectively.

M DDLoss = − min(cumsum(pnl) − cummax(pnl)). (10)


Let us take a closer look of the SharpeLoss and what values might be generated by the model.
In the current task we want the model to predict positions based on a sequence of historical
returns. Returns are usually in the order of |10−2 |, having αi ≈ 10−2 we get pnli ≈ 10−4 , so
having E(pnl) ≈ 10−4 , σ(pnl) ≈ 10−4 , and SharpeLoss ≈ 1. Otherwise having αi ≈ 102 we
will get the same order of SharpeLoss ≈ 1. This simple calculation reveals that the SharpeLoss
lacks of sense of the order of predicting positions. We reconstruct a loss function that better
fits the task conditions and call it modified Sharpe loss – ModSharpeLoss. If we assume loss
function equal to
E(pnl)
M odSharpeLoss = − ln (α − r)2 . (11)
σ(pnl) + ϵ
we will get the following: if the mean squared error (MSE) factor penalizes the model when the
predicted position values declines from the returns too much, and E(pnl) factor contains data
that will penalize the model predicting wrong trend and low positions. To avoid division by
zero we add a small constant ϵ. Similar to the case of the SharpeLoss the goal is to maximize
the ModSharpeLoss value, so we include the loss value with a negative sign to backpropogate
gradients.
The plot in Figure 1 shows the loss values in correspondence of generated positions magni-
tude in logarithmic scale. The straightforward SharpeLoss keeps the loss value power no matter
what the positions values are while the ModSharpeLoss has a linear dependence. It proves our
concerns about the SharpeLoss sensitivity to deviations and highlights the advantage of the
proposed modification.
We also address the issue of portfolio low turnover introducing custom regularization penalty.
Sometimes models tend to generate constant positions values along trading period, that affect
on close to zero turnover, which is equal to “lazy“ strategy of buying and holding assets. To
penalize a model for static positions we add a turover regularization:

TvrReg = λ · (max(1, tvr − tb) + max(1, bb − tvr)), (12)


where tvr is a turnover calculated via equation (6), tb and bb are the top and bottom boundaries
accordingly, and λ is a regularization strength factor. All tb, bb, and λ are hyperparameters.

5
109
SharpLoss
ModSharpLoss
107
Log(Loss value)

105

103

101

3 2 1 0 1 2 3
Log(position value)

Figure 1: The dependence of the SharpeLoss and ModSharpeLoss values on the magnitude of
the generated positions.

TvrReg is added to main loss to be accounted in optimization. To examine the turnover


regularization influence we used it in pair with MSELoss, SharpeLoss and ModSharpeLoss loss
functions to train LSTM models. For the experiments we used λ = 1.0, tb = 1.0 and bb = 0.3
for turnover regularization, λ = 0.3 and γ = 0.01 for RiskAdjLoss.
4 Experiment
To scrutinize alphas designing with DL models we used historical market data from Bi-
nance. Firstly, we created three classical alphas based on heuristics: reversion, momentum and
mean reversion. Then we used LinReg to generate positions as classical ML approach baseline.
For DL alphas we used MLP and LSTM as a base models, MSELoss as a baseline optimiza-
tion function and custom losses PnLLoss, MDDLoss and logarithmic MDDLoss, SharpeLoss,
RiskAdjLoss, ModSharpeLoss with the turnover regularization. Secondly, we examined DL
portfolio optimization techniques.
To run the experiment we created 20 low correlated alphas using financially multimodal data:
candlestick, orders and trades statistic, order book data. The aim is to study the applicability of
DL models for portfolio optimization. A simple technique with equally weighted alphas served
as a baseline. We trained LSTM to generated weights with which alphas must be obtained.
We determine two approaches: single and point-wise weights generation. For so-called single
weighted case having L alphas, model generates L weights for each time step, so the specific
weight will be broadcasted over the corresponding alpha positions. For point-wise weighted case
having L alphas and M assets, model generates L × M weights matrix for each time step, so
we get a sum of element-wise product of alpha positions and corresponding weights.
We used a basic LSTM realization provided by the PyTorch framework, adding linear layer
to adjust prediction vector to a target value size. We used Nvidia V100 GPU to run experiments.
5 Results and Discussion
5.1 Loss Functions
We used LSTMs as a base DL model to generate algorithmic trading strategies (alphas).
First we examined different loss functions along with custom turnover regularization (12) to
loss functions.
If a loss function contains the turnover regularization, the model name occurs with the
TvrReg suffix. Figure 2 and Figure 3 show the graph of alphas cumulative profit and loss within

6
total and test time intervals correspondingly. Table 2 and Table 3 contain evaluation metrics
for total and test time intervals correspondingly. LSTM models with custom loss function
outperform classical alphas and ones constructed with linear regression, ahead of conservative
strategy “Buy&Hold“. Logarithmic MDDLoss (LogMDDLoss) turned to be the most robust
optimization function outperforming classical Momentum and Mean Reversion alphas on test
interval. The model trained with LogMDDLoss has the lowest maximum draw down value on
inference, and one of the highest profit value and Sharpe ratio. The turnover regularization
helped to boost model performance dramatically and control the alpha turnover in a predefined
interval. The designed alphas are low-correlated 4 and might be used in portfolio optimization.
Alphas performance

2.0 Alphas
Buy&Hold
Reversion
Momentum
Mean Reversion
LinReg
1.5 MLP MSELoss
MLP Sharpe Loss
MLP ModSharpeLoss
LSTM MSELoss
P&L

LSTM SharpeLoss
LSTM PnLLoss
1.0 LSTM ModSharpe Loss
LSTM MDDLoss
LSTM LogMDDLoss
LSTM Risk Adjusted
LSTM ModSharpe Loss TvrReg
LSTM MSELoss TvrReg
0.5 LSTM SharpeLoss TvrReg
Test

2022-01 2022-07 2023-01 2023-07 2024-01 2024-07 2025-01 2025-07


date

Figure 2: Alphas performance results. Red dotted line indicates the start of test time interval
– April 25, 2024.

Alpha Turnover Max Drawdown Profit, % Sharpe


LSTM SharpeLoss 0.24 -0.101082 94.86 2.975480
LSTM ModSharpe Loss 0.20 -0.101960 114.66 2.955721
LSTM ModSharpeLoss TvrReg 0.59 -0.0.108168 126.31 2.914830
LSTM PnLLoss 0.16 -0.216381 96.58 2.250893
MLP ModSharpe Loss 0.19 -0.190578 97.21 2.099271
LSTM LogMDDLoss 0.65 -0.055888 67.26 1.919644
LSTM Risk Adjusted 0.75 -0.175058 70.14 1.775109
LSTM MSELoss 0.22 -0.200447 71.52 1.635229
LSTM SharpeLoss TvrReg 0.53 -0.070589 39.70 1.324113
LSTM MDDLoss 0.63 -0.096374 48.01 1.289905
MLP Sharpe Loss 0.14 -0.224630 84.61 1.245231
LSTM MSELoss TvrReg 0.08 -0.101111 30.33 1.018838
MLP MSELoss 0.06 -0.121723 34.19 0.891930
Reversion 0.12 -0.341410 59.30 0.749263
Mean Reversion 0.17 -0.444715 37.32 0.458115
LinReg 0.57 -0.338092 34.44 0.325222
Buy&Hold 0.03 -1.105845 33.19 0.126119
Momentum 0.19 -0.618691 -41.24 -0.513156

Table 2: Alphas performance sorted by Sharpe ratio over total historical interval

7
Alphas performance

1.8 Alphas
Buy&Hold
Reversion
Momentum
1.6 Mean Reversion
LinReg
MLP MSELoss
MLP Sharpe Loss
1.4 MLP ModSharpeLoss
LSTM MSELoss
P&L

LSTM SharpeLoss
LSTM PnLLoss
LSTM ModSharpe Loss
1.2 LSTM MDDLoss
LSTM LogMDDLoss
LSTM Risk Adjusted
LSTM ModSharpe Loss TvrReg
1.0 LSTM MSELoss TvrReg
LSTM SharpeLoss TvrReg

0.8
2024-07 2024-09 2024-11 2025-01 2025-03 2025-05 2025-07
date

Figure 3: Alphas performance results on test time interval.

Alpha Turnover Max Drawdown Profit, % Sharpe


LSTM LogMDDLoss 0.64 -0.054650 21.02 2.038615
LSTM ModSharpe Loss 0.20 -0.057322 22.29 1.985342
LinReg 0.60 -0.068624 25.86 1.920134
LSTM ModSharpeLoss TvrReg 0.57 -0.108168 21.68 1.666563
LSTM MSELoss TvrReg 0.09 -0.043611 15.00 1.622607
Reversion 0.12 -0.108269 24.45 1.241843
LSTM Sharpe Loss 0.24 -0.101082 12.73 1.201756
Mean Reversion 0.18 -0.090327 23.12 1.064067
MLP Sharpe Loss 0.13 -0.224630 21.80 0.881357
LSTM MDDLoss 0.62 -0.067982 9.13 0.796684
LSTM Risk Adjusted 0.72 -0.175058 8.48 0.666076
LSTM SharpeLoss TvrReg 0.54 -0.047443 5.55 0.640856
MLP ModSharpe Loss 0.19 -0.190578 5.43 0.395615
Buy&Hold 0.04 -0.922599 24.90 0.307227
LSTM MSELoss 0.22 -0.200447 -6.03 -0.456444
MLP MSELoss 0.06 -0.121723 -5.81 -0.613343
Momentum 0.19 -0.228660 14.00 -0.622315
LSTM PnLLoss 0.16 -0.216381 -10.76 -0.748599

Table 3: Alphas performance sorted by Sharpe ratio over test interval

5.2 Portfolio Optimization


To address the issue of portfolio optimization techniques, we first built 20 low correlated
(Figure 6) alphas using financially multimodal data. The Figure 5 contains the designed alphas
performance graphs, and Table 4 represents evaluation metrics.
Figure 7 and Table 5 represent the whole historical time interval of performance graphs and
metrics for the collected portfolios correspondingly.
The ModSharpe loss functions was the most effective and robust optimization function

8
Correlation Heatmap
Correlation Heatmap Reversion 1.0
Reversion 1.0
Momentum
Momentum Mean Reversion
0.8 0.8
Mean Reversion LinReg Overall
LinReg Overall
0.6 Random Forest
Random Forest 0.6
XGBoost
XGBoost
LSTM MSELoss Overall
LSTM MSELoss Overall 0.4
LSTM SharpBasic Overall
LSTM SharpBasic Overall 0.4
LSTM SharpLoss Overall LSTM SharpLoss Overall
0.2 LSTM_LFT SharpLoss Overall
LSTM_LFT SharpLoss Overall 0.2
LSTM PnLLoss Overall LSTM PnLLoss Overall
0.0 LSTM ModSharpLoss Overall
LSTM ModSharpLoss Overall
LSTM MDDLoss Overall LSTM MDDLoss Overall 0.0
0.2 LSTM ModSharpLoss TvrReg Overall
LSTM ModSharpLoss TvrReg Overall
LSTM MSELoss TvrReg Overall LSTM MSELoss TvrReg Overall
0.2
LSTM SharpLoss TvrReg Overall 0.4 LSTM SharpLoss TvrReg Overall
LSTM_LFT SharpLoss TvrReg Overall LSTM_LFT SharpLoss TvrReg Overall
LSTM ModSharpTvrRegLoss Ensemble 0.6 LSTM ModSharpTvrRegLoss Ensemble 0.4
LSTM MSETvrRegLoss Ensemble LSTM MSETvrRegLoss Ensemble

Reversion

Random Forest
Mean Reversion
LinReg Overall
Momentum

XGBoost

LSTM SharpBasic Overall

LSTM ModSharpLoss TvrReg Overall


LSTM MSELoss TvrReg Overall
LSTM SharpLoss TvrReg Overall

LSTM ModSharpTvrRegLoss Ensemble


LSTM MSETvrRegLoss Ensemble
LSTM MSELoss Overall

LSTM SharpLoss Overall

LSTM ModSharpLoss Overall


LSTM MDDLoss Overall

LSTM_LFT SharpLoss TvrReg Overall


LSTM_LFT SharpLoss Overall
LSTM PnLLoss Overall
Reversion

Mean Reversion
LinReg Overall
Random Forest
Momentum

XGBoost

LSTM SharpBasic Overall

LSTM ModSharpLoss TvrReg Overall


LSTM MSELoss TvrReg Overall
LSTM SharpLoss TvrReg Overall

LSTM ModSharpTvrRegLoss Ensemble


LSTM MSETvrRegLoss Ensemble
LSTM MSELoss Overall

LSTM SharpLoss Overall

LSTM ModSharpLoss Overall


LSTM MDDLoss Overall

LSTM_LFT SharpLoss TvrReg Overall


LSTM_LFT SharpLoss Overall
LSTM PnLLoss Overall

(a) Total historical time interval (b) Test time interval

Figure 4: Correlation heatmap for 20 alphas: (a) Total historical time interval, (b) Test time
interval

Alphas performance
3.00

2.75 Alphas
vol_imbalance
reversion
vwap / close
2.50 cancel_val / trade_val
vwaps_ratio
reverse val_b / imbalance
2.25 spread bbo over vwap / close ratio
lob 10lvl vs bbo spread ratio
high_low vwap diff
2.00 OB_imbalance_vol
P&L

put vs cancel ratio


trade_disb
1.75 LOB val_b val_s ratio
order_put_imbalance
cancel_vs_put
1.50 high_low_time
tr_val_bs_ratio
op_val_bs_ratio
1.25 tpr_vwap_ts
vwap_1mio_or_ratio

1.00
2022-07 2022-10 2023-01 2023-04 2023-07 2023-10 2024-01 2024-04 2024-07
date

Figure 5: Alphas performance.

outperforming other portfolios by the Sharpe and Profit metrics.


6 Conclusion and Further Work
We showed that proposed loss functions ModSharpLoss, SharpLoss, MDDLoss, and PnLLoss
outperform classical optimization functions in a case of generating alpha positions. It is essential
that turnover regularization not only natively maintains turnover in a predefined bounds but also
improves predicting quality. It is an open research issue to examine influence of regularization
parameters on alphas performance. The question of proposed loss functions convergence remains
unaddressed. There is a freedom of testing other DL models for alpha generation and portfolio
optimization such as xLSTM and Transformers. The further step is to join proposed loss
functions with LOB data processing into a united strategy. Knowing the limit order book state
and forecasting its evolution dynamic might help to build a more robust execution strategies.
We also see a good potential of the introduced loss function in building reward policies

9
Correlation Heatmap
vol_imbalance 1.0
reversion
vwap / close 0.8
cancel_val / trade_val
vwaps_ratio 0.6
reverse val_b / imbalance
spread bbo over vwap / close ratio
lob 10lvl vs bbo spread ratio 0.4
high_low vwap diff
OB_imbalance_vol 0.2
put vs cancel ratio
trade_disb
LOB val_b val_s ratio 0.0
order_put_imbalance
cancel_vs_put 0.2
high_low_time
tr_val_bs_ratio
op_val_bs_ratio 0.4
tpr_vwap_ts
vwap_1mio_or_ratio 0.6
vol_imbalance
reversion

trade_disb
cancel_val / trade_val
vwaps_ratio

LOB val_b val_s ratio


reverse val_b / imbalance

lob 10lvl vs bbo spread ratio


high_low vwap diff
OB_imbalance_vol

cancel_vs_put

tr_val_bs_ratio
op_val_bs_ratio
tpr_vwap_ts
vwap / close

spread bbo over vwap / close ratio

put vs cancel ratio

high_low_time

vwap_1mio_or_ratio
order_put_imbalance

Figure 6: Alphas correlation heatmap.

Alphas performance
1.14

1.12

1.10
Alphas
1.08 Equal Weighted
PnLLoss
MSETvrReg
Sharpe
P&L

1.06 ModSharpe
MaxDrawDown
LogMaxDrawDown
Risk Adjusted
1.04

1.02

1.00

2024-02 2024-03 2024-04 2024-05 2024-06 2024-07


date

Figure 7: Portfolio performance results over historical time interval.

for reinforcement learning, that will have financially grounded policy and intuitive for traders.
It will be beneficial to incorporate such policies into language agents to argue their decision
making, offering them a powerful tool to evaluate trading experience.

10
Alpha Turnover Max Drawdown Profit, % Sharpe
OB imbalance vol 2.199817 -0.022153 0.968738 8.889217
tr val bs ratio 1.304981 -0.020539 0.744540 7.986993
op val bs ratio 2.424682 -0.027409 0.739854 6.439757
LOB val b val s ratio 0.470399 -0.059132 0.814909 5.607485
vwap / close 5.056355 -0.114793 1.539917 5.094586
order put imbalance 5.587706 -0.036191 0.711062 4.872396
reversion 4.287674 -0.201390 1.985513 4.752785
reverse val b / imbalance 1.034873 -0.083021 0.623762 4.177369
high low time 5.270254 -0.035321 0.503467 4.012199
tpr vwap ts 2.946431 -0.028602 0.433965 3.988069
put vs cancel ratio 0.489127 -0.055238 0.441970 3.560326
cancel vs put 1.462331 -0.063046 0.650970 3.338945
lob 10lvl vs bbo spread ratio 3.860393 -0.066188 0.462961 3.072702
spread bbo over vwap / close ratio 2.895454 -0.086034 0.704435 3.066959
trade disb 4.984747 -0.040891 0.303026 3.059141
vwaps ratio 5.746206 -0.066124 0.370534 2.659815
vwap 1mio or ratio 2.443877 -0.081554 0.534091 2.532313
cancel val / trade val 0.852424 -0.043650 0.288598 2.149364
high low vwap diff 3.124243 -0.064560 0.320390 2.103093
vol imbalance 5.461548 -0.077652 0.296717 2.048025

Table 4: Aggregation method average performance. Sorted by Sharpe ratio.

Alpha Turnover Max Drawdown Profit, % Sharpe


ModSharpe 0.637673 -0.020634 13.0466 7.402504
LogMaxDrawDown 0.664499 -0.028539 10.9697 5.591971
PnLLoss 0.736992 -0.052282 11.5989 5.270845
Risk Adjusted 0.589193 -0.029046 12.0306 5.175631
MaxDrawDown 0.321364 -0.032647 3.9891 4.419619
MSETvrReg 0.743339 -0.036836 9.9495 3.835984
Sharpe Loss 0.743339 -0.036836 0.099495 3.835984
Equal Weighted 0.129249 -0.079705 0.078826 2.594670

Table 5: Portfolio management performance. Sorted by Sharp ratio

Acknowledgments The authors thank.....


Funding This work was supported by the grant of the state program of the “Sirius“ Federal
Territory “Scientific and technological development of the “Sirius“ Federal Territory“ (Agree-
ment No. 18-03 data 10.09.2024).
Data availability The used dataset is published and accessible on Kaggle platform: https:
//www.kaggle.com/datasets/kkhubiev/cryptotrading, and the loss function implementa-
tion in the jupyter-notebook https://www.kaggle.com/code/kkhubiev/finance-grounded-loss-functions
edit
Ethical Conduct Not applicable.
Conflicts of interest The authors declare that there is no conflict of interest.
References
[1] Jason Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Mod-
els“,

11
[2] Tula Masterman and Sandi Besen and Mason Sawtell and Alex Chao, “The Landscape of
Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey“,

[3] DeepSeek-AI and Daya Guo et. al., “DeepSeek-R1: Incentivizing Reasoning Capability in
LLMs via Reinforcement Learning“,

[4] , Yang and Hongyang, Liu and Xiao-Yang, Zhong and Shan, Walid and Anwar, “Deep
reinforcement learning for automated stock trading: an ensemble strategy“,

[5] Jie Zhu and Qian Chen, Huaixia Dou and Junhui Li, Lifan Guo and Feng Chen, and Chi
Zhang, “DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language
Models“,

[6] B. Zhang, H. Yang and X.-Y. Liu, “Instruct-FinGPT: Financial Sentiment Analysis by In-
struction Tuning of General-Purpose Large Language Models,“ FinLLM at IJCAi (2023)
URL: https://ssrn.com/abstract=4489831 or http://dx.doi.org/10.2139/ssrn.4489831

[7] X. Yu et al., “Harnessing LLMs for Temporal Data - A Study on Explainable Fi-
nancial Time Series Forecasting,“ Proceedings of the 2023 Conference on Empirical
Methods in Natural Language Processing: Industry Track, 1 739–753, (2023) URL:
https://aclanthology.org/2023.emnlp-industry.69/

[8] A. Lopez-Lira and Y. Tang, “Can ChatGPT Forecast Stock Price Movements? Return
Predictability and Large Language Models,“ SSRN (2023).

[9] K.U. Khubiyev, M.E. Semenov, “Multimodal Stock Price Prediction: A Case Study of the
Russian Securities Market,“ Program Systems: Theory and Applications 16, No.1, 83–130,
(2025), URL: https://psta.psiras.ru/2025/1 83-130.

[10] Z. Kakushadze, “101 Formulaic Alphas,“ Wilmott Magazine, 84, 2006, 72–80 (2016) URL:
http://dx.doi.org/10.2139/ssrn.2701346

12

You might also like