Finance-Grounded Optimization For Algorithmic Trading
Finance-Grounded Optimization For Algorithmic Trading
Kasymkhan Khubiyev∗
Sirius University of Science and Technology, Sirius, Russia
[email protected]
Mikhail Semenov
Sirius University of Science and Technology, Sirius, Russia
[email protected]
Irina Vyacheslavovna Podlipnova
arXiv:2509.04541v1 [cs.LG] 4 Sep 2025
September 8, 2025
Abstract
Deep Learning is evolving fast and integrates into various domains. Finance is a challeng-
ing field for deep learning, especially in the case of interpretable artificial intelligence (AI).
Although classical approaches perform very well with natural language processing, computer
vision, and forecasting, they are not perfect for the financial world, in which specialists use
different metrics to evaluate model performance.
We first introduce financially grounded loss functions derived from key quantitative fi-
nance metrics, including the Sharpe ratio, Profit-and-Loss (PnL), and Maximum Draw down.
Additionally, we propose turnover regularization, a method that inherently constrains the
turnover of generated positions within predefined limits.
Our findings demonstrate that the proposed loss functions, in conjunction with turnover
regularization, outperform the traditional mean squared error loss for return prediction
tasks when evaluated using algorithmic trading metrics. The study shows that financially
grounded metrics enhance predictive performance in trading strategies and portfolio opti-
mization.
1 Introduction
Deep learning (DL) is evolving fast and integrates into various domains, affecting complex
problems and the routine of daily life. Finance is an ongoing challenge for deep learning due
to domain specificity. There are many tasks to challenge with deep learning, and algorithmic
trading is one of those challenges. The main goal of algorithmic trading is to discover new
signals from various data flows to build strategies that increase profits.
Large language models (LLMs) succeeded in different tasks, including solving mathematical
problems. Chain-of-Thoughts (CoT) [1] and reasoning in a couple with agentic architecture [2]
help achieve the best results. There are many attempts to fit LLMs for stock price predic-
tion [6, 7, 8], starting with zero-shot and few-shot learning and ending with LLM supervised
finetuning (SFT). For example, Zhang et al. [6] proposed a FinGPT - a GPT-like model, fine-
tuned on financial data, outperforms general-purpose LLMs in tasks where domain-specific
0
Preprint for the ICOMP 2025: International Conference on Computational Optimization
1
numerical understanding is crucial. Yu et al. [7] proposed a framework to build explainable
forecasting multimodal models based on LLMs. They highlighted that LLMs struggle with
numerical data and addressed a discrete bins embedding technique to the issue. Lopez and
Lira [8] used ChatGPT to forecast price movements and built a simple trading strategy upon
the forecast to show the LLM capabilities. In our previous study [9] we focused on multimodal
approaches for stock price prediction. We used LLM as embedding to vectorize news flow and
concatenate it with time-series. We showed that news flow directly embedded into candlestick
time series improves forecasting performance reducing average prediction error and in most
tasks outperforms backbone model – long-short term memory recurrent model (LSTM). An-
other approach is reinforcement learning (RL) that is widely used in robotics and brilliantly
demonstrated its power with a new highly efficient LLM Deep-Seek R1 [3]. The key point to
achieve efficient training with RL is robust and effective reward model.
In all researches mentioned above, the authors used standard for DL optimization tools and
frameworks for the regression and classification tasks. Although the standard methods showed
their robustness in various scenarios, financial evaluation and quality metrics differ from the
ones used in classical problems. For example, Mean Squared Error (MSE) – the top-choice loss
function for regression task. But from financial perspective applying the MSE is not informative,
because finance experts base on other metrics for evaluation and decision making. That is
why implementing finance-grounded metrics might benefit forecasting performance and improve
models decision-making interpretation being a step forward to trustworthy AI in finance. For
example, the authors [4] applying RL to design algorithmic trading strategies, used Profit-
and-Loss (PnL) metric as a key component of a reward policy and Sharpe ratio to select top-
performing strategies on a historical interval. The authors [5] of the DianJin-R1 model, a LLM
specialized for finance, focused on how the model argues and interprets its responses. To train
the model to reason and response with (CoT) the authors used the following datasets: the
CFLUE dataset contains 38 finance exam questions in Chinese, the FinQA dataset contains
8 thousand financial report questions with numeric answer in English and the CCC (Chinese
Compliance Check) to ensure the model safety. They usedGPT-4o to filter questions by difficulty
and ambiguity and to compare the model output with ground truth value to compute the reward
value for RL. The final language model outperforms multiagentic systems, which tend to spend
more tokens to solve the same problem. The model scrutinizes and analyzes market data
and events and assesses them, having some situational knowledge. Although the model has
strong financial and economical knowledge, it was not trained for algorithmic trading and to
use quantitative tools.
The current paper aims to propose loss functions based finance fundamentals for algorithmic
trading strategies and portfolio management. The proposed functions might be used straight-
forwardly to generate positions as we show in the paper and as a part of reward policies.
The key contributions of this paper are as follows.
1. We use finance-grounded loss functions like SharpeLoss, MaxDrawDownLoss, PnLLoss
that are fundamentally more suitable for financial time-series forecasting.
The paper is organized as follows. In Section 2, we briefly describe the original dataset
and conduct an exploration of it. In Section 3, we present a research methodology that covers
evaluation metrics and custom loss functions. In Section 4, we describe performed experiments
set up and pipelines, introduce models architectures and algorithmic trading strategies. In
Section 5, we describe the results of a computational experiment. Finally, Section 6 concludes
the paper and discusses the future work.
2
2 Data
In this section we discuss the data and its sources needed to perform experiments, describing
the data key features and arguing the sources choice. Because of the data quality requirements
and experiments specificity we used time series and statistics data from Binance - one of the most
popular Centralized Exchange (CEX). Binance provides API to parse high-frequency market
data.
To perform an experiment we first chose back test time interval: from the 1st of January
2022 until the 1st of July 2025. To select coins we first looked through which coins were listed
not later than the year of 2021 and are not de-listed at the moment of the experiment at the year
of 2025. We skipped data of the year 2021 because the market had a dramatic price change 1
from 2021 to 2022 having median change equal to 432.42%. There were 61 coins that satisfied
the requirements. We parsed market data with data points frequency of three types: daily,
hourly, and fifteen minutes. The API provides the following data: close, open, high, and low
prices, base and quote asset volumes, taker buy base and quote asset volume, and number of
trades.
3 Methodology
We examined different financial data modalities in perspective of algorithmic trading strate-
gies (alphas) [10]. For candlestick data we build alphas using heuristics, machine learning and
deep learning models, comparing the influence of custom finance-grounded loss functions on
alphas execution results.
3.1 Evaluation Metrics
Although the collected dataset contains 15-minute frequency data points, for algorithmic
trading strategies (alphas) we use a conservative strategy - we rebalance the porfolio once a
day. The choice of trading frequency is not occasional. In the current research we focus on
middle-frequency trading strategies with a rigid constraint – any portfolio rebalance orders will
be executed within a time interval between two time points determined by the data points
frequency.
We use classical trading strategies built only on heuristics as baseline strategies. For
example, we use reversion, momentum, mean reversion [10], and conservative buy and hold
(Buy&Hold) alphas:
p(d)
r(d) = − 1. (2)
p(d − 1)
3
Returns is a scaling transformation, that makes it possible to compare assets in relative terms.
Let us a batch of stock tickers observed in a specific alpha to be an universe. Each alpha
consists of vectors of positions of the length equal to the number of assets included into a current
universe. Behind the reversion alpha stands the idea that if the current trend is growth, later
it will decrease. Momentum stands for the idea, that if the market is in the growth stage it
will grow for a while. Mean Reversion stands for trade against the mean, if the price deviates
from mean, it will return back to it later. Buy&Hold strategy is simple – we buy assents
proportionally into the portfolio and hold them immutably.
For ML based strategies we used linear regression. For DL models: Multi Layer Perceptron
(MLP) as a baseline model, Long-Short Term Memory recurrent models (LSTM). For all models
we performed singular and ensemble forecasting. In singular setting each model predicts a vector
of the length of included assets tickers, in the ensemble setting for we have a prediction for each
asset and then aggregate the models output into a single vector as a final result.
To create training samples we used data of all three frequencies. Firstly, we transformed
close prices into returns via equation (2), having daily return as a target value. Secondly, with
a sliding window with a size of 20 days we subsampled data points, first 14 days contain daily
returns, next 3 days contain hourly returns and the last 3 days contain 15-minutes returns. We
assumed that the closer to the execution date, the more frequent data points must be. Because
the model could capture local short-term trends from more frequent data and long-term trends
from less frequent earlier data points. Because price change orders in daily, hourly, and 15-
minute candles might differ dramatically, we normalize aggregated returns vectors via min-max
scaling to have data points within an interval [0, 1].
To compare different alphas execution results we use the following metrics – Sharpe ratio,
PnL, Maximum draw down, and Turnover:
√ E(pnl)
Sharpe ratio = N , (3)
σ(pnl)
where E(x), σ(x) – expected value and standard deviation of random variable x respectively,
pnl = (α1 r1 , α2 r2 , . . . , αN rN ) is profit and loss vector, α = (α1 , . . . , αM ), r = (r1 , . . . , rM ) is a
vector of predicted and historical returns respectively, N – a forecasting horizon length, M –
the number of stocks.
N
X
PnL = αr = pnli , (4)
i=1
N
X
Turnover = | αi (d) − αi (d − 1) |, (6)
i=1
where cummax () and cumsum() are cumulative maximum and cumulative sum of a given profit
and loss vector correspondingly.
Sharpe ratio represents how constantly a give alpha earns, the greater the Sharpe ratio
value, the more smooth a cumulative PnL curve is and the more constantly the strategy earns.
High Sharpe ratio values do not mean huge profit, but mean less states, when the alpha losses
money. Maximum draw down indicates the maximum money loss of the strategy, while PnL
shows the final profit compared to initial bank account. These metrics are correlated. For
example, having large draw down values affects lower Sharpe ratio.
3.2 Custom Loss Functions
For the regression task we basically use mean squared error loss function (MSELoss) which
is a common default choice. But we are conscious the standard ML losses do not exactly match
4
financial time series forecasting. To address the issue we propose custom losses that are strongly
associated with trading features and results. There are Sharpe ratio (SharpeLoss), maximum
draw down (MDDLoss), and Profit-and-Loss (PnLLoss) losses.
We define a Sharpe√ ratio as it is stated in Eq. (3) and propose the following modifications:
firstly, we removed N factor, that does not affect optimization, but reflects a batch size;
secondly, we add an extra factor to the loss that penalizes for the deviation from the ground
truth value:
E(pnl)
SharpeLoss = . (7)
σ(pnl) + ϵ
We also implemented PnL (PnLLoss) (8), Risk Adjusted Loss (RiskAdjLoss) and maximum
draw down (MDDLoss) (10) losses, and used pytorch mean squared error (MSELoss) loss as a
baseline. PnLLoss occurs with a negative sign because the task is to maximize profit value, and
MDDLoss with a positive sign to minimize.
5
109
SharpLoss
ModSharpLoss
107
Log(Loss value)
105
103
101
3 2 1 0 1 2 3
Log(position value)
Figure 1: The dependence of the SharpeLoss and ModSharpeLoss values on the magnitude of
the generated positions.
6
total and test time intervals correspondingly. Table 2 and Table 3 contain evaluation metrics
for total and test time intervals correspondingly. LSTM models with custom loss function
outperform classical alphas and ones constructed with linear regression, ahead of conservative
strategy “Buy&Hold“. Logarithmic MDDLoss (LogMDDLoss) turned to be the most robust
optimization function outperforming classical Momentum and Mean Reversion alphas on test
interval. The model trained with LogMDDLoss has the lowest maximum draw down value on
inference, and one of the highest profit value and Sharpe ratio. The turnover regularization
helped to boost model performance dramatically and control the alpha turnover in a predefined
interval. The designed alphas are low-correlated 4 and might be used in portfolio optimization.
Alphas performance
2.0 Alphas
Buy&Hold
Reversion
Momentum
Mean Reversion
LinReg
1.5 MLP MSELoss
MLP Sharpe Loss
MLP ModSharpeLoss
LSTM MSELoss
P&L
LSTM SharpeLoss
LSTM PnLLoss
1.0 LSTM ModSharpe Loss
LSTM MDDLoss
LSTM LogMDDLoss
LSTM Risk Adjusted
LSTM ModSharpe Loss TvrReg
LSTM MSELoss TvrReg
0.5 LSTM SharpeLoss TvrReg
Test
Figure 2: Alphas performance results. Red dotted line indicates the start of test time interval
– April 25, 2024.
Table 2: Alphas performance sorted by Sharpe ratio over total historical interval
7
Alphas performance
1.8 Alphas
Buy&Hold
Reversion
Momentum
1.6 Mean Reversion
LinReg
MLP MSELoss
MLP Sharpe Loss
1.4 MLP ModSharpeLoss
LSTM MSELoss
P&L
LSTM SharpeLoss
LSTM PnLLoss
LSTM ModSharpe Loss
1.2 LSTM MDDLoss
LSTM LogMDDLoss
LSTM Risk Adjusted
LSTM ModSharpe Loss TvrReg
1.0 LSTM MSELoss TvrReg
LSTM SharpeLoss TvrReg
0.8
2024-07 2024-09 2024-11 2025-01 2025-03 2025-05 2025-07
date
8
Correlation Heatmap
Correlation Heatmap Reversion 1.0
Reversion 1.0
Momentum
Momentum Mean Reversion
0.8 0.8
Mean Reversion LinReg Overall
LinReg Overall
0.6 Random Forest
Random Forest 0.6
XGBoost
XGBoost
LSTM MSELoss Overall
LSTM MSELoss Overall 0.4
LSTM SharpBasic Overall
LSTM SharpBasic Overall 0.4
LSTM SharpLoss Overall LSTM SharpLoss Overall
0.2 LSTM_LFT SharpLoss Overall
LSTM_LFT SharpLoss Overall 0.2
LSTM PnLLoss Overall LSTM PnLLoss Overall
0.0 LSTM ModSharpLoss Overall
LSTM ModSharpLoss Overall
LSTM MDDLoss Overall LSTM MDDLoss Overall 0.0
0.2 LSTM ModSharpLoss TvrReg Overall
LSTM ModSharpLoss TvrReg Overall
LSTM MSELoss TvrReg Overall LSTM MSELoss TvrReg Overall
0.2
LSTM SharpLoss TvrReg Overall 0.4 LSTM SharpLoss TvrReg Overall
LSTM_LFT SharpLoss TvrReg Overall LSTM_LFT SharpLoss TvrReg Overall
LSTM ModSharpTvrRegLoss Ensemble 0.6 LSTM ModSharpTvrRegLoss Ensemble 0.4
LSTM MSETvrRegLoss Ensemble LSTM MSETvrRegLoss Ensemble
Reversion
Random Forest
Mean Reversion
LinReg Overall
Momentum
XGBoost
Mean Reversion
LinReg Overall
Random Forest
Momentum
XGBoost
Figure 4: Correlation heatmap for 20 alphas: (a) Total historical time interval, (b) Test time
interval
Alphas performance
3.00
2.75 Alphas
vol_imbalance
reversion
vwap / close
2.50 cancel_val / trade_val
vwaps_ratio
reverse val_b / imbalance
2.25 spread bbo over vwap / close ratio
lob 10lvl vs bbo spread ratio
high_low vwap diff
2.00 OB_imbalance_vol
P&L
1.00
2022-07 2022-10 2023-01 2023-04 2023-07 2023-10 2024-01 2024-04 2024-07
date
9
Correlation Heatmap
vol_imbalance 1.0
reversion
vwap / close 0.8
cancel_val / trade_val
vwaps_ratio 0.6
reverse val_b / imbalance
spread bbo over vwap / close ratio
lob 10lvl vs bbo spread ratio 0.4
high_low vwap diff
OB_imbalance_vol 0.2
put vs cancel ratio
trade_disb
LOB val_b val_s ratio 0.0
order_put_imbalance
cancel_vs_put 0.2
high_low_time
tr_val_bs_ratio
op_val_bs_ratio 0.4
tpr_vwap_ts
vwap_1mio_or_ratio 0.6
vol_imbalance
reversion
trade_disb
cancel_val / trade_val
vwaps_ratio
cancel_vs_put
tr_val_bs_ratio
op_val_bs_ratio
tpr_vwap_ts
vwap / close
high_low_time
vwap_1mio_or_ratio
order_put_imbalance
Alphas performance
1.14
1.12
1.10
Alphas
1.08 Equal Weighted
PnLLoss
MSETvrReg
Sharpe
P&L
1.06 ModSharpe
MaxDrawDown
LogMaxDrawDown
Risk Adjusted
1.04
1.02
1.00
for reinforcement learning, that will have financially grounded policy and intuitive for traders.
It will be beneficial to incorporate such policies into language agents to argue their decision
making, offering them a powerful tool to evaluate trading experience.
10
Alpha Turnover Max Drawdown Profit, % Sharpe
OB imbalance vol 2.199817 -0.022153 0.968738 8.889217
tr val bs ratio 1.304981 -0.020539 0.744540 7.986993
op val bs ratio 2.424682 -0.027409 0.739854 6.439757
LOB val b val s ratio 0.470399 -0.059132 0.814909 5.607485
vwap / close 5.056355 -0.114793 1.539917 5.094586
order put imbalance 5.587706 -0.036191 0.711062 4.872396
reversion 4.287674 -0.201390 1.985513 4.752785
reverse val b / imbalance 1.034873 -0.083021 0.623762 4.177369
high low time 5.270254 -0.035321 0.503467 4.012199
tpr vwap ts 2.946431 -0.028602 0.433965 3.988069
put vs cancel ratio 0.489127 -0.055238 0.441970 3.560326
cancel vs put 1.462331 -0.063046 0.650970 3.338945
lob 10lvl vs bbo spread ratio 3.860393 -0.066188 0.462961 3.072702
spread bbo over vwap / close ratio 2.895454 -0.086034 0.704435 3.066959
trade disb 4.984747 -0.040891 0.303026 3.059141
vwaps ratio 5.746206 -0.066124 0.370534 2.659815
vwap 1mio or ratio 2.443877 -0.081554 0.534091 2.532313
cancel val / trade val 0.852424 -0.043650 0.288598 2.149364
high low vwap diff 3.124243 -0.064560 0.320390 2.103093
vol imbalance 5.461548 -0.077652 0.296717 2.048025
11
[2] Tula Masterman and Sandi Besen and Mason Sawtell and Alex Chao, “The Landscape of
Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey“,
[3] DeepSeek-AI and Daya Guo et. al., “DeepSeek-R1: Incentivizing Reasoning Capability in
LLMs via Reinforcement Learning“,
[4] , Yang and Hongyang, Liu and Xiao-Yang, Zhong and Shan, Walid and Anwar, “Deep
reinforcement learning for automated stock trading: an ensemble strategy“,
[5] Jie Zhu and Qian Chen, Huaixia Dou and Junhui Li, Lifan Guo and Feng Chen, and Chi
Zhang, “DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language
Models“,
[6] B. Zhang, H. Yang and X.-Y. Liu, “Instruct-FinGPT: Financial Sentiment Analysis by In-
struction Tuning of General-Purpose Large Language Models,“ FinLLM at IJCAi (2023)
URL: https://ssrn.com/abstract=4489831 or http://dx.doi.org/10.2139/ssrn.4489831
[7] X. Yu et al., “Harnessing LLMs for Temporal Data - A Study on Explainable Fi-
nancial Time Series Forecasting,“ Proceedings of the 2023 Conference on Empirical
Methods in Natural Language Processing: Industry Track, 1 739–753, (2023) URL:
https://aclanthology.org/2023.emnlp-industry.69/
[8] A. Lopez-Lira and Y. Tang, “Can ChatGPT Forecast Stock Price Movements? Return
Predictability and Large Language Models,“ SSRN (2023).
[9] K.U. Khubiyev, M.E. Semenov, “Multimodal Stock Price Prediction: A Case Study of the
Russian Securities Market,“ Program Systems: Theory and Applications 16, No.1, 83–130,
(2025), URL: https://psta.psiras.ru/2025/1 83-130.
[10] Z. Kakushadze, “101 Formulaic Alphas,“ Wilmott Magazine, 84, 2006, 72–80 (2016) URL:
http://dx.doi.org/10.2139/ssrn.2701346
12