Comparative Analysis of Portfolio
Comparative Analysis of Portfolio
First cycle, 15 hp
Adrian Eriksson
Erik Peterson
The authors would also like to express their sincere appreciation to Dr. Jan Kronqvist,
Associate Professor at the Numerics, Optimization & Systems department at KTH Royal
Institute of Technology, for his constructive feedback, and continuous encouragement.
Their collective expertise, guidance, and encouragement have been indispensable in the
completion of this report. We are deeply grateful for their support and mentorship.
2 Theoretical background 6
2.1 Efficient frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Clique theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Mixed-integer linear programming . . . . . . . . . . . . . . . . . . . . . . . 7
3 Numerical analysis 8
3.1 Approximations and data collection . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Models to risk approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Approach 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Approach 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Results 11
4.1 Numerical validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Approach 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.1 Efficient frontier approach 1 . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Approach 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.1 Efficient frontier approach 2 . . . . . . . . . . . . . . . . . . . . . . 16
5 Discussion 17
5.1 Approach 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Approach 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Conclusion 21
I Appendix A 23
II Appendix B 24
The progress of portfolio optimization methods has been intricately linked with the
development of financial theory, computational approaches, and data analytics. The
concept of portfolio optimization is based on Modern Portfolio Theory (MPT), which was
developed 1952 by Harry Markowitz. MPT accentuates the importance of diversification
and balance between risk and return in investment portfolios. In short, Markowitz
demonstrated the core theorem of mean-variance portfolio theory, which asserts that
one either could maximize the expected return under constant risk or minimize the risk
for a given expected return [1]. An extension of Markowitz’s work, termed the Capital
Asset Pricing Model (CAPM), was produced by William Sharpe, John Lintner, and Jan
Mossin in the 1960s. CAPM provides a framework for determining the appropriate
expected return for an asset based on its systematic risk (beta). It is a cornerstone
in asset pricing theory and is used extensively in finance for tasks such as calculating
the cost of equity capital, estimating the required rate of return for investments, and
assessing the performance of investment portfolios [2]. Another frequently used method
in portfolio optimization is the Black-Litterman model, developed by Fischer Black and
Robert Litterman in the beginning of the 1990s. This model betters the evaluation of
expected returns and asset allocation decisions by combining the mean-variance framework
with investor views and market equilibrium[3]. Furthermore, recent advancements in
machine learning and artificial intelligence have led to the development of sophisticated
algorithms for portfolio optimization. Techniques such as genetic algorithms, neural
networks, and reinforcement learning are being applied to optimize portfolios in dynamic
and complex environments [4].
1.2 Outline
⋄ Chapter 2, "Theoretical Background" establishes the concepts essential to the analysis.
This chapter provides a comprehensive overview of the theoretical framework guiding
our research.
⋄ Chapter 3, "Numerical Analysis" delves into two distinct approaches for managing
portfolio risk and maximizing returns. We offer a detailed examination of these
methodologies and their practical implications.
⋄ Chapter 4, "Results" presents empirical data and comparisons between the two
approaches discussed in Chapter 3.
1.3 Parameters
The following parameter clarifications will elucidate the key parameters utilized in this
thesis.
Symbol Description
(K) Number of stocks in the portfolio
(L) Number of stocks in sector Sj
(η) Correlation between stocks
(ε) Minimum investment per stock
(ξ) Maximum investment per stock
(µ2 ) Variance & risk
Optimization model
This plot illustrates the concept of the efficient frontier for a set of stocks. The two
outlying blue dots denote the maximum return for a portfolio and the minimum return
for a portfolio. They are connected by the green curve, which represents the efficient
frontier. These outlying dots can denote the worst and the best theoretical portfolios,
but they can only be obtained if all the capital is invested in the best respective the
worst stock. The green dot to the left in the graph denotes the theoretical minimum-
risk portfolio. The green curve defines the feasible region, and all potential portfolio
combinations must be on or inside the green curve among the blue dots. All portfolios
along the green curve (the efficient frontier) represent the optimal portfolios. However,
it is then up to the investor to adjust the balance between risk and return according to
their preferences. The parameter (α) stated above is the weight factor balancing risk
and return. If α = 0 then the minimum risk portfolio will be obtained and if α = 1 the
outlying blue dot in the top will be selected.
Figure 2: Graph G
In figure 2 nodes are detonated as {A, B, ..., F } and constitutes the graph G. A set of
nodes Ω = {A, B, C} forms a complete subgraph (clique) of G, which all have a mutual
relationship. In the graph G, for instance, (C&D) and (D&E) have mutual relationship
and (F ) have no relationship with any other node in the graph.
In financial markets, cliques can indicate sectors, industries, or thematic groupings where
stocks tend to move in tandem due to common underlying factors or market dynamics.
Analyzing cliques can help investors identify diversification opportunities, assess systemic
risks, and tailor portfolio strategies to mitigate concentration risk.
By leveraging clique theory in correlation analysis between stocks, investors can gain
insights into the structure and dynamics of the market, leading to more informed investment
decisions and improved portfolio management practices [7].
The downloaded stock data was acquired by Yahoo Finance Python API, yfinance. The
adjusted closing price C, is considered because of corporate actions such as splits and
dividend distributions. If sudden stock price changes occur, due to changes within the
company, the correlation matrix and the return vector will be distorted, and the outcome
will be hard to analyze. The adjusted price compensates for this, and should consequently
give a more accurate measurement of a stocks value [9].
A filter was also constructed to ensure that the latest acquired adjusted closing price
would remain unchanged for a period of n days if the stock remained closed for n days.
This filter ensures that if there are missing values (NaN) in any column of the data matrix
containing the adjusted closing prices, those missing values are replaced with the values
from the corresponding column in the previous time step. This approach helps maintain
continuity in the data and ensures smoother processing.
The return, ri and the return vector r for all stocks over the time step t, can be calculated
using the following formulas:
Ct − Ct−1
ri = , (1)
Ct−1
r = [r1 , r2 , ..., rn ]T , (2)
where i ∈ {1, 2, ..., n} (n= stock amount) and t ∈ T.
Within a certain time period (T) (for example one year with m-days) the main parameters
can be calculated using python-functions from NumPy. The variable θi ∈ T contains the
periods amount of returns for a certain stock i such as the daily return for the time period
(T) gives:
θi = [r1 , r2 , ..., rm ]T .
θi is a vector containing the discrete daily time samples of return within the period (T)
for stock i.
3.2.1 Approach 1
The models used in this thesis presume that the accepted risks µ2 are not present. Thus,
an alternative approach to regard risks is used by diversifying the portfolio. The risk
constraint (see equation (4b)) is modeled by the following two constraints:
n
X
yi ≥ K,
X
i=1 (5)
yi ≥ L,
i∈Sj
where K denotes the minimum amount of stocks used in the portfolio and L denotes
the minimum amount of stocks per sector, Sj , utilized within the portfolio. This approach
of the risk is constructed from balanced representation of different sectors by requiring a
certain number of assets from each sector to be included in the portfolio. In this context,
the variable yi is binary together with the constraint that ensures that yi = 1 for all
ξ ≥ xi ≥ ε, (ε and ξ) is a fix minimum and maximum investment per stock. For all
xi = 0, yi = 0. The limitations outlined above will manifest in the model for approach 1
in the following manner:
max rT x
Xn
s.t. yi ≥ K,
i=1
X
yi ≥ L, (6)
i∈Sj
1T x = 1,
xi = 0 ⇒ yi = 0,
ε ≤ xi ≤ ξ ⇒ yi = 1.
max rT x
Xn
s.t. yi ≥ K,
i=1
Corri,j ≤ η, (9)
T
1 x = 1,
xi = 0 ⇒ yi = 0,
ε ≤ xi ≤ ξ ⇒ yi = 1.
For the purpose of this approach, a graph G = (Ω,E) was considered, where Ω is the set
of nodes and E is the set of edges between the nodes (see chapter 2.2). In the graph each
node represents a stock. For a pair of nodes (i, j) ∈ Ω the edge (i, j) exists if and only
if the correlation is lower than η (see equation (8)). The group of stocks which mutually
fulfills (8) will form a clique and represents the optimized portfolio.
An example for K = 4 stocks (see the green nodes, ) with correlation η ≤ 0:
This is a plot with (5 · 105 ) random weighted portfolios of 4 stocks and each portfolio is
plotted as a dot in the figure. The stocks are selected by Gurobi while optimizing for only
optimal return with no constraints (η, L) to make sure that the most optimal portfolio
is obtained in the banal case. The only active constrain is the (ε → ∞ 1
). In other words
the minimum investment constraint goes to zero. In the plot, the efficient frontier (set of
optimal portfolios) can be distinguished as the smooth left edge of the collection of points.
The Sharpe ratio quantifies how much excess return an investment or strategy generates
for each unit of risk taken. A higher Sharpe ratio indicates a better risk-adjusted return,
as it suggests that the investment is delivering more return for the amount of risk being
assumed. Conversely, a lower Sharpe ratio suggests that the investment is not generating
sufficient return relative to its risk. According to the plot, it does seem like the built
python-program is working as expected.
4.2 Approach 1
K
6 9 12 15 18 21 24 27 30
L
1 101.2 94.4 86.8 78.9 70.7 62.5 54.1 45.2 36.3
2 - - 86.4 78.8 70.7 62.5 54.1 45.2 36.3
3 - - - - 70.2 62.2 53.9 45.1 36.2
K
6 9 12 15 18 21 24 27 30
L
1 0.01932 0.01785 0.01612 0.01469 0.01365 0.01276 0.01182 0.01173 0.01226
2 - - 0.01633 0.01465 0.01359 0.01276 0.01182 0.01173 0.01226
3 - - - - 0.01394 0.01293 0.01200 0.01188 0.01226
In the preceding (Table 2) the risk is visualized for different K- and L-values. The step
increase is the same as for table 1. The strokes ("-") imply that the model is infeasible
for the corresponding combination of K- and L-values.
In the figure depicted above the return is plotted for varying K-values. The blue graph
shows the returns for varying K-values, ranging from 6 to 30 and with a fix L = 1.
Moreover, the red graph shows the returns for different K-values, in the spectrum from 12
to 30. Here L = 2. Lastly, the yellow graph shows the returns for K-values in the range
from 18 to 30 and in this case L = 3. The step size for K is 1 in all three cases. The
investment parameters was set to ε = 1/30 and ξ = 1.
Figure 6: L = 1
Figure 7: L = 2
Figure 8: L = 3
K
1 2 3 4 5 6 7 8 9 10
η
0.2 111.0 105.1 98.6 91.9 83.5 75.2 65.8 56.0 45.0 32.3
0.3 111.0 107.5 101.6 95.3 88.8 82.0 74.8 66.7 58.3 49.7
0.4 111.0 107.5 102.3 95.9 88.8 82.0 75.3 68.4 61.2 53.0
The table shown above displays the return for 10 different K-values and 3 distinct η-values.
The K-values are ranging from 1 to 10 and the η-values from 0.2 to 0.4.
K
1 2 3 4 5 6 7 8 9 10
η
0.2 0.0225 0.0206 0.0183 0.0169 0.0150 0.0134 0.0118 0.0104 0.0093 0.0087
0.3 0.0225 0.0211 0.0192 0.0176 0.0155 0.0144 0.0130 0.0119 0.0109 0.0105
0.4 0.0225 0.0211 0.0193 0.0177 0.0155 0.0144 0.0135 0.0133 0.0126 0.0122
In the preceding table the risk for 10 different K-values and 3 distinct η-values. The
K-values are ranging from 1 to 10 and the η-values from 0.2 to 0.4.
In the left graph above (Connection Lines) the returns for 3 different η-values and
same K-value are connected with a black dashed line. The graph to the right (Datasets)
shows the returns for 3 different η-values and 10 K-values. Both graphs are done for ten
different K-values, ranging from 1 to 10 with a step size of 1. To clarify, K = 1 is in the
top right corner and K = 10 is in the bottom left corner for both graphs. The minimum
investment was set to ε = 1/10 and the maximum investment was fixed to ξ = 1.
This is the same "dataset"-graph as shown in Figure (9), but it includes more η-values
to see how the risk converges. The minimum investment was set to ε = 1/10 and the
maximum was set to ξ = 1.
(a) Node graphs for η = 0.4 (b) Node graphs for η = 0.3 (c) Node graphs for η = 0.2
The figure above shows the node graphs for three different η-values, all with K = 10.
The green nodes, ( ) represent the stocks chosen from the optimization and the red
nodes, ( ) represents the stocks which were not chosen from the optimization. All three
subgraphs are complete and are thereby considered as cliques.
Figure 12: η = 1
5.1 Approach 1
As seen in Table (1) the highest return for approach 1 was obtained for the case of K =
6 and L = 1, in which the optimizer chooses to invest in the best stock from each sector.
This is a rather expected result since a lot of the investment budget can be placed in
the most lucrative stocks. For all L-cases the return is decreasing with greater K-values.
This is due to the fact that the greater the K-value is, the more of the investment budget
needs to be distributed over a larger amount of stocks. However, for the same K-value but
different L-values the return remains unchanged. This implies that the optimizer chooses
the same portfolio for the different L-cases when K is fixed.
As illustrated in Table (2) the risk is declining for greater K-values, which is according
to the principle of diversification. For varying L and a fixed K-value, the risk is fairly
unchanged. Furthermore, the risk is higher for L = 3 compared to L = 1 and L = 2
for 4 out of 5 K-values. This result contradicts the thesis of approach 1, which was that
the risk would be lowered by increasing the amount of stocks chosen per sector. In other
words, the model for approach 1 is not working for the selection of 60 stocks from OMX
Stockholm Large Cap. This outcome is ascertained and visualized in Figure (5), where the
graphs for different L-values coincide, entailing they give approximately the same return
for the same risk.
5.2 Approach 2
As shown in Table (3) the highest return for approach 2 is attained for K = 1, and is the
same for all η-values. The correlation constraint (see equation (7)) is inoperative when
considering K = 1 stocks for the optimization. Thus the result is reasonable because
all the capital allocation can be invested in the best stock. In all η-cases, the return
decreases as the K-value increases. This is, as in approach 2, an anticipated result since
the investment budget needs to be shared between a larger amount of stocks for greater
K-values. For fixed K-values, the return increases as the η-value rises. This result is owing
to the fact that a lower η-value corresponds to a more limited selection of stocks which
can be included in the portfolio, which in turn should result in a lower return.
Regarding the risks, visualized in Table (4), it is obtained that for all η-values the risk
is lowered for increasing K-values. In addition, the risk is decreasing as the correlation
value, η, is being lowered. This outcome harmonizes with the thesis of approach 2, which
stated that the risk would decrease with lowered η-values. Therefore, the model for
approach 2 is considered valid. The corresponding values from Table (3) and Table (4),
establishing points (x, y), are graphed in Figure (9) where the results mentioned above
are depicted.
5.3 Comparison
The parallel comparison is rather difficult to implement due to the different forms of
prerequisites the two methods are built upon. To make the parallel comparison, a
In the coming section of the comparison, η-values in the range from 0.5 to 0.9 are
being used in order to more easily draw conclusions about the tendencies for approach 2
compared to approach 1.
In Figure (15) the two approaches are shown in two columns, with respective return plot
underneath. The lower return plot for approach 1 shows that the relationship between K
Figure (16) is comparing the two approaches in the same graph. The η- and L-values are
varied for K-values in the range K ∈ {6,7,...,30}. The minimum investment was set to
ε = 1/30, thus the comparison is easier to do in that case. One drawback with that ε
is when K = 30 all stocks in the portfolio will have the same investment weights. Since
ε only is a scaling factor the same result with smaller difference will be present if we
decrease ε.
The η ∈ {0.5,0.6,0.7} are showing that they give a lower risk compared with η = 0.9 and
all L-values. This indicate that the η-approach seems to work . According to approach 2
lower η-values should give lower risk, but the graph shows that η-values {0.5,0.6} fluctuate
and here the 0.6-value gives lower risk than the 0.5-value. This fluctuation may occur
when the η-values are relatively high to be included in the epithet (low correlation), [-1
to 0.3] ([10]). This is confirmed in Figure (10) above, were the η-values are low and give
lower risk for lower η. Another explanation can be that in this case 50 % of all stocks
is considered (30 out of 60) which could affect the risk. The fluctuation region occurs
when K is extremely high (K ∈ {22,23,...,30}), which probably is not a practical portfolio
scenario in reality.
In the figure shown above, a comparison between the two models and the efficient frontier
is graphed. The two models tends to follow the same form as the efficient frontier but
they are not as effective in the return-to-risk trade off. This is due to the obvious fact
that the models uses constraints which limits the selection of stocks and the distribution
of investment shares for the chosen stocks.
[2] Fama, Eugene F and French, Kenneth R. (2004). "The Capital Asset Pricing Model:
Theory and Evidence".
[Link]
[4] Pinelis, Michael and Ruppert, David. (2021). "Machine learning portfolio allocation".
[Link]
[5] Markowitz, H.M. (1952). “Portfolio Selection”. Journal of Finance, Vol. 7 No. 1: 77-91.
[7] Birch, Jenna; Pantelous, Athanasios A. and Soramäki, Kimmo. (2016). "Analysis of
correlation based networks representing DAX 30 stock price returns".
[Link]
1. Budget Constraint:
• Ensures that only assets with an investment share above a certain threshold ε
are included in the portfolio to avoid excessively small investments.
4. Binary Constraints:
• Restricts the number of assets that can be chosen by assigning a binary variable
for each asset indicating whether the asset is included in the portfolio or not.
• Limits the total number of assets that can be included in the portfolio to a
predetermined number K.
7. Correlation Constraints: