An Introduction To Machine Learning
An Introduction To Machine Learning
Analysis
For institutional investor, qualified investor and investment professional use only. Not for retail public distribution.
Authors
Dr Anthony Ledford
Chief Scientist, Man AHL
[Link]/maninstitute
‘‘
Although the terms
‘Artificial Intelligence’
This interview originally appeared as the chapter ‘Humans versus Machines’ in the
2019 Mercer publication Investment Wisdom for the Digital Age. Dr Anthony Ledford
was interviewed by Dr Harry Liem, Director of Strategic Research and Head of Capital
Markets for Mercer in the Pacific Region.
and ‘Machine
Learning’ are often Introduction
Whilst artificial intelligence (‘AI’) has been around since at least the 1950s, interest
used interchangeably, in the topic has boomed since 2012. This has been driven largely by applied
they mean quite breakthroughs with Deep Learning in diverse applications such as image recognition,
natural language processing (‘NLP’) and superhuman performance in the game of Go.
different things. ’’ Unlike the AI doom-mongers, however, we remain sanguine about the possibility of AI
‘taking over’ as evil robot overlords, rating this as science fiction rather than impending
science fact.
The experience of Man AHL over the last decade is that AI, and in particular machine
learning (‘ML’), can play beneficial roles within investment management, especially in
applications where there is a relative abundance of data. For example, our research
and development in faster speed (e.g. daily and intra-day) systematic investment
strategies, together with algorithms for trade execution and smart order-routing,
have all made extensive use of ML. More recently, we have developed and deployed
systematic investment strategies that exploit text-based data using NLP.
The investment management and AI industries are both undergoing rapid change. We
expect to see more consolidation of investment managers as both fee erosion and the
costs of doing innovative state-of-the-art research take effect. Some currently ‘cutting-
edge’ alphas (including some ML models) will transition into alternative betas, whilst
a new cohort of data science researchers will seek-out new alphas to replace them.
Internationally, North America and China have been the leading investors in AI and
ML research. The traditional model of methodological AI research being undertaken in
universities has changed significantly over the last decade, with much now originating
in blue-sky company laboratories and being openly published, with a corresponding
drift of research staff from universities to these laboratories. Without a strong source of
people to replace these university researchers, the research landscape could become
fundamentally changed. To mitigate this, joint industry-university collaborations such as
the Oxford-Man Institute (‘OMI’) may become more common.
In contrast, ML is the study of the algorithms and methods that enable computers to
solve specific tasks without being explicitly instructed how to solve these tasks, instead
doing so by identifying persistent relevant patterns within observed data. 1
Deep Learning refers to a subset of ML algorithms that make use of large arrays of
Artificial Neural Networks (‘ANNs’). 2
1. This general definition of machine learning is very broad. However, within Man AHL the convention is to exclude standard statistical techniques such as linear regression.
Others take a different view. 2. Artificial Neural Networks are computing systems inspired by the biological neural networks found in animal (including human) brains. Such
systems progressively improve their ability to do tasks by considering examples, generally without task-specific programming.
Artificial
Intelligence
Early artificial intelligence
stirs excitement.
Machine
Learning
Machine learning begins
to flourish. Deep
Learning
Deep learning breakthroughs
drive AI boom.
So why the resurging interest in artificial intelligence, given the field has
been around since the 1950s?4
Many practical problems that humans take for granted – such as driving a car,
translating between languages or recognising faces in photos – have proven to be too
complex to solve with explicitly codified computer programs. Indeed, AI researchers
tried this approach for decades, but empirical research showed it is much easier to
solve such problems by gathering a large number of examples (so called training data)
and letting the relevant statistical regularities emerge from within these. For solving
such problems, this ML approach has beaten – by a wide margin – the best human
engineered solutions.
Deep Learning has become extremely popular since 2012, when a deep learning
system for image recognition beat competing systems based on other technologies by
a significant margin, but the development of ANNs can be traced back to at least the
1940s and 1950s, as you point out.
For example, in the 2012 image recognition competition that kick-started all the
subsequent deep learning interest, supervised learning was used on Imagenet. This is
a large database of digital images where each image has been pre-labelled according
to its contents (e.g. bird, fish, plant, etc). 5 Here, the inputs were the images, the
outcomes or targets were the key words describing each image, and the learning task
was to develop a system to reproduce the labels for each image.
Unsupervised Learning refers to when the elements of the training data do not have
outcomes, and the focus is then on identifying structure within the training data. One
example is identifying sub-groups or clusters that exhibit similar features or behaviour,
although unsupervised learning includes broader applications than just clustering.
To illustrate, referring back to the Imagenet database, if the labels describing the
images are ignored, then grouping the images into separate clusters containing similar
features is an unsupervised learning problem. You’d like to allocate images of birds
3. [Link] 4. The field of artificial intelligence research was founded as an academic discipline in 1956. The earliest research into
thinking machines was inspired by a confluence of ideas that became prevalent in the late 1930s, 1940s, and early 1950s. Research in neurology had shown that the brain
was an electrical network of neurons that fired in all-or-nothing pulses. Alan Turing’s theory of computation showed that any form of computation could be described digitally.
The close relationship between these ideas suggested that it might be possible to construct an electronic brain. 5. See [Link]
It’s worth noting that ML systems are not necessarily bound to just one of these fields.
In particular, the combination of deep learning and reinforcement learning in Deep
Reinforcement Learning has produced some high-profile successes, a recent example
being the AlphaGo engine which beat the world Go champion Lee Sedol. 6
For illustrative purposes. Modern Deep Learning systems for computer vision may have 20 or more layers.
Source: [Link]
There is this whole focus on ‘big data’. Why now (apart from the fact that
there is a lot of available data now and storage capacity has increased
exponentially)?
What do we exactly mean when we say big data?
Unfortunately there is no widely accepted definition, but in our view, big data is not
just ‘having a lot of data’. It’s more about having data from multiple sources, of various
types, and arising at different frequencies, e.g. information from financial markets,
national statistics and news, in numerical and text formats, obtained in real-time, daily
and monthly.
6. See [Link]
The lesson is always to fit a simple model first, and then only adopt a more
complicated ML model if the extra predictive accuracy (value) it provides is worth it.
‘‘
Give me the simplest model that does the job every time.
How do researchers deal with the fact that big data may contain a lot of
Big data is not just fake data?
‘having a lot of data’. By ‘fake’ data, I assume you mean ‘data created with the deliberate intention of
misinforming’, as opposed to statistically ‘noisy’ datasets which may contain errors,
It’s more about missing values or other corruptions.
having data from For ‘noisy’ data, a suite of modelling techniques that goes by the name Bayesian
multiple sources, of Machine Learning is particularly robust at dealing with the statistical uncertainty
implicit with such noise. Indeed, this is one of the areas where we have enjoyed both
various types, and collaborations with academics at the Oxford-Man Institute and applications within
arising at different our systematic trading. Systematic fund managers like ourselves have been dealing
with noisy data for decades, so in some sense this can be thought of as business-
frequencies. ’’ as-usual but using the latest cutting-edge tools. Other branches of ML do not
naturally take account of such statistical noise, and in their basic form may fail to give
appropriate results when exposed to noisy data. Such models are described as brittle
rather than robust. This is a criticism often levelled against Deep Learning, however
recent methodological breakthroughs in Bayesian Deep Learning have led to new ML
techniques which at least partially address such issues.
Back to ‘fake’ data. This is not so relevant for market quantities such as price or
volume, as there are mechanisms in place to ensure such data accurately reflect
reality. It becomes more of an issue for text based data such as news or commentary,
but again most financial news reporting is of a high standard. It is also the case that
opinions can be wrong without being fake. It’s more of a problem in unregulated data
sources such as social media, but most institutional level investment and trading is not
driven by these anyway.
7. As an illustrative example, the same is true for simpler statistical models such as linear regression. The unknowns within the model are the intercept and slope parameters.
The knowns outside the model are the observed data, which are known in the sense that they are given, and do not change. 8. Examples include techniques such as Dropout
and Early Stopping, both of which are used to avoid overfitting in Deep Learning applications. 9. For example, Imagenet.
‘‘
gender bias. This deeply problematic outcome reflects an obvious, but important, truth:
these algorithms learn whatever pattern is in the data they are trained on regardless of
whether that pattern is what you want them to learn. Indeed removing bias in training
data and developing techniques for steering ML algorithms to learn some things but
‘Bias’ in data is an ignore others (e.g. things you already know, or that stocks tend to increase in price
issue, and distinct over time) is a key task in the applied research the ML team at our firm undertakes.
from ‘fake’ or ‘noisy’ How do researchers deal with non-stationarity (big data not being stable
data. ’’ over time)?
This is a pretty universal problem in any quantitative financial modelling so is felt
more widely than just in applications of ML. To calibrate any model with parameters
requires data, and the more data you have, the more precisely you can estimate the
parameters. Precision in estimated parameters is good to have, so this suggests you
should use lots of data. However, using more data typically means using data from
increasingly historical periods, but that is at the risk that these data may not reflect the
current world. To avoid that risk you should therefore use only the most recent data;
in other words, use few data. Unfortunately, these considerations pull you in opposite
directions; it’s Catch-22. In practice, we tilt towards using as much data as we can
and apply a penalty that discounts the impact of historical data compared to recent
information.
Algorithms becoming aware and ‘taking over’ is definitely in the domain of science
fiction rather than science fact! That does not mean algorithms can’t or won’t exhibit
destructive behaviour; however if they do, then it won’t be because they’ve gained
consciousness, but more likely that they’ve stumbled on some corner-case solution of
an ill-specified optimisation criterion.
Where do you see the main risks and opportunities for institutional
investors?
The investment opportunities offered by ML strategies can be summarised in one word:
diversification. As with the rest of quantitative investment, there are many more words
for the risks.
10. Alan Turing (1912 – 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. In 1941, Turing and his fellow
cryptanalysts set up a system for decrypting German Enigma signals. Turing is widely considered to be the father of theoretical computer science and artificial intelligence.
“Computing Machinery and Intelligence” published in 1950 is considered the seminal work on the topic of artificial intelligence.
Sure, there would be more competition, but there would also be a lot more research
getting done and a lot more people doing it. The trick would be to remain at the
forefront of that increased research activity, something we’ve been good at so far.
I wanted to draw your attention to a Man AHL paper written in Dec 2016
called ‘Man vs. Machine: Comparing Discretionary and Systematic Hedge
Fund Performance’ which suggests discretionary macro managers
underperform systematic macro managers, even after adjusting for
volatility and factor exposures over the measured time period. How do you
think this debate will shift with the advent of ML?
From my experience, ML strategies have the most to offer towards the faster end of the
spectrum of strategies deployed by most systematic macro managers. Furthermore,
not all systematic macro managers will make use of such strategies.
An independent risk management team that monitors for the build-up of inadvertent
portfolio exposures;
A wide range of diversifying strategies so that undue reliance is not placed on any
one strategy or concentrated group of strategies;
A stable, international, multi-disciplined and diverse team with low turnover that
contains both new recruits and old-timers (like me!);
Active engagement with the outside world e.g. through academic collaborations and
publications;
A test-trading program where the latest research ideas can be verified in live trading
without risking client capital;
More generally the amount of trading capital a model obtains in the portfolio will be
driven by its long-term risk-adjusted return and correlation with other models. This
means that a model which consistently underperforms or fails to diversify will naturally
receive a diminishing allocation as time progresses, although obviously this de-
allocation occurs with some lag.
What do you think is the best investment time horizon to apply ML?
How important is real-time data?
Within our suite of trading models, ML has the highest representation at the faster end,
with holding periods extending from intra-day out to multiple days.
Faster signals than that certainly exhibit greater non-linear structure, however such
effects are hard to capture as alpha in the client-scale funds typical of large systematic
managers like us.
In a nutshell, they are just too fast. However, when applied in trade execution, such
effects may offer significant advantage in reducing transaction costs, e.g. enabling
orders to be front- or back-loaded depending on the short-term predictability of the
limit-order book. Real-time data is essential for that.
Can you comment on some of the other interesting areas apart from ML
that the Man-Oxford Institute is currently working on?12
Within the University of Oxford, the Engineering Science Department’s hub for ML
houses both the OMI and the broader Machine Learning Research Group (‘MLRG’).
Research activities span a diverse range of topics with applications ranging from
astronomy to zoology, with examples including detecting disease-bearing mosquitoes,
identifying exoplanets from data gathered using NASA’s Kepler space telescope,
systems for remote fault detection and monitoring, and making energy networks and
storage more efficient.
Also, if you look across the industry, the half-life of trading strategies tends to be
monotonic with their time horizons. Slower strategies typically last longer, but with
All three. But it’s rare lower Sharpe ratios, than higher frequency strategies.
to find all these in
As a scientist, do you have any thoughts on the premium for sustainability?
the same person at
My colleagues in Man Numeric have spent almost two years unpacking Environmental,
the same time, which Social and Governance (‘ESG’) data, conditioning for statistical biases and removing
exposures to other factors, and thereby have obtained something meaningful and
is why we believe in
orthogonal. This is the closest thing I have seen to an ESG or sustainability factor. Of
teams. ’’ course, it is quite possible that as more people focus on ESG and sustainability that a
premium may emerge – it is a changing environment where ESG and non-ESG activities
could become advantaged or disadvantaged by policy.
How do you see the investment industry evolving over the coming decade?
At the industry level, I expect to see more consolidation as both fee erosion and the
costs of doing innovative state-of-the-art research take effect.
Closer to home, some currently ‘cutting-edge’ alphas (including some ML models) will
transition into alternative betas, whilst a new cohort of data science researchers will
seek-out new alphas to replace them. Discretionary managers will make extensive use
of data dashboards that deliver assimilated big data views.
Run-of-the-mill computer hardware (and whatever the smartphone has become in a
decade’s time) will make today’s state-of-the-art systems look just as ridiculous as
those from 2009 do today.
Who will have the edge in leading the AI research of the future? Apart
from yourself, are there certain leading academics, corporations or even
countries that you could identify?
Internationally, North America and China have been the leading investors in AI and ML
research for some time, with Europe, Australasia and the rest of the world now trying to
compete, if somewhat belatedly.
I’d expect how this funding landscape evolves to be the deciding factor in the
shape of AI developments over the next decade. That said, the traditional model of
methodological AI research being undertaken in universities has changed significantly
over the last 10 years, with a lot more now originating in blue-sky company laboratories
and being openly published, with a corresponding drift of staff from universities to
these laboratories.
Without a strong source of people to replace these university researchers, the research
landscape could become fundamentally changed. To mitigate this, joint industry-
university collaborations such as the OMI may become more common, enabling
academics to operate effectively in both camps, rather than exclusively in one or the
other.
Dr Anthony Ledford
Chief Scientist at Man AHL
Dr Anthony Ledford is Man AHL’s Chief Scientist and Academic Liaison. Dr Ledford is
based in the Man Research Laboratory (Oxford) and has overall
responsibility for Man AHL’s strategic research undertaken
there. Prior to joining Man AHL in 2001, he lectured in Statistics
at the University of Surrey. Dr Ledford read Mathematics at
Cambridge University, holds a PhD from Lancaster University in
the development and application of multivariate extreme value methods and is a former
winner of the Royal Statistical Society’s Research Prize.