Why Cluster Analysis and Why now? Machine Learning in Equity Trading

Monika kucharskaQuod Insights

Why Cluster Analysis and Why now?

Why Cluster Analysis and Why now?

The equities universe, from shares of companies to ETFs and Warrants, is greater than 4 million instruments. Some are liquid, some not particularly liquid, and some are traded sporadically. The prevalence of automated electronic trading, combined with the use of algos and smart order routing, results in an increasing use of historical data. This is present in an array of trading, risk, and execution decisions for example VWAP Algorithms or Percentage of Volume (POV) or Implementation Shortfall (IS) algos. 

The first addressable issue relates to the sheer volume of data that needs to be captured, cleansed, stored, and processed for execution systems to use in real-time. Often, the data for these decisions is insufficient, as a result of licensing, granularity or simply the data is unavailable.

Machine Learning Cluster Analysis is a fully data driven approach to identify groups of instruments that perform similarly over short periods of time. This method is used to reduce the instrument universe required to perform data-driven operations.

What is Cluster Analysis?

Cluster analysis is the process of dividing sets of data points that are similar to each other into groups. It is a widespread tool used for exploratory data analysis in diverse fields from science to business. An example of where cluster analysis is often used is in marketing analytics. Consumers can be grouped into segments based on characteristics or features, such as age, postcode, purchase history, etc.

Cluster analysis is an umbrella term used for many different algorithmic approaches.  It is a form of unsupervised machine learning in which the clustering algorithms can automatically interpret data by finding natural groups.

Why Cluster Analysis and Why now?

Financial markets can be regarded as complex systems. A complex system is a system where many components interact with each other influencing outcomes. Complex systems are used in many research areas, such as biology, chemistry, and physics. 

Capital Markets’ complexity evolves from the vast number of individual actors that interact in an uncoordinated and sometimes disorderly way. These individual microscopic interactions in the system often lead to large macroscopic patterns. 

Clustering is an effective way to analyse complex systems to data driven decisions. And clustering algorithms are well suited specifically to financial markets because of the large amounts of data available, making the system well defined.

There are many reasons for grouping equities instruments together, such as analysing trading strategies (eg. for Transaction Cost Analysis), pre-and post-trade risk management, execution, and trading strategy selection, or providing trading algorithms with more valuable insights to optimise their performance.

Two of the main use cases of the clustering algorithm developed at Quod Financial include:

  • Volume Curve Prediction
  • The optimization of the slice duration according to the targeted participation rate on a given benchmark (eg. VWAP)


To achieve either of these use cases without cluster analysis, simulations would need to be performed on a large number of instruments over long periods while producing sufficient samples for various combinations of parameters. This means vast amounts of data would need to be stored. However, clustering can vastly increase the efficiency of this process, because the simulations can instead be performed with representatives of each cluster, rather than on individual instruments.

What are the benefits of the Machine Learning Clustering approach?

Reducing the data cost:

Acquiring all the necessary data (historical tick data and trades) is expensive: both costly to process and store. With the Clustering approach, only the clusters of data are important and not all the data for each instrument.

Improving data quality and completeness:

The available data is sometimes of poor quality or incomplete to base an analysis on. For instance, Warrants trading can be infrequent, and as a result data is incomplete or inconsistent. The use of clustering allows gaps to be derived from similar instruments.

Better suited for real-time processing:

In any real-time trading system, reducing data processing increases overall performance. Clustering allows calculations to be done on drastically smaller data sets.


The above benefits are an increasingly important success factor in automated execution and trading strategies, which uses this data to decide on the slicing, timing, or volume participation.

To find out more, access our Machine Learning Equity Trading Clustering Whitepaper 


Image promoting Quod Insight which talks about trending technologies and topics impacting the future of financial services


About Quod Financial

Quod Financial is a multi-asset OMS/EMS trading technology provider focused on automation and innovation – specialising in software and services such as Algorithmic Trading, Smart Order Routing (SOR), and Internalisation of Liquidity. Quod leverages the use of its data-driven architecture to support the demands of e-trading markets by combining AI/ML-enabled decision-making tools and dynamic market access with a non-disruptive approach to deployment. For more information visit: www.quodfinancial.com

Quod Marketing

+44 20 7997 7020