Building the factory, not the car

Jun 5

A successful trading model requires much more than just picking the right model type, a few inputs, and a trading command to execute. The trading model undoubtedly influences trade execution decisions, but it is by no means the only factor in the decision-making process. As the saying goes, "build the factory, not the car" means investing more time, money, and resources to make sure the right infrastructure and tools are in place before building a trading model. Nevertheless, many more systems must be put in place before large sums of money can be trusted to any computer algorithm, as many of them call for specialized knowledge and skill sets.

Below is a quick summary of the main functional groups required to create a profitable trading model, in order to give you an idea of the diverse roles that any trading "factory" possesses. While each component could be the subject of a whole other study, and as many have built entire careers on subsets of these components, this overview aims to condense the main building blocks or functional teams and is by no means an exhaustive list.

Data Engineering Team:

The data engineering team is essentially the starting point of the model factory, where the core data that is being used throughout the creation process is validated, curated, and formatted. Training models were previously driven by internal and restricted databases, and the competitive advantage was mainly derived from access to private data. Nonetheless, the retail data gap is closing thanks to the abundance of third-party data providers that are available today, providing huge, sparse data sets that are easily and inexpensively accessible. The availability of data has largely fueled the use of machine learning (ML), which has helped give retail investors a competitive advantage in the markets. With an abundance of advancements in data (and its ubiquitous availability), new tools are constantly required and developed in order to take advantage of these market opportunities. Leveraging distributed databases and assembling a team capable of optimizing them for advanced data curation is crucial when working with terabytes of data at your disposal. The emergence of MapReduce along with the continual improvement of data warehouse technologies, are reshaping the industry. Ensuring the skills in the factory are sharp enough to adapt to this changing landscape is a fundamental part of staying competitive and ensuring all aspects of the model factory are running smoothly. Furthermore, with the advent of cloud-based data warehousing, specific expertise is required to guarantee that these databases are performant and continuously optimized. The Data Engineering team not only possesses strong database optimization skills but also a solid working knowledge of API development, typically in R, Python, and Excel, to create scripts for data gap visualization and cleaning as well as basic scripting.

Software Team:

Since distinct components of the algorithm factory have distinct needs depending on their functional domain, there are often multiple teams rather than just one. The way these teams are organized will be specific to the firm, but in general, having interchangeability of software developers across the organization and a well-understood common set of development principles is considered best practice within most software development organizations. Particularly in the ML/AI domain, mastery of Python and the following libraries: sklearn, pandas, numpy, scipy, matplotlib, Multiprocessing, and a few more, is widely regarded as standard practice. To guarantee that these software teams can communicate with one another effectively, a strong understanding of API-driven development is also necessary. Finally, it is critical to have a solid grasp of validation testing to guarantee that experimental models are properly productized.

Infrastructure Teams:

This is the team that deals with the actual physical hardware that is used to drive model training and live execution. The skills required will differ from firm to firm depending on the core strategies; however, it is worth pointing out that the skills required to maintain hardware operations are unique and separate. While there has been a significant shift in recent years toward cloud-native operations where third parties handle hardware configuration, high levels of customization and GPU utilization still require careful consideration. It is imperative for any organization to guarantee that its core production hardware is operating at peak efficiency and that all security patches have been installed. Stronger models typically require stronger infrastructure, and maintaining the hardware rack's is just as crucial as ensuring the software installed in it is reliable. It takes a very specific skill set to oversee the installation of the infrastructure and determine its appropriate size and configuration.

Modeling and Trading Team

The modeling and trading team is occasionally regarded as belonging to the software team; however, these experts require different skills than just programming. Although there is a lot of overlap in the skills required, this team is more concerned with developing the models than with turning them into tradable execution. Their team members are generally well-versed in Python, and also tend to delve into other, less common technologies used in the production space, such as; MATLAB, R, etc. Tasked with giving the math behind the models a deeper understanding of economic and monetary aspects, they place greater emphasis on research than code quality. In essence, the modeling and trading team is "allowed to write bad code", but it is crucial to give them a way to verify the output while the software team is putting it into production, regardless of the research technology they use in their research.

Trading Validation Team

When a model is given trading capabilities and access to the market, it may occasionally require human assistance to facilitate a direct trade, or it may be entirely automated and have direct access to brokerage services. A model can only learn from past data in either scenario, and there are numerous ways for an algorithm to go awry. The trading validation team is in charge of establishing the boundaries for every algorithm and making sure that abnormal behavior such as stop-losses are implemented before anything negative can occur. These guidelines are established in close collaboration with the portfolio team and are continuously reviewed to make sure that a human can intervene if necessary. In a sense, this team acts as the production support space for the trading models.

Portfolio Team

The portfolio team is responsible for adding the human touch to the firm and set the uniqueness that ultimately drives the success of the portfolio. There is no single model that applies to everything; it is up to the factory to create several models and find a compromise between risk tolerance and projected return. Various models are regularly created for specific stock tickers and asset classes. Depending on the fund's risk tolerance, it is usually preferable to have a human oversee the precise allocation ratios and leverage amounts that should be set to hedge risk. Computer models—which can be automated—help with this, but as previous blog posts have pointed out, models can only detect patterns of events that have already happened. Regardless of the computer model, someone needs to ensure the portfolio is in line with the desires and risk tolerance of the trading group.

Market dynamics are constantly changing - staying nimble and agile is always key to defining a firm's "factory.”. While the core teams above may have very different objectives and domains of expertise, the human aspect of its implementation is what ultimately drives its success. Allowing people to move and provide expertise across domains, all while having a strong culture around technical reusability and development standards is important to maintaining a “clean shop” and having an edge on a faster and more dynamic market. A trading firm's main objective is to maximize alpha while minimizing risk. To do this, it must not only create the required models but also operate as a factory to produce them on a constant basis, allowing them to maintain flexibility in a constantly changing financial environment.

Steven Gasior

Building the factory, not the car

The human factor in an algorithmic world