Data: AI’s Kryptonite in Asset Management?
Price, percentages, basis points[1], revenue, sales numbers, operating costs all with their corresponding metrics. Turn on the financial news any given day and you will see this. One would think that this is a great place for AI to make its mark. Yet, many who have tried have not achieved the results one would hope for. So, what is the challenge AI has with financial markets? Surprising to most, data may be the biggest foe.
The data alone is not the issue, it is what happens when the data is not sufficient, overfitting[2]. Overfitting is a modeling error in statistics that occurs when a function is too closely aligned to a limited set of data points. You might think of a time when you felt like you knew it all. No one could tell you anything new and, in a certain setting, you were an expert. Then, when thrust into a new situation where all your “knowledge” was useless. This is what happens to a model when it is overfit, it becomes too focused on the data it learned from that it performs badly when introduced to new data. This is a huge problem that must be constantly addressed by those using AI in any field. The source of overfitting is many times the data. Let’s dive into some of the key issues with financial data in AI modeling.
Lack of Data Points
Time Series[3]: Financial data is in time series form, and it is impossible to go back in time and create different stock prices for each day. This means that it is hard to create more data to keep the model from overfitting. Additionally, another issue is the difference in time that each company has been listed in the market. Microsoft, for example, went public in 1986 while Tesla went public in 2010.
Meaning of Units
Changes to Metrics Used in Valuation: While many valuation metrics[4] have stood the test of time, the metrics we use today may not be the same used 10 or 20 years ago. When trying to train an AI to learn patterns, the model needs to consider this so that it is making decisions with the same information that a human had at the time of the decision.
Noisy Data
Human Error: Numbers on a financial statement or GDP data are sometimes updated after the quarter ends. While this is easy to see for a human, a computer doesn’t know the difference. If not dealt with, it could cause serious problems, such as look-ahead bias, for the model when given live data.
Kirin API
To help solve the issues with data, Qraft created Kirin API, Qraft’s proprietary data preprocessing system. Kirin uses a variety of techniques to help prepare the data to train and test our deep learning models[5].
As mentioned before, one of the issues with time series data is that it limits the number of data points possible for use. One technique Qraft uses in an attempt to solve this issue is Data Augmentation[6]. This process involves making slightly modified copies of already existing data to increase the overall data amount.
While many processes sound technical and have fancy names, to solve the issue of changing metrics or data corrections, you just have to change it. Kirin API makes sure that when the data enters the model, it has the data that was available to investors on that date, and not the corrected version.
To help cut out the noise in the data, Qraft uses feature extraction[7]. Many times, in data sets, different features (Different types of data such as stock price, sales, or GDP data) have the same correlation. These features are condensed to one feature to reduce the noise in the data and make the computing process more efficient.
As AI is a new concept to many, many investors could assume that data is not a problem. It is also important to note that the information above is not a comprehensive list nor are the solutions techniques used the only ones Qraft used. However, with the potential for more AI-powered investment products to come on the market, we hope this will arm investors with the knowledge to ask the right questions.
1. Basis Points - Basis points (BPS) refers to a common unit of measure for interest rates and other percentages in finance. One basis point is equal to 1/100th of 1%, or 0.01%, or 0.0001, and is used to denote the percentage change in a financial instrument.
2. Overfitting - Overfitting is a modeling error in statistics that occurs when a function is too closely aligned to a limited set of data points.
3. Time Series Data - Time series data is a collection of observations obtained through repeated measurements over time. Plot the points on a graph, and one of your axes would always be time.
4. Valuation Metrics – Valuation metrics are comprehensive measures of a company's performance, financial health and prospects for future earnings. Examples include EPS (Earning per Share), P/E (Price to Equity), etc.
5. Deep Learning Model - Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.
6. Data Augmentation - Data augmentation is a popular technique used to increase the generalizability of a possibly overfitting model. By generating additional training data and exposing the model to different versions of data within the same class, the training process becomes more robust and is thus more likely to generalize the resulting model.
7. Feature Extraction - Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing.