Constructing ETFs with AI
To create a high-quality quant fund with robust alpha factors, Qraft Technologies goes through three distinct steps: data pre-processing, factor research, and strategy extraction. Understanding these steps is critical to getting a comprehensive overview of how Qraft AI ETFs are made and executed. This in turn will provide you with a better sense of the “why” you should invest in our ETFs.
We hope that by explaining our process with simplicity and conviction, we can bring you closer to integrating any one of our NYSE-listed AI ETFs (ticker: QRFT, AMOM, HDIV) into your portfolio.
At Qraft Technologies, our goal is to allow AI systems to automate the traditional process of producing active index ETFs with high level of alpha at low cost.
Data Pre-Processing with Kirin API
The first step in trying to form an excess return focused ETF with AI technology is taking on the right source of datasets. Without a proper and clean data, the outcomes may eventually lead to inaccurate results. Qraft’s in-house Kirin API, which is a comprehensive data platform that integrates multiple vendors to provide both macroeconomic and company fundamentals with the correct point-in-time data, can do just that. To learn more about Kirin API, please read our article published on Medium here.
What Kirin API essentially does is it eliminates any inaccurate data that leads to look-ahead and/or survivorship biases. In layman’s terms, Kirin API ensures that the release of a company’s fundamental data matches the official end or start of a quarter. For example, Q4 ended on 12/31/19 and let’s suppose that official documents didn’t get released until 01/15/2020. That means if you try to formulate a model before the release date, then your data will be flawed because you are using a future date. And the problem with future data is that it can cause biases which will lead to overfitting of the model.
Big-sized global data vendors like S&P Global and Refinitiv overlook the importance of point-in-time data. With Qraft’s data processing system, however, not only is the correct records available, but investors can avoid making bias decisions that might eventually hurt their portfolios.
Researching Robust Factors
The core function of Qraft’s AI system lies with automatically finding high alpha factors and coming up with new investment strategies to implement. In short, with the right datasets provided by Kirin API, the AI technology navigates through the vast investment universe to narrow the probable candidates for a strategy. To better understand this, let’s consider a typical Go game.
If you are unfamiliar with Go, it’s a Chinese-originated abstract strategy board game that aims to declare more territory than the opponent. Two players with black or white stones will take turns placing the stones on an empty intersection of a board. Once the stone gets surrounded by the opponent, then it gets removed. At the end of the game, the players will count one point for each stone they’ve captured and one point for each vacant territory. A winner is declared when he/she has more captured stones and territory than the opponent.
Go looks simple, but it is a complicated board game that requires in-depth strategic knowledge, creativity, and intuition. There are literally billions of possible board configurations available and for nearly a decade, AI wasn’t able to overcome this hurdle due to the qualitative and mysterious nature of the game. That is, until March 2016 when AlphaGo famously defeated Lee Sedol, a world champion with more than 18 titles to his name, by 4:1 in a historic match held in Seoul, South Korea.
AlphaGo, developed by Google’s DeepMind Technologies, uses both advanced search tree and deep neural networks to increase its learning capability by playing different versions of itself thousands of times. Eventually, AlphaGo became an expert at learning what works and what doesn’t. AlphaGo is considered the world’s greatest Go player, defeating champions several times over.
So why bring this up? And how does this connect to Qraft’s AI technology?
Similar to a Go game, the search space for the investment universe is massive. In fact, it may even be bigger. In such a vast search universe, a well-engineered deep learning model can exhibit consistent results in narrowing the probable candidates and automatically back/forward testing the candidates to extract an investment strategy. In other words, just as AlphaGo can strategize and predict which outcomes work best, Qraft’s AI system can formulate effective strategies by finding which factors potentially provide the highest returns.
The initial part of this process is finding the factors and the latter part is automatically extracting strategies. Factor Factory is Qraft’s core research technology that intuitively finds factors that may bring excess returns. With the correct data processed by Kirin API, Factor Factory can explore hundreds of market anomalies in a single day by applying AutoML technology.
Extracting New Investment Strategies
After finding robust factors, Strategy Factory constructs an investment strategy that applies to our ETFs. There are currently four main components that make up the extract framework: Data Loading, Processing, Splitting, and Backtest Simulation.
Just as there are different types of strategies that Go players can implement for optimal play, there are countless ways that Strategy Factory can make the best use of the factors found by Factor Factory. Let’s briefly discuss the four main components:
*Heavy technical terms are applied.
1. Data Loading: Fetches data from Kirin API and adds other layer of necessary operations to properly process data. There are two main types of data that we focus on: data with different values for individual stocks (RCA factors) and data with only one value at a specific time (GDP, S&P 500 Index).
2. Processing: Applies cross-sectional Z-score to the data and divides the quartile based on the value to convert it into One Hot. One Hot from data is as follows:
Convert data to percentile
Convert percentile to classification according to given intervals ((0~0.3) : 0, (0.3 ~ 0.7) : 1, (0.7 ~ 1) : 2)
Convert class number to one hot vector (0: [1,0,0], 1 : [0,1,0], 2: [0,0,1])
3. Splitting: Data is truncated according to validation time points to avoid look-ahead bias. For example, when validating model performance for May 2015, you need to train and infer using only data up to April 2015.
The remaining data is then transferred to the model training and inference stage, where it’s able to train multiple models with different goals. The series of models trained are stored for each time point and the prediction is inferred from models trained with that time point data.
4. Backtest Simulation: Performs a simulation based on the predictions at each time point to check the portfolio components, weights of each stock, dividend yield, turnover rate, capital gains, etc. The number of portfolio items is set at this point and how they will be distributed within the ETF.
Wrapping Up
Simply put, our ETFs are created by processing clean data, finding high alpha factors through AI, and forming an investment strategy that brings the potential for excess returns. Since inception, Qraft AI ETFs have brought incredible results and outperformed several benchmark indices — you can view our performances here.
Qraft AI ETFs trade on the New York Stock Exchange and they are available to purchase on various brokerage accounts, including Fidelity, Charles Schwab, TD Ameritrade, E-Trade, and Robinhood. Please note that you cannot invest directly through Qraft Technologies, Inc.
If you are looking to actively invest in high-growth ETFs, Qraft Technologies, Inc. is devoted to helping you achieve alpha potential by leveraging the latest AI technology.
Artificial intelligence selection models are reliant upon data and information supplied by third parties that are utilized by such models. To the extent the models do not perform as designed or as intended, the strategy may not be successfully implemented. If the model or data are incorrect or incomplete, any decisions made in reliance thereon may lead to the inclusion or exclusion of securities that would have been excluded or included had the model or data been correct and complete. Service providers may experience disruptions that arise from human error, processing and communications error, counterparty or third-party errors, technology or systems failures, any of which may have an adverse impact.
— — — — — — — — — — — — — — —
Alpha is a measure of the active return on an investment, the performance of that investment compared with a suitable market index.
AutoML technology is short for Automated Machine Learning. It’s essentially the automation of the machine learning process to make machine learning jobs simpler, easier, and faster.
RCA Factors, otherwise known as Root Cause Analysis, represents a systematic process to understand and perform a comprehensive, system-wide review of significant problems.
Z Score gives you an idea of how far from the mean a data point is. In statistics, Z scores are measured in standard deviation units.
One Hot is a process by which categorical variables are converted into a form that could be provided to Machine Learning algorithms to do a better job in prediction.