strategies to focus on one trading pair
Umpteen researchers have proved to optimize pairs trading as the numbers of opportunities for arbitrage profit have gradually ablated. Pairs trading is a grocery store-neutral scheme; it profit if the given condition is satisfied within a given trading window, and if not, there is a risk of deprivation. In this study, we nominate an optimized pairs-trading strategy victimisation deep reinforcement learning—particularly with the deep Q-network—utilizing various trading and occlusive-deprivation boundaries. More specifically, if spreads hit trading thresholds and reverse to the mean, the agent receives a positive reward. However, if spreads hit stop-loss thresholds or fail to reverse to the mean after hitting the trading thresholds, the factor receives a negative reward. The agent is trained to pick out the best storey of discretized trading and stop-loss boundaries given a banquet to maximize the expected sum of discounted future profits. Pairs are selected from stocks along the Sdanadenylic acid;P 500 Forefinger using a cointegration test. We compared our proposed method with traditional pairs-trading strategies which apply constant trading and occlusion-loss boundaries. We find that our proposed model is trained intimately and outperforms tralatitious pairs-trading strategies.
1. Introduction
Pairs trading is a method for obtaining arbitrage profit when on that point is a applied mathematics conflict between two stocks with similar characteristics that are cointegrated or highly correlated. This is possible because of the statistical reason that spreads made by 2 stocks have a mean reversion in the long run [1]. In the youth, pairs-trading methods were popular because of the opportunity to obtain arbitrage profit [1–4]. However, as many investors including circumvent funds sought these arbitrage opportunities by executing the pairs-trading strategy, its gainfulness began to deteriorate [5, 6]. To get the better of these shortcomings, significant research has been conducted to improve the pairs-trading strategy [7–10].
The mechanism of pairs trading is American Samoa follows. First, a pair of stocks with similar trends is identified. Second, regress analysis such as ordinary least squares (OLS), total method of least squares (TLS), and error correction models (ECM) is used to calculate the extended of these stocks. Finally, if the spread hits preset boundaries, investors will open a portfolio which takes a longsighted position along the undervalued stock and shorts the overvalued stock. Subsequently, if the spread reverses to the mean, investors will close the portfolios which are opposite position to the open portfolio. In this case, the investor obtains an arbitrage profit by executing this strategy. However, there is a risk when the spread does non reverse to the mean. In such a berth, investors are at high chance because they cannot close the portfolio. By scope a stop-loss boundary, investors can hedge the risk [11–13].
Galore researchers get applied various applied mathematics methods to improve the efficiency and public presentation of pairs trading. In particular, they concentrated happening using the spread as a trading signal. The study in [1] collected pairs of stocks based along minimizing the sum of squared deviations between the deuce stocks so executed the trading scheme if the difference between the pairs is double the standard deviation of the spread. They used normalized US breed price data from 1962 to 2002 to exam the lucrativeness of pairs trading. The take in [14] used the cointegration approach to protect the pairs-trading strategy from severe losses. They applied an OLS method to make over a spread and set off various conditions that translated into trading actions. From these models, they achieved a trading strategy with a stripped level of profits stormproof from risk of loss. The results showed about an 11% annualized excess return over the entire menses. The enquiry in [15] compared the distance and cointegration approaches for to each one high-oftenness and daily dataset to check whether it is profit-making for Norwegian seafood companies. The performance is similar between two approaches. Reference [16] used a Kalman separate out to calculate spread, which was so used as a high-frequency trading sign, on the shares constituting the KOSPI 100 Index. He found that the pairs-trading strategy's performance was significant connected the KOSPI and was better during daily market conditions at market opening and final. Furthermore, [7] optimized a pairs-trading system as a random control problem. They used the Ornstein-Uhlenbeck process to calculate spread as a trading signaling and tried and true their model with simulated data; the results showed that their scheme performs substantially. In summation, [17] suggested the Ornstein-Uhlenbeck swear out to induce a commercialise microstructure noise used as a trading signal in pairs trading strategy. The performance is better under this method than in traditionalistic estimators so much as ARIMA(1,1) and maximum likeliness. Reference work [18] applied a cointegration method to Chinese commodity futures from 2006 to 2022 to check whether pairs trading was suitable in that market. They used OLS regression to create spreads from the pairs. Furthermore, [10] applied a cointegration tryout to assorted pairs of stocks and a transmitter erroneous belief-discipline mould to create a trading signal.
It is important to set a boundary to optimize the pairs-trading scheme. This boundary is a criterion for deciding whether to perform a pairs-trading strategy. If a low boundary is rig, many strategies will follow executed, but profits wish live lower; if a high boundary is set, investors will get high returns when the strategy is executed. Nonetheless, all this assumes that mean reversion occurs. If the spread does not recurrence to the average in the specified trading window, losses leave follow incurred. If a low boundary is set, the going will be small. However, if the strategy is executed with a high boundary, the loss testament increase. Therefore, the performance of pair trading depends on how the boundary is readiness. Reference [14] suggested taking a borderline-profit condition, which could be efficient to reduce losses in a pairs-trading system. They set a trading rule with a diverse open condition: for exemplar, if the spread is higher up 0.3, 0.5, 0.75, 1.0, and 1.5 standard deviations. They used the daily closing prices from January 2, 2001, to August 30, 2002, of two stocks, the Australia Unused Sjaelland Bank and the Adelaide Banking company. The results showed that, as the open condition assess decreases, the number of trades and profits increases. Also [19] recommended optimal preset boundaries calculated from estimated parameters for the average trade continuance, intertrade interval, and number of trades and used them to maximize the minimum total profit. They used the daily closing price information from January 2, 2004, to June 30, 2005, of seven pairs of stocks on the Australian Securities market. The results showed that their proposed method was streamlined in making profits victimization the pairs-trading strategy. Reference [18] examined whether the pairs-trading strategy could be practical to the daily return of Chinese trade good futures from 2006 to 2022 using cardinal methods: classical, nonopening-loop, and dynamic break-loss. The closed-loop method takes only a stop-profit barrier which executes the strategy and does not consider the risk if spreads revert to the mean. The neoclassic method acting adds halt-loss boundaries to the closed-coil method acting. The high-powered break off-loss method acting uses a diverseness of stop-profit and kibosh-loss barriers to primed the spreads if the spread is large than the standard deviation, which is set using criteria based on the historical fair of spreads. The results showed that these methods obtained an annualized return of over 15%, especially the shuttered-loop method acting, which yielded the highest profit of 26.94%. In addition, [20] experimented with fixed best threshold pick, conditional excitability, centile, ghostlike analysis, and neural meshwork thresholds in pairs-trading scheme. Of these, the neuronic web threshold has outperformed all other strategies.
Favourable the success of reinforcement learning, demonstrated by its successful performance at Atari games [21], many researchers have attempted to apply this algorithm to the financial trading organization. Reference [22] projected a deep Q-trading system using reinforcement acquisition methods. They applied Q-learning to a trading system to trade automatically. They set a delta Leontyne Price victimisation information from the past 120 days, had three discrete action spaces (buy, hold, and deal), and used long-term profit as a reward. They ill-used daily data from January 01, 2001, to December 31, 2022, of the Hang Seng Indicator and the Sdanamp;P 500 Index. The experimental results showed that their proposed method outperformed steal-and-keep apart strategies and recurrent reinforcement learning methods. Reference [23] proposed three steps to hold reinforcement acquisition to the fiscal trading system. First, they diminished relative replay size to fit financial trading. Second, they proposed an action-augmentation proficiency that provides more feedback from the action to the federal agent. Third gear, they used foresightful sequences as reinforcement data to conduct recurrent neural net training. The experimental data comprised tick-by-mark data of 12 forex currency pairs from January 2012 to December 2022. The results showed that the action mechanism-augmentation proficiency yielded many profit than an epsilon-greedy policy. Reference [10] used an N-armlike brigand problem to optimize the pairs-trading strategy. They took the circulate using an wrongdoing-correction model and saved the parameters using a grid-hunt algorithm. They compared their projected modeling with a constant parametric quantity manikin, which was similar to a orthodox pairs-trading scheme. They used intraday one-minute data of some stocks in the FactSet database from June 2022 to January 2022. The performance of their projected model was better than the constant-parameter framework.
We investigate not sole the dynamic boundary based on a spread in each trading window—which can reach high profit than the unmoving limit in use in traditional pairs trading strategy—but also if IT is possible to train deep reinforcement eruditeness methods to follow this mechanism. To this cease, we aim a new-sprung method to optimize the pairs trading strategy using late reinforcement learning, especially deep Q-networks, since pairs trading scheme backside be view of atomic number 3 a game. After opening a portfolio position, the profit can be set whether portfolio is squinched, stop-loss berth. Therefore, if we set this strategy as a game by mise en scene boundaries which are optimized in spreads in trading window, we can accomplish more profit than traditional pairs trading strategies. In particular, we hardening the pairs-trading system to be a benign of game and obtain the best boundaries, trading thresholds, and stop-loss thresholds according to the calculated spreadhead. The intellect for this construction is that if the portfolio is opened and closed in the trading window in the calculated spread, it will be unconditionally profitable if the portfolio is closed. If the portfolio reaches the stop-loss boundary or does non meet to the mean, losses may occur. We therefore pose the DQN to learn by positively rewarding it if it takes a closed position and negatively rewarding it if it reaches the stop-expiration operating theater decease thresholds. We conducted the following experiments to swan that our proposed method acting is optimized compared to the conventional method acting. First, we used different spreads measured using OLS and TLS to see how the results differ depending on the spread used for input. Second, contingent on the geological formation window and trading windowpane, the counterpane and hedge ratio wish represent heterogenous. We therefore set a total of six window sizes for selecting the optimal window size which had the best performance. Finally, we compared the proposed method with the longstanding pairs-trading strategy using the test data with the optimal windowpane size. Therein try out, we use the daily adjusted closing prices from January 2, 1990, to July 31, 2022, of 50 stocks in the Sdanampere;P 500 Index. Experimental results show that our proposed method outperforms the traditional pairs-trading strategy across all the pairs. To boot, we can confirm that the performance measure varies according to the spread.
The main contributions of this study are Eastern Samoa follows. Firstborn, we propose a novel method to optimise pairs trading scheme victimization deep reinforcement learning, especially deep Q-networks with trading and stop-loss boundaries. The experimental results show that our method tail end be applied in the pairs trading system and also to various other Fields, including finance and economics, when there is a deman to optimize a rule-based strategy to be more efficient. Forward, we propose an optimized dynamic boundary based on a spread in apiece trading window. Our proposed method outperforms traditional pairs trading scheme which set a fixed edge. Last-place, we find that our method outperforms time-honored pairs trading strategy in whol pairs based on constituent stocks in SdanA;P 500. Since our method selects optimal boundaries based on spreads, IT can be applied to new stock markets such as KOSPI, Nikkei, and Hang Seng. It should exist noted that the present work is a part of the Original thesis [24].
The eternal rest of this paper is methodical arsenic follows. Section 2 explains the technical background. Section 3 describes the materials and methods. Section 4 shows the results and provides a discussion of the experiments. Segment 5 provides our conclusions to this field.
2. Technical Background
2.1. The Traditional Pairs-Trading Scheme
Pairs trading is a representative market-neutral trading scheme which simultaneously longs an undervalued stock and shorts an overvalued stock. This scheme is a take form of statistical arbitrage trading that assumes the movements of the prices of the ii assets will be similar to previous trends [1]. It follows the assumption that plus prices will counte to the long-terminal figure equilibrium. This strategy started from the melodic theme that arbitrage opportunities exist when the price opening between two assets expands to surgery past a certain level. It is also based on the belief that historical price movements will not change importantly in the future.
In Figure 1, the graph drawn in amobarbital sodium is a spread made of two stocks that are cointegrated, the red lines are the trading boundaries, and the green lines are the hold bac-loss boundaries. When this spread reaches the trading boundaries, the portfolio is yawning and only closed when the spread returns to the average. However, losings are incurred when prices hand the check-loss boundaries after the portfolio is agape and do not return to the average. Furthermore, after the portfolio is opened, if the trading signal is non reversed to mean during the trading window, the portfolio is closed by force; this is titled the exit put over of the portfolio.
2.1.1. The Cointegration Test
There are many approaches for pair selection such as the discrete approach [11, 25–27], the cointegration approach [10, 16, 27], and the stochastic approach [7, 8]. Therein study, we use the cointegration approach to choose pairs which have long-term equilibrium. Generally, a analog combining of nonstationary variables is also a nonstationary human relationship. Assume that and have unit roots; as antecedently mentioned, the linear combination of these variables follows nonstationary conditions. Withal, IT can be a stationary relationship if the nonstationary variables are cointegrated. In this case, this regression must be checked to determine whether information technology is a spurious regression operating theatre cointegrated. Johansen's method is widely used to test for cointegration [28]. In this method, the number of cointegration relations and the parameters of the model are estimated and tried using maximum likelihood estimation (MLE). Since all variables are regarded as endogenous variables, there is no deman to pick out reliant variables and multiple cointegration relationships are identified. In addition, we manipulation MLE to estimate the cointegration relation with the vector autoregression model and to determine the cointegration coefficient supported the likelihood-ratio try out. There is therefore an advantage in performing diverse theory tests connate the estimation of cointegration parameters and the setting of other models when at that place is cointegration, and not but to test for cointegration.
2.2. Spread Calculation
2.2.1. Workaday To the lowest degree Squares
In regression psychoanalysis, OLS is widely used to estimate parameters away minimizing the sum of the squared errors [29]. Assume that , , and are an self-reliant variable, a dependent variable, and an wrongdoing term. We can estimate from the following equation by winning a partial derivative: The value obtained from equation (5) is misused for the number of store orders. The epsilon apprais is also in use as a trading bespeak through Z-marking, in the state composed of the geological formation-window size.
2.2.2. Total Method of least squares
TLS estimates parameters to denigrate the sum of the measured distance and the vertical outstrip between regression lines [30]. Since the vertical distance does not change when the X and Y coordinates are changed, the rate of is calculated consistently. In the TLS method, the discovered values of and have the following fault terms: where and are true values and and are erroneous belief damage following free identical distributions. It is assumed that there is linear combination of true values. For convenience, we represent the error variance ratio in equation (10): The orthogonal regression calculator is calculated by minimizing the sum of the measured space and the vertical distance between regression lines in par (11): The value obtained from equation (12) is used in the same way as that obtained from equation (5) and the epsilon measure is also used as a trading signal done the Z-score in the res publica composed of the formation-window size.
2.3. Reinforcement Learning and the Deep Q-Meshing
The approximation of reinforcement learning is to find an optimal policy which maximizes the expected sum of discounted future rewards [31]. These rewards come from selecting the optimal value of each action, called the optimal Q-value. Reinforcement learning basically solves the problem definite past the Markov determination process (MDP). It consists of a tuple , where is a tensed set of states, is a finite set of actions, is a express transition probability matrix, is a reward mathematical function, and is a discount broker. In surroundings , agentive role-observed state at fourth dimension , action is selected. From the results of these sequences, environmental feedback is provided to the federal agent in the form of reward and next state . An action is chosen by the action-respect function that represents the expected sum of discounted forthcoming rewards. In this sue-value function , we find an optimal execute-value function , following an optimum policy which maximizes the expected sum of discounted time to come rewards. This optimal action-value function can be developed as the Bellboy equivalence. The DQN uses a nonlinear function approximator to estimate the action value function. This network is trained past minimizing a sequence of loss functions , which changes with each episode of . The weight of is updated as the sequence progresses:
3. Materials and Methods
3.1. Data
In this study, 50 stocks from the Sdanamp;P 500 Index were selected based on their trading bulk and market capitalization. To carry unconscious the experiment, the information must cover the same period. Thus, corresponding stocks were selected, leaving a total of 25 stocks. Postpone 1 represents the dataset of hackneyed names, abbreviations of those stocks, and their respective sectors. We collected the adjusted day by day closing prices using Thomson Reuters' database. The period of the training dataset is from January 2, 1990, to December 31, 2008, comprising 4792 data points; the test dataset covers the period from January 2, 2009, to July 31, 2022, comprising 2411 data points. From these datasets, a pair of stocks volition make up selected during the education dataset time period using the cointegration test.
| |||
No. | Ticker | Trite | Sector |
| |||
1 | AAPL | Apple Inc. | Technology |
2 | MSFT | Microsoft Corporation | Technology |
3 | BRKa | Berkshire Hathaway Inc. | Financial Services |
4 | JPM | JPMorgan Tail danadenylic acid; Cobalt. | Financial Services |
5 | JNJ | Johnson danamp; Johnson | Healthcare |
6 | XOM | Exxon Mobil Corporation | Vim |
7 | BAC | Bank of America Corporation | Fiscal Services |
8 | WFC | Wells Fargo danamp; Company | Financial Services |
9 | WMT | Walmart Inc. | Consumer Defensive |
10 | UNH | UnitedHealth Group Incorporated | Healthcare |
11 | CVX | Chevron Pot | Vitality |
12 | T | ATdanamp;T Inc. | Communication Services |
13 | PFE | Pfizer Inc. | Healthcare |
14 | ADBE | Adobe Systems Incorporated | Engineering |
15 | MCD | McDonald's Pot | Consumer Cyclical |
16 | MDT | Medtronic plc | Healthcare |
17 | MMM | 3M Company | Industrials |
18 | HON | Honeywell International Inc. | Industrials |
19 | GE | Gross Electric Company | Industrials |
20 | ABT | Abbott Laboratories | Health care |
21 | MO | Altria Group, INC. | Consumer Defensive |
22 | UNP | Union Pacific Corporation | Industrials |
23 | TXN | Texas Instruments Incorporated | Technology |
24 | UTX | United Technologies Corporation | Industrials |
25 | LLY | Eli Lilly and Company | Health care |
|
3.2. Selecting Pairs Exploitation the Cointegration Run
It is necessary to twin stocks which have semipermanent statistical relationships OR like price movements. Information technology is practical to determine the degree to which two stocks have had exchangeable price movements through the correlation value. Furthermore, the long-condition equilibrium of a pair of stocks is an essential characteristic for the execution of pairs trading. In this report, we used the cointegration set about to select pairs of stocks. Through Johansen's method, we designated 11 pairs of stocks that have endless-run equilibria. Table 2 shows the resulting pairs of stocks that were identified settled on t-statistics and Chassis 2 shows price movements of the cointegrated stocks XOM and CVX. Using this dataset, we will swan whether our projected method acting has better performance than the traditional pairs-trading method acting.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mark: and refer a rejection of the null hypothesis at the 1% and 5% significance levels, severally. |
3.3. Trading Sign
Aft selecting the pairs, it is essential to extract the betoken for trading. To extract signals, we opt for the OLS or TLS methods. Start, because the stock price follows a random walk around [32], we need to ensure that it follows the process through the increased Dickey-Fuller test. Afterwards, the cognitive process should be created victimization the exponent dispute in strain prices which is then practical to the OLS and TLS methods. In equation (18), is a faithful measure, is a hedge ratio (which is used as trading sized), is the error term, and and are the index differences in the stock prices and at time . We convert values of into a Z-score used as a trading signal. For instance, if the trading signal reaches the threshold, we short one partake of the overvalued stock (represented as ) and long shares of the undervalued stock (represented Eastern Samoa ). The hedge ratio is determined based on the window size. We set a total of half-dozen discrete window sizes to receive the optimal window size for the experiment. Trading windows are constituted using half of the formation-window size. The prepared obtained hither is used as a state when applying support learning (i.e., as an input of the DQN).
3.4. Proposed Method acting: Optimized Pairs-Trading Strategy Using the DQN Method
In this analyze, we optimise the pairs-trading strategy with a type of game using the DQN. We will effort to follow out an optimal pairs-trading strategy away pickings optimal trading and stop-loss boundaries that jibe to the given banquet, since carrying out depends on how trading and stop-release boundaries are set in pairs trading [14]. Public figure 3 shows the mechanism of our planned pairs-trading scheme. Throughout the cointegration test, we identify pairs and, exploitation regression analysis, obtain a hedge ratio used as trading volume and a spread used every bit a trading signal and state. In the case of the DQN, deuce hidden layers are set up and the number of neurons is optimized by taking one-half of input size through trial run and error. Action values consist of the six discrete spaces in Postpone 3. Each value of has values for trading and stop-passing boundaries.
| ||||||
Action | ||||||
A0 | A1 | A2 | A3 | A4 | A5 | |
| ||||||
Trading boundary | ||||||
Stop-loss boundary | ||||||
|
A pairs-trading system can make a profit if the spread touches the threshold and returns to the average such that the portfolio is closed in each trading window. Then again, if the trading boundary is touched and the stop-loss boundary is reached, the system of rules tries to denigrate losings by stopping trades. If the spread touches the trading boundary but fails to return to the average, the strategy may last up with a profit or a loss. In this study, the pairs-trading scheme is therefore considered every bit a kind of game; terminative a portfolio yields a positive reward and a portfolio that reaches its hitch-loss threshold yields a negative reward. Although an exited portfolio may possibly give a positive profit, thither is likewise a possibility that losses will occur and it is therefore set to yield a bad reward. We set the other conditions (such as the maintenance of the portfolio operating theater not to execute the portfolio) to zero so as to centre connected the close, contain-loss, and exit positions. We fix the values of portfolio close, stop-loss, and exit to +1000, −1000, and −500, respectively. When we update the Q-values, we must consider the pay back as a significant component part of efficiently training the DQN. We consequently set the reward value to have a range similar to that of the Q-value. Additionally, we included the in proportion to net income or loss prise to mull over that weight after the trading ended. In equation (19), and are the stock orders of stocks and at time , and are the well-worn prices of and at clock , and and are the stock prices of and at time .
Algorithm 1 shows the process of our planned method. Before we start our proposed method, we readiness a replay retention and batch size and select pairs using the cointegration test. At to each one epoch, we initialized total profit to 1.0. In the training scheme, we set a state which has spreads within the formation windowpane and select actions which are put-upon as trading and stop-loss boundaries. Throughout the trading window, we executed a strategy replaceable to a traditional pairs-trading scheme using the action selected. After capital punishment the strategy, we incur a payoff supported the results of the portfolio. Finally, for the Q-learnedness swear out, we update the Q-networks past performing a gradient descent step.
Format replay memory and pile size | |
Initialize deep Q-network | |
Select pairs using cointegration test | |
(1) For each epochact up | |
(2) Profit = 1.0 | |
(3) For steps t = 1, … until end of training data setdo | |
(4) Calculate spreads victimisation OLS OR TLS methods | |
(5) Obtain initial state by converting spread to Z-score based on formation window | |
(6) Using epsilon-greedy method, select a random action | |
(7) Otherwise select | |
(8) Execute time-honoured pairs-trading strategy supported happening the action selected | |
(9) Obtain reward by performing the pairs-trading strategy | |
(10) Set next state of matter | |
(11) Store transition in | |
(12) Sample minibatch of transition from . | |
(13) | |
(14) Update Q-electronic network away performing a slope descent tread on | |
(15) End | |
(16) End |
3.5. Execution Step
We check our experiment results based on net profit, maximum drawdown, and the Sharpe ratio. Profit is commonly used as a performance measure for trading strategies. IT is calculated as the sum of returns fetching into consideration trading cost. Since many trades derriere increase total profit, it is necessary to determine the total gain attractive into retainer transaction costs depending along trading volume. In this study, we set a trading toll of 5 bp; equation (21) is almost the same as equation (19), but it does non include absolute value, and is trading price. Maximum drawdown represents the uttermost accumulative loss from the highest to the lowest values of the portfolio during a bestowed investment period where is the economic value of the portfolio and is the period time evaluate. The Sharpe ratio is an index number of the degree of extra profits from investing in risky assets used in evaluating portfolios [33]. In equation (23), is the expected amount of portfolio returns and is the risk-free rate; we set this treasure to 0 and is the standard difference of portfolio returns. The Materials and Methods section should turn back sufficient inside information so that all procedures can be continual. It may be divided into headed subsections if several methods are described.
4. Results and Discussion
We use the stock mate XOM and CVX, which rejects the null hypothesis at the 1% import stage, to assert whether our planned model is trained well. The lengths of the windowpane sizes so much as the shaping window and trading window are chosen from the performance results with the education dataset. From these results, we select an optimized window size and comparison our projected model with traditional pairs trading, which takes a constant put up of actions with the mental test dataset.
4.1. Training Results
To find the optimum window sizing for the optimized pairs-trading system, we experimented with half dozen cases. We performed the experiments based on six window sizes, and the results for each window size are calculated by averaging the top-5 results for a absolute of 11 pairs. From Tables 4 and 5, we can find that the good operation is obtained when the formation and training windows are 30 and 15, respectively, founded on the profit generated by both the OLS and TLS methods. When we trained our networks, we set a positive reward for taking more closed positions and fewer stop-loss and exit positions. We can find the last ratio of portfolio closed in positions supported the issue of admissive positions, which in the formation and trading Windows are for 30 and 15 days (0.68). Unfavourable to this event, the highest ratios of the number of blocked positions in the formation and trading windows are for 120 and 60 days (0.73). However, the highest profit reported in the formation and trading windows are for 30 and 15 days. This can be explained when we check the ratio of the number of stoppag-loss portfolios. The formation and trading window sizes are 30 and 15 days and the ratio of portfolio stop-release posture is 0.13, but the formation and trading window sizes are 0.20. This result indicates that it is important to reduce the stop-exit position patc increasing the closed position. In addition, we john see that the trading signals made with the TLS method are better than those made with the OLS method in altogether cardinal of the discrete window sizes. The reason for this is supported the difference between the circumvent ratios of the two methods. In OLS, when one side is the reference, the relative change of the else side is estimated. Since the assumption is that on that point is no error portion connected the mention side and there is an error single on the other side, the hedge ratio varies depending on the side used atomic number 3 the reference. However, in TLS, hedging ratios are the same regardless of which side is used as the reference. For this reason, the inquiry results confirm that the TLS method is better able to determine when to execute the pairs-trading strategy. From these results, we bring forward the optimum windowpane size when we assert our proposed method in the test dataset. However, we first need to check that the modelling we proposed is well-drilled.
| ||||||||
Formation windowpane | Trading windowpane | MDD | Sharpe ratio | Net income | # of admissive portfolios | # of closed portfolios | # of stop-loss portfolios | # of portfolio exits |
| ||||||||
30 | 15 | −0.3682 | 0.1197 | 2.7344 | 328 | 225 | 44 | 58 |
60 | 30 | −0.3779 | 0.1327 | 2.5627 | 210 | 147 | 41 | 21 |
90 | 45 | −0.4052 | 0.1409 | 2.4112 | 160 | 114 | 34 | 11 |
120 | 60 | −0.4383 | 0.1165 | 2.0287 | 134 | 98 | 28 | 8 |
150 | 75 | −0.4395 | 0.1244 | 2.0098 | 110 | 80 | 24 | 6 |
180 | 90 | −0.5045 | 0.1180 | 1.9390 | 100 | 73 | 21 | 5 |
|
| ||||||||
Formation window | Trading windowpane | MDD | Sharpe ratio | Profit | # of open portfolios | # of closed portfolios | # of stop consonant-loss portfolios | # of exited portfolios |
| ||||||||
30 | 15 | −0.4422 | 0.1061 | 2.9436 | 320 | 229 | 46 | 44 |
60 | 30 | −0.5031 | 0.1143 | 2.5806 | 204 | 144 | 42 | 17 |
90 | 45 | −0.5824 | 0.1072 | 2.4588 | 155 | 110 | 36 | 9 |
120 | 60 | −0.5768 | 0.1181 | 2.4378 | 136 | 98 | 31 | 6 |
150 | 75 | −0.5805 | 0.1245 | 2.4127 | 110 | 79 | 26 | 5 |
180 | 90 | −0.5467 | 0.1209 | 2.3570 | 100 | 72 | 23 | 4 |
|
It is important to check whether our reinforcement learning algorithm is trained fortunate. Reference [21] suggested that a steadily increasing average of Q-values is evidence that the DQN is learning well. Fles 4(a) shows the average Q-values of HON and TXN as training progressed. We chance that the average Q-values steady increased, indicating that our proposed poser is properly trained. Additionally, we provide a optimistic reward when the portfolio closes and a negative reward when the portfolio reaches the stop-loss threshold or exits. Figure 4(b) shows the ratio of the number of portfolio positions as preparation progressed. The ratio of closed to unfold portfolio positions increased and the ratio of portfolios arrival their stop-loss thresholds to open portfolio positions decreased. We also find out that the ratio of portfolio exits to ajar portfolio positions slightly increased. IT is possible that the rewards given for an open portfolio spot compared to those given for a closed portfolio position are relatively dwarfish. The DQN is therefore trained to prevent portfolios from stretch their stop-passing thresholds (the much world-shaking target) over exiting them. This result can also serve A a groundwork for judging whether the proposed model is being trained properly.
(a)
(b)
Tables 6 and 7 represent the performance results of XOM and CVX in the training dataset. We call our proposed model pairs-trading DQN (PTDQN) and traditional pairs trading with constant military action values as pairs trading with action 0 (PTA0) to pairs trading with action 5 (PTA5). From this solvent, we can confirm that our proposed method acting is more profitable than the constant pairs-trading strategies. In addition, we can meet that the TLS method has a higher gainfulness compared to the OLS method. From PTA0 to PTA5, the trading bounds and the stop consonant-loss boundary grew larger; the numbers of open and out of use portfolios and portfolios that reached their stop-loss thresholds are rock-bottom. In other quarrel, there is less chance for profit, simply the probability of loss is also reduced. It is important non only to take back a good deal of closed positions, but as wel to take the high-grade action to open and close the portfolio. For exercise, if a portfolio is opened and obstructed past a boundary corresponding to action 0 inside the unchanged spread and if a portfolio is wide-eyed and drawn by a boundary corresponding to action 1, the same profit is different. Assuming that the awful reversion is certain to occur, if we take the maximum edge condition to open a portfolio, we will obtain a larger profit than when we take a smaller boundary condition. We can see that the PTDQN returns are higher than the strategy with the highest return among the traditional pairs trading strategies that take the constant carry out. Figures 5–8 show the changes in trading and stop-loss boundaries and the highest profit for constant action when applying the DQN method acting during the preparation period using OLS and TLS.
| |||||||
Model | MDD | Sharpe ratio | Profit | # of open portfolios | # of closed portfolios | # of stop-loss portfolios | # of exited portfolios |
| |||||||
PTDQN | −0.0842 | 0.1835 | 3.4068 | 469 | 336 | 64 | 96 |
PTA0 | −0.2014 | 0.1452 | 2.5934 | 565 | 382 | 132 | 50 |
PTA1 | −0.1431 | 0.1773 | 2.7603 | 409 | 279 | 45 | 84 |
PTA2 | −0.1234 | 0.1955 | 2.6307 | 325 | 191 | 16 | 118 |
PTA3 | −0.2586 | 0.0861 | 1.3850 | 208 | 86 | 2 | 120 |
PTA4 | −0.2591 | 0.0803 | 1.1933 | 124 | 39 | 2 | 83 |
PTA5 | −0.2448 | −0.0638 | 0.8588 | 47 | 11 | 0 | 36 |
|
| |||||||
Model | MDD | Sharpe ratio | Profit | # of open portfolios | # of closed portfolios | # of plosive-loss portfolios | # of exited portfolios |
| |||||||
PTDQN | −0.0944 | 0.2133 | 4.8760 | 541 | 399 | 104 | 63 |
PTA0 | −0.1210 | 0.1522 | 4.1948 | 579 | 413 | 125 | 41 |
PTA1 | −0.1015 | 0.1650 | 3.8834 | 430 | 310 | 50 | 70 |
PTA2 | −0.1483 | 0.1722 | 3.3425 | 320 | 209 | 13 | 98 |
PTA3 | −0.1386 | 0.1771 | 2.4385 | 217 | 101 | 3 | 113 |
PTA4 | −0.1749 | 0.1602 | 1.6852 | 119 | 38 | 2 | 79 |
PTA5 | −0.2862 | 0.0137 | 1.0362 | 55 | 10 | 0 | 45 |
|
Figures 5 and 6 show comparisons of PTDQN and PTA1 victimization the TLS method. Figure 5 consists of the spread, trading, and stop-loss boundaries. We find that trading and stop-going boundaries have different values in PTDQN, showing that it has learned to find the optimal boundary according to each spread. In contrast to PTDQN, PTA1 in Figure 6 has constant trading and turn back-loss boundaries. Figures 7 and 8 exhibit the same features we see in Figures 5 and 6. The difference between these methods lies in the spreads: different results can be obtained contingent on the spreads used. Making better spreads can therefore amend carrying into action.
Figures 9 and 10 represent the profit corresponding to DQN and unceasing actions exploitation TLS and OLS. Reference book [34] suggested that an middling value over multiple trials should be presented to show the reproducibility of deep reinforcement learning because there may embody contrastive results from soprano variances crosswise trials and random seeds. We thus conducted five trials with antithetical unselected seeds. The profit graph of DQN represents the average profit of these trials and the filled region between the uttermost and minimum gain values. We can ascertain that PTDQN had a higher net than the long-standing pairs-trading strategies during the breeding period. This means that, even with the aforesaid spread, we can see how profit will variety As the boundaries are changed. In separate words, finding the optimal boundary for the spread is an important factor in optimizing the profitability of pairs trading.
4.2. Test Results
Tables 8 and 9 express the average performance measures of each dyad tested by applying the top-5 trained models. We can see that the constant action with the highest returns for from each one pair is different, and the TLS method is higher all told pairs than the OLS method founded on profit, as shown preceding. We also find that PTDQN has improve performance than traditional pairs-trading strategies. The pair with the highest profit using the projected method is HON and TXN (3.2755); it also shows the biggest difference between the DQN method and the optimal constant action (0.9377). We find that the proposed method has a higher Sharpe ratio all told pairs except for Molybdenum and UTX when the TLS method acting is utilised. If we add the Sharpe ratio to boot to the unconditional net income atomic number 3 an objective function, we can build a more than optimized pairs-trading system. Supported these results, we give notice ensure the robustness of our proposed method acting for our dataset. The proposed method can be applied to else pairs of stocks found in another global markets.
| ||||||||
Pairs | Model | MDD | Sharpe ratio | Profit | # of open portfolios | # of out of use portfolios | # of stop-loss portfolios | # of exited portfolios |
| ||||||||
MSFT/JPM | PTDQN | −0.1122 | 0.2294 | 3.0446 | 186 | 126 | 38 | 62 |
PTA0 | −0.3411 | 0.0742 | 1.6236 | 211 | 136 | 57 | 18 | |
PTA1 | −0.2907 | 0.0979 | 1.8001 | 162 | 104 | 26 | 32 | |
PTA2 | −0.1507 | 0.1936 | 2.6303 | 131 | 64 | 7 | 60 | |
PTA3 | −0.4032 | 0.1542 | 1.8282 | 97 | 39 | 1 | 57 | |
PTA4 | −0.4340 | 0.0400 | 1.0480 | 55 | 13 | 0 | 42 | |
PTA5 | −0.1836 | 0.3098 | 1.5524 | 30 | 7 | 0 | 23 | |
| ||||||||
MSFT/TXN | PTDQN | −0.3420 | 0.1001 | 1.5423 | 204 | 132 | 47 | 65 |
PTA0 | −1.2094 | −0.0571 | 0.0013 | 244 | 152 | 76 | 16 | |
PTA1 | −0.9225 | −0.0177 | 0.6131 | 178 | 110 | 25 | 43 | |
PTA2 | −0.5574 | 0.0351 | 1.0887 | 134 | 68 | 8 | 58 | |
PTA3 | −0.5375 | −0.0128 | 0.8326 | 97 | 34 | 1 | 62 | |
PTA4 | −0.4485 | 0.0260 | 1.0118 | 66 | 15 | 1 | 50 | |
PTA5 | −0.1048 | 0.1233 | 1.1502 | 32 | 5 | 0 | 27 | |
| ||||||||
BRKa/ABT | PTDQN | −0.0740 | 0.3159 | 2.3655 | 162 | 111 | 30 | 43 |
PTA0 | −0.1392 | 0.1554 | 1.7157 | 182 | 128 | 35 | 18 | |
PTA1 | −0.1048 | 0.2464 | 2.1508 | 138 | 96 | 15 | 27 | |
PTA2 | −0.1133 | 0.2538 | 1.9578 | 108 | 64 | 3 | 40 | |
PTA3 | −0.1040 | 0.2480 | 1.7576 | 76 | 35 | 1 | 40 | |
PTA4 | −0.0829 | 0.2087 | 1.3171 | 44 | 13 | 0 | 31 | |
PTA5 | −0.0704 | 0.4366 | 1.4013 | 19 | 7 | 0 | 12 | |
| ||||||||
BRKa/UTX | PTDQN | −0.5401 | 0.1174 | 1.5744 | 167 | 105 | 35 | 58 |
PTA0 | −1.2143 | −0.0199 | 0.5918 | 192 | 117 | 55 | 19 | |
PTA1 | −0.9340 | 0.0346 | 1.0701 | 147 | 89 | 12 | 45 | |
PTA2 | −0.9099 | −0.0009 | 0.8435 | 122 | 60 | 5 | 57 | |
PTA3 | −0.5673 | 0.0473 | 1.1520 | 89 | 32 | 1 | 56 | |
PTA4 | −0.3641 | 0.0694 | 1.1628 | 53 | 9 | 0 | 44 | |
PTA5 | −0.2309 | 0.0408 | 1.0405 | 18 | 3 | 0 | 15 | |
| ||||||||
JPM/T | PTDQN | −0.1384 | 0.1283 | 1.4653 | 175 | 113 | 42 | 53 |
PTA0 | −0.3630 | 0.0071 | 0.8968 | 205 | 129 | 60 | 15 | |
PTA1 | −0.2801 | 0.0460 | 1.1595 | 144 | 94 | 17 | 32 | |
PTA2 | −0.3750 | 0.0192 | 0.9987 | 119 | 62 | 5 | 51 | |
PTA3 | −0.5241 | −0.0717 | 0.6609 | 92 | 35 | 0 | 56 | |
PTA4 | −0.3607 | −0.0550 | 0.8411 | 56 | 18 | 0 | 38 | |
PTA5 | −0.2235 | 0.0061 | 0.9851 | 22 | 6 | 0 | 16 | |
| ||||||||
JPM/HON | PTDQN | −0.1872 | 0.1523 | 2.2510 | 223 | 155 | 39 | 62 |
PTA0 | −0.6769 | 0.0190 | 1.0077 | 274 | 180 | 70 | 23 | |
PTA1 | −0.4644 | 0.0622 | 1.6331 | 201 | 139 | 24 | 38 | |
PTA2 | −0.4537 | 0.0840 | 1.7165 | 149 | 87 | 2 | 60 | |
PTA3 | −0.2410 | 0.1414 | 1.7648 | 107 | 43 | 0 | 64 | |
PTA4 | −0.3313 | 0.0879 | 1.3150 | 62 | 16 | 0 | 46 | |
PTA5 | −0.1693 | 0.1803 | 1.2777 | 28 | 7 | 0 | 21 | |
| ||||||||
JPM/GE | PTDQN | −0.1098 | 0.2123 | 2.8250 | 193 | 124 | 46 | 65 |
PTA0 | −0.3897 | 0.0507 | 1.5137 | 224 | 142 | 65 | 17 | |
PTA1 | −0.3404 | 0.0640 | 1.6912 | 163 | 109 | 18 | 36 | |
PTA2 | −0.1628 | 0.1284 | 1.9032 | 132 | 73 | 6 | 53 | |
PTA3 | −0.2980 | 0.1142 | 1.7555 | 106 | 38 | 1 | 67 | |
PTA4 | −0.2817 | 0.0790 | 1.2884 | 55 | 13 | 0 | 42 | |
PTA5 | −0.0612 | 0.4776 | 1.7489 | 21 | 6 | 0 | 15 | |
| ||||||||
JNJ/WFC | PTDQN | −0.1576 | 0.2437 | 2.3741 | 143 | 100 | 28 | 38 |
PTA0 | −0.2872 | 0.0892 | 1.4932 | 164 | 115 | 37 | 12 | |
PTA1 | −0.2219 | 0.1948 | 2.1147 | 127 | 90 | 15 | 21 | |
PTA2 | −0.3188 | 0.1322 | 1.6362 | 99 | 55 | 5 | 38 | |
PTA3 | −0.2324 | 0.1084 | 1.3141 | 68 | 27 | 0 | 41 | |
PTA4 | −0.1532 | 0.1043 | 1.1228 | 40 | 14 | 0 | 26 | |
PTA5 | −0.0970 | 0.1203 | 1.0734 | 16 | 6 | 0 | 10 | |
| ||||||||
XOM/CVX | PTDQN | −0.4265 | 0.0605 | 1.1924 | 218 | 135 | 45 | 77 |
PTA0 | −0.6189 | 0.0236 | 0.8812 | 256 | 161 | 67 | 28 | |
PTA1 | −0.5999 | 0.0154 | 0.8809 | 197 | 118 | 25 | 54 | |
PTA2 | −0.6034 | −0.0073 | 0.7792 | 153 | 70 | 8 | 75 | |
PTA3 | −0.5628 | −0.0224 | 0.7734 | 114 | 38 | 2 | 74 | |
PTA4 | −0.5311 | −0.0200 | 0.8643 | 70 | 18 | 1 | 51 | |
PTA5 | −0.2583 | 0.0060 | 0.9692 | 31 | 4 | 0 | 27 | |
| ||||||||
HON/TXN | PTDQN | −0.0874 | 0.2679 | 3.2755 | 233 | 164 | 49 | 63 |
PTA0 | −0.5108 | 0.1080 | 1.9219 | 276 | 186 | 66 | 23 | |
PTA1 | −0.5841 | 0.1625 | 2.3378 | 207 | 140 | 28 | 38 | |
PTA2 | −0.1926 | 0.2086 | 2.3096 | 158 | 92 | 4 | 62 | |
PTA3 | −0.1611 | 0.1557 | 1.7100 | 114 | 49 | 2 | 63 | |
PTA4 | −0.1254 | 0.2289 | 1.6374 | 69 | 23 | 0 | 46 | |
PTA5 | −0.1578 | 0.1924 | 1.1925 | 28 | 9 | 0 | 19 | |
| ||||||||
GE/TXN | PTDQN | −0.1133 | 0.1871 | 2.1398 | 172 | 117 | 30 | 48 |
PTA0 | −0.3348 | 0.0967 | 1.6398 | 201 | 136 | 44 | 21 | |
PTA1 | −0.1656 | 0.1070 | 1.6355 | 153 | 101 | 19 | 33 | |
PTA2 | −0.2043 | 0.1388 | 1.7568 | 117 | 68 | 8 | 41 | |
PTA3 | −0.2335 | 0.1591 | 1.5555 | 89 | 39 | 2 | 48 | |
PTA4 | −0.3847 | −0.1355 | 0.6570 | 45 | 7 | 0 | 38 | |
PTA5 | −0.3489 | −0.2730 | 0.7218 | 21 | 2 | 0 | 19 | |
| ||||||||
MO/UTX | PTDQN | −0.5264 | 0.0840 | 1.2940 | 150 | 88 | 35 | 58 |
PTA0 | −1.0950 | −0.0272 | 0.6231 | 178 | 102 | 56 | 19 | |
PTA1 | −0.7205 | 0.0286 | 1.0362 | 125 | 73 | 12 | 39 | |
PTA2 | −0.8361 | −0.0040 | 0.8658 | 105 | 51 | 3 | 50 | |
PTA3 | −0.4311 | 0.0052 | 0.9323 | 79 | 24 | 0 | 54 | |
PTA4 | −0.3916 | 0.1141 | 1.2129 | 48 | 12 | 0 | 36 | |
PTA5 | −0.1311 | 0.2948 | 1.1276 | 14 | 3 | 0 | 11 | |
|
| ||||||||
Pairs | Model | MDD | Sharpe ratio | Profit | # of open portfolios | # of closed portfolios | # of stop-personnel casualty portfolios | # of exited portfolios |
| ||||||||
MSFT/JPM | PTDQN | −0.2096 | 0.1228 | 1.9255 | 215 | 137 | 54 | 62 |
PTA0 | −0.3618 | 0.0492 | 1.3365 | 225 | 141 | 61 | 23 | |
PTA1 | −0.5036 | 0.0188 | 1.0185 | 168 | 102 | 28 | 38 | |
PTA2 | −0.4045 | 0.0611 | 1.3591 | 124 | 59 | 8 | 57 | |
PTA3 | −0.5055 | −0.0094 | 0.8636 | 97 | 33 | 3 | 61 | |
PTA4 | −0.4195 | −0.0009 | 0.9459 | 58 | 12 | 1 | 45 | |
PTA5 | −0.2018 | 0.1236 | 1.1593 | 29 | 6 | 0 | 23 | |
| ||||||||
MSFT/TXN | PTDQN | −0.2878 | 0.0698 | 1.3466 | 244 | 153 | 65 | 68 |
PTA0 | −0.5271 | 0.0070 | 0.8489 | 252 | 156 | 72 | 24 | |
PTA1 | −0.4721 | 0.0255 | 1.0286 | 187 | 117 | 26 | 44 | |
PTA2 | −0.3816 | 0.0215 | 0.9912 | 145 | 71 | 10 | 64 | |
PTA3 | −0.6553 | −0.1015 | 0.5053 | 104 | 30 | 2 | 72 | |
PTA4 | −0.2719 | 0.0422 | 1.0532 | 63 | 16 | 1 | 46 | |
PTA5 | −0.1850 | 0.0068 | 0.9785 | 34 | 7 | 0 | 27 | |
| ||||||||
BRKa/ABT | PTDQN | −0.1282 | 0.1644 | 1.5076 | 180 | 109 | 48 | 57 |
PTA0 | −0.5073 | −0.0265 | 0.7070 | 183 | 112 | 48 | 22 | |
PTA1 | −0.2649 | 0.0453 | 1.0786 | 139 | 80 | 13 | 46 | |
PTA2 | −0.2246 | 0.1056 | 1.2942 | 121 | 60 | 4 | 56 | |
PTA3 | −0.1686 | 0.1241 | 1.2718 | 91 | 38 | 1 | 52 | |
PTA4 | −0.1483 | 0.0176 | 0.9778 | 49 | 12 | 0 | 37 | |
PTA5 | −0.1602 | 0.0004 | 0.9830 | 16 | 2 | 0 | 14 | |
| ||||||||
BRKa/UTX | PTDQN | −0.5231 | 0.0816 | 1.2976 | 215 | 132 | 57 | 69 |
PTA0 | −1.1928 | −0.0647 | 0.3332 | 216 | 133 | 57 | 25 | |
PTA1 | −0.8697 | −0.0157 | 0.7445 | 167 | 100 | 15 | 51 | |
PTA2 | −0.7815 | −0.0071 | 0.8391 | 135 | 70 | 5 | 60 | |
PTA3 | −0.3573 | 0.0315 | 1.0292 | 94 | 36 | 0 | 58 | |
PTA4 | −0.2096 | 0.0684 | 1.0857 | 52 | 11 | 0 | 41 | |
PTA5 | −0.1317 | −0.1174 | 0.9312 | 16 | 2 | 0 | 14 | |
| ||||||||
JPM/T | PTDQN | −0.1338 | 0.1391 | 1.4547 | 205 | 127 | 60 | 50 |
PTA0 | −0.3588 | 0.0069 | 0.9054 | 208 | 130 | 61 | 16 | |
PTA1 | −0.2535 | 0.0405 | 1.0902 | 151 | 96 | 19 | 35 | |
PTA2 | −0.1872 | 0.0542 | 1.1198 | 119 | 66 | 5 | 48 | |
PTA3 | −0.2574 | 0.0336 | 1.0502 | 94 | 39 | 0 | 55 | |
PTA4 | −0.2212 | 0.0345 | 1.0312 | 57 | 20 | 0 | 37 | |
PTA5 | −0.2348 | −0.1922 | 0.8299 | 20 | 5 | 0 | 15 | |
| ||||||||
JPM/HON | PTDQN | −0.3869 | 0.1071 | 1.5175 | 250 | 162 | 57 | 68 |
PTA0 | −0.7141 | 0.0181 | 0.9444 | 256 | 166 | 59 | 30 | |
PTA1 | −0.5065 | 0.0702 | 1.3071 | 198 | 127 | 22 | 49 | |
PTA2 | −0.4649 | 0.1071 | 1.4260 | 152 | 84 | 3 | 65 | |
PTA3 | −0.4871 | 0.0763 | 1.2098 | 102 | 44 | 0 | 58 | |
PTA4 | −0.3503 | −0.0694 | 0.8178 | 50 | 13 | 0 | 37 | |
PTA5 | −0.2980 | −0.1721 | 0.8040 | 23 | 6 | 0 | 17 | |
| ||||||||
JPM/GE | PTDQN | −0.1195 | 0.1443 | 1.7682 | 226 | 133 | 64 | 69 |
PTA0 | −0.4379 | 0.0036 | 0.8549 | 232 | 137 | 66 | 29 | |
PTA1 | −0.1523 | 0.0987 | 1.4814 | 165 | 98 | 16 | 51 | |
PTA2 | −0.1738 | 0.1264 | 1.5661 | 134 | 62 | 5 | 67 | |
PTA3 | −0.2680 | 0.0729 | 1.2026 | 93 | 29 | 0 | 64 | |
PTA4 | −0.2104 | 0.1298 | 1.3242 | 51 | 12 | 0 | 39 | |
PTA5 | −0.1461 | −0.0423 | 0.9586 | 18 | 3 | 0 | 15 | |
| ||||||||
JNJ/WFC | PTDQN | −0.1890 | 0.1266 | 1.7194 | 202 | 130 | 47 | 56 |
PTA0 | −0.8705 | −0.0326 | 0.4635 | 207 | 131 | 53 | 22 | |
PTA1 | −0.6189 | −0.0134 | 0.7318 | 150 | 91 | 19 | 39 | |
PTA2 | −0.4763 | 0.0309 | 1.0563 | 124 | 57 | 4 | 62 | |
PTA3 | −0.2318 | 0.1447 | 1.6072 | 97 | 33 | 2 | 62 | |
PTA4 | −0.2415 | 0.0549 | 1.0632 | 50 | 13 | 0 | 37 | |
PTA5 | −0.0880 | 0.2468 | 1.1886 | 20 | 4 | 0 | 16 | |
| ||||||||
XOM/CVX | PTDQN | −0.3316 | 0.0265 | 1.1517 | 141 | 81 | 23 | 43 |
PTA0 | −0.7629 | −0.0547 | 0.4186 | 240 | 149 | 61 | 30 | |
PTA1 | −0.5648 | 0.0132 | 0.8754 | 193 | 114 | 23 | 56 | |
PTA2 | −0.6977 | −0.0387 | 0.6655 | 154 | 70 | 7 | 77 | |
PTA3 | −0.5235 | 0.0277 | 0.9865 | 117 | 38 | 1 | 78 | |
PTA4 | −0.4781 | −0.0577 | 0.8117 | 63 | 12 | 1 | 50 | |
PTA5 | −0.3787 | −0.1492 | 0.8090 | 29 | 3 | 0 | 26 | |
| ||||||||
HON/TXN | PTDQN | −0.1339 | 0.1534 | 1.8852 | 270 | 175 | 64 | 69 |
PTA0 | −0.4135 | 0.0212 | 0.9455 | 276 | 177 | 70 | 28 | |
PTA1 | −0.2758 | 0.0666 | 1.3216 | 207 | 124 | 27 | 55 | |
PTA2 | −0.2614 | 0.1054 | 1.5031 | 159 | 84 | 5 | 69 | |
PTA3 | −0.1759 | 0.1413 | 1.5617 | 117 | 45 | 2 | 70 | |
PTA4 | −0.0834 | 0.2650 | 1.7044 | 66 | 23 | 0 | 43 | |
PTA5 | −0.0664 | 0.4606 | 1.6830 | 30 | 13 | 0 | 17 | |
| ||||||||
GE/TXN | PTDQN | −0.1676 | 0.1263 | 1.6411 | 206 | 140 | 43 | 62 |
PTA0 | −0.6133 | 0.0178 | 0.9742 | 211 | 144 | 44 | 23 | |
PTA1 | −0.3085 | 0.0586 | 1.2743 | 166 | 109 | 19 | 38 | |
PTA2 | −0.2402 | 0.0585 | 1.2216 | 128 | 68 | 5 | 55 | |
PTA3 | −0.3190 | −0.0013 | 0.9193 | 91 | 31 | 2 | 58 | |
PTA4 | −0.2493 | −0.0285 | 0.9117 | 49 | 8 | 0 | 41 | |
PTA5 | −0.0862 | 0.1417 | 1.0936 | 23 | 4 | 0 | 19 | |
| ||||||||
Show Me State/UTX | PTDQN | −0.3181 | 0.0524 | 1.1402 | 188 | 117 | 49 | 59 |
PTA0 | −0.4688 | 0.0041 | 0.8667 | 195 | 121 | 52 | 21 | |
PTA1 | −0.6166 | −0.0230 | 0.7470 | 144 | 84 | 13 | 46 | |
PTA2 | −0.5034 | −0.0076 | 0.8666 | 115 | 51 | 4 | 59 | |
PTA3 | −0.2833 | 0.0457 | 1.0873 | 88 | 32 | 0 | 56 | |
PTA4 | −0.2901 | 0.0356 | 1.0280 | 44 | 12 | 0 | 32 | |
PTA5 | −0.1500 | 0.0992 | 1.0297 | 13 | 2 | 0 | 11 | |
|
In Figure 11, we can see that our proposed method, PTDQN, outperforms the traditional pairs trading strategies that have constant actions in test dataset. The crucial aspect of this method is the selection of optimal boundary in the spread that makes the highest profit in constant action, which is like a constant boundary. Thus, the trend is the same American Samoa time-honoured pairs trading strategies; however, when the optimum boundaries which have the highest profits in the spreadhead are combined, PTDQN is found to have high profit than time-honoured pairs trading strategies. This method can thus be applied in various fields when at that place is a need to optimise the efficiency of a rule-based strategy [35, 36]. In that subject field, we consider spread and boundaries to be the important factors of pairs trading strategy. Therefore, we tried to optimize pairs trading strategy with versatile trading and stop-loss boundaries using heavy reinforcement learning and our method outperforms linguistic rule-based strategies. By optimizing key parameters in rule out-settled methods, it can improve the performances.
(a) MSFT/JPM
(b) MSFT/TXN
(c) BRKa/ABT
(d) BRKa/UTX
(e) JPM/T
(f) JPM/HON
(g) JPM/GE
(h) JNJ/WFC
(i) XOM/CVX
(j) HON/TXN
(k) Germanium/TXN
Pairs trading uses two types of regular which have the same trends. However, it bum be broken imputable various factors such as economic issues and companion risk. In this situation, the spread between two stocks is highly large. Although this situation cannot Be avoided, we hedge this risk by taking a dynamic boundary. In this sense, winning the worst stop-going boundary is the best choice since information technology can be overcome with the least release. Away taking the propellant boundary using the deep reinforcement learnedness method acting, we can see that not only win are increased, simply losings are also minimized as compared to taking a fixed boundary.
5. Conclusions
We propose a refreshing approach to optimize pairs trading strategy victimization a abysmal reinforcement learnedness method, especially deep Q-networks. Thither are two key enquiry questions posed. First, if we set a dynamic boundary based on a spread in all trading window, can it accomplish higher profit than handed-down pairs trading strategy? 2nd, is it conceivable that deep reinforcement learning method seat be trained to take after this mechanism? To investigate these questions, we collected pairs selected using the cointegration test. We experimented with how the results varied according to the spread and the method used. We therefore set diametrical spreads using OLS and TLS methods as the input of the DQN and the trading sign. To conduct this experiment, we set up a formation window and a trading window. The hedge ratio, which is an important factor in determining how much stock to necessitate, depends on this value. We thus applied the OLS and TLS methods and experimented to find the optimal window size of it by varying the establishment window and the trading windowpane.
Tables 6 and 7 show the average carrying out values of the formation windows and trading windows in the grooming dataset. The results show that all six windowpane sizes were higher when TLS spreads were utilized than in OLS spreads. In addition, we can see that profitability step by step increases as the estimation Windows and trading windows of methods victimization TLS and OLS reduced. The reason is that although the ratio of closed position portfolio is the lowest in what we set formation and trading windows, the ratio of blockage-exit position portfolio is as wel the lowest compared with other formation and trading windows. It means that reducing stop-loss position portfolio is important as well as increasing closed posture portfolio to make a turn a profit. Using the optimal windowpane size, we and so check-out procedure whether our DQN is properly trained. At each era, we find that the average Q-value steadily increased, the ratio of closed portfolios increased, and the ratio of portfolios that reached their stop-loss thresholds decreased, confirming that our DQN is pot-trained well. Based connected these results, we find that our planned fashion mode using the test dataset with a formation window of 30 and a trading window of 15 had results that were superior to those of traditional pairs-trading strategies in the out-of-try dataset. In Physique 11, we can see that the net income path of PTDQN is similar PTA0 to PTA5, but better than that from other methods. This shows that taking changing boundaries based on our method is underspent in optimizing the pairs trading strategy. During economic issues uncertainties, it can be a risk to manage the pairs trading strategies including our projected method acting. However, we set a reward function if fan out is all of a sudden high, and our network is trained to prevent this situation past taking less stop-loss boundary since it is trained to maximize the expected sum of future rewards. Therefore, our proposed method can minimize the risk when the economic risks appeared compared with traditional pairs trading strategy with fixed boundary.
From the experimental results, we show that our method acting can be applied in the pairs trading system. IT can exist applied in various fields, including finance and economic science, when on that point is a need to optimise the efficiency of a rule-based strategy. Furthermore, we find that our method outperforms the traditional pairs trading strategy in all pairs based on constituent stocks in Sdanamp;P 500. If we select appropriate pairs which are cointegrated, we can apply our methods to other markets such as KOSPI, Nikkei, and Attend Seng. The study focused on only spreads successful by two stocks, which wealthy person long-full term labyrinthine sense patterns. Since our method selects optimal boundaries supported spreads, it can be applied to else stock markets such A KOSPI, Nikkei, and Hang Seng.
In future works, we can develop our proposed model as follows. First, as net profit was set as the objective function in that study, the performance of the model is take down than traditional pairs trading when supported other public presentation measures. It commode thus be possible to create a better-optimized pairs-trading strategy by including all these other functioning indicators as set forth of the object lens function. Second, we can use other statistical methods such as the Kalman filter and error-correction models to use diversified spreads. Finally, it is mathematical to create a more-optimized pairs-trading scheme by continuously changing the discrete set of window sizes and boundaries. We will solve these difficulties in future studies.
Data Availability
The data used to support the findings of this read have been deposited in the figshare repository (DOI: 10.6084/m9.figshare.7667645).
Disclosure
The funders had none role in the subject field design, data collection and analysis, decision to print, or preparation of the manuscript. This bring up represents a part of the study conducted A a Master Thesis in Financial Engineering during 2022 and 2022 at the University of Ajou, Republic of Han-Gook.
Conflicts of Interest
The authors declare that in that location are nobelium conflicts of interest regarding the publication of this report.
Acknowledgments
This work was supported by the National Research Fundament of Dae-Han-Min-Gook (NRF) grant funded away the Korea Politics (MSIT: Ministry of Science and ICT) (No. NRF-2017R1C1B5018038).
Copyright
Copyright © 2022 Taewook Kim and Ha Young Kim. This is an open access clause distributed below the Creative Common Attribution License, which permits nonsensitive use, distribution, and reproduction in any moderate, provided the original function is properly cited.
strategies to focus on one trading pair
Source: https://www.hindawi.com/journals/complexity/2019/3582516/
Posted by: evanshicustant.blogspot.com
0 Response to "strategies to focus on one trading pair"
Post a Comment