strategies to focus on one trading pair

Umpteen researchers have proved to optimize pairs trading as the numbers of opportunities for arbitrage profit have gradually ablated. Pairs trading is a grocery store-neutral scheme; it profit if the given condition is satisfied within a given trading window, and if not, there is a risk of deprivation. In this study, we nominate an optimized pairs-trading strategy victimisation deep reinforcement learning—particularly with the deep Q-network—utilizing various trading and occlusive-deprivation boundaries. More specifically, if spreads hit trading thresholds and reverse to the mean, the agent receives a positive reward. However, if spreads hit stop-loss thresholds or fail to reverse to the mean after hitting the trading thresholds, the factor receives a negative reward. The agent is trained to pick out the best storey of discretized trading and stop-loss boundaries given a banquet to maximize the expected sum of discounted future profits. Pairs are selected from stocks along the Sdanadenylic acid;P 500 Forefinger using a cointegration test. We compared our proposed method with traditional pairs-trading strategies which apply constant trading and occlusion-loss boundaries. We find that our proposed model is trained intimately and outperforms tralatitious pairs-trading strategies.

1. Introduction

Pairs trading is a method for obtaining arbitrage profit when on that point is a applied mathematics conflict between two stocks with similar characteristics that are cointegrated or highly correlated. This is possible because of the statistical reason that spreads made by 2 stocks have a mean reversion in the long run [1]. In the youth, pairs-trading methods were popular because of the opportunity to obtain arbitrage profit [1–4]. However, as many investors including circumvent funds sought these arbitrage opportunities by executing the pairs-trading strategy, its gainfulness began to deteriorate [5, 6]. To get the better of these shortcomings, significant research has been conducted to improve the pairs-trading strategy [7–10].

The mechanism of pairs trading is American Samoa follows. First, a pair of stocks with similar trends is identified. Second, regress analysis such as ordinary least squares (OLS), total method of least squares (TLS), and error correction models (ECM) is used to calculate the extended of these stocks. Finally, if the spread hits preset boundaries, investors will open a portfolio which takes a longsighted position along the undervalued stock and shorts the overvalued stock. Subsequently, if the spread reverses to the mean, investors will close the portfolios which are opposite position to the open portfolio. In this case, the investor obtains an arbitrage profit by executing this strategy. However, there is a risk when the spread does non reverse to the mean. In such a berth, investors are at high chance because they cannot close the portfolio. By scope a stop-loss boundary, investors can hedge the risk [11–13].

Galore researchers get applied various applied mathematics methods to improve the efficiency and public presentation of pairs trading. In particular, they concentrated happening using the spread as a trading signal. The study in [1] collected pairs of stocks based along minimizing the sum of squared deviations between the deuce stocks so executed the trading scheme if the difference between the pairs is double the standard deviation of the spread. They used normalized US breed price data from 1962 to 2002 to exam the lucrativeness of pairs trading. The take in [14] used the cointegration approach to protect the pairs-trading strategy from severe losses. They applied an OLS method to make over a spread and set off various conditions that translated into trading actions. From these models, they achieved a trading strategy with a stripped level of profits stormproof from risk of loss. The results showed about an 11% annualized excess return over the entire menses. The enquiry in [15] compared the distance and cointegration approaches for to each one high-oftenness and daily dataset to check whether it is profit-making for Norwegian seafood companies. The performance is similar between two approaches. Reference [16] used a Kalman separate out to calculate spread, which was so used as a high-frequency trading sign, on the shares constituting the KOSPI 100 Index. He found that the pairs-trading strategy's performance was significant connected the KOSPI and was better during daily market conditions at market opening and final. Furthermore, [7] optimized a pairs-trading system as a random control problem. They used the Ornstein-Uhlenbeck process to calculate spread as a trading signaling and tried and true their model with simulated data; the results showed that their scheme performs substantially. In summation, [17] suggested the Ornstein-Uhlenbeck swear out to induce a commercialise microstructure noise used as a trading signal in pairs trading strategy. The performance is better under this method than in traditionalistic estimators so much as ARIMA(1,1) and maximum likeliness. Reference work [18] applied a cointegration method to Chinese commodity futures from 2006 to 2022 to check whether pairs trading was suitable in that market. They used OLS regression to create spreads from the pairs. Furthermore, [10] applied a cointegration tryout to assorted pairs of stocks and a transmitter erroneous belief-discipline mould to create a trading signal.

It is important to set a boundary to optimize the pairs-trading scheme. This boundary is a criterion for deciding whether to perform a pairs-trading strategy. If a low boundary is rig, many strategies will follow executed, but profits wish live lower; if a high boundary is set, investors will get high returns when the strategy is executed. Nonetheless, all this assumes that mean reversion occurs. If the spread does not recurrence to the average in the specified trading window, losses leave follow incurred. If a low boundary is set, the going will be small. However, if the strategy is executed with a high boundary, the loss testament increase. Therefore, the performance of pair trading depends on how the boundary is readiness. Reference [14] suggested taking a borderline-profit condition, which could be efficient to reduce losses in a pairs-trading system. They set a trading rule with a diverse open condition: for exemplar, if the spread is higher up 0.3, 0.5, 0.75, 1.0, and 1.5 standard deviations. They used the daily closing prices from January 2, 2001, to August 30, 2002, of two stocks, the Australia Unused Sjaelland Bank and the Adelaide Banking company. The results showed that, as the open condition assess decreases, the number of trades and profits increases. Also [19] recommended optimal preset boundaries calculated from estimated parameters for the average trade continuance, intertrade interval, and number of trades and used them to maximize the minimum total profit. They used the daily closing price information from January 2, 2004, to June 30, 2005, of seven pairs of stocks on the Australian Securities market. The results showed that their proposed method was streamlined in making profits victimization the pairs-trading strategy. Reference [18] examined whether the pairs-trading strategy could be practical to the daily return of Chinese trade good futures from 2006 to 2022 using cardinal methods: classical, nonopening-loop, and dynamic break-loss. The closed-loop method takes only a stop-profit barrier which executes the strategy and does not consider the risk if spreads revert to the mean. The neoclassic method acting adds halt-loss boundaries to the closed-coil method acting. The high-powered break off-loss method acting uses a diverseness of stop-profit and kibosh-loss barriers to primed the spreads if the spread is large than the standard deviation, which is set using criteria based on the historical fair of spreads. The results showed that these methods obtained an annualized return of over 15%, especially the shuttered-loop method acting, which yielded the highest profit of 26.94%. In addition, [20] experimented with fixed best threshold pick, conditional excitability, centile, ghostlike analysis, and neural meshwork thresholds in pairs-trading scheme. Of these, the neuronic web threshold has outperformed all other strategies.

Favourable the success of reinforcement learning, demonstrated by its successful performance at Atari games [21], many researchers have attempted to apply this algorithm to the financial trading organization. Reference [22] projected a deep Q-trading system using reinforcement acquisition methods. They applied Q-learning to a trading system to trade automatically. They set a delta Leontyne Price victimisation information from the past 120 days, had three discrete action spaces (buy, hold, and deal), and used long-term profit as a reward. They ill-used daily data from January 01, 2001, to December 31, 2022, of the Hang Seng Indicator and the Sdanamp;P 500 Index. The experimental results showed that their proposed method outperformed steal-and-keep apart strategies and recurrent reinforcement learning methods. Reference [23] proposed three steps to hold reinforcement acquisition to the fiscal trading system. First, they diminished relative replay size to fit financial trading. Second, they proposed an action-augmentation proficiency that provides more feedback from the action to the federal agent. Third gear, they used foresightful sequences as reinforcement data to conduct recurrent neural net training. The experimental data comprised tick-by-mark data of 12 forex currency pairs from January 2012 to December 2022. The results showed that the action mechanism-augmentation proficiency yielded many profit than an epsilon-greedy policy. Reference [10] used an N-armlike brigand problem to optimize the pairs-trading strategy. They took the circulate using an wrongdoing-correction model and saved the parameters using a grid-hunt algorithm. They compared their projected modeling with a constant parametric quantity manikin, which was similar to a orthodox pairs-trading scheme. They used intraday one-minute data of some stocks in the FactSet database from June 2022 to January 2022. The performance of their projected model was better than the constant-parameter framework.

We investigate not sole the dynamic boundary based on a spread in each trading window—which can reach high profit than the unmoving limit in use in traditional pairs trading strategy—but also if IT is possible to train deep reinforcement eruditeness methods to follow this mechanism. To this cease, we aim a new-sprung method to optimize the pairs trading strategy using late reinforcement learning, especially deep Q-networks, since pairs trading scheme backside be view of atomic number 3 a game. After opening a portfolio position, the profit can be set whether portfolio is squinched, stop-loss berth. Therefore, if we set this strategy as a game by mise en scene boundaries which are optimized in spreads in trading window, we can accomplish more profit than traditional pairs trading strategies. In particular, we hardening the pairs-trading system to be a benign of game and obtain the best boundaries, trading thresholds, and stop-loss thresholds according to the calculated spreadhead. The intellect for this construction is that if the portfolio is opened and closed in the trading window in the calculated spread, it will be unconditionally profitable if the portfolio is closed. If the portfolio reaches the stop-loss boundary or does non meet to the mean, losses may occur. We therefore pose the DQN to learn by positively rewarding it if it takes a closed position and negatively rewarding it if it reaches the stop-expiration operating theater decease thresholds. We conducted the following experiments to swan that our proposed method acting is optimized compared to the conventional method acting. First, we used different spreads measured using OLS and TLS to see how the results differ depending on the spread used for input. Second, contingent on the geological formation window and trading windowpane, the counterpane and hedge ratio wish represent heterogenous. We therefore set a total of six window sizes for selecting the optimal window size which had the best performance. Finally, we compared the proposed method with the longstanding pairs-trading strategy using the test data with the optimal windowpane size. Therein try out, we use the daily adjusted closing prices from January 2, 1990, to July 31, 2022, of 50 stocks in the Sdanampere;P 500 Index. Experimental results show that our proposed method outperforms the traditional pairs-trading strategy across all the pairs. To boot, we can confirm that the performance measure varies according to the spread.

The main contributions of this study are Eastern Samoa follows. Firstborn, we propose a novel method to optimise pairs trading scheme victimization deep reinforcement learning, especially deep Q-networks with trading and stop-loss boundaries. The experimental results show that our method tail end be applied in the pairs trading system and also to various other Fields, including finance and economics, when there is a deman to optimize a rule-based strategy to be more efficient. Forward, we propose an optimized dynamic boundary based on a spread in apiece trading window. Our proposed method outperforms traditional pairs trading scheme which set a fixed edge. Last-place, we find that our method outperforms time-honored pairs trading strategy in whol pairs based on constituent stocks in SdanA;P 500. Since our method selects optimal boundaries based on spreads, IT can be applied to new stock markets such as KOSPI, Nikkei, and Hang Seng. It should exist noted that the present work is a part of the Original thesis [24].

The eternal rest of this paper is methodical arsenic follows. Section 2 explains the technical background. Section 3 describes the materials and methods. Section 4 shows the results and provides a discussion of the experiments. Segment 5 provides our conclusions to this field.

2. Technical Background

2.1. The Traditional Pairs-Trading Scheme

Pairs trading is a representative market-neutral trading scheme which simultaneously longs an undervalued stock and shorts an overvalued stock. This scheme is a take form of statistical arbitrage trading that assumes the movements of the prices of the ii assets will be similar to previous trends [1]. It follows the assumption that plus prices will counte to the long-terminal figure equilibrium. This strategy started from the melodic theme that arbitrage opportunities exist when the price opening between two assets expands to surgery past a certain level. It is also based on the belief that historical price movements will not change importantly in the future.

In Figure 1, the graph drawn in amobarbital sodium is a spread made of two stocks that are cointegrated, the red lines are the trading boundaries, and the green lines are the hold bac-loss boundaries. When this spread reaches the trading boundaries, the portfolio is yawning and only closed when the spread returns to the average. However, losings are incurred when prices hand the check-loss boundaries after the portfolio is agape and do not return to the average. Furthermore, after the portfolio is opened, if the trading signal is non reversed to mean during the trading window, the portfolio is closed by force; this is titled the exit put over of the portfolio.

2.1.1. The Cointegration Test

There are many approaches for pair selection such as the discrete approach [11, 25–27], the cointegration approach [10, 16, 27], and the stochastic approach [7, 8]. Therein study, we use the cointegration approach to choose pairs which have long-term equilibrium. Generally, a analog combining of nonstationary variables is also a nonstationary human relationship. Assume that and have unit roots; as antecedently mentioned, the linear combination of these variables follows nonstationary conditions. Withal, IT can be a stationary relationship if the nonstationary variables are cointegrated. In this case, this regression must be checked to determine whether information technology is a spurious regression operating theatre cointegrated. Johansen's method is widely used to test for cointegration [28]. In this method, the number of cointegration relations and the parameters of the model are estimated and tried using maximum likelihood estimation (MLE). Since all variables are regarded as endogenous variables, there is no deman to pick out reliant variables and multiple cointegration relationships are identified. In addition, we manipulation MLE to estimate the cointegration relation with the vector autoregression model and to determine the cointegration coefficient supported the likelihood-ratio try out. There is therefore an advantage in performing diverse theory tests connate the estimation of cointegration parameters and the setting of other models when at that place is cointegration, and not but to test for cointegration.

2.2. Spread Calculation

2.2.1. Workaday To the lowest degree Squares

In regression psychoanalysis, OLS is widely used to estimate parameters away minimizing the sum of the squared errors [29]. Assume that , , and are an self-reliant variable, a dependent variable, and an wrongdoing term. We can estimate from the following equation by winning a partial derivative: The value obtained from equation (5) is misused for the number of store orders. The epsilon apprais is also in use as a trading bespeak through Z-marking, in the state composed of the geological formation-window size.

2.2.2. Total Method of least squares

TLS estimates parameters to denigrate the sum of the measured distance and the vertical outstrip between regression lines [30]. Since the vertical distance does not change when the X and Y coordinates are changed, the rate of is calculated consistently. In the TLS method, the discovered values of and have the following fault terms: where and are true values and and are erroneous belief damage following free identical distributions. It is assumed that there is linear combination of true values. For convenience, we represent the error variance ratio in equation (10): The orthogonal regression calculator is calculated by minimizing the sum of the measured space and the vertical distance between regression lines in par (11): The value obtained from equation (12) is used in the same way as that obtained from equation (5) and the epsilon measure is also used as a trading signal done the Z-score in the res publica composed of the formation-window size.

2.3. Reinforcement Learning and the Deep Q-Meshing

The approximation of reinforcement learning is to find an optimal policy which maximizes the expected sum of discounted future rewards [31]. These rewards come from selecting the optimal value of each action, called the optimal Q-value. Reinforcement learning basically solves the problem definite past the Markov determination process (MDP). It consists of a tuple , where is a tensed set of states, is a finite set of actions, is a express transition probability matrix, is a reward mathematical function, and is a discount broker. In surroundings , agentive role-observed state at fourth dimension , action is selected. From the results of these sequences, environmental feedback is provided to the federal agent in the form of reward and next state . An action is chosen by the action-respect function that represents the expected sum of discounted forthcoming rewards. In this sue-value function , we find an optimal execute-value function , following an optimum policy which maximizes the expected sum of discounted time to come rewards. This optimal action-value function can be developed as the Bellboy equivalence. The DQN uses a nonlinear function approximator to estimate the action value function. This network is trained past minimizing a sequence of loss functions , which changes with each episode of . The weight of is updated as the sequence progresses:

3. Materials and Methods

3.1. Data

In this study, 50 stocks from the Sdanamp;P 500 Index were selected based on their trading bulk and market capitalization. To carry unconscious the experiment, the information must cover the same period. Thus, corresponding stocks were selected, leaving a total of 25 stocks. Postpone 1 represents the dataset of hackneyed names, abbreviations of those stocks, and their respective sectors. We collected the adjusted day by day closing prices using Thomson Reuters' database. The period of the training dataset is from January 2, 1990, to December 31, 2008, comprising 4792 data points; the test dataset covers the period from January 2, 2009, to July 31, 2022, comprising 2411 data points. From these datasets, a pair of stocks volition make up selected during the education dataset time period using the cointegration test.


No.	Ticker	Trite	Sector

1	AAPL	Apple Inc.	Technology
2	MSFT	Microsoft Corporation	Technology
3	BRKa	Berkshire Hathaway Inc.	Financial Services
4	JPM	JPMorgan Tail danadenylic acid; Cobalt.	Financial Services
5	JNJ	Johnson danamp; Johnson	Healthcare
6	XOM	Exxon Mobil Corporation	Vim
7	BAC	Bank of America Corporation	Fiscal Services
8	WFC	Wells Fargo danamp; Company	Financial Services
9	WMT	Walmart Inc.	Consumer Defensive
10	UNH	UnitedHealth Group Incorporated	Healthcare
11	CVX	Chevron Pot	Vitality
12	T	ATdanamp;T Inc.	Communication Services
13	PFE	Pfizer Inc.	Healthcare
14	ADBE	Adobe Systems Incorporated	Engineering
15	MCD	McDonald's Pot	Consumer Cyclical
16	MDT	Medtronic plc	Healthcare
17	MMM	3M Company	Industrials
18	HON	Honeywell International Inc.	Industrials
19	GE	Gross Electric Company	Industrials
20	ABT	Abbott Laboratories	Health care
21	MO	Altria Group, INC.	Consumer Defensive
22	UNP	Union Pacific Corporation	Industrials
23	TXN	Texas Instruments Incorporated	Technology
24	UTX	United Technologies Corporation	Industrials
25	LLY	Eli Lilly and Company	Health care

3.2. Selecting Pairs Exploitation the Cointegration Run

It is necessary to twin stocks which have semipermanent statistical relationships OR like price movements. Information technology is practical to determine the degree to which two stocks have had exchangeable price movements through the correlation value. Furthermore, the long-condition equilibrium of a pair of stocks is an essential characteristic for the execution of pairs trading. In this report, we used the cointegration set about to select pairs of stocks. Through Johansen's method, we designated 11 pairs of stocks that have endless-run equilibria. Table 2 shows the resulting pairs of stocks that were identified settled on t-statistics and Chassis 2 shows price movements of the cointegrated stocks XOM and CVX. Using this dataset, we will swan whether our projected method acting has better performance than the traditional pairs-trading method acting.


No.	Pairs	t-statistic	Correlation

1	MSFT/JPM	−3.5423	0.9165
2	MSFT/TXN	−3.448	0.8641
3	BRKa/ABT	−3.5148	0.9493
4	BRKa/UTX	−3.3992	0.9609
5	JPM/T	−3.5882	0.8486
6	JPM/HON	−5.8209	0.9250
7	JPM/GE	−3.4494	0.9105
8	JNJ/WFC	−3.5696	0.9693
9	XOM/CVX	−4.05	0.9879
10	HON/TXN	−4.0625	0.7469
11	GE/TXN	−3.467	0.9148

Mark:

and

refer a rejection of the null hypothesis at the 1% and 5% significance levels, severally.

3.3. Trading Sign

Aft selecting the pairs, it is essential to extract the betoken for trading. To extract signals, we opt for the OLS or TLS methods. Start, because the stock price follows a random walk around [32], we need to ensure that it follows the process through the increased Dickey-Fuller test. Afterwards, the cognitive process should be created victimization the exponent dispute in strain prices which is then practical to the OLS and TLS methods. In equation (18), is a faithful measure, is a hedge ratio (which is used as trading sized), is the error term, and and are the index differences in the stock prices and at time . We convert values of into a Z-score used as a trading signal. For instance, if the trading signal reaches the threshold, we short one partake of the overvalued stock (represented as ) and long shares of the undervalued stock (represented Eastern Samoa ). The hedge ratio is determined based on the window size. We set a total of half-dozen discrete window sizes to receive the optimal window size for the experiment. Trading windows are constituted using half of the formation-window size. The prepared obtained hither is used as a state when applying support learning (i.e., as an input of the DQN).

3.4. Proposed Method acting: Optimized Pairs-Trading Strategy Using the DQN Method

In this analyze, we optimise the pairs-trading strategy with a type of game using the DQN. We will effort to follow out an optimal pairs-trading strategy away pickings optimal trading and stop-loss boundaries that jibe to the given banquet, since carrying out depends on how trading and stop-release boundaries are set in pairs trading [14]. Public figure 3 shows the mechanism of our planned pairs-trading scheme. Throughout the cointegration test, we identify pairs and, exploitation regression analysis, obtain a hedge ratio used as trading volume and a spread used every bit a trading signal and state. In the case of the DQN, deuce hidden layers are set up and the number of neurons is optimized by taking one-half of input size through trial run and error. Action values consist of the six discrete spaces in Postpone 3. Each value of has values for trading and stop-passing boundaries.


	Action
	A0	A1	A2	A3	A4	A5

Trading boundary
Stop-loss boundary

A pairs-trading system can make a profit if the spread touches the threshold and returns to the average such that the portfolio is closed in each trading window. Then again, if the trading boundary is touched and the stop-loss boundary is reached, the system of rules tries to denigrate losings by stopping trades. If the spread touches the trading boundary but fails to return to the average, the strategy may last up with a profit or a loss. In this study, the pairs-trading scheme is therefore considered every bit a kind of game; terminative a portfolio yields a positive reward and a portfolio that reaches its hitch-loss threshold yields a negative reward. Although an exited portfolio may possibly give a positive profit, thither is likewise a possibility that losses will occur and it is therefore set to yield a bad reward. We set the other conditions (such as the maintenance of the portfolio operating theater not to execute the portfolio) to zero so as to centre connected the close, contain-loss, and exit positions. We fix the values of portfolio close, stop-loss, and exit to +1000, −1000, and −500, respectively. When we update the Q-values, we must consider the pay back as a significant component part of efficiently training the DQN. We consequently set the reward value to have a range similar to that of the Q-value. Additionally, we included the in proportion to net income or loss prise to mull over that weight after the trading ended. In equation (19), and are the stock orders of stocks and at time , and are the well-worn prices of and at clock , and and are the stock prices of and at time .

Algorithm 1 shows the process of our planned method. Before we start our proposed method, we readiness a replay retention and batch size and select pairs using the cointegration test. At to each one epoch, we initialized total profit to 1.0. In the training scheme, we set a state which has spreads within the formation windowpane and select actions which are put-upon as trading and stop-loss boundaries. Throughout the trading window, we executed a strategy replaceable to a traditional pairs-trading scheme using the action selected. After capital punishment the strategy, we incur a payoff supported the results of the portfolio. Finally, for the Q-learnedness swear out, we update the Q-networks past performing a gradient descent step.

Format replay memory and pile size
Initialize deep Q-network
Select pairs using cointegration test
(1) For each epochact up
(2) Profit = 1.0
(3) For steps t = 1, … until end of training data setdo
(4) Calculate spreads victimisation OLS OR TLS methods
(5) Obtain initial state by converting spread to Z-score based on formation window
(6) Using epsilon-greedy method, select a random action
(7) Otherwise select
(8) Execute time-honoured pairs-trading strategy supported happening the action selected
(9) Obtain reward by performing the pairs-trading strategy
(10) Set next state of matter
(11) Store transition in
(12) Sample minibatch of transition from .
(13)
(14) Update Q-electronic network away performing a slope descent tread on
(15) End
(16) End

3.5. Execution Step

We check our experiment results based on net profit, maximum drawdown, and the Sharpe ratio. Profit is commonly used as a performance measure for trading strategies. IT is calculated as the sum of returns fetching into consideration trading cost. Since many trades derriere increase total profit, it is necessary to determine the total gain attractive into retainer transaction costs depending along trading volume. In this study, we set a trading toll of 5 bp; equation (21) is almost the same as equation (19), but it does non include absolute value, and is trading price. Maximum drawdown represents the uttermost accumulative loss from the highest to the lowest values of the portfolio during a bestowed investment period where is the economic value of the portfolio and is the period time evaluate. The Sharpe ratio is an index number of the degree of extra profits from investing in risky assets used in evaluating portfolios [33]. In equation (23), is the expected amount of portfolio returns and is the risk-free rate; we set this treasure to 0 and is the standard difference of portfolio returns. The Materials and Methods section should turn back sufficient inside information so that all procedures can be continual. It may be divided into headed subsections if several methods are described.

4. Results and Discussion

We use the stock mate XOM and CVX, which rejects the null hypothesis at the 1% import stage, to assert whether our planned model is trained well. The lengths of the windowpane sizes so much as the shaping window and trading window are chosen from the performance results with the education dataset. From these results, we select an optimized window size and comparison our projected model with traditional pairs trading, which takes a constant put up of actions with the mental test dataset.

4.1. Training Results

To find the optimum window sizing for the optimized pairs-trading system, we experimented with half dozen cases. We performed the experiments based on six window sizes, and the results for each window size are calculated by averaging the top-5 results for a absolute of 11 pairs. From Tables 4 and 5, we can find that the good operation is obtained when the formation and training windows are 30 and 15, respectively, founded on the profit generated by both the OLS and TLS methods. When we trained our networks, we set a positive reward for taking more closed positions and fewer stop-loss and exit positions. We can find the last ratio of portfolio closed in positions supported the issue of admissive positions, which in the formation and trading Windows are for 30 and 15 days (0.68). Unfavourable to this event, the highest ratios of the number of blocked positions in the formation and trading windows are for 120 and 60 days (0.73). However, the highest profit reported in the formation and trading windows are for 30 and 15 days. This can be explained when we check the ratio of the number of stoppag-loss portfolios. The formation and trading window sizes are 30 and 15 days and the ratio of portfolio stop-release posture is 0.13, but the formation and trading window sizes are 0.20. This result indicates that it is important to reduce the stop-exit position patc increasing the closed position. In addition, we john see that the trading signals made with the TLS method are better than those made with the OLS method in altogether cardinal of the discrete window sizes. The reason for this is supported the difference between the circumvent ratios of the two methods. In OLS, when one side is the reference, the relative change of the else side is estimated. Since the assumption is that on that point is no error portion connected the mention side and there is an error single on the other side, the hedge ratio varies depending on the side used atomic number 3 the reference. However, in TLS, hedging ratios are the same regardless of which side is used as the reference. For this reason, the inquiry results confirm that the TLS method is better able to determine when to execute the pairs-trading strategy. From these results, we bring forward the optimum windowpane size when we assert our proposed method in the test dataset. However, we first need to check that the modelling we proposed is well-drilled.


Formation windowpane	Trading windowpane	MDD	Sharpe ratio	Net income	# of admissive portfolios	# of closed portfolios	# of stop-loss portfolios	# of portfolio exits

30	15	−0.3682	0.1197	2.7344	328	225	44	58
60	30	−0.3779	0.1327	2.5627	210	147	41	21
90	45	−0.4052	0.1409	2.4112	160	114	34	11
120	60	−0.4383	0.1165	2.0287	134	98	28	8
150	75	−0.4395	0.1244	2.0098	110	80	24	6
180	90	−0.5045	0.1180	1.9390	100	73	21	5


Formation window	Trading windowpane	MDD	Sharpe ratio	Profit	# of open portfolios	# of closed portfolios	# of stop consonant-loss portfolios	# of exited portfolios

30	15	−0.4422	0.1061	2.9436	320	229	46	44
60	30	−0.5031	0.1143	2.5806	204	144	42	17
90	45	−0.5824	0.1072	2.4588	155	110	36	9
120	60	−0.5768	0.1181	2.4378	136	98	31	6
150	75	−0.5805	0.1245	2.4127	110	79	26	5
180	90	−0.5467	0.1209	2.3570	100	72	23	4

It is important to check whether our reinforcement learning algorithm is trained fortunate. Reference [21] suggested that a steadily increasing average of Q-values is evidence that the DQN is learning well. Fles 4(a) shows the average Q-values of HON and TXN as training progressed. We chance that the average Q-values steady increased, indicating that our proposed poser is properly trained. Additionally, we provide a optimistic reward when the portfolio closes and a negative reward when the portfolio reaches the stop-loss threshold or exits. Figure 4(b) shows the ratio of the number of portfolio positions as preparation progressed. The ratio of closed to unfold portfolio positions increased and the ratio of portfolios arrival their stop-loss thresholds to open portfolio positions decreased. We also find out that the ratio of portfolio exits to ajar portfolio positions slightly increased. IT is possible that the rewards given for an open portfolio spot compared to those given for a closed portfolio position are relatively dwarfish. The DQN is therefore trained to prevent portfolios from stretch their stop-passing thresholds (the much world-shaking target) over exiting them. This result can also serve A a groundwork for judging whether the proposed model is being trained properly.

(a)
(a)

(b)
(b)

Tables 6 and 7 represent the performance results of XOM and CVX in the training dataset. We call our proposed model pairs-trading DQN (PTDQN) and traditional pairs trading with constant military action values as pairs trading with action 0 (PTA0) to pairs trading with action 5 (PTA5). From this solvent, we can confirm that our proposed method acting is more profitable than the constant pairs-trading strategies. In addition, we can meet that the TLS method has a higher gainfulness compared to the OLS method. From PTA0 to PTA5, the trading bounds and the stop consonant-loss boundary grew larger; the numbers of open and out of use portfolios and portfolios that reached their stop-loss thresholds are rock-bottom. In other quarrel, there is less chance for profit, simply the probability of loss is also reduced. It is important non only to take back a good deal of closed positions, but as wel to take the high-grade action to open and close the portfolio. For exercise, if a portfolio is opened and obstructed past a boundary corresponding to action 0 inside the unchanged spread and if a portfolio is wide-eyed and drawn by a boundary corresponding to action 1, the same profit is different. Assuming that the awful reversion is certain to occur, if we take the maximum edge condition to open a portfolio, we will obtain a larger profit than when we take a smaller boundary condition. We can see that the PTDQN returns are higher than the strategy with the highest return among the traditional pairs trading strategies that take the constant carry out. Figures 5–8 show the changes in trading and stop-loss boundaries and the highest profit for constant action when applying the DQN method acting during the preparation period using OLS and TLS.


Model	MDD	Sharpe ratio	Profit	# of open portfolios	# of closed portfolios	# of stop-loss portfolios	# of exited portfolios

PTDQN	−0.0842	0.1835	3.4068	469	336	64	96
PTA0	−0.2014	0.1452	2.5934	565	382	132	50
PTA1	−0.1431	0.1773	2.7603	409	279	45	84
PTA2	−0.1234	0.1955	2.6307	325	191	16	118
PTA3	−0.2586	0.0861	1.3850	208	86	2	120
PTA4	−0.2591	0.0803	1.1933	124	39	2	83
PTA5	−0.2448	−0.0638	0.8588	47	11	0	36


Model	MDD	Sharpe ratio	Profit	# of open portfolios	# of closed portfolios	# of plosive-loss portfolios	# of exited portfolios

PTDQN	−0.0944	0.2133	4.8760	541	399	104	63
PTA0	−0.1210	0.1522	4.1948	579	413	125	41
PTA1	−0.1015	0.1650	3.8834	430	310	50	70
PTA2	−0.1483	0.1722	3.3425	320	209	13	98
PTA3	−0.1386	0.1771	2.4385	217	101	3	113
PTA4	−0.1749	0.1602	1.6852	119	38	2	79
PTA5	−0.2862	0.0137	1.0362	55	10	0	45

Figures 5 and 6 show comparisons of PTDQN and PTA1 victimization the TLS method. Figure 5 consists of the spread, trading, and stop-loss boundaries. We find that trading and stop-going boundaries have different values in PTDQN, showing that it has learned to find the optimal boundary according to each spread. In contrast to PTDQN, PTA1 in Figure 6 has constant trading and turn back-loss boundaries. Figures 7 and 8 exhibit the same features we see in Figures 5 and 6. The difference between these methods lies in the spreads: different results can be obtained contingent on the spreads used. Making better spreads can therefore amend carrying into action.

Figures 9 and 10 represent the profit corresponding to DQN and unceasing actions exploitation TLS and OLS. Reference book [34] suggested that an middling value over multiple trials should be presented to show the reproducibility of deep reinforcement learning because there may embody contrastive results from soprano variances crosswise trials and random seeds. We thus conducted five trials with antithetical unselected seeds. The profit graph of DQN represents the average profit of these trials and the filled region between the uttermost and minimum gain values. We can ascertain that PTDQN had a higher net than the long-standing pairs-trading strategies during the breeding period. This means that, even with the aforesaid spread, we can see how profit will variety As the boundaries are changed. In separate words, finding the optimal boundary for the spread is an important factor in optimizing the profitability of pairs trading.

4.2. Test Results

Tables 8 and 9 express the average performance measures of each dyad tested by applying the top-5 trained models. We can see that the constant action with the highest returns for from each one pair is different, and the TLS method is higher all told pairs than the OLS method founded on profit, as shown preceding. We also find that PTDQN has improve performance than traditional pairs-trading strategies. The pair with the highest profit using the projected method is HON and TXN (3.2755); it also shows the biggest difference between the DQN method and the optimal constant action (0.9377). We find that the proposed method has a higher Sharpe ratio all told pairs except for Molybdenum and UTX when the TLS method acting is utilised. If we add the Sharpe ratio to boot to the unconditional net income atomic number 3 an objective function, we can build a more than optimized pairs-trading system. Supported these results, we give notice ensure the robustness of our proposed method acting for our dataset. The proposed method can be applied to else pairs of stocks found in another global markets.


Pairs	Model	MDD	Sharpe ratio	Profit	# of open portfolios	# of out of use portfolios	# of stop-loss portfolios	# of exited portfolios

MSFT/JPM	PTDQN	−0.1122	0.2294	3.0446	186	126	38	62
	PTA0	−0.3411	0.0742	1.6236	211	136	57	18
	PTA1	−0.2907	0.0979	1.8001	162	104	26	32
	PTA2	−0.1507	0.1936	2.6303	131	64	7	60
	PTA3	−0.4032	0.1542	1.8282	97	39	1	57
	PTA4	−0.4340	0.0400	1.0480	55	13	0	42
	PTA5	−0.1836	0.3098	1.5524	30	7	0	23

MSFT/TXN	PTDQN	−0.3420	0.1001	1.5423	204	132	47	65
	PTA0	−1.2094	−0.0571	0.0013	244	152	76	16
	PTA1	−0.9225	−0.0177	0.6131	178	110	25	43
	PTA2	−0.5574	0.0351	1.0887	134	68	8	58
	PTA3	−0.5375	−0.0128	0.8326	97	34	1	62
	PTA4	−0.4485	0.0260	1.0118	66	15	1	50
	PTA5	−0.1048	0.1233	1.1502	32	5	0	27

BRKa/ABT	PTDQN	−0.0740	0.3159	2.3655	162	111	30	43
	PTA0	−0.1392	0.1554	1.7157	182	128	35	18
	PTA1	−0.1048	0.2464	2.1508	138	96	15	27
	PTA2	−0.1133	0.2538	1.9578	108	64	3	40
	PTA3	−0.1040	0.2480	1.7576	76	35	1	40
	PTA4	−0.0829	0.2087	1.3171	44	13	0	31
	PTA5	−0.0704	0.4366	1.4013	19	7	0	12

BRKa/UTX	PTDQN	−0.5401	0.1174	1.5744	167	105	35	58
	PTA0	−1.2143	−0.0199	0.5918	192	117	55	19
	PTA1	−0.9340	0.0346	1.0701	147	89	12	45
	PTA2	−0.9099	−0.0009	0.8435	122	60	5	57
	PTA3	−0.5673	0.0473	1.1520	89	32	1	56
	PTA4	−0.3641	0.0694	1.1628	53	9	0	44
	PTA5	−0.2309	0.0408	1.0405	18	3	0	15

JPM/T	PTDQN	−0.1384	0.1283	1.4653	175	113	42	53
	PTA0	−0.3630	0.0071	0.8968	205	129	60	15
	PTA1	−0.2801	0.0460	1.1595	144	94	17	32
	PTA2	−0.3750	0.0192	0.9987	119	62	5	51
	PTA3	−0.5241	−0.0717	0.6609	92	35	0	56
	PTA4	−0.3607	−0.0550	0.8411	56	18	0	38
	PTA5	−0.2235	0.0061	0.9851	22	6	0	16

JPM/HON	PTDQN	−0.1872	0.1523	2.2510	223	155	39	62
	PTA0	−0.6769	0.0190	1.0077	274	180	70	23
	PTA1	−0.4644	0.0622	1.6331	201	139	24	38
	PTA2	−0.4537	0.0840	1.7165	149	87	2	60
	PTA3	−0.2410	0.1414	1.7648	107	43	0	64
	PTA4	−0.3313	0.0879	1.3150	62	16	0	46
	PTA5	−0.1693	0.1803	1.2777	28	7	0	21

JPM/GE	PTDQN	−0.1098	0.2123	2.8250	193	124	46	65
	PTA0	−0.3897	0.0507	1.5137	224	142	65	17
	PTA1	−0.3404	0.0640	1.6912	163	109	18	36
	PTA2	−0.1628	0.1284	1.9032	132	73	6	53
	PTA3	−0.2980	0.1142	1.7555	106	38	1	67
	PTA4	−0.2817	0.0790	1.2884	55	13	0	42
	PTA5	−0.0612	0.4776	1.7489	21	6	0	15

JNJ/WFC	PTDQN	−0.1576	0.2437	2.3741	143	100	28	38
	PTA0	−0.2872	0.0892	1.4932	164	115	37	12
	PTA1	−0.2219	0.1948	2.1147	127	90	15	21
	PTA2	−0.3188	0.1322	1.6362	99	55	5	38
	PTA3	−0.2324	0.1084	1.3141	68	27	0	41
	PTA4	−0.1532	0.1043	1.1228	40	14	0	26
	PTA5	−0.0970	0.1203	1.0734	16	6	0	10

XOM/CVX	PTDQN	−0.4265	0.0605	1.1924	218	135	45	77
	PTA0	−0.6189	0.0236	0.8812	256	161	67	28
	PTA1	−0.5999	0.0154	0.8809	197	118	25	54
	PTA2	−0.6034	−0.0073	0.7792	153	70	8	75
	PTA3	−0.5628	−0.0224	0.7734	114	38	2	74
	PTA4	−0.5311	−0.0200	0.8643	70	18	1	51
	PTA5	−0.2583	0.0060	0.9692	31	4	0	27

HON/TXN	PTDQN	−0.0874	0.2679	3.2755	233	164	49	63
	PTA0	−0.5108	0.1080	1.9219	276	186	66	23
	PTA1	−0.5841	0.1625	2.3378	207	140	28	38
	PTA2	−0.1926	0.2086	2.3096	158	92	4	62
	PTA3	−0.1611	0.1557	1.7100	114	49	2	63
	PTA4	−0.1254	0.2289	1.6374	69	23	0	46
	PTA5	−0.1578	0.1924	1.1925	28	9	0	19

GE/TXN	PTDQN	−0.1133	0.1871	2.1398	172	117	30	48
	PTA0	−0.3348	0.0967	1.6398	201	136	44	21
	PTA1	−0.1656	0.1070	1.6355	153	101	19	33
	PTA2	−0.2043	0.1388	1.7568	117	68	8	41
	PTA3	−0.2335	0.1591	1.5555	89	39	2	48
	PTA4	−0.3847	−0.1355	0.6570	45	7	0	38
	PTA5	−0.3489	−0.2730	0.7218	21	2	0	19

MO/UTX	PTDQN	−0.5264	0.0840	1.2940	150	88	35	58
	PTA0	−1.0950	−0.0272	0.6231	178	102	56	19
	PTA1	−0.7205	0.0286	1.0362	125	73	12	39
	PTA2	−0.8361	−0.0040	0.8658	105	51	3	50
	PTA3	−0.4311	0.0052	0.9323	79	24	0	54
	PTA4	−0.3916	0.1141	1.2129	48	12	0	36
	PTA5	−0.1311	0.2948	1.1276	14	3	0	11


Pairs	Model	MDD	Sharpe ratio	Profit	# of open portfolios	# of closed portfolios	# of stop-personnel casualty portfolios	# of exited portfolios

MSFT/JPM	PTDQN	−0.2096	0.1228	1.9255	215	137	54	62
	PTA0	−0.3618	0.0492	1.3365	225	141	61	23
	PTA1	−0.5036	0.0188	1.0185	168	102	28	38
	PTA2	−0.4045	0.0611	1.3591	124	59	8	57
	PTA3	−0.5055	−0.0094	0.8636	97	33	3	61
	PTA4	−0.4195	−0.0009	0.9459	58	12	1	45
	PTA5	−0.2018	0.1236	1.1593	29	6	0	23

MSFT/TXN	PTDQN	−0.2878	0.0698	1.3466	244	153	65	68
	PTA0	−0.5271	0.0070	0.8489	252	156	72	24
	PTA1	−0.4721	0.0255	1.0286	187	117	26	44
	PTA2	−0.3816	0.0215	0.9912	145	71	10	64
	PTA3	−0.6553	−0.1015	0.5053	104	30	2	72
	PTA4	−0.2719	0.0422	1.0532	63	16	1	46
	PTA5	−0.1850	0.0068	0.9785	34	7	0	27

BRKa/ABT	PTDQN	−0.1282	0.1644	1.5076	180	109	48	57
	PTA0	−0.5073	−0.0265	0.7070	183	112	48	22
	PTA1	−0.2649	0.0453	1.0786	139	80	13	46
	PTA2	−0.2246	0.1056	1.2942	121	60	4	56
	PTA3	−0.1686	0.1241	1.2718	91	38	1	52
	PTA4	−0.1483	0.0176	0.9778	49	12	0	37
	PTA5	−0.1602	0.0004	0.9830	16	2	0	14

BRKa/UTX	PTDQN	−0.5231	0.0816	1.2976	215	132	57	69
	PTA0	−1.1928	−0.0647	0.3332	216	133	57	25
	PTA1	−0.8697	−0.0157	0.7445	167	100	15	51
	PTA2	−0.7815	−0.0071	0.8391	135	70	5	60
	PTA3	−0.3573	0.0315	1.0292	94	36	0	58
	PTA4	−0.2096	0.0684	1.0857	52	11	0	41
	PTA5	−0.1317	−0.1174	0.9312	16	2	0	14

JPM/T	PTDQN	−0.1338	0.1391	1.4547	205	127	60	50
	PTA0	−0.3588	0.0069	0.9054	208	130	61	16
	PTA1	−0.2535	0.0405	1.0902	151	96	19	35
	PTA2	−0.1872	0.0542	1.1198	119	66	5	48
	PTA3	−0.2574	0.0336	1.0502	94	39	0	55
	PTA4	−0.2212	0.0345	1.0312	57	20	0	37
	PTA5	−0.2348	−0.1922	0.8299	20	5	0	15

JPM/HON	PTDQN	−0.3869	0.1071	1.5175	250	162	57	68
	PTA0	−0.7141	0.0181	0.9444	256	166	59	30
	PTA1	−0.5065	0.0702	1.3071	198	127	22	49
	PTA2	−0.4649	0.1071	1.4260	152	84	3	65
	PTA3	−0.4871	0.0763	1.2098	102	44	0	58
	PTA4	−0.3503	−0.0694	0.8178	50	13	0	37
	PTA5	−0.2980	−0.1721	0.8040	23	6	0	17

JPM/GE	PTDQN	−0.1195	0.1443	1.7682	226	133	64	69
	PTA0	−0.4379	0.0036	0.8549	232	137	66	29
	PTA1	−0.1523	0.0987	1.4814	165	98	16	51
	PTA2	−0.1738	0.1264	1.5661	134	62	5	67
	PTA3	−0.2680	0.0729	1.2026	93	29	0	64
	PTA4	−0.2104	0.1298	1.3242	51	12	0	39
	PTA5	−0.1461	−0.0423	0.9586	18	3	0	15

JNJ/WFC	PTDQN	−0.1890	0.1266	1.7194	202	130	47	56
	PTA0	−0.8705	−0.0326	0.4635	207	131	53	22
	PTA1	−0.6189	−0.0134	0.7318	150	91	19	39
	PTA2	−0.4763	0.0309	1.0563	124	57	4	62
	PTA3	−0.2318	0.1447	1.6072	97	33	2	62
	PTA4	−0.2415	0.0549	1.0632	50	13	0	37
	PTA5	−0.0880	0.2468	1.1886	20	4	0	16

XOM/CVX	PTDQN	−0.3316	0.0265	1.1517	141	81	23	43
	PTA0	−0.7629	−0.0547	0.4186	240	149	61	30
	PTA1	−0.5648	0.0132	0.8754	193	114	23	56
	PTA2	−0.6977	−0.0387	0.6655	154	70	7	77
	PTA3	−0.5235	0.0277	0.9865	117	38	1	78
	PTA4	−0.4781	−0.0577	0.8117	63	12	1	50
	PTA5	−0.3787	−0.1492	0.8090	29	3	0	26

HON/TXN	PTDQN	−0.1339	0.1534	1.8852	270	175	64	69
	PTA0	−0.4135	0.0212	0.9455	276	177	70	28
	PTA1	−0.2758	0.0666	1.3216	207	124	27	55
	PTA2	−0.2614	0.1054	1.5031	159	84	5	69
	PTA3	−0.1759	0.1413	1.5617	117	45	2	70
	PTA4	−0.0834	0.2650	1.7044	66	23	0	43
	PTA5	−0.0664	0.4606	1.6830	30	13	0	17

GE/TXN	PTDQN	−0.1676	0.1263	1.6411	206	140	43	62
	PTA0	−0.6133	0.0178	0.9742	211	144	44	23
	PTA1	−0.3085	0.0586	1.2743	166	109	19	38
	PTA2	−0.2402	0.0585	1.2216	128	68	5	55
	PTA3	−0.3190	−0.0013	0.9193	91	31	2	58
	PTA4	−0.2493	−0.0285	0.9117	49	8	0	41
	PTA5	−0.0862	0.1417	1.0936	23	4	0	19

Show Me State/UTX	PTDQN	−0.3181	0.0524	1.1402	188	117	49	59
	PTA0	−0.4688	0.0041	0.8667	195	121	52	21
	PTA1	−0.6166	−0.0230	0.7470	144	84	13	46
	PTA2	−0.5034	−0.0076	0.8666	115	51	4	59
	PTA3	−0.2833	0.0457	1.0873	88	32	0	56
	PTA4	−0.2901	0.0356	1.0280	44	12	0	32
	PTA5	−0.1500	0.0992	1.0297	13	2	0	11

In Figure 11, we can see that our proposed method, PTDQN, outperforms the traditional pairs trading strategies that have constant actions in test dataset. The crucial aspect of this method is the selection of optimal boundary in the spread that makes the highest profit in constant action, which is like a constant boundary. Thus, the trend is the same American Samoa time-honoured pairs trading strategies; however, when the optimum boundaries which have the highest profits in the spreadhead are combined, PTDQN is found to have high profit than time-honoured pairs trading strategies. This method can thus be applied in various fields when at that place is a need to optimise the efficiency of a rule-based strategy [35, 36]. In that subject field, we consider spread and boundaries to be the important factors of pairs trading strategy. Therefore, we tried to optimize pairs trading strategy with versatile trading and stop-loss boundaries using heavy reinforcement learning and our method outperforms linguistic rule-based strategies. By optimizing key parameters in rule out-settled methods, it can improve the performances.

(a) MSFT/JPM

(b) MSFT/TXN

(c) BRKa/ABT

(d) BRKa/UTX

(e) JPM/T

(f) JPM/HON

(g) JPM/GE

(h) JNJ/WFC

(i) XOM/CVX

(j) HON/TXN

(k) GE/TXN
(k) Germanium/TXN

Pairs trading uses two types of regular which have the same trends. However, it bum be broken imputable various factors such as economic issues and companion risk. In this situation, the spread between two stocks is highly large. Although this situation cannot Be avoided, we hedge this risk by taking a dynamic boundary. In this sense, winning the worst stop-going boundary is the best choice since information technology can be overcome with the least release. Away taking the propellant boundary using the deep reinforcement learnedness method acting, we can see that not only win are increased, simply losings are also minimized as compared to taking a fixed boundary.

5. Conclusions

We propose a refreshing approach to optimize pairs trading strategy victimization a abysmal reinforcement learnedness method, especially deep Q-networks. Thither are two key enquiry questions posed. First, if we set a dynamic boundary based on a spread in all trading window, can it accomplish higher profit than handed-down pairs trading strategy? 2nd, is it conceivable that deep reinforcement learning method seat be trained to take after this mechanism? To investigate these questions, we collected pairs selected using the cointegration test. We experimented with how the results varied according to the spread and the method used. We therefore set diametrical spreads using OLS and TLS methods as the input of the DQN and the trading sign. To conduct this experiment, we set up a formation window and a trading window. The hedge ratio, which is an important factor in determining how much stock to necessitate, depends on this value. We thus applied the OLS and TLS methods and experimented to find the optimal window size of it by varying the establishment window and the trading windowpane.

Tables 6 and 7 show the average carrying out values of the formation windows and trading windows in the grooming dataset. The results show that all six windowpane sizes were higher when TLS spreads were utilized than in OLS spreads. In addition, we can see that profitability step by step increases as the estimation Windows and trading windows of methods victimization TLS and OLS reduced. The reason is that although the ratio of closed position portfolio is the lowest in what we set formation and trading windows, the ratio of blockage-exit position portfolio is as wel the lowest compared with other formation and trading windows. It means that reducing stop-loss position portfolio is important as well as increasing closed posture portfolio to make a turn a profit. Using the optimal windowpane size, we and so check-out procedure whether our DQN is properly trained. At each era, we find that the average Q-value steadily increased, the ratio of closed portfolios increased, and the ratio of portfolios that reached their stop-loss thresholds decreased, confirming that our DQN is pot-trained well. Based connected these results, we find that our planned fashion mode using the test dataset with a formation window of 30 and a trading window of 15 had results that were superior to those of traditional pairs-trading strategies in the out-of-try dataset. In Physique 11, we can see that the net income path of PTDQN is similar PTA0 to PTA5, but better than that from other methods. This shows that taking changing boundaries based on our method is underspent in optimizing the pairs trading strategy. During economic issues uncertainties, it can be a risk to manage the pairs trading strategies including our projected method acting. However, we set a reward function if fan out is all of a sudden high, and our network is trained to prevent this situation past taking less stop-loss boundary since it is trained to maximize the expected sum of future rewards. Therefore, our proposed method can minimize the risk when the economic risks appeared compared with traditional pairs trading strategy with fixed boundary.

From the experimental results, we show that our method acting can be applied in the pairs trading system. IT can exist applied in various fields, including finance and economic science, when on that point is a need to optimise the efficiency of a rule-based strategy. Furthermore, we find that our method outperforms the traditional pairs trading strategy in all pairs based on constituent stocks in Sdanamp;P 500. If we select appropriate pairs which are cointegrated, we can apply our methods to other markets such as KOSPI, Nikkei, and Attend Seng. The study focused on only spreads successful by two stocks, which wealthy person long-full term labyrinthine sense patterns. Since our method selects optimal boundaries supported spreads, it can be applied to else stock markets such A KOSPI, Nikkei, and Hang Seng.

In future works, we can develop our proposed model as follows. First, as net profit was set as the objective function in that study, the performance of the model is take down than traditional pairs trading when supported other public presentation measures. It commode thus be possible to create a better-optimized pairs-trading strategy by including all these other functioning indicators as set forth of the object lens function. Second, we can use other statistical methods such as the Kalman filter and error-correction models to use diversified spreads. Finally, it is mathematical to create a more-optimized pairs-trading scheme by continuously changing the discrete set of window sizes and boundaries. We will solve these difficulties in future studies.

Data Availability

The data used to support the findings of this read have been deposited in the figshare repository (DOI: 10.6084/m9.figshare.7667645).

Disclosure

The funders had none role in the subject field design, data collection and analysis, decision to print, or preparation of the manuscript. This bring up represents a part of the study conducted A a Master Thesis in Financial Engineering during 2022 and 2022 at the University of Ajou, Republic of Han-Gook.

Conflicts of Interest

The authors declare that in that location are nobelium conflicts of interest regarding the publication of this report.

Acknowledgments

This work was supported by the National Research Fundament of Dae-Han-Min-Gook (NRF) grant funded away the Korea Politics (MSIT: Ministry of Science and ICT) (No. NRF-2017R1C1B5018038).

Copyright

Copyright © 2022 Taewook Kim and Ha Young Kim. This is an open access clause distributed below the Creative Common Attribution License, which permits nonsensitive use, distribution, and reproduction in any moderate, provided the original function is properly cited.