1 Assignment 4: A Machine Learning Trading Algo
1.0.1 Data preparation tasks (15p)
1. (Data prep) Load iShares TSX 60 (XIU.TO) order book data from https://github.com/mariuszoican/mgt443/blob/main/Data/XIU_TAQ.csv?raw=true. The file contains 378,376 quote updates for XIU.TO from March 11, 2020.
2. (Data prep) Convert the Date-Time column into native pandas dates, and add the GMT Offset to obtain local Toronto time.
Hint: to add the GMT Offset you may apply the inline function lambda x: dt.timedelta(hours=x) to the appropriate column.
3. (Data prep) Select only quote updates between 9:30 AM and 3:30 PM (to avoid overnight and opening/closing auction effects).
4. (Resample) Resample the data such that you only retain the last observation for each second.
Forward-fill any missing observations (i.e., with no updates in that second). Hint: you may use .resample(‘1S’).last().
1.0.2 Variable building (20p)
5. (Midpoint) Compute the midpoint for each row as
Bid Price + Ask Price
6. (Tick change) Generate a dummy Tick that takes the value:
1. 1 if the next-second midpoint is higher than the current midpoint;
2. 0 if the next-second midpoint is lower or equal to the current midpoint; 7. (Depth) Compute the market depth for each row as
Depth = Ask size + Bid size.
8. (Order imbalance) Compute the order imbalance for each row as
Order Imbalance =
Ask size−Bid size
9. (Bid ask spread) Compute the quoted bid-ask spread as:
Bid-ask spread =
Ask price−Bid price
10. (First differences) Generate columns for the change in depth and order imbalance from the previous second to the current one.
11. (Rolling mean) Generate two columns for the rolling mean of depth and order imbalance, using a rolling window of 10 rows. You can use the Series.rolling method.
Further, generate two dummies taking value 1 if the current depth/absolute order imbalance is above their rolling mean, and zero else.
1.0.3 Logistic regression prediction (40p)
10. Split the data into equally-sized train and test samples.
11. Use a logistic regression to predict future midpoint movements using:
1. Market depth and its first difference;
2. Order imbalance and its first difference;
3. Bid-ask spread;
4. Whether the depth is above rolling mean or not;
5. Whether the absolute order imbalance is above rolling mean or not.
In estimating the logistic regression, use solver=’lbfgs’ rather than solver=’newton-cg’ (better convergence).
You will not be able to estimate the logistic regression with NaN values, so you need to delete them.
12. Plot the ROC curve and assess the predictive power of your model.
1.0.4 Testing the algorithm (25p)
Set a threshold probability q (e.g., the median predicted probability of an uptick) such that you always buy when the predicted probability is larger than q, and sell in the next second. That way, you don’t accumulate a position.
1. What is your profit if you buy and sell always at the prevailing mid-point?
2. What is your profit if you buy at the ask price and sell at the bid price?
3. How many trades are profitable in either case?
Reflect on the importance of transaction costs and the difference between predictive power and strategy implementation.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: firstname.lastname@example.org 微信:itcsdx