This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hello again, Superstonk. A few days back I submitted a post called Let's Demystify the Swaps -- do not trust me bro!
I've since continued grinding on the data to see what I could find from it. Unfortunately the rabbit hole goes deeper than I originally anticipated.
That being said, I have made some progress.
Rapidly Evolving Standards
So, to begin with, part of the confusion surrounding this data is because there has been frequent changes to the schema of the data. That is, columns have been renamed, dropped, or added over time to help the regulatory bodies make sense of the data that's coming in.
From the data that you can still fetch from the DTCC, there are three formats that it might come in:
- Data after late January 2024 -- This is the newest format which they call "Phase 2 Rewrite"
- Data from January 2023 to January 2024 -- The DTCC calls this "Pre-Phase 2 Rewrite"
- Data from 2022 -- This doesn't seem to have a name, I just called it Pre-2023
Following the Thread of Transactions
Apart from the general shuffling of the columns, one of the biggest changes comes in how swap transaction data is correlated.
In Phase 2, the DTCC started requiring that all transactions must point back to a Dissemination Identifier that was assigned when the swap was originally created.
I feel like this is an obvious oversight that I'm glad they corrected. The data would look something like the following:
Dissemination Identifier | Original Dissemination Identifier | Action |
---|---|---|
A | New | |
B | A | Modification |
C | A | Modification |
D | A | Termination |
You can easily see that all of these transactions are correlated to the swap that was created with the ID of 'A'.
Unfortunately, before the Phase 2 Rewrite, the Original Dissemination Identifier only pointed to the previous Dissemination Identifier, so you could easily end up with a tree-like structure.
For example:
Dissemination Identifier | Original Dissemination Identifier | Action |
---|---|---|
A | New | |
B | A | Modification |
C | B | Modification |
D | B | Modification |
E | C | Modification |
F | D | Modification |
G | D | Modification |
H | F | Modification |
I | F | Modification |
This makes it very difficult to realize that all of the modifications are actually part of the same swap.
Limited by Design
Another unfortunate caveat to this swap data is that the data was originally meant for regulatory bodies, but was made available to the public to assist with "fair price discovery".
Section 727 of the Dodd-Frank Act amends the Commodity Exchange Act (CEA) by inserting a new Section 2(a)(13), which requires public availability of swap transaction data. CEA Section 2(a)(13) authorizes the Commission to make swap transaction and pricing data available to the public in such form and at such times as the Commission determines appropriate to enhance price discovery.
Furthermore, they don't intend for this data to drive trading decisions beyond simply making the swap rates for various instruments known.
CEA Section 2(a)(13) also requires that the publicly reported information not identify the transaction participants, market position or business transactions of any person.
Long or Short?
For better or worse, I believe this is a critical piece of information regarding whether a swap is long or short.
There has been speculation surrounding the negative notional values on some of the swaps. Unfortunately, we cannot make any assumptions based on the fact that the notional is marked negative.
In October 2022 the Legal Entity Identifier Regulatory Oversight Committee submitted a number of inquiries to the DTCC regarding the swap data that was being published. One such question was:
Q5: This data element is updated to clarify that negative notional should be allowed in case of commodity derivatives. Do you have any comments on the proposed amendment?
A5: We would like to seek clarity and guidance on using negative notional amount. Negative notional amounts may potentially cause reconciliation and data aggregation issues. Establishing clear and prescriptive guidance, ideally specifying the exact product types and use cases, is critical to ensure the proper and aligned usage of negative notional amount and the validation duty of a trade repository (“TR”) or swap data repository/security-based swap data repository (“SDR”).
https://www.leiroc.org/publications/gls/dtcc-response-09-22.pdf
What they are essentially saying here is that a negative notional value has no meaning in their system and interferes with their ability to accurately aggregate data (for instance, calculating the change in notional value over the lifetime of the swap).
That is to say that if a swap transaction is reported with a negative notional value, the meaning is only significant to entity that submitted the transaction, not to the DTCC.
Therefore, we cannot assume that a negative notional value implies short exposure to the underlyings for the receiver.
This also makes sense given the DTCC's statements requiring that the reported information not identify market position. I think this is further supported by the fact that all swap transactions require that the counterparty have the opposite exposure to the underlyings than the receiving party.
Furthermore, the convention in the data is that Leg 1 always represents the variable side of the swap and Leg 2 always represents the fixed side of the swap. If that's the case, then the notional would always be positive since it wouldn't matter if the receiving party is acquiring long exposure or short exposure to the underlyings.
All that being said, I don't think the information is meaningless. I think it simply means that we'll need to find more supporting evidence to corroborate the data contained within the public disseminations.
Progress Made
So, to begin with -- if you're interested, all of my code is available in this github repository: https://github.com/DustinReddit/GME-Swaps
There was actually so much data that needed to be processed that my machine was crashing when trying to work with CSV files. I have since switched to storing the data in parquet format and using PyArrow to process it.
Reconciling the Changing Standards
So, the first thing that I did was translate all of the prior data to be in the Phase 2 Rewrite format. Now we can look at all the data through the same lens.
Second, there are many basket swaps where GME suddenly appears in a swap, but that swap (seemingly) didn't contain GME. This means that we had limited visibility of the swap until GME was included in the basket.
To clear this up, I marked any of these types of transactions as "orphans" and started digging through the full dataset to locate the parents. I repeated this until I found a parent for every orphan in the dataset.
Finally, I started following the chain of transactions for any swaps that existed before the Phase 2 Rewrite.
I added a new column that I call the "Progenitor Dissemination Identifier". This is either the Dissemination ID that was assigned when a transaction was marked with NEWT
in the Action type
column, or if it was the simply the earliest transaction I could find in my dataset.
I then went through and started following the chain of transactions such that all swap transactions are correlated to the proper "progenitor id".
Of particular interest has been swap 884224488. This caught many people's eye due to the notional value being marked as negative. It also seemed to come out of nowhere as GME didn't seem to be included in the basket from the beginning.
Once I pulled in all of the parent transactions and correlated them all under the same progenitor ID, the swap that 884224488 belongs to was found to actually be swap 626618238.
To date, this particular swap has just short of 2000 modifications made to it since it was created on January 31, 2023. It is set to expire on July 31st of this year. (10 days from today)
I've published the correlated GME data to Google Sheets which you can see here: https://docs.google.com/spreadsheets/d/13YvIBcXfGh51nwpPvuY4GbLWHwF7Q4rikHOJIzYYrZw/edit?usp=sharing
If you search for 626618238, then you'll find all the transactions I'm talking about including 884224488.
Conclusions
As before with my previous post, I want to avoid drawing any hard conclusions such as dates.
Instead, I hope that I'm helping to peel back that layers of the swap dissemination data onion.
The things I've learned, however, is that:
- This is actually much larger than I originally anticipated and I'm working with an incomplete dataset.
- We cannot make any assumptions about whether a given swap is "short" or "long". (I put long/short in quotes because in reality, it's both.. someone is short and someone is long in every swap. However, it would make a difference as to who is holding the short side of the transaction.)
- With more clarity in the swap data, I think it now makes sense to start thinking about ways that we can correlate event timestamps in this dataset with meaningful events in the stock price.
One last note is that I haven't even began to look into swaps on the ETFs yet. I think a good starting point toward that would be to identify what all ETFs contain GME in its holdings (a good look would probably be the BRNO paper), then identifying the RIC / ISIN codes for those ETFs so that we can locate them in the swap data.
I want to thank everyone for all the kind words I received on my first post. I hope this information continues to be useful and I look forward to seeing what insights we can make together!
Subreddit
Post Details
- Posted
- 3 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/Superstonk/...