Clustering Cryptocurrencies with Affinity Propagation and the RAD 30 Crypto Composite

Written by quinterojs | Published 2018/05/09
Tech Story Tags: blockchain | crypto | data-science | machine-learning | venture-capital

TLDRvia the TL;DR App

We apply Affinity Propagation clustering to the Rad 30 Crypto Composite and find that some sectors of the crypto economy are systematically different from the others.

Introduction

Radicle’s Crypto research arm recently released the Rad 30 Crypto Composite, which tracks thirty assets across ten decentralized sectors of the crypto economy, much like how the Dow Jones Industrial Average tracks stocks across various sectors of the U.S. economy. It is a carefully designed tool that allows us to evaluate how individual segments of the crypto economy are developing outside of Bitcoin and Ethereum.

Why? Well, you can learn more about that in Harry’s release notes, but in general, it’s largely unexplored territory, and that’s exactly our area of expertise at Radicle––we research startups and technologies that are breaking into new ground. We surveyed the landscape of index style metrics in the crypto space and found them all to be solely focused on crypto assets with high market capitalizations––largely for investment optimization in crypto hedge funds. Those approaches are not ideal for understanding how the overall ecosystem is developing outside of Bitcoin and Ethereum, the current market leaders. And if you believe that the potential for crypto and decentralization is much larger than Bitcoin and Ethereum, then the RAD 30 is a good tool to see how that belief plays out IRL. More generally however, the RAD 30 expands our own scope of understanding and gives Radicle’s Crypto and Machine Learning teams a powerful new tool for analysis.

The Rad 30 Crypto Composite tracks leading blockchain protocols, platforms and networks around advertising, currency exchanges, distributed cloud computing, e-commerce, file storage, financial services, venture capital & crowdfunding, healthcare, social & digital media networks, and energy. If you look at the correlation matrix heatmap at the top of this article, you’ll find that the currency exchanges, distributed cloud computing, and financial services sectors are notable in having the darkest spots on the map. That tells us there’s some divergence between those sectors and most of the other sectors being tracked. Unfortunately, that’s as far as a correlation matrix will take us. We apply Affinity Propagation clustering to these economy level components of the composite to see if we can identify which sectors, if any, appear to be splitting from the rest in a systematic manner. Like most unsupervised clustering work, this is an exercise in knowledge discovery, with the overall objective of grouping sectors of the crypto economy into clusters by the similarity in their market movements across time.

ML Background

This paper is a follow up to Clustering Cryptocurrencies with Affinity Propagation, where we applied affinity propagation to individual crypto assets and discovered that a vast majority of them do not move according to Bitcoin’s volatile news cycle. More specifically, we found 3 distinct clusters of crypto assets that move in-tandem and can be generalized by the movements of Tether, Ripple, and DigixDAO––the cluster exemplars. Don’t worry, I won’t assume you know what that means.

Clustering Cryptocurrencies with Affinity Propagation_To better understand coin correlations we deployed an Affinity Propagation algorithm and found three distinct clusters…_towardsdatascience.com

Affinity Propagation, published in Science by Brendan Frey and Delbert Dueck, takes as input measures of similarity between data points and exchanges real-valued messages between matrices until high-quality clusters naturally emerge. This “message passing” occurs over multiple iterations until the cluster boundaries stabilize and the algorithm achieves convergence.

While exchanging messages the algorithm identifies exemplars, which are observations that do a good job of describing a cluster. You can basically think of them as centroids, with the exception that they’re not the arithmetic mean of all objects in each clustered group, but rather a real observed data point that is representative of its cluster.

As discussed in our first piece in this series, we considered a few other algorithms for the clustering task, however none fulfilled one or more of our desiderata. We knew that identifying groups of sectors that are very similar for only a few weeks out of the year, but completely dissimilar the other months measured, would be undesirable. Equally important, we wanted to isolate, as much as possible, the daily impact of the news cycle on market movements; ignoring any delayed effects. Therefore, similarity needed to be evaluated exclusively at each date index, but across the entire array. Dynamic Time Warping, which is a popular approach for measuring the similarity of time series data with varying lengths and with out-of-phase similarities would not work — exactly because it attempts to find out-of-phase similarities. We have the luxury of a problem where the data is intentionally in-phase and of the same length. K-means clustering and its many variants would also not work because we don’t want to specify how many clusters should be identified, as that would, in our minds, reduce the objectivity of the analysis.

RAD 30 😎 + Affinity Propagation 🤖 = 👽

We deployed the Sci-kit Learn implementation of Affinity Propagation for this project. For data processing, each data array was log transformed and normalized by subtracting its mean and dividing by its standard deviation. These steps control for the magnitude of volatility in crypto and puts all sectors on a comparable scale. As far as hyperparameters, we set the damping factor to 0.9 and defined the distance measure as negative euclidean distance. The beauty of Affinity Propagation is its power and simplicity.

Before discussing the results, let’s briefly consider the philosophy of the whole thing. Currently, it is widely believed that Bitcoin and Ethereum set the pace for the whole crypto economy. On a granular level, that assumption implies that new dApps are at the mercy of how Bitcoin and Ethereum perform, no matter: (1) the actual market of opportunity for any given decentralized application, (2) the network and experience of the startup’s executive team, and even (3) fundamental changes in the underlying technology. We don’t believe that’s actually true, and the data agrees.

The algorithm achieved convergence after 67 iterations. It found 5 clusters, and therefore 5 exemplars: (1) RAD 30, (2) Currency Exchanges, (3) Distributed Cloud Computing, (4) Bitcoin, and (5) Ethereum. The side-by-side plots below correspond to each of the five identified clusters. On the left we present the box plot for each cluster. On the right we present the corresponding cluster in time series form, with the exemplar in bright green and the other constituents in the space in faint green.

The first cluster is also the largest. It contains the RAD 30 and the decentralized markets for advertising, e-commerce, venture capital and crowdfunding, healthcare, social & digital media networks, and energy. We more or less expected to find at least one large cluster such as this one, considering the low level of maturity of this entire industry. Indeed, a portion of the assets we’re tracking did not exist just a year ago. The fact that the Rad 30 Crypto Composite is the exemplar for this cluster is a bit obvious, since it’s the aggregate sum of all crypto sectors. In trial runs where we didn’t include the aggregate composite, the energy sector becomes the exemplar. We expect some of these sectors to splinter off as they mature. but as of now, they seem to be moving in-tandem.

The currency exchanges space is interesting in being the only decentralized sector of the crypto economy identified as having unique market movements across time. This result was unexpected but seems intuitive. Currency exchanges operate at a higher level of the crypto economy, allowing individuals to buy, sell, and trade crypto assets for others or for fiat currencies. Not to mention that they also interface more regularly with centralized institutions and regulatory authorities. They are the portal into the decentralized economy for most, so it makes sense that their market behavior is unique.

Distributed cloud computing, file storage, and financial services are clustered above. The heatmap at the top of this article alluded to these sectors being somewhat different from the rest, but it wasn’t clear that they were different together. As to why, well, we have to humbly state that we don’t have a rigorous answer to this. Anecdotally, we know that the largest ICO in 2017 was Filecoin’s, and also that the distributed cloud computing space has some early market entrants. So, it’s possible that this cluster is, in general, more mature. If you have another idea feel free to share in the comments.

Lastly, Bitcoin and Ethereum, plotted above and below, are each their own unique cluster. This detail is important to us. The Rad 30 Crypto Composite was designed to track how the crypto economy is developing outside of Bitcoin and Ethereum, and therefore it wouldn’t be ideal if it was systematically associated with either Bitcoin or Ethereum. From an empirical point of view, this result validates that we’re capturing some distinct signal in the crypto economy.

Conclusion

Overall, the results from this study suggest that the crypto economy is already starting to splinter as it matures. We’re interested to see how the RAD 30 helps us decipher the changes in the crypto economy in the coming months and years.

If you enjoyed this article please 👏🏼 and share so others can find it.

The Inference and Machine Learning Group is Radicle’s data science research arm. We work at the intersection of computational statistics, machine learning, and venture capital. Subscribe to our email list or follow on Twitter if you’d like to be notified when future research is released.

Legal Disclaimer: Radicle is not an investment advisor, and Radicle makes no representation regarding the advisability of investing in any security, asset, token, fund or other investment vehicle. Radicle is not a tax advisor. Inclusion of a security, asset or token within a Radicle composite or any Radicle analysis is not a recommendation by Radicle to buy, sell, or hold such security, nor is it considered to be investment advice. Past performance of a composite is not an indication or guarantee of future results. All Radicle materials have been prepared solely for informational purposes based upon information generally available to the public and from sources believed to be reliable. The composite data and analysis shown do not represent the results of actual trading of investable assets/securities. Radicle maintains the composite(s) and calculates the composite levels and performance shown or discussed, but does not manage actual assets.


Published by HackerNoon on 2018/05/09