Correlation Between COVID-19 Epidemic, Bitcoin Buy signal, and Number of Data Breaches? Well, Let’s Figure out.

Hesam Haddad
5 min readApr 23, 2020

Introductions

Last few weeks during the rise of COVID-19 epidemic, we have seen a rise in the number of data breaches in Iran and other countries. I was wondering if these two trends are related. It is easy to conclude that, servers and computers security shouldn't have correlations with such epidemics. That lead me into thinking about the other reasons for increase of leaks and the motives of hackers. while keeping in mind that hacking a server, and publishing personal data of users are immoral and illegal, understanding the hackers’ motives is worthy because if we understand the patterns, next actions will be predictable.

On the other hand, bitcoin price has dropped sharply in the early days of COVID-19 epidemic, making a buying opportunity for believers in the idea that bitcoin price will reach new highs in the following years. So I made a hypothesis that maybe, maybe, the hackers hold the leaked data until a bitcoin buy signal, to sell them on the internet. Why would a hacker do so? because they mainly get paid with bitcoin and if they sell data for 200$ when bitcoin have lower price, they will see benefits holding that bitcoins in the future where bitcoin price increases.

The question may arise that, why wouldn't they sell data at any arbitrary time they got access to a server? so then they can exchange bitcoins for dollars, and then buy bitcoin whenever they want. One answer to that question is that every time they put illegal bitcoins in an exchange website, the chance of them getting caught for their illegal activity will increase.

Now, how can we verify this hypothesis? as a popular quote says “Talk is cheap, show me the data”. So lets use data analysis methodologies to verify this.

Data Breaches Dataset

In the next step, we need to collect required data to verify how true the statement is. To do so, I searched the web for a dataset and couldn’t find a well structured one. Instead I've found a post with the title of “All Data Breaches in 2019 & 2020 — An Alarming Timeline” which have listed the data breaches in 2019 and 2020 with their dates.

A sample data leak information in selfkey.org website

With a little data cleaning, we could extract the breaches dates in the article to have them well formed for our analysis.

Correlation Analysis Between Bitcoin Buy Signal and Data Breaches

First we need a metric for bitcoin buy signal, one well known Indicator in stocks and cryptocurrency is called Relative Strength Index or RSI. Simply it produces a value between 0 to 100 using prior prices of the commodity, indicating how good is to buy or sell a stock or coin. The lower the RSI means its better to buy a commodity, and the higher the RSI means its better to sell a commodity. For sure, trading isn't that simple and it require a lot of knowledge to perform well. But RSI is a good enough approximation for bitcoins buy signal in our analysis.

To make a better and less noisy experiment, we need to summarize the data such that we could ignore noises. One way to do so, is summarizing the data to larger periods. I’ve selected monthly intervals, and then calculated median of daily RSI, and count of data breaches, in that month. The data is shown in the chart bellow, as we can see two variables have some relations and sometimes when the RSI makes a buy signal and drops, we see an increase in the number of data breaches.

Median of RSI Index, and Number of Data Breaches per Month

After that I’ve used Pearson correlation coefficient from scipy library in python to calculate correlation and p-value. It led to the following results:

Correlation: -0.12
p-value: 0.66

as we can see the p-value is high, so we cannot conclude that the results are robust. In statistical tests, we assume a p-value is good when it is less that 0.05 or 0.01. But the good news is that correlation is negative and this is what we wanted it to be. But something that is missed here is that what if Buy Signal is repeated consecutive months? The hackers will probably sell on the first month and it makes error in the above model. In the next section, we will test a simpler model that hopefully will not have such problems.

Correlation Analysis Between Bitcoin Price and Data Breaches

In the previous section, we tested the correlations between bitcoin buy signal and data breaches, but it may not reflect exactly as how hackers behave. They may have a simpler formula in mind which is sell the data when bitcoin price is low, no matter what the RSI or other technical analysis indicators say. If one believes in bitcoin and want to hold it for a long period of time, that is a simple strategy that just works.

In order to make this analysis, we calculated the median of bitcoin price monthly, and count of data breaches per month to verify the correlations between them. The data is shown in the chart bellow, as we can see two variables have better relations and when the price drops, we see an increase in the number of data breaches.

Median of Bitcoin Price, and Number of Data Breaches per month

So let’s again calculate Pearson correlation and p-value for this experiment.

Correlation: -0.71
p-value: 0.002

Wow! it seems that there is a strong negative correlation between the bitcoin price and number of data breaches! with the less than 0.01 p-value, the experiment is statistically significant. So we can say with confidence that our hypothesis about the correlations between bitcoin price and data breaches is correct.

Conclusion and More Thoughts

The experiments above was a part of verifying the reason behind increase of data breaches, but as the statisticians say “Correlations doesn't imply causation.”. Maybe there exist another factor that is effecting these two indexes at the same time, but keep in mind that we cannot publish polls for hackers to understand what their real motives are, and with the strong correlation between two indexes in the last experiment, we may assume that when the bitcoin price goes down, hackers publish data breaches more frequently to earn more.

Another problem in the experiments is that Pearson correlations analysis assume that X and Y comes from a Gaussian distribution, but does the RSI or Bitcoin Price come from a Gaussian distribution? we didn’t verify that. But a data scientist assuming an unknown distribution, Gaussian, is nothing new in the field. Ha-ha!

If you know better experiments to testify this hypothesis or larger datasets for this purpose, please let me know in comments.

--

--