BTC Price vs. Search Interest


In this first Lambda School portfolio project we are practicing the following skills:

Description of Data

I selected the following two datasets:

The Bitcoin Historical Price Data is from and contains 8 columns covering a period from 2012–01–01 to 2020–12–31:

To create the Google Trends search interest file, the past 5 years was selected in the time window options and a .csv file was downloaded. It contains 2 columns:


Statistical Methods

My Hypothesis: Bitcoin Price and Bitcoin Search Interest are correlated.

Null hypothesis: There is not a statically significant relationship between Bitcoin Search Interest and Bitcoin Price.

Alternative hypothesis: There is a statistically significant relationship between Bitcoin Search Interest and Bitcoin Price.

A one-sample t-test and an ordinary least squares regression was performed to test the hypothesis.

Data Wrangling

The following was conducted to prep the raw data:

Data Visualization:

Time Series: BTC Price and Search Interest; BTC Search Interest vs. Price

We can see Bitcoin Price and Search Interest are correlated somewhat for lower prices, while they are less correlated as search interest picks up.


The statsmodel.formula.api library was utilized to perform the t-test.

Output for alpha = .05 and 95% Confidence Interval :

BTC Search Interest vs. Price

At the alpha = .05 level, We reject the null hypothesis and conclude there is a statistically significant relationship between BTC price and Google Search Interest.

While this correlation is high for a single sentiment measure, the correlation is weak if our goal was to develop a trading strategy utilizing the model to determine when BTC is overpriced or underpriced. We would most certainly lose money.

Volatility and Measurement Frequency

Historically BTC price has been very volatile. The below figure shows five 50 week periods and the % change relative to starting price in weekly mean price. Over 50 week periods, -70% and +1,600 % change in prices were observed:

In this example, the price of BTC is measured at a much greater frequency than Bitcoin Search Interest. This was resolved by our resample operation that took the mean of 10,880 BTC price measurements over a day.

It follows, if we were to compare our model’s prediction of price to any real-time measurement of price, we would expect the prediction to have an additional error term equal to the delta between the unknown real-time value and last measurement of the Google Search Interest.

Mean Daily/Weekly Price vs. BTC Search Interest

Upon observation of our initial R² and thinking through implications of our resampling, I considered whether another variable in the Bitcoin dataset would have a stronger correlation with Search. Notable points of conjecture:

These lead me to suspect Volume (Price * Quantity) traded would have a stronger correlation with BTC searches and would lose less information via a resample operation using sum. Let’s see:

Output for alpha = .05, and 95% Confidence Interval :

BTC Search Interest vs. Volume Traded ($B)

Less outliers than Price vs. Search Interest. We reject the null again. Stronger correlation.


Our results suggest that a 1% increase in BTC Search Interest corresponds to a $250 increase in BTC price or $24 M in volume traded over the time period considered.

A 55% correlation between BTC Volume Traded and Search Interest is quite high. This may suggest BTC price movement — relative to comparable assets — is largely driven by retail investors, and this could be explored by performing the same exercise for precious metals or stocks.

Potential follow-on items to improve BTC price and sentiment correlation, my own investment thesis with BTC is it is speculative asset experiencing boom-bust cycles riding on top of a very interesting growing value-prop/infrastructure:

Finally, I am left with the following questions for further exploration as I continue through my Data Science journey:

Lambda School Student, Past: Aerospace Engineering, MBA, Strategist