Alternative data can easily be overlooked by us retail traders. Often this is because it can be hard to obtain or the data comes with a large price tag attached to it. Instead, we usually have to make do with whatever data is available on our trading platform of choice and as such, this frequently means sticking to
OHLCV candlestick data. However, it doesn’t always have to be that way! In this week’s post, we will take a look at how to download historical UK Met Office climate data which could then be used for backtesting in Backtrader, Quantconnect, tinkering in Excel or importing into any other platform that allows you to use your own data.
Why Look at Climate Data?
On a large scale, the weather has the power to cause widespread devastation, impact supply chains and cause shortages in the supply of goods. When this happens, it will impact both businesses and consumers alike. On a smaller scale, the impact of weather on our daily lives can be subtle but significant. For example, a series of wet weekends may prevent us from hitting the high street and spending our pennies. Alternatively, a sustained cold spring might delay people from purchasing newer lighter clothes. As you can imagine, both of these factors could have quite an impact on retail businesses. In fact, the weather has been attributed to a decline in sales before. Take this excerpt from an article written on the BBC in 2016 as an example:
A warm winter and a cold spring has been blamed for a fall in sales at low-cost fashion retailer Primark.
Shoppers left winter clothes on the rails in the run up to Christmas due to unusually warm weather, and a cold March and April depressed sales of summer clothes.
As a result Primark, like other stores, had to cut the price tag to sell them.
The retailer expects like-for-like sales, which ignore new-store sales, to fall 2% for the year to 17 September.
“If the weather’s warm consumers make do with clothes from the previous year,” said Maria Malone, principal lecturer for fashion business at Manchester Metropolitan University.
So with that in mind, having access to weather statistics could (
keyword = could!) help you to predict future price movements.
Note that the code in this article was written specifically to download and massage the data supplied by the met office at the time of writing. They could change the format or stop supplying the data at any time which would result in broken code. Unfortunately, this is just one of the perils of obtaining free data. The supplier has no obligation to continue the service and can freely change anything at will.
Copyright (c) 2019 backtest-rookies.com
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
import pandas as pd
import numpy as np
def get_data(url, n=7):
# This list will be used for storing each row of the downloaded data. We can
# then use this list to make a dataframe later.
data = 
# The labels will be used for naming the columns in the pandas dataframe.
labels = ['year', 'month', 'tmax degC', 'tmin degC', 'af days', 'rain mm', 'sun hours', 'notes']
# Get our data
r = requests.get(url)
# First save the text file. It makes it a little easier to remove the
# first 7 lines.
with open('temp.txt', 'wb') as f:
# Then open the same file and remove the lines.
with open('temp.txt',) as f, open("temp2.txt", "w") as out:
for x in range(n):
for line in f:
with open('temp2.txt') as f:
for line in f:
# Get rid of the # marks in the data
row = line.replace("#",'').split()
# Create our dataframe
df = pd.DataFrame.from_records(data, columns=labels)
# Replace missing data with NaN
df = df.replace('---', np.nan)
df['datetime'] = pd.to_datetime(df['year'] + '-' + df['month'])
# Set a datetime index - Will be useful later in backtrader
df = df.set_index(df['datetime'])
# Extract the columns we want
df = df[['tmax degC', 'tmin degC', 'af days', 'rain mm', 'sun hours', 'notes']]
# Clean up
if __name__ == '__main__':
data_url = 'https://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/heathrowdata.txt'
df = get_data(data_url)
The data we want is stored in a text file and is accessible via a URL link. This makes it quite easy to obtain. A single line utilizing the
requestsmodule is all we need. The challenging work of this post is massaging the text file into a format we can work with.
For your information, the link below is a direct link to the data we are using in this post.
Note: You can also get access to data from other stations across the UK using this link: https://www.metoffice.gov.uk/public/weather/climate-historic/#?tab=climateHistoric
If you click on the direct link to the text file, you will notice that the text file contains a few lines of information at the top describing the data. This is useful information for most people but we don’t want it. We are interested in the table of data below it.
So the challenge of the day is to remove the first
xlines of the text file. In the example code, we do this by:
- Downloading & saving the text file locally
- Opening the downloaded text file and then iterating (looping) through the first
xlines until we get to the start of the table
- Next, we open a second temporary file.
- Finally, we start writing each line of data from the start of the table in the first file to the second temporary text file.
Admittedly, this seems a little inefficient. If you know a cleaner way of doing this, feel free to share in the comments below!
Now that we have a new text file containing only the table data, we can open it start to loop through each line. Each line of the file is now the same as a single row in the table which is handy. While we are looping through, we use
split()to extract each word in the row and store them into a list. To help you visualize that, if you printed each list as we loop through the lines, it would look something like this:
At the same time we are doing this we also remove any instances of the character
#. This mark appears in the sunshine data column and just notes that the data was taken from an automatic Kipp & Zonen sensor. If we don’t remove it, it could cause issues later when we want to work with the saved data.
Once we have saved the row as a list, we can then append (add it) to another list called
datathat we created at the start of the
Now we have a list containing many other lists (nested list), we are ready to create a
pandas dataframe! Once, the dataframe is created, we clean up the data inside it by:
- Replacing any
np.NaN. Again, this will help us when we want to start working with the data.
- Converting the year and month to a
datetimeobject. This is not strictly required as you could combine them into a string with the correct format. However, should you wish to work with the dataframe later, datetime objects are handier.
- Exporting only the columns we want.
After we return the dataframe from the
get_data()function, we save it as to a CSV file. However, it is worth noting that you don’t have to do this. You could do something else with the dataframe such as feeding the results directly into Backtrader or playing around calculating averages and creating charts among other things.
On running the script you will see that the dataframe is printed and a
Weather_data.csv file shall appear in the directory from which the script was run.