OHLCVcandlestick data. However, it doesn’t always have to be that way! In this week’s post, we will take a look at how to download historical UK Met Office climate data which could then be used for backtesting in Backtrader, Quantconnect, tinkering in Excel or importing into any other platform that allows you to use your own data.
Why Look at Climate Data?On a large scale, the weather has the power to cause widespread devastation, impact supply chains and cause shortages in the supply of goods. When this happens, it will impact both businesses and consumers alike. On a smaller scale, the impact of weather on our daily lives can be subtle but significant. For example, a series of wet weekends may prevent us from hitting the high street and spending our pennies. Alternatively, a sustained cold spring might delay people from purchasing newer lighter clothes. As you can imagine, both of these factors could have quite an impact on retail businesses. In fact, the weather has been attributed to a decline in sales before. Take this excerpt from an article written on the BBC in 2016 as an example:
Reference: https://www.bbc.com/news/business-37336511 So with that in mind, having access to weather statistics could (
A warm winter and a cold spring has been blamed for a fall in sales at low-cost fashion retailer Primark.Shoppers left winter clothes on the rails in the run up to Christmas due to unusually warm weather, and a cold March and April depressed sales of summer clothes. As a result Primark, like other stores, had to cut the price tag to sell them. The retailer expects like-for-like sales, which ignore new-store sales, to fall 2% for the year to 17 September. “If the weather’s warm consumers make do with clothes from the previous year,” said Maria Malone, principal lecturer for fashion business at Manchester Metropolitan University.
keyword = could!) help you to predict future price movements.
The CodeNote that the code in this article was written specifically to download and massage the data supplied by the met office at the time of writing. They could change the format or stop supplying the data at any time which would result in broken code. Unfortunately, this is just one of the perils of obtaining free data. The supplier has no obligation to continue the service and can freely change anything at will.
''' Author: www.backtest-rookies.com MIT License Copyright (c) 2019 backtest-rookies.com Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ''' import pandas as pd import numpy as np import csv import requests import os def get_data(url, n=7): # This list will be used for storing each row of the downloaded data. We can # then use this list to make a dataframe later. data =  # The labels will be used for naming the columns in the pandas dataframe. labels = ['year', 'month', 'tmax degC', 'tmin degC', 'af days', 'rain mm', 'sun hours', 'notes'] # Get our data r = requests.get(url) # First save the text file. It makes it a little easier to remove the # first 7 lines. with open('temp.txt', 'wb') as f: f.write(r.content) # Then open the same file and remove the lines. with open('temp.txt',) as f, open("temp2.txt", "w") as out: for x in range(n): next(f) for line in f: out.write(line) with open('temp2.txt') as f: for line in f: # Get rid of the # marks in the data row = line.replace("#",'').split() data.append(row) # Create our dataframe df = pd.DataFrame.from_records(data, columns=labels) # Replace missing data with NaN df = df.replace('---', np.nan) df['datetime'] = pd.to_datetime(df['year'] + '-' + df['month']) # Set a datetime index - Will be useful later in backtrader df = df.set_index(df['datetime']) # Extract the columns we want df = df[['tmax degC', 'tmin degC', 'af days', 'rain mm', 'sun hours', 'notes']] # Clean up os.remove('temp.txt') os.remove('temp2.txt') return df if __name__ == '__main__': data_url = 'https://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/heathrowdata.txt' df = get_data(data_url) print(df) df.to_csv('Weather_data.csv')
Code CommentaryThe data we want is stored in a text file and is accessible via a URL link. This makes it quite easy to obtain. A single line utilizing the
requestsmodule is all we need. The challenging work of this post is massaging the text file into a format we can work with. For your information, the link below is a direct link to the data we are using in this post. Met Office Data: https://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/heathrowdata.txt Note: You can also get access to data from other stations across the UK using this link: https://www.metoffice.gov.uk/public/weather/climate-historic/#?tab=climateHistoric If you click on the direct link to the text file, you will notice that the text file contains a few lines of information at the top describing the data. This is useful information for most people but we don’t want it. We are interested in the table of data below it. So the challenge of the day is to remove the first
xlines of the text file. In the example code, we do this by:
- Downloading & saving the text file locally
- Opening the downloaded text file and then iterating (looping) through the first
xlines until we get to the start of the table
- Next, we open a second temporary file.
- Finally, we start writing each line of data from the start of the table in the first file to the second temporary text file.
split()to extract each word in the row and store them into a list. To help you visualize that, if you printed each list as we loop through the lines, it would look something like this: At the same time we are doing this we also remove any instances of the character
#. This mark appears in the sunshine data column and just notes that the data was taken from an automatic Kipp & Zonen sensor. If we don’t remove it, it could cause issues later when we want to work with the saved data. Once we have saved the row as a list, we can then append (add it) to another list called
datathat we created at the start of the
get_data()function. Now we have a list containing many other lists (nested list), we are ready to create a
pandasdataframe! Once, the dataframe is created, we clean up the data inside it by:
- Replacing any
np.NaN. Again, this will help us when we want to start working with the data.
- Converting the year and month to a
datetimeobject. This is not strictly required as you could combine them into a string with the correct format. However, should you wish to work with the dataframe later, datetime objects are handier.
- Exporting only the columns we want.
get_data()function, we save it as to a CSV file. However, it is worth noting that you don’t have to do this. You could do something else with the dataframe such as feeding the results directly into Backtrader or playing around calculating averages and creating charts among other things.
ResultsOn running the script you will see that the dataframe is printed and a
Weather_data.csvfile shall appear in the directory from which the script was run.
Find This Post Useful?
If this post saved you time and effort, please consider support the site! There are many ways to support us and some won’t even cost you a penny.