How to Download Transactions Data and Perform a Backtest (Python)

Part I – Install the CLI Tool

If you have not already done so, please look at the installation guide for the CLI tool. The installation guide also assists with directory/folder structure set up for a clean work environment. While this step is not mandatory, it will make following some steps in the code below easier. The code below assumes that directory structure.

That said, if the CLI tool is installed and directory structure set up, please move to next step to retrieve data.

 

Part II – Get Data from the API

Navigate to the directory where the unpacked contents of the installation zip file are, and move to folder ‘bin’. This folder contains the relevant python scripts containing code that will allow us to read the data. So, enter the following command:

cd ~/path-to-project-folder/spd-cli-0.2.0/bin

The next command retrieves the data, which we will collect in a csv file that we name “merchant_daily2_chipotle.csv”. In the command below, “[Token]” will be replaced by a valid API Key (contact your 7Park sales representative to request a key). The example we are using takes the following values:

Merchant: Chipotle

Date Range: 2013-01-01 to 2017-07-23

python get_merchant_daily.py –source “Merchant Intel2” –merchants Chipotle –date_from 2013-01-01 –date_to 2017-07-23 –token [enter token value] –format csv > ../../output/merchant_daily2_chipotle.csv

 

Part III – Setting Up Python Tools to Start Exploring the Data

Now that we have the data in a csv, let’s do some analysis.

To interact with the data, we will be using python in this example, and more specifically jupyter notebooks in python. So, let us check if we have Jupyter Notebook installed in our python library.

The command below will show you the list of packages installed on python, see if jupyter notebook is in there.

pip list

If yes, ignore the following command. If not, let’s install it with the following command.

pip install jupyter

Now that the test is complete, let’s navigate to the notebooks folder we created in Part I and then launch jupyter notebook:

cd ../../notebooks/ jupyter notebook

With the execution of the above command, the current directory’s (/notebooks) contents will have opened up in your default browser (this should be an empty folder right now). Click on ‘New’ in the top right corner to launch a new python notebook, choose ‘root’ or ‘python2’ from the dropdown menu. A new tab will open, adjacent to the current tab, with a new jupyter notebook. Now we can start writing some code and exploring the data.

Note:

To start our exploration, we will be taking advantage of python’s libraries/packages (just as we did with jupyter notebook). So, our firs step will be to import the relevant packages in python.

If you do not have a package, simply navigate to the terminal window (or open a new terminal window) and run

pip install [enter package name]

For example, if you do not have pandas installed as a library, in the terminal window type:

pip install pandas

If the installation worked without errors, navigate back to the jupyter notebook in the browser and rerun the import command. This time it should work.

 

Part IV – Working with the Data

Now that you are in the jupyter notebook in your browser.

In the following chunks of code we will be loading the daily data from our exercise above into a Pandas DataFrame. Aggregating it into weekly and quarterly DataFrames and finally making some simple plots with it.

Let’s start writing some python code to work with the data. Follow the commands shown below:


Python code begins here:

Import relevant libraries.

If a package is not installed, refer to above guidelines to install it.

 

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import datetime
plt.matplotlib.style.use('ggplot')
from scipy.stats import zscore
%matplotlib inline

 

Reading in reported data by company – price and revenue values

In the code chunk that follows, we load public revenue estimates for Chipotle. To follow this example you will need a file or DataFrame with Chipotle’s quarterly revenue estimates and daily adj close prices along with the sma 20, sma 50 and sma 200 timeseries.

After loading the data, we change the data type of column ‘date’ so that we can manipulate dates using Pandas library. Add an ISO week column to the daily price data to get week-ending date’s adj close price.

 

# read in reported revenue and stock price data
rev = pd.read_csv("./../input/CMG_fund.csv")
price = pd.read_csv("./../input/CMG_price.csv")

# reformat the date columns, converting to date objects
rev['date'] = pd.to_datetime(rev.date)
price['date'] = pd.to_datetime(price.date)

# change column names 'revenues' to 'reported_revenue' to avoid confusion in the future:
rev.rename(columns={'revenues': 'reported_revenue'}, inplace=True)

# get week-ending adj close prices in the price data:
# for each iso week, get the max date of the week in the data
price_week_date = price.groupby('iso_week', as_index = False).agg({'date': np.max})
# join it with the daily price data to only get rows with the max date for each iso week
price_weekly = price_week_date.merge(price, how='inner', on=['iso_week','date'])
# sort values by date
price_weekly.sort_values(by='date', inplace = True)

# Looking at the data:
display('Quarterly Revenue Data:',rev.head(5),'Daily Price Data:',price.head(5), 'ISO-Week Ending Date Price Data:', price_weekly.head(5))

 

Quarterly Revenue Data:

ticker date iso_week yr_qrt_fisc ebitda reported_revenue
0 CMG 2010-03-31 2010-13 2010-1 78.305 409.686
1 CMG 2010-06-30 2010-26 2010-2 92.439 466.841
2 CMG 2010-09-30 2010-39 2010-3 95.298 476.874
3 CMG 2010-12-31 2010-52 2010-4 92.209 482.521
4 CMG 2011-03-31 2011-13 2011-1 93.315 509.384

 

Daily Price Data:

ticker date iso_week yr_qrt_fisc sma_20 sma_200 sma_50 total_return_forward_adjusted_price price
0 CMG 2010-01-04 2010-1 2010-1 88.3895 84.1880 86.3686 87.84 87.84
1 CMG 2010-01-05 2010-1 2010-1 88.5995 84.3276 86.3586 89.02 89.02
2 CMG 2010-01-06 2010-1 2010-1 88.7715 84.4450 86.4580 87.32 87.32
3 CMG 2010-01-07 2010-1 2010-1 88.8985 84.5546 86.5552 86.43 86.43
4 CMG 2010-01-08 2010-1 2010-1 89.2350 84.6868 86.7982 91.89 91.89

 

ISO-Week Ending Date Price Data:

iso_week date ticker yr_qrt_fisc sma_20 sma_200 sma_50 total_return_forward_adjusted_price price
0 2010-1 2010-01-08 CMG 2010-1 89.2350 84.6868 86.7982 91.89 91.89
11 2010-2 2010-01-15 CMG 2010-1 91.4005 85.4240 88.3260 98.06 98.06
22 2010-3 2010-01-22 CMG 2010-1 93.1985 86.0129 89.3700 97.77 97.77
33 2010-4 2010-01-29 CMG 2010-1 94.9010 86.6352 90.1878 96.46 96.46
44 2010-5 2010-02-05 CMG 2010-1 97.7285 87.0992 91.4734 95.20 95.20

 

Reading and Transforming 7PD data from our outputs above

Read the data in and see what it looks like. In this step, we also append the column names with a ‘7P’ prefix to show they are 7P data columns and not to be confused with our revenue and price datasets loaded above, when merged.

 

# Read in the data
data = pd.read_csv('../output/merchant_daily2_chipotle.csv')

# Add 7P prefix to all 7Park data DataFrames to avoid confusion
data.columns = ['7P_' + s for s in data.columns]

# Let's look at what this data looks like
display('Data:',data.head())
display('Data Types:', data.dtypes)

 

Data:

7P_source 7P_msa 7P_zip_code 7P_merchant 7P_date 7P_revenue 7P_volume
0 Merchant Intel2 NaN NaN Chipotle 2014-01-08 45.1913 5.4645
1 Merchant Intel2 NaN NaN Chipotle 2014-01-09 70.2659 5.4259
2 Merchant Intel2 NaN NaN Chipotle 2014-01-11 95.8801 5.3505
3 Merchant Intel2 NaN NaN Chipotle 2014-01-12 43.3280 5.3163
4 Merchant Intel2 NaN NaN Chipotle 2014-01-17 114.6626 10.1471

 

Data Types:

7P_source       object
7P_msa         float64
7P_zip_code    float64
7P_merchant     object
7P_date         object
7P_revenue     float64
7P_volume      float64
dtype: object

 

As you will see above, the datatype of the 7P_date column is ‘object’. To perform any kind of date manipulations, we need Pandas (python library) to understand that it is a date column. So, in the next chunk of code, we change the data type of the date column in the dataframe above from object to datetime to perform date manipuluations. In addition, to ensure any missing days of data do not cause problems in the future, we create a continuous date series and append our DataFrame to it. Lastly, we add ISO week and calendar quarter columns to the DataFrame to eventually aggregate the data to weekly and quarterly timseries.

 

# change date format from object to datetime format
data['7P_date_orig'] = pd.to_datetime(data['7P_date'])
data.drop('7P_date', axis=1, inplace=True)

# make a continuous dates dataframe to merge with data above to ensure missing dates do not impact future calculations
min_date = data['7P_date_orig'].min()
max_date = data['7P_date_orig'].max()
numdays = (max_date - min_date).days
date_list= [min_date + datetime.timedelta(days=x) for x in range(0, numdays)]
date_dict = {'7P_date': date_list}
df_dates = pd.DataFrame(date_dict)
df_dates['7P_iso_week'] = df_dates['7P_date'].apply(lambda x: str(x.isocalendar()[0]) + '-' + str(x.isocalendar()[1]))

# merge the dates column with the data dataframe
data = df_dates.merge(data, how = 'left', left_on = '7P_date',right_on = '7P_date_orig')
# dropping redundant 7P_date_orig column
data.drop('7P_date_orig', axis = 1, inplace = True)

# add iso year and week column
data['7P_iso_week'] = data['7P_date'].apply(lambda x: str(x.isocalendar()[0]) + '-' + str(x.isocalendar()[1]))
# add iso quarter and week column
data['7P_yr_qrt_fisc'] = data['7P_date'].apply(lambda x: str(x.isocalendar()[0]) + '-' + str(x.quarter))

# display the data to see
display(data.head(5))

 

7P_date 7P_iso_week 7P_source 7P_msa 7P_zip_code 7P_merchant 7P_revenue 7P_volume 7P_yr_qrt_fisc
0 2014-01-08 2014-2 Merchant Intel2 NaN NaN Chipotle 45.1913 5.4645 2014-1
1 2014-01-09 2014-2 Merchant Intel2 NaN NaN Chipotle 70.2659 5.4259 2014-1
2 2014-01-10 2014-2 NaN NaN NaN NaN NaN NaN 2014-1
3 2014-01-11 2014-2 Merchant Intel2 NaN NaN Chipotle 95.8801 5.3505 2014-1
4 2014-01-12 2014-2 Merchant Intel2 NaN NaN Chipotle 43.3280 5.3163 2014-1

 

Now that the dataframe is ready for some transformations, in this chunk we perform the following actions:

  • aggregate the series by week and quarter
  • transform the data into a weekly timeseries DataFrame
  • transform the data into a quarterly timeseries DataFrame
  • display a small snippet of the DataFrames to see what they look like

 

# define grouping columns
grp_cols = ['7P_source','7P_merchant']
grp_cols_qtr = grp_cols + ['7P_yr_qrt_fisc']
grp_cols_wk = grp_cols + ['7P_iso_week']

# group and aggregate data frame by week and quarter into new dataframes:

# quarterly data
data_qtr = data.groupby(grp_cols_qtr, as_index = False).agg({'7P_revenue' : np.sum, '7P_volume': np.sum})

# weekly data
data_wk = data.groupby(grp_cols_wk, as_index = False).agg({'7P_revenue' : np.sum, '7P_volume': np.sum, '7P_date': max})
data_wk.sort_values(by='7P_iso_week', inplace = True)

# Looking at what they look like:
display('Quarterly 7PD Data:',data_qtr.head(5), 'Weekly 7PD Data:',data_wk.head(5))

 

Quarterly 7PD Data:

7P_source 7P_merchant 7P_yr_qrt_fisc 7P_revenue 7P_volume
0 Merchant Intel2 Chipotle 2014-1 542.2883 48.7693
1 Merchant Intel2 Chipotle 2014-2 730.3900 64.6565
2 Merchant Intel2 Chipotle 2014-3 999.4704 82.1070
3 Merchant Intel2 Chipotle 2014-4 1308.3422 110.2851
4 Merchant Intel2 Chipotle 2015-1 2841.7865 153.0745

 

Weekly 7PD Data:

7P_source 7P_merchant 7P_iso_week 7P_revenue 7P_date 7P_volume
0 Merchant Intel2 Chipotle 2014-10 28.5041 2014-03-07 4.2863
1 Merchant Intel2 Chipotle 2014-11 63.1915 2014-03-15 4.2553
2 Merchant Intel2 Chipotle 2014-12 41.9821 2014-03-21 4.2535
3 Merchant Intel2 Chipotle 2014-13 39.2827 2014-03-26 4.2699
4 Merchant Intel2 Chipotle 2014-14 55.7476 2014-04-06 4.3215

 

Backtesting Quarterly Data Against Reported Revenue Data

In this snippet of code we perform the following actions, all to build a backtest of reported revenue vs. 7Park revenue:

  • merge quarterly reported revenue timeseries with 7P quarterly aggregated timeseries
  • scale/normalize the merged DataFrame by calculating z-scores for each numeric column
  • Plot the merged DataFrame to show a backtest of the revenue

 

# merge reported revenue data with quarterly aggregated 7p data
data_qtr_rev = rev.merge(data_qtr, how='inner', left_on = 'yr_qrt_fisc', right_on = '7P_yr_qrt_fisc')

# normalize revenue and volume colums by calculating zscores on timeseries
for col in ['7P_revenue','7P_volume','reported_revenue']:
    data_qtr_rev[col + '_z'] = zscore(data_qtr_rev[col])

# select a subset of the columns from the dataset to be charted/displayed
columns_sublist = ['ticker','date','yr_qrt_fisc','7P_revenue_z','reported_revenue_z']
sub_qtr_rev = data_qtr_rev[columns_sublist]
sub_qtr_rev.set_index('yr_qrt_fisc',inplace = True)

# Plot this the subset 
ax = sub_qtr_rev.plot(figsize=(20,8), title = 'Normalized Values Backtest: 7P Revenue vs. Reported Revenue')
ax.set_xlabel("Fiscal Year and Quarter")
ax.set_ylabel("Normalized Quarterly Values")
ax.legend(labels=['7PD Revenue','Reported Revenue'])

 

matplotlib.legend.Legend at 0x10aa32390

 

 

Backtesting Weekly Aggregated Data Against Week-Ending Adj. Close Price MAs

In this snippet we will be merging the week-ending adj close price data with the 7P weekly aggragated data to eventually show a comparison of the two. We perform the following actions:

  • merge week-ending adj. close price timeseries with 7P weekly aggregated timeseries
  • scale/normalize the merged DataFrame by calculating z-scores for each numeric column

 

# merge reported revenue data with weekly aggregated 7PD data
data_wk_price = price_weekly.merge(data_wk, how='inner', left_on = 'iso_week', right_on = '7P_iso_week')

# normalize revenue and volume colums by calculating zscores on timeseries
zscore_cols_wk = ['7P_revenue','7P_volume','sma_20','sma_50','sma_200']
for col in zscore_cols_wk:
    data_wk_price[col + '_z'] = (data_wk_price[col] - data_wk_price[col].mean())/data_wk_price[col].std(ddof=0)

# select a subset of the columns from the dataset to be charted/displayed
columns_sublist = ['ticker','date','iso_week','7P_revenue_z','sma_20_z', 'sma_50_z']
sub_wk_price = data_wk_price[columns_sublist]

# set the date column as index
sub_wk_price.set_index('date', inplace=True)

# display
display(data_wk_price.head(5))

 

iso_week date ticker yr_qrt_fisc sma_20 sma_200 sma_50 total_return_forward_adjusted_price price 7P_source 7P_merchant 7P_iso_week 7P_revenue 7P_date 7P_volume 7P_revenue_z 7P_volume_z sma_20_z sma_50_z sma_200_z
0 2014-2 2014-01-10 CMG 2014-1 527.3105 427.1574 529.4446 538.57 538.57 Merchant Intel2 Chipotle 2014-2 254.6653 2014-01-12 21.5572 0.406181 1.913389 -0.135679 -0.138903 -1.261790
1 2014-3 2014-01-17 CMG 2014-1 531.1440 432.3198 529.2906 527.61 527.61 Merchant Intel2 Chipotle 2014-3 114.6626 2014-01-17 10.1471 -0.373535 -0.153536 -0.102250 -0.140282 -1.208697
2 2014-10 2014-03-07 CMG 2014-1 560.1190 462.8286 539.3250 593.41 593.41 Merchant Intel2 Chipotle 2014-10 28.5041 2014-03-07 4.2863 -0.853376 -1.215212 0.150421 -0.050438 -0.894924
3 2014-11 2014-03-14 CMG 2014-1 568.6570 468.1697 544.3706 575.96 575.96 Merchant Intel2 Chipotle 2014-11 63.1915 2014-03-15 4.2553 -0.660192 -1.220827 0.224875 -0.005261 -0.839993
4 2014-12 2014-03-21 CMG 2014-1 580.1110 474.1040 551.1112 611.12 611.12 Merchant Intel2 Chipotle 2014-12 41.9821 2014-03-21 4.2535 -0.778313 -1.221153 0.324757 0.055091 -0.778960

 

In this snippet we plot the final timeseries

 

# plot this data
plt.figure() 
ax = sub_wk_price.plot(figsize= (20,8),title = 'Weekly Aggregated 7PD Revenue vs. Week-ending Aj. Close Price (SMA 20 and 50), Normalized')
ax.set_xlabel("ISO Year and Week")
ax.set_ylabel("Normalized Time Series")
ax.legend(labels=['7PD Revenue','Adj. Close SMA 20','Adj. Close SMA 50'])

 

matplotlib.legend.Legend at 0x10b983110
matplotlib.figure.Figure at 0x10ba45250