How to Download Transactions Data and Perform a Backtest (R)

Overview

Note: to complete this example you will need to install the AvenueAPI client for R.

In this example we download data for Chipotle using 7Park’s “Spending Intel” credit card panel. These metrics are later joined with publicly available revenue estimates to illustrate a backtest procedure. To replicate this example you will need a valid 7ParkData API key (contact your 7ParkData sales representative to request a key).

 

Pulling Credit Card Data from the Avenue API

To start, we use the fetch_revenue_series function to quickly download and restructure a credit card series, saving the JSON return as well as two dataframes in both “long” and “wide” formats. We can retrieve the raw JSON object and save it as a structured list in R with the following code:

 

# Load the AvenueAPI client
library(AvenueAPI)

# Store your API key (please contact your sales representative if you do not have a key)
# ave_key <- 'YOUR_KEY_HERE' 

# This initiates an S4 Class object with associated methods for calling the API:
ave <- connect_avenue(api_key=ave_key) 

# Retrieve data for Chipotle, saving the JSON return in the global environment.
chip <- fetch_revenue_series(ave, firm="Chipotle", data_source="cc", start_date='2015-01-01’)

 

The above commands will create a structured data object for Chipotle in your working environment (in this example, within the object named “chip”). The “chip” object is a nested list and contains, in addition to the data itself, information about the API call, any success/failure messages, and a summary of available metrics. While this information provides a helpful orientation to the scope of the data object returned by the API, long and wide data formats are typically easier to work with for quick plotting and statistical analysis. With this in mind, we provide a generic function – transform_avenue_series – to convert the nested JSON return into a long or wide dataframe. These transformations are quickly performed with the following code:

 

# Transform the JSON return into long and wide formats to facilitate quick analysis:
chiplong <- transform_avenue_series(chip)
chipwide <- transform_avenue_series(chip, wide = TRUE) 

 

Aggregating the Dataframe

Having downloaded data for Chipotle, we now aggregate the “wide” dataframe by quarter in order to merge the 7Park parameter estimates with publicly reported revenues.

 

# First, examine the topmost rows of dataset: 
head(chipwide) 

 

##         date   revenue   volume     firm
## 1 2015-01-01  166.3541  16.2890 Chipotle
## 2 2015-01-02 2035.3276 153.2186 Chipotle
## 3 2015-01-03 1197.2990  84.2998 Chipotle
## 4 2015-01-04 1475.9870  86.3647 Chipotle
## 5 2015-01-05 5039.9666 372.4294 Chipotle
## 6 2015-01-06 2059.3876 189.3025 Chipotle

 

Note that, once transformed, data are properly typed by the AvenueAPI package:

 

str(chipwide)

 

## 'data.frame':    939 obs. of  4 variables:
##  $ date   : Date, format: "2015-01-01" "2015-01-02" ...
##  $ revenue: num  166 2035 1197 1476 5040 ...
##  $ volume : num  16.3 153.2 84.3 86.4 372.4 ...
##  $ firm   : chr  "Chipotle" "Chipotle" "Chipotle" "Chipotle" ...

 

Because Chipotle follows a standard fiscal year, we can create a simple calendar quarter key using lubridate:

 

library(lubridate)
chipwide$yr_qrt_fisc <- paste(year(chipwide$date), quarter(chipwide$date), sep='-')

 

Next, we use this key to aggregate the data by quarter with dplyr, using the sum of 7Park’s daily revenue estimate:

 

chipqrt <- 
    filter(chipwide, year(date) < 2017) %>% 
    arrange(date) %>%
    group_by(yr_qrt_fisc) %>% 
    summarize(firm=first(firm),
              revenue=sum(revenue, na.rm=TRUE),
              qrt_end <- last(date)
    )

# This aggregation returns a simple dataframe:
head(chipqrt)

 

## # A tibble: 6 x 4
##   yr_qrt_fisc     firm  revenue `qrt_end <- last(date)`
##                                   
## 1      2015-1 Chipotle 291610.0              2015-03-31
## 2      2015-2 Chipotle 350226.0              2015-06-30
## 3      2015-3 Chipotle 367680.2              2015-09-30
## 4      2015-4 Chipotle 317503.1              2015-12-31
## 5      2016-1 Chipotle 250835.8              2016-03-31
## 6      2016-2 Chipotle 304471.4              2016-06-30

 

Merging the Revenue Series

In the code chunk that follows, we merge the aggregated 7Park series with public revenue estimates for CHG. To follow this example you will need a dataframe with Chipotle’s quarterly revenue. Here, we use the following data structure for 10q revenues:

 

head(chip10q)

 

##   ticker yr_qrt_fisc  ebitda revenues
## 1    CMG      2010-1  78.305  409.686
## 2    CMG      2010-2  92.439  466.841
## 3    CMG      2010-3  95.298  476.874
## 4    CMG      2010-4  92.209  482.521
## 5    CMG      2011-1  93.315  509.384
## 6    CMG      2011-2 102.368  571.561

 

After loading Chipotle’s public revenue estimates into your R session, joining the data is easily accomplished with the following code:

 

# Join the records
btest <- left_join(chipqrt, chip10q, by = "yr_qrt_fisc") %>% arrange

 

Chipotle Revenue Backtest

With our files now merged, R’s lm function provides a straightfoward method for computing an R2 value:

 

# Perform a basic regression of the two series:
summary(lm(revenues ~ revenue, data=btest))

 

## 
## Call:
## lm(formula = revenues ~ revenue, data = btest)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -61.14 -23.13 -19.24  12.20 110.25 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 7.956e+01  1.924e+02   0.413  0.69362   
## revenue     3.084e-03  6.076e-04   5.075  0.00228 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57.09 on 6 degrees of freedom
## Multiple R-squared:  0.8111, Adjusted R-squared:  0.7796 
## F-statistic: 25.76 on 1 and 6 DF,  p-value: 0.002276

 

Similarly, we can compute a comparable R2 estimate for quarter-over-quarter percent change with the following code:

 

# Calculate Q-over-Q change using a 1-quarter lag for both series:
btest$revenues_10q_QQ <- (btest$revenues - lag(btest$revenues, 1)) / lag(btest$revenues, 1)
btest$revenues_7pd_QQ <- (btest$revenue  - lag(btest$revenue, 1))  / lag(btest$revenue,  1)

# Perform a basic regression of the Q-over-Q variables:
summary(lm(revenues_10q_QQ ~ revenues_7pd_QQ, data=btest))

 

## 
## Call:
## lm(formula = revenues_10q_QQ ~ revenues_7pd_QQ, data = btest)
## 
## Residuals:
##         2         3         4         5         6         7         8 
## -0.047077 -0.005946 -0.048092  0.029528  0.038915  0.024153  0.008519 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.01932    0.01494  -1.293 0.252550    
## revenues_7pd_QQ  0.82707    0.10104   8.185 0.000443 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.039 on 5 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.9306, Adjusted R-squared:  0.9167 
## F-statistic:    67 on 1 and 5 DF,  p-value: 0.0004427

 

Comparing the two series, we find an adjusted R-squared value of 0.7796 for the raw values and 0.9167 for quarter-over-quarter percent change.

 

Joining Weekly Stock Data with the 7Park Revenue Estimate

Following the basic pattern of to the revenue backtest, joining stock data can be quickly accomplished with dplyr and lubridate. Here, you will need stock closing price data for Chipotle to replicate the example in your local R environment (if price data are not otherwise available to you, quantmod’s getSymbols function can be used to retrieve daily adjusted close values directly within R). The following data structure for the price data is used in this example:

 

head(chipprice)

 

##         date adj_close
## 1 2010-01-04     87.84
## 2 2010-01-05     89.02
## 3 2010-01-06     87.32
## 4 2010-01-07     86.43
## 5 2010-01-08     91.89
## 6 2010-01-11     96.77

 

In addition to adjusted closing price, we also calculate the 20-, 50-, and 200-day SMA for CHG using TTR:

 

# Use TTR to calculate various SMA values
library(TTR)
chipprice <- arrange(chipprice, date) # Ensure row order
chipprice$sma20 <- as.vector(TTR::SMA(chipprice$adj_close,  n = 20))
chipprice$sma50 <- as.vector(TTR::SMA(chipprice$adj_close,  n = 50))
chipprice$sma200 <- as.vector(TTR::SMA(chipprice$adj_close, n = 200))

 

With the price data loaded into memory, we join these records to the 7Park series using the following code:

 

# Note that the 7Park revenue and volume metrics are generally available on non-trading days.
days <- left_join(chipwide, chipprice, by = "date") %>% arrange(date)

# Next, caculate an indicator for the isoweek using the date vector:
days$iso_week <- paste(isoyear(days$date), isoweek(days$date), sep='-')

# Finally, aggregate the combined series using isoweek:
wtest <- arrange(days, date) %>%
    group_by(iso_week) %>% 
    summarize(firm=first(firm),
              revenue=sum(revenue, na.rm=TRUE),
              volume=sum(volume, na.rm=TRUE),
              # Note: these calls to "last" take the final non-missing value for each isoweek
              # Alternatively, you may also take the weekly mean using, e.g.: 
              #    adj_close_wk_end=mean(adj_close, na.rm=TRUE)
              adj_close_wk_end =dplyr::last(adj_close[which( !is.na(adj_close))]),
              price_last_sma20 =dplyr::last(sma20[which( !is.na(sma20))]),
              price_last_sma50 =dplyr::last(sma50[which( !is.na(sma50))]),
              price_last_sma200=dplyr::last(sma200[which(!is.na(sma200))]),
              date_end=max(date) ) %>% 
    arrange(date_end) %>%
    filter(date_end <= '2017-05-31', date_end > '2015-05-31')

 

Plotting the Results

Using the weekly data, we are now ready to build a chart with ggplot2 that combines 7Park’s revenue series and our various stock metrics. In the code below we also indicate, using dashed vertical lines, several important news events for CHG:

 

library(ggplot2)
ggplot(data = wtest, aes(x = date_end, y = scale(volume))) +
    annotate("text", x=as.Date("2016-05-20"), y=-3, 
             label="Closed > 2000 locations for training", colour = "blue") +
    annotate("text", x=as.Date("2015-07-10"), y=-3, 
             label="Norovirus", colour = "red") +
    annotate("text", x=as.Date("2015-10-10"), y=-3, 
             label="E Coli", colour = "green") +
    theme(plot.title = element_text(hjust = 0.5)) +
    geom_point(size=1, alpha = 0) +
    geom_smooth(aes(y = scale(revenue), colour = "trend"), se = FALSE, method = 'loess') +
    geom_line(aes(y = scale(revenue), colour = "revenue")) +
    geom_line(aes(y = scale(adj_close_wk_end), colour = "adj_close_wk_end")) +
    geom_line(aes(y = scale(price_last_sma20), colour = "price_last_sma20")) +
    geom_vline(xintercept = as.numeric(as.Date("2015-08-15")), linetype = 4, colour = "red") +
    geom_vline(xintercept = as.numeric(as.Date("2015-12-01")), linetype = 4, colour = "red") +
    geom_vline(xintercept = as.numeric(as.Date("2017-07-15")), linetype = 4, colour = "red") +
    geom_vline(xintercept = as.numeric(as.Date("2015-10-30")), linetype = 4, colour = "green") +
    geom_vline(xintercept = as.numeric(as.Date("2015-11-30")), linetype = 4, colour = "green") +
    geom_vline(xintercept = as.numeric(as.Date("2016-02-15")), linetype = 4, colour = "blue") +
    ylab("Normalized Values") + xlab("") +
    ggtitle("Chipotle: 2016 Norovirus (red) & E. Coli (green) outbreaks plotted with various indicators") +
    scale_color_hue(labels=c("Adjusted Close", 
                             "Adjusted Close (SMA20)", 
                             "7Park Revenue",  
                             "7Park Revenue (Trend)")) +
    theme(axis.text.x = element_text(angle = 45, hjust=1), legend.title = element_blank()) +
    xlab("Date") +
    scale_x_date(date_breaks = "1 month", date_labels = "%b %y")