Pulling App Engagement Data with Avenue API (Python)

Part I – Install the CLI Tool

If you have not already done so, please look at the installation guide for the CLI tool. The installation guide also assists with directory/folder structure set up for a clean work environment. While this step is not mandatory, it will make following some steps in the code below easier. The code below assumes that directory structure.

That said, if the CLI tool is installed and directory structure set up, please move to next step to retrieve data.

 

Part II – Get Data from the API

Navigate to the directory where the unpacked contents of the installation zip file are, and move to folder ‘bin’. This folder contains the relevant python scripts containing code that will allow us to read the data. So, enter the following command:

cd ~/path-to-project-folder/spd-cli-0.2.0/bin

The next command retrieves the data, which we will collect in a csv file that we name “app_usage_daily_facebook.csv”. In the command below, “[Token]” will be replaced by a valid API Key (contact your 7Park sales representative to request a key). The example we are using takes the following values:

App: Facebook

Date Range: 2013-01-01 to 2017-07-23

python get_app_usage_daily.py –pkgs com.facebook.katana –country_codes US –date_from 2013-01-01 –date_to 2017-07-23 –token [token value] –format csv >../../output/app_usage_daily_facebook.csv

 

Part III – Setting Up Python Tools to Start Exploring the Data

Now that we have the data in a csv, let’s do some analysis.

To interact with the data, we will be using python in this example, and more specifically jupyter notebooks in python. So, let us check if we have Jupyter Notebook installed in our python library.

The command below will show you the list of packages installed on python, see if jupyter notebook is in there.

pip list

If yes, ignore the following command. If not, let’s install it with the following command.

pip install jupyter

Now that the test is complete, let’s navigate to the notebooks folder we created in Part I and then launch jupyter notebook:

cd ../../notebooks/ jupyter notebook

With the execution of the above command, the current directory’s (/notebooks) contents will have opened up in your default browser (this should be an empty folder right now). Click on ‘New’ in the top right corner to launch a new python notebook, choose ‘root’ or ‘python2’ from the dropdown menu. A new tab will open, adjacent to the current tab, with a new jupyter notebook. Now we can start writing some code and exploring the data.

Note:

To start our exploration, we will be taking advantage of python’s libraries/packages (just as we did with jupyter notebook). So, our firs step will be to import the relevant packages in python.

If you do not have a package, simply navigate to the terminal window (or open a new terminal window) and run

pip install [enter package name]

For example, if you do not have pandas installed as a library, in the terminal window type:

pip install pandas

If the installation worked without errors, navigate back to the jupyter notebook in the browser and rerun the import command. This time it should work.

 

Part IV – Working with the Data

Now that you are in the jupyter notebook in your browser.

In the following chunks of code we will be loading the daily data from our exercise above into a python dataframe. Aggregating it into a weekly dataframe and finally making some simple plots with it.

Let’s start writing some python code to work with the data. Follow the commands shown below:

 


 

Python code begins here:

Import relevant libraries.

If a package is not installed, refer to above guidelines to install it

 

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import datetime
plt.matplotlib.style.use('ggplot')
from scipy.stats import zscore
%matplotlib inline

 


 

Reading and transforming 7PD data from our outputs above

Read the data in and see what it looks like. In this step, we also append the column names with a ‘7P’ prefix to show they are 7P data columns. While not so important here, often times merging with data sets from other sources with similar column names can cause confusion.

 

# Read in the data
data = pd.read_csv('../output/app_usage_daily_facebook.csv')

# Add 7P prefix to all 7Park data DataFrames to avoid confusion
data.columns = ['7P_' + s for s in data.columns]

# Let's look at what this data looks like
display('Data:',data.head())
display('Data Types:', data.dtypes)

 

Data:

7P_pkg 7P_name 7P_developer 7P_country_code 7P_date 7P_install_base 7P_active_users 7P_aupib 7P_sessions 7P_spau 7P_megabytes 7P_mbpau 7P_mbps 7P_minutes 7P_mpau 7P_mps
0 com.facebook.katana Facebook Facebook US 2014-03-30 78.2668 56.7322 72.4857 6.9621 12.2719 10.2482 18.0642 1.4720 19.8914 35.0618 2.8571
1 com.facebook.katana Facebook Facebook US 2014-03-31 78.1693 56.6543 72.4763 6.7816 11.9701 8.7023 15.3603 1.2832 18.0353 31.8340 2.6595
2 com.facebook.katana Facebook Facebook US 2014-04-01 78.0438 56.5402 72.4467 6.8353 12.0893 8.5920 15.1962 1.2570 18.2873 32.3439 2.6754
3 com.facebook.katana Facebook Facebook US 2014-04-02 78.1578 56.5367 72.3366 6.8058 12.0379 8.7734 15.5181 1.2891 18.1305 32.0685 2.6640
4 com.facebook.katana Facebook Facebook US 2014-04-03 78.0942 56.3716 72.1841 6.8930 12.2277 8.9171 15.8184 1.2936 18.3423 32.5381 2.6610

 

Data Types:

7P_pkg              object
7P_name             object
7P_developer        object
7P_country_code     object
7P_date             object
7P_install_base    float64
7P_active_users    float64
7P_aupib           float64
7P_sessions        float64
7P_spau            float64
7P_megabytes       float64
7P_mbpau           float64
7P_mbps            float64
7P_minutes         float64
7P_mpau            float64
7P_mps             float64
dtype: object

 

As you will see above, the datatype of the 7P_date column is ‘object’. To perform any kind of date manipulations, we need pandas (python library) to understand that it is a date column. So, in the next chunk of code, we change the data type of the date column in the dataframe above from object to datetime to perform date manipuluations. In addition, to ensure any missing days of data do not cause problems in the future, we create a continuous date series and append our dataframe to it. Lastly, we add an ISO week column to the dataframe to eventually aggregate the data to weekly timseries

 

# change date format from object to datetime format
data['7P_date_reformatted'] = pd.to_datetime(data['7P_date'])
# drop the 'object' type column as it is now redundant
data.drop('7P_date', axis=1, inplace=True)

# make a continuous dates dataframe to merge with data above to ensure missing dates do not impact future calculations
# get bounds for this date range from the data
min_date = data['7P_date_reformatted'].min()
max_date = data['7P_date_reformatted'].max()
# calculate the num of days between the min and max date
numdays = (max_date - min_date).days
# create a list of dates
date_list= [min_date + datetime.timedelta(days=x) for x in range(0, numdays)]
# append list to a dictionary object
date_dict = {'7P_date': date_list}
# convert dictionary object to dataframe by feeding it to Pandas DataFrame() function
df_dates = pd.DataFrame(date_dict)
# add ISO Week column to the dates dataframe
df_dates['7P_iso_week'] = df_dates['7P_date'].apply(lambda x: str(x.isocalendar()[0]) + '-' + str(x.isocalendar()[1]))

# merge the dates column with the data dataframe
data = df_dates.merge(data, how = 'left', left_on = '7P_date',right_on = '7P_date_reformatted')
# drop redundant 7P_date_reformatted column
data.drop('7P_date_reformatted', axis = 1, inplace = True)


# display the data to see
display(data.head(5))

 

7P_date 7P_iso_week 7P_pkg 7P_name 7P_developer 7P_country_code 7P_install_base 7P_active_users 7P_aupib 7P_sessions 7P_spau 7P_megabytes 7P_mbpau 7P_mbps 7P_minutes 7P_mpau 7P_mps
0 2014-03-30 2014-13 com.facebook.katana Facebook Facebook US 78.2668 56.7322 72.4857 6.9621 12.2719 10.2482 18.0642 1.4720 19.8914 35.0618 2.8571
1 2014-03-31 2014-14 com.facebook.katana Facebook Facebook US 78.1693 56.6543 72.4763 6.7816 11.9701 8.7023 15.3603 1.2832 18.0353 31.8340 2.6595
2 2014-04-01 2014-14 com.facebook.katana Facebook Facebook US 78.0438 56.5402 72.4467 6.8353 12.0893 8.5920 15.1962 1.2570 18.2873 32.3439 2.6754
3 2014-04-02 2014-14 com.facebook.katana Facebook Facebook US 78.1578 56.5367 72.3366 6.8058 12.0379 8.7734 15.5181 1.2891 18.1305 32.0685 2.6640
4 2014-04-03 2014-14 com.facebook.katana Facebook Facebook US 78.0942 56.3716 72.1841 6.8930 12.2277 8.9171 15.8184 1.2936 18.3423 32.5381 2.6610

 

Now that the dataframe is ready for some transformations, in this step, aggregate the series by week and transform the data into a weekly timeseries. Then, scale/normalize the weekly dataframe by calculating z-scores for each numeric column. Eventually, display a small snippet of the dataframe to see what it looks like.

 

# define grouping columns
grp_cols = ['7P_name','7P_country_code']
# grp_cols_qtr = grp_cols + ['7P_yr_qrt_fisc']
grp_cols_wk = grp_cols + ['7P_iso_week']

# aggregate daily data to weekly
data_wk = data.groupby(grp_cols_wk, as_index = False).agg({'7P_active_users' : np.sum, 
                                                           '7P_sessions': np.sum, 
                                                           '7P_minutes': np.sum,
                                                            '7P_spau': np.mean,
                                                          '7P_mpau':np.mean})

# normalize revenue and volume colums by calculating zscores on timeseries
zscore_cols_wk = ['7P_active_users','7P_sessions','7P_minutes','7P_spau','7P_mpau']
for col in zscore_cols_wk:
    data_wk[col + '_z'] = (data_wk[col] - data_wk[col].mean())/data_wk[col].std(ddof=0)

# sort values in the dataframe by iso week
data_wk.sort_values(by='7P_iso_week', inplace = True)
# set iso week as the index for the dataframe
data_wk.set_index('7P_iso_week', inplace = True)
# drop incomplete week
data_wk.drop('2014-13',inplace = True)

# Look at what this data look like:
display('Weekly Data:',data_wk.head(5))

 

Weekly Data:

7P_name 7P_country_code 7P_active_users 7P_minutes 7P_spau 7P_sessions 7P_mpau 7P_active_users_z 7P_sessions_z 7P_minutes_z 7P_spau_z 7P_mpau_z
7P_iso_week
2014-14 Facebook US 395.8695 128.9065 12.060829 47.7448 32.563257 0.438558 -0.378053 -2.066635 -2.507992 -1.989965
2014-15 Facebook US 395.8762 126.9933 11.957271 47.3354 32.077343 0.438752 -0.466155 -2.150973 -2.784557 -2.059513
2014-16 Facebook US 398.0691 127.9509 12.077529 48.0773 32.137057 0.502206 -0.306499 -2.108760 -2.463392 -2.050967
2014-17 Facebook US 398.6219 130.3051 12.065857 48.1066 32.675571 0.518201 -0.300194 -2.004982 -2.494562 -1.973889
2014-18 Facebook US 401.4398 134.3141 12.294571 49.3564 33.456686 0.599740 -0.031239 -1.828256 -1.883746 -1.862089

 


 

Plotting the weekly time series

 

# plot this data
plt.figure() 
ax = data_wk[['7P_active_users_z','7P_sessions_z','7P_minutes_z']].plot(figsize= (20,8),title = 'Weekly Sum of Metrics, Normalized')
ax.set_xlabel("ISO Year and Week")
ax.set_ylabel("Index")
ax.legend(labels=['Active Users','Sessions','Minutes'])

 

matplotlib.legend.Legend at 0x10b218c10
matplotlib.figure.Figure at 0x10b58a6d0

 

 

# plot this data
plt.figure() 
ax = data_wk[['7P_spau_z','7P_mpau_z']].plot(figsize= (20,8),title = 'Weekly Average of Per User Metrics, Normalized')
ax.set_xlabel("ISO Year and Week")
ax.set_ylabel("Index")
ax.legend(labels=['Sessions per Active User','Minutes per Active User'])

 

matplotlib.legend.Legend at 0x10bc44bd0
matplotlib.figure.Figure at 0x10bc2df50