AdX (Google Ad Exchange) Reporting API – Part 1
October 14, 2024 | by Pythonister Mister
Hello stranger Pythonist and Welcome! In this series of articles, we will go over Google Ad Exchange (commonly known as AdX) reporting API using Python programming language. Let’s start with understanding what AdX is and what purpose it serves.
Adx is Google’s real-time auction digital marketplace where publishers sell their ad inventory and advertisers bid to show their ads. To visualize how it operates you can think of a stock exchange, where advertisers compete in real time to display their ads on various publisher websites. The platform allows publishers to sell ad space programmatically through real-time bidding, private auctions, or direct deals. It is specifically made for premium publishers who have significant amounts of high-quality ad inventory and want to maximize their programmatic ad revenue.
Now we have a basic understanding of what AdX is, we can start reviewing the actual code. As I mentioned in the beginning, it will be a series of articles. We will be going over multiple layers of code and custom libraries.
Again, as always, I am going to repeat and highlight it over and over and over in every single article: PLEASE make sure to set proper indentations when you copy-paste code snippets. Python loves indentations and the code won’t work if they are wrong. Okay, let’s begin.
from adx_api_functions import api_call
from analytics_library import get_secret, get_bucket
from datetime import datetime, timedelta
The first three lines import the required modules and functions
adx_api_functinos is a custom library where all the AdX API-related functions live. This library was written to keep the code clean and reuse the functionality without writing the same code multiple times. The api_call function is used for… (try guessing) making API calls. I will go over the adx_api_fuctions library in the next article (stay tuned).
analytics_library is another custom library with various helper functions. In this example, we are importing 2 functions – get_secret and get_bucket. Those functions are responsible for pulling login creds from a remote repository (AWS Secrtes Manger in my case) and S3 bucket name respectively. The analytics_library custom library will be thoroughly discussed in a separate article.
datetime
and timedelta
from the datetime
module, which is used for handling date and time operations.
def api_query(**context):
The api_query function is defined after all the required libraries have been imported. All the computations will be happening inside this function. The context dictionary is passed as a parameter. I used that dictionary to pull the values passed from Apache Airflow, a task management platform and it’s beyond the scope of this article.
#creds = eval(get_secret('adx-api'))
#key = creds['pw'][1:-1]
#email = creds['email']
key = 'MySup3rStr0ngPW'
email = 'my_email@gmail.com'
Then we are going to pull email and password values from a remote repository. The get_secret function returns a dictionary with email and password. We assign those values to the respective variables. For simplicity of the example, I commented out the password-pulling part and hardcoded the values.
creds = "{" + f'''
"private_key": "{key}",
"client_email": "{email}",
"token_uri": "https://oauth2.googleapis.com/token"
''' + "}"
fw = open('/tmp/adx_creds.json', 'w')
fw.write(creds)
fw.close()
Next, we create and temporarily save a JSON credentials file on a disk. The creds variable holdings a JSON formatted string with dynamically passed email and key values. The third value – token_uri is hardcoded.
Then we open/create an adx_creds.json file in a tmp folder, write the creds string, and close the file.
try:
today = context['ti'].xcom_pull(key="init_today")
except:
today = str(datetime.date.today())
namdat = datetime.strptime(today, '%Y-%m-%d').date()
Next, we retrieve and parse today’s date. First, we try to pull the date string from the context variable using the xcom_pull function. This is Apache Airflow function, and it’s beyond the scope of this article. If the data cannot be extracted from the context, we pull it from the datetime library. Usually, the date is always in the Airflow so the except is triggered only if I run the code locally in the editor. The Airflow returns the date in the string format, because of that in the last line we convert today to the namdat date variable.
end_date = namdat - timedelta(days=1)
start_date = namdat - timedelta(days=7)
print('namdat: ' + str(namdat))
print('start_date: ' + str(start_date))
print('end_date: ' + str(end_date))
Next, we define report start and end dates. The report end date is today minus one day (yesterday) and the start date is today minus seven days (last week on the same weekday). After that printout of all three dates.
dims = ['DATE',
'SITE_NAME',
'DEVICE_CATEGORY_NAME',
'AD_UNIT_NAME']
cols = [
'AD_EXCHANGE_LINE_ITEM_LEVEL_CLICKS',
'AD_EXCHANGE_LINE_ITEM_LEVEL_REVENUE',
'AD_EXCHANGE_LINE_ITEM_LEVEL_IMPRESSIONS']
Then we define the dimensions (dims) and metrics (cols) of the report we want to pull. Think of dimensions as groups and metrics as values that can be counted. You can find the list of all the available metrics and dimensions here or just by googling it.
s3_bucket = get_bucket()
s3_key = f'temp/{namdat} - adx_api_report- {start_date} to {end_date}.csv.gz'
s3_bucket variable holds the S3 bucket name pulled by the get_bucket function. I am going to discuss the get_bucket function in some other article.
s3_key holds the file path inside the S3 bucket. The report file will be downloaded in an archived comma-separated file format, .csv.gz.
print('Downloading report')
out = api_call(dims=dims, cols=cols, s3_key=s3_key,
start_date=start_date, end_date=end_date,
s3_bucket=s3_bucket, tz='AD_EXCHANGE')
The print statement notifies us that we are about to download the report. The api_call function is invoked with the following parameters passed: dimensions, metrics, start date, end date, S3 bucket, and the time zone type. The function will download and upload the report to an S3 bucket folder. In the next article, I will go over how the api_call function works.
print('Download complete')
if out == -1:
raise Exception('GAM API query failed or returned nothing')
return ({"cols": out['header'],
"n_rows": out['n_rows'],
"fn": s3_key,
"bk": s3_bucket})
If the report was downloaded successfully, then the api_call function returns the report header and number of rows in the report. Otherwise, it returns -1 and raises an exception.
Lastly, the api_query function returns the report header, number or rows in the report, S3 key, and S3 bucket. Those values will be used for later validation and processing by Apache Airflow jobs.
Thank you for stopping by and reading the first article in the AdX reporting API series. If you have any questions or concerns please feel free to email me at pythonisterATpythonenthusiastDOTcom.
RELATED POSTS
View all