PythonEnthusiast.com

Sharethrough Reporting API

September 25, 2024 | by Pythonister Mister

sharethrough_reporting_api

Hello Stranger and welcome! Today we will talk about how to connect to Sharethrought reporting API, get a performance report, and send it to an S3 storage. We are going to use Python to implement all this. But who is Sharethrough to begin with?

Sharethrough is an omnichannel ad exchange platform that leverages AI and technology to provide great ad experiences for publishers and advertisers. The company focuses on sustainable, high-performing ad formats across video, display, and native advertising. By using advanced technology like real-time bidding and AI-driven optimization, Sharethrough aims to improve user engagement, ad relevance, and overall monetization outcomes. It’s interesting to note, that the platform prioritizes environmentally conscious ad formats through carbon reduction initiatives. More details can be found at Sharethrough.

Now we know a little bit more about the company and let’s continue with the code. The code flow will be the following:

  1. Import all required libraries
  2. Define the main function to work in
  3. Set up an access token
  4. Set up dates
  5. Build an API request
  6. Send the request to get the data
  7. Process data
  8. Validate
  9. Save on disk temporarily
  10. Upload to S3

First, as always we we will start with the imports.

But before that, I want to warn you again to make sure to keep the indentation correctly while copy-pasing the code. Python likes indentations and won’t run the code if they are properly set.

import requests
import datetime
import boto3
import csv
import os
from json import loads
from custom_library import get_secret, get_bucket, randstr

Initial Imports: Necessary libraries for making API requests (requests), handling dates (datetime), working with AWS (boto3), handling CSV files (csv), calls to the os (os) and loading JSON (json.loads). It also imports helper functions from a custom library.

def api_query(**context):

Function definition: Defines the function api_query, which accepts an unspecified number of arguments (context), used for passing runtime information from Apache Airflow. In some other articles, I will go into detail about how to create Aifrlow Dags and build ETL pipelines. The rest of the code will be inside the api_query function.

print("Retrieving login details from AWS")
#creds = loads(get_secret('api-sharethrough', osvar=False))
#access_token = creds['access_token']
access_token = 'ACCESS_TOKEN_682'

The first thing we are going to do inside the newly defined function is pull an access token from a password repository. In codes, I use AWS Secrets Manager and a custom function that pulls values from it. In some other articles, I am going to talk about how to define the function and use AWS Secretes Manager. So for now I commented out the token fetching code and hard-coded the access_token variable.

    try:
        today = context['ti'].xcom_pull(key="init_today")
    except:
        today = str(datetime.date.today())

    the_now = datetime.datetime.strptime(today, '%Y-%m-%d').date()
    last_week = (the_now - datetime.timedelta(days=7)).strftime("%Y-%m-%d")

Next, we will define the today variable. The variable will hold today’s date in the ‘YYYY-MM-DD’ formatted string. First, we try to get the date from the context, and if the context does not exist then the date is generated by the datetime library.

After that, the today is converted to the the_now date object, and the last_week variable is defined by subtracting seven days from the now.

    query_object = {
        "startDate": last_week, 
        "endDate": today, 
        "groupBy": ["date", "placement_name", "domain", "creative_type", "device_type"],
        "fields": ["rendered_impressions", "pub_earnings", "clicks"], 
        "debug": True
    }

    print("Running stats API call")
    print(f"From: {last_week}")
    print(f"To: {today}")
    print(f"Request parameters: {str(query_object)}")

Then we define the query_object variable, a dictionary with five keys: startDate, endDate, groupBy, fields, and debug. The groupBy and field keys are the dimensions and metrics of the report. Consequently, a series of print statements output the date range and query parameters for debugging purposes.

    header_data = {
        'Content-Type': 'application/json', 
        'accept': 'application/json', 
        'Authorization': f'Bearer {access_token}', 
    }

    url = "https://publisher-api.sharethrough.com/v2/programmatic"
    
    print("Making API call")
    r = requests.post(url, headers=header_data, json=query_object)

     if r.status_code != 200:
       print(f"Unable to connect - status code is {r.status_code}")
       raise Exception(f"Unable to connect - status code is {r.status_code}")

    response = r.json()

In succession, the header_data dictionary is defined. It prepares headers for the API call, including content type, accepted response format, and the authorization token.

Following the URL endpoint definition. Then the API request is made using the requests library. It’s a POST request with custom-defined headers and payload. If the return status is not 200 then print the error message and raise the error.

Then convert the response into an actual Python-readable object.

    local_filename = "/tmp/sharethrough" + randstr(10) + ".csv"
    results = response['results']
    line_count = len(results)
    headers = list(results[0].keys())
    print(f'headers: {headers}')

After that, we define a local filename variable to store the CSV report file temporarily in the tmp folder. Then we pull results from the response dictionary, count report lines, and extract and print out the report header.

dats = {i['DATE'] for i in results}
try:
    start = min(dats)
    end = max(dats)
except:
    raise Exception('failed to extract dates from data..')

Next, we validate the report dates by creating a set of dates using set compression. Then extracting min and max dates from the dats set. If unable to extract dates, it raises an exception. We will use those dates later in the final report filename.

with open(local_filename, 'w') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=headers)
    writer.writeheader()
    writer.writerows(results)

Then we write the CSV report to a temporary directly. We are using the CSV library, you can read more about the library here.

    print("Sending file to S3")
    s3r = boto3.resource('s3')
    bucket = get_bucket()

    s3_key = f'temp/{today} - sharethrough - {start} to {end}.csv'
    print(f"Bucket: {bucket}")
    print(f"S3 key: {s3_key}")

Now we approach the end of the code and only one step left – upload the files to AWS S3 for further validation, storage, and database import.

We begin the last step by defining the AWS S3 resource by calling the resource method on the boto3 library. Then the custom function get_bucket returns the S3 bucket name based on the environment name. I will be going over this custom function in some other article.

The s3 key is defined. The report name includes 3 dates: the report date, the start and end dates. Finally, print messages display the bucket name and s3 key.

This will conclude the article. If you have any questions or want to add something, please feel free to reach out.

RELATED POSTS

View all

view all