PythonEnthusiast.com

AppNexus Reporting API (xandr) – PART 2

September 9, 2024 | by Pythonister Mister

img-1-part-2

PART 1


This is a continuation of the AppNexus reporting API article. I am trying to keep the article reasonable in length so the decision was made to break the article into two pieces. Again, when you copy the code, please keep the correct indentation, otherwise the code won’t run.

Now, after making sure the report is ready, we can download it.

    tmp_dir_raw = '/tmp/appnexus_report_raw.csv'
    crl = f"curl -i -b {cookie_path} -c {cookie_path} "
    crl += f"'https://api.appnexus.com/report-download?id={rid}' > {tmp_dir_raw}"
    res = subprocess.check_output(crl, shell=True)
    print("Report downloaded.")

First, we assign a temporary report path to the tmp_dir_raw variable. Traditionally all the temp files are stored in the /tmp/ folder. The report will be downloaded in a “raw” form with some unnecessary header information. We will strip it out later.

The next two lines are the curl command which means “download this report to this folder and make sure to read/write cookies”. The “>” sign in the command means write to file.

The next two lines execute the curl command using the subprocess module and print out the message that the report is downloaded.

    tmp_dir_final = '/tmp/appnexus_api_report.csv'
    c = 0
    wr = False
    dats = set()
    
    with open(tmp_dir_raw, 'r') as f:
        with open(tmp_dir_final, 'w') as f1:
            for line in f:
                if line.startswith('day,placement_name'):
                    wr = True
                    header = line.strip('\n').split(',')
                    f1.write(line)
                    c += 1
                    continue
                if wr == True:
                    f1.write(line)
                    c += 1
                    dats.add( line.split(',')[0])
    os.system(f'rm {cookie_path}')
    os.system(f'rm {tmp_dir_raw}')
    print("Report processed.")

In the first four lines, we are setting up variables. The first variable is holding the path to the processed report, we are going to write it in the /tmp/ directory as well. c=0 is the line counter. Counting the report lines is one of the ways to validate the report. If a report has zero lines then the report is invalid, duhh… wr=False is a flag that indicates the beginning of the report. As I mentioned earlier, the report comes with a metadata header we want to remove. The wr flag will indicate when the actual report starts.

dats = set() – is one more verification logic I wanted to share. From a math course, you may know that sets do not contain duplicates. Every object added to the set is unique, and we can use functions like max and min to find minimum and maximum values in the set. Those functions are perfect for finding minimum and maximum dates in the report. Hand tight we will see it later, after this code block.

Then, after setting up the variables we opened the temporary raw report in the reading mode and the final report in the writing mode, so we could read and write the data respectively. Then for line in f we are reading the raw report line by line. If the report line starts with the “day,placement_name”, it means that we skipped all the metadata header lines and ready to write the final version of the report to file. The wr flag sets to True. We assigned the variable ‘header’ the value of the first line. Which is the array of header values after we strip the new line character and split by commas. Then we write the first line (header) to the final report file. After that, we increment the line counter and skip the rest of the loop by invoking continue.

On the next loop iteration, we are writing the report straight to file as the wr flag is set to True. Then we increment the counter by one and add a date from the report to the dats set, by converting the line to an array and getting the first element of that array.

The os.system(f’rm {cookie_path}’) and os.system(f’rm {tmp_dir_raw}’) lines do some clean up for us. It delete cookies and raw report files from the temporary folder. Then the print message indicates that the report is processed. At this point, we downloaded and converted a raw report into a CSV file.

Lastly, we will run one more validation and send the file to AWS S3 bucket.

    try:
        mn = min(dats)
        mx = max(dats)
    except:
        raise Exception('could not extract dates from report')

It’s a try-catch block, meaning if an error something goes wrong inside the block, we will receive a custom error. Inside the block, we are trying to get minimum and maximum dates from the dats set. If we are not able to extract the dates, the report is faulty and needs attention.

    s3_key = f'temp/{today} - appnexus - {mn} to {mx}.csv'
    bk = "your_bucket"
    s3r = boto3.resource('s3')
    s3r.meta.client.upload_file(tmp_dir_final, bk, s3_key)
    os.remove(tmp_dir_final)

After we fetch min and max dates from the report, we could use the values in the report name. The s3_key variable holds the S3 bucket prefix. The prefix starts with the temp, because first, we put all the files in the temporary folder for further validation. The bk variable stores the name of your S3 bucket. Then next 2 lines create an S3 resource and upload the file. Then the final report is deleted from the local system.

In the end, you can return the s3_key, bucket, header and columns count and use those values in the next Airflow DAG for example. But this will be all for now.

If you have any questions, please feel free to leave a comment or message me pythonister@pythonenthusiast.com

Thank you

RELATED POSTS

View all

view all