eCommerce Link Building on Autopilot with Python

Table of Contents

Introduction

Links are still an important aspect of any SEO strategy. One of the best types of link for an eCommerce store is from a brand stockist page. These links are gold dust and really move the needle for SEO.

The best part is that a stockist link is usually just a quick phone call or email away, and with their high authority rating can benefit your site in other ways too:

  • They send referral traffic that converts
  • Usually flow Page Rank
  • Come from trusted & established domains with strong, organic link profiles

Example of a typical stockist page.

Today’s post will walk you through a python script to shortlist these types of links on autopilot.

Instructions

To get started you’ll need three things:

Quickstart

  1. Download the python script from GitHub and place it into a new folder.
  2. Setup an account with ZenSERP, and paste the api key into a plain text file called zenserp_key.txt
  3. Paste the list of brands you’d like to check for stockist links into a plain text file called brands.txt

If done correctly your folder should look like this:

Open the command prompt and change directory to where the above files are located by typing:

cd /path/to/directory

Finally run the code by executing the python script

python ecommerece_link_builder.py

Watch as the script does the hard / boring work for you!

Output

Once the script has finished, it’ll output a .csv file in the same folder ready to QA.

368 links checked for homebase.co.uk and compiled into a spreadsheet in 22 minutes on autopilot! That’s a LOT faster than manually Googling and a lot less boring too 😉

The script found some excellent links, such as this one: https://www.meaco.com/pages/meaco-resellers

To view the output in more detail you can review this Google Sheet

The only thing left to do is to review the opportunity and reach out to the site owner and ask to be included on the page. This is usually easy to do because of the existing relationship with the supplier. Often it’s just a case of picking up the phone or sending an email.

How to Acquire a List of Brands

One thing I’d like to touch on briefly is how to find the list of brands to use for the script. If this is for your own site, this is usually as straightforward as looking through a spreadsheet of suppliers. If you don’t have that luxury, then it’s time to get creative…

If you’re lucky, the site will have a brands page, such as this one: https://www.homebase.co.uk/brands.list

A list of brands in an easy to use, copy and pastable format. Perfect!

If you’re not so lucky, then it’s time to fire up Screaming Frog and do some custom extractions.

Scraping the Brand attribute from Product Schema

Scraping the brand attribute from the product schema is a surefire way to extract a comprehensive list of brands from a Website using a custom extraction.

The image below shows the brand: SHARK for this product on currys.co.uk.

This is easily extractable using custom extractions.

Configuration > Custom > Extraction

To read more about custom extractions, I strongly suggest reading this post on the Screaming Frog blog.

Code Walkthrough

The first thing to do is to install the requests and pandas library, if not done already.

pip install requests
pip install pandas

Next we import everything we need to run the script.

# import the libraries
import time
import json
import pandas as pd
import requests
import os
startTime = time.time()

This is where the variables are set to search Google with. I recommend updating them for your region and language. All settings are available at: https://app.zenserp.com/playground

# zenserp variables used to search google  // https://app.zenserp.com/playground 
search_engine = "google.co.uk"
device = "desktop"
location = "London,England,United Kingdom"
geolocation = "GB"
language = "en"

The next step is to get the current working directory using the OS module so the script knows where to find zenserp_key.txt and brands.txt

# get the current working directory and print
path = os.getcwd()
print(path)

Once the directory has been set, we can read in both files.

# read in the zenserp.com key to scrape the serps
with open(path +'/zenserp_key.txt', 'r') as file: # read in the Keywords Everywhere API Key
zenserp_key = file.read()

# read in the list of brands
with open(path +'/brands.txt', 'r') as file: # read in the Keywords Everywhere API Key
brands = file.read().splitlines()

Next we need to append the word ‘stockists’ to each brand name so we can search for the related stockist page in Google. I’m doing this using Pandas because it’s familiar. There’s probably 100 more pythonic ways to do this, this is just how I choose to do it.

# make a temp dataframe to append the word 'stockists'
df = pd.DataFrame(brands, columns=["brand"])
df['stockists'] = " Stockists"
df['brand'] = df['brand'] + df['stockists']
total = len(df['brand'])
search_terms = df['brand'].tolist() # dump search term queries to a list (to loop through with the Search Console API)

Now let’s make some empty lists to store the data ..

# make empty list and dataframe to store the extracted data
df_final = pd.DataFrame(None)
url_list = []
description_list = []
title_list = []
query_list = []
df_position = []

Now it’s time to use ZenSERP to search Google for our stockist links. Ths loops through our list of brands, appending the data returning from Google into lists, ready to be inserted into a dataframe once complete.

count = 0

for i in search_terms:
    count = count + 1
    print("Searching:", i.strip(), count, "of", total)
    headers = {"apikey": zenserp_key}
    params = (
        ("q", i),
        ("device", device),
        ("search_engine", search_engine),
        ("location", location),
        ("gl", geolocation),
        ("hl", language),
        ("apikey", zenserp_key),
    )

    response = requests.get('https://app.zenserp.com/api/v2/search', headers=headers, params=params);

    # Get JSON Data
    d = response.json()
    json_str = json.dumps(d)  # dumps the json object into an element
    resp = json.loads(json_str)  # load the json to a string
    organic = (resp['organic'])

    # get the length of the list to iterate over in the loop
    list_len = len(organic)
    pos_counter = 0
    counter = 0
    while counter != list_len:
        access = (organic[counter])
        pos_counter = pos_counter + 1
        df_position.append(pos_counter)
        try:
            my_url = (access['url'])
            url_list.append(my_url)
        except Exception:
            url_list.append("MISSING")
            pass

        try:
            my_description = (access['description'])
            description_list.append(my_description)
        except Exception:
            description_list.append("MISSING")
            pass

        try:
            my_title = (access['title'])
            title_list.append(my_title)
        except Exception:
            title_list.append("MISSING")
            pass

        query = (resp['query'])
        q_access = (query['q'])
        query_list.append(q_access)

        counter = counter +1

All of the lists are then added to new columns in the dataframe.

# add lists to dataframe columns
df_final['query'] = query_list
df_final['url'] = url_list
df_final['title'] = title_list
df_final['description'] = description_list
df_final['position'] = df_position

The data is then cleaned to remove any missing values and to just keep the first result from Google.

# clean the data!
df_final = df_final[df_final.position == 1] # keep position 1 result for each search only
df_final = df_final[~df_final["url"].isin(['MISSING'])]
df_final = df_final[~df_final["description"].isin(['MISSING'])]
df_final = df_final[~df_final["title"].isin(['MISSING'])]
df_final["temp_url"] = df_final.loc[:, ["url"]]

This removes any result which returned a homepage. The reason? A Homepage isn’t a stockists link page. This reduces the QA time significantly.

# Remove homepages. Ensures consistency by adding extra ///// which are removed so all domains are stripped back
df_final["temp_url"] = df_final["temp_url"] + "/////"
df_final["temp_url"] = df_final["temp_url"].str.replace("//////", "")
df_final["temp_url"] = df_final["temp_url"].str.replace("/////", "")
df_final['url_depth'] = df_final["temp_url"].str.count("/")
df_final = df_final[~df_final['url_depth'].isin(["2"])]  # depth 2 = homepage link

One final clean up before we export the data. Columns are reordered into a more logical state, with all duplicates values removed.

# clean and sort the columns
cols = "query", "url", "title", "description"
df_final = df_final.reindex(columns=cols)
df_final.drop_duplicates(subset=['url'], keep="first", inplace=True)

Lastly, we export the .csv file, ready for QA-ing. That’s most of the boring work taken care of, time to get some links!

# export the data
df_final.to_csv(path + '/brand_links_output.csv')
print(f'\nCompleted in {time.time() - startTime:.2f} Seconds')

Conclusion

Stockist links are one of the greatest tools in your arsenal when building links for an eCommerce site and being able to automate the process makes it that much sweeter. I hope this script saves you a lot of time / boredom the next time you’re looking to link build.

If you have any questions, suggestions or queries I’d love to hear them! Ping me on Twitter @LeeFootSEO

Scroll to Top