DevLog 02: Automating Wish Lists | Smarter Shopping with Price Monitoring
SUNDAY AUGUST 06 2017 - 11 MIN
Remember when Mark Zuckerberg unveiled his AI assistant during the Christmas break last December? Everyone (including me) was super excited and many people got specially interested in automation related stuff. That's when I wrote the first post of this series pledging to work on a small scale automation projects and writing about them along the way.
Well.. as expected, life happened and all the crazy ideas along with all the excitement got lost somewhere in the abyss of procrastinated projects. Only recently, I found time (and motivation) to start the project once again and made the first "actual" automation script.
Problem
Online shopping has made life a lot easier for us couch potatoes around the world. We lay around half-dead, scroll few pages on online stores and summon the products we want at our doorsteps. I, however, am among those cheapskates who wait for the sales seasons for the fulfilment of their wish-lists.
Seemingly, these stores know about the existence of our species too. Many times, a lot of our desired items doesn't observe any discounts even in the sales seasons. Even when this isn't the case, impatience goes hands-in-hand with laziness anyway.
"Life is too short to wait for sales seasons" - Cheapo McCheapAss
Frustrated with not being able to buy an item for months, I came up with this idea of a script that could sit silently at my system and monitor the prices for the items I need in the background and notify me whenever the prices go below the specified baseline.
With little googling, I found some useful tools for data scraping with Python
(didn't had any prior experience of Python programing before this) which made things very straight forward. The basic recipe consists of following steps:
- Define a structured list of items along with your desired baseline prices.
- Write a web scraper for the chosen online store to fetch the search results.
- Match the prices for scraped items against those in the wish-list.
- Display windows notifications and write the "ready to buy" items to an Excel file along with the product links.
- Schedule the script to run at windows start up.
Through the rest of the post I'll be demonstrating the procedure. As prerequisites, you'll need Python along with pip package manager installed on your system. I'd highly recommend using Anaconda as your default Python as it has almost all of the required modules pre-installed. To keep things simple, I'll limit the code to just a single online store, but hopefully, with little Python knowledge you can expand the script to work with as many store as you want.
After starting to learn just a week ago, I consider myself absolutely noob at Python right now. So if you see something stupid in the code, please feel free to correct me because I only managed to write a "working" system with the limited knowledge I have.
Step 01 : Defining The Wish-List
JSON is now a go-to format for structured data representation. We'll be using it to define our so called "wish-list". Every item in the list will consist of following 3 attributes:
-
description
: A common description for the product that you can find on most of the stores for the specific product. Since the search results on websites usually contains a lot of irrelevant stuff along with the required items. This description will help us identify the similar items from search results. -
name
: This will be our search term that we will send along with the search query. -
baseline_price
: This will be the price at which we want to be notified about this item.
I have allowed the K notation as a shorthand for baseline_price prices in the
JSON
, so we can optionally specify prices as 1K, 5K etc. instead of whole numbers
The wishlist.json
file will look something like this:
{
"items": [
{
"description": "Sapphire Radeon R9 270x",
"name": "Sapphire R9",
"baseline_price": "12K"
}, {
"description": "Gigabyte LGA 1151 Z170",
"name": "Gigabyte Z170",
"baseline_price": "13200"
}
]
}
Now, once the system is setup, all you have to do is manage this file to get notifications for desired items.
Step 02 : Writing The Web Scraper
Web Scrapers are programs used to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in a structured format.
Our simple scraper will send request with our search queries to the specified websites and then pick the target HTML
attributes from the returned results. You will have to examine the source of websites you want to scrap before writing the code for them. Most of them represent data in a structured way with consistent Ids
and classes
. Once the target tags have been picked, all you need to do is extracting the their values and parse them to relevant data structures.
To make things easier, Python has a brilliant and easy to use library for data scraping called Beautiful Soup. We will be using it to write our scraper.
class AmazonScraper: # Declare search URL and class names to picked
BASE_URL = 'https://www.amazon.com/s/ref=nb_sb_noss/' \
'134-4639554-0290304?url=search-alias%3Daps&' \
'field-keywords={}'
PRODUCT_TITLE_CLASS = "a-link-normal s-access-detail-page " \
"s-color-twister-title-link " \
"a-text-normal"
PRODUCT_BRAND_CLASS = "a-size-small a-color-secondary"
PRODUCT_PRICE_CLASS = "a-size-base a-color-base"
def search_item(self, product):
# Read the page contents
query = urllib.parse.quote(product.name)
url = self.BASE_URL.format(query)
# Get structured data using beautiful soup
data = bSoup(requests.get(url).text, "html.parser")
# Find all the item containers
containers = data.findAll(
"div", {"class", "a-fixed-left-grid-col a-col-right"}
)
# Get item information for each item in container
if len(containers) > 0:
for item in containers:
title_div = item.find(
"a", {"class", self.PRODUCT_TITLE_CLASS}
)
price_div = item.find(
"span", {"class", self.PRODUCT_PRICE_CLASS}
)
brand_div = item.findAll(
"span", {"class", self.PRODUCT_BRAND_CLASS}
)[1]
title = title_div.text
brand = brand_div.text
price = self.extract_price(price_div)
link = title_div["href"]
# Display Windows Notification and write the product
# details to a csv file when the Price picked form
# the search result is less than or equal to our
# baseline price for the item and the product's
# descriptions has a similarity rate of more than 45%
price = math.floor(float(price))
baseline_price = float(product.baseline_price)
if baseline_price >= price > 0:
prompt = "\"" + title.replace(",", "|") + \
"\" is now available in " + \
str(price) + " at Amazon (Baseline:" + \
" " + product.baseline_price + ")"
details = get_details(brand, price, title, link)
if is_similar(title, product.description):
print_similarity(title, product.description)
display_windows_notification(brand, prompt)
write_to_csv(details)
@staticmethod
def extract_price(price_div):
if price_div is None:
return 0
# Remove all the symobls received with the price text
price = re.sub(r'[–|-||]', "", price_div.text).strip()
price = price.replace("$", "")
# Convert USD to PKR
price = float(price) * 100
return math.floor(price)
Step 03 : Run Scrapper for each Product in Wish-List
Once the main component of our system i.e. scraper is in place. We need to set up a routine that could read from our wishlist.json
file and parse its contents into data models. These model object will then be passed to our scrapper for further processing.
In the procedure, we'll also be creating a available_wishlist_products.csv
file that will carry all the latest ready-to-purchase items.
class Product:
def **init**(self, product):
self.description = product["description"]
self.name = product["name"]
self.baseline_price = product["baseline_price"] # Allowed number representation in 'K' format # (e.g. 3K, 10K etc) in json. Here while parsing, # convert the prices with K notation to numeric value
has_k = str(self.baseline_price).lower().**contains**("k")
price = self.baseline_price[:-1]
price = int(price) \* 1000
self.baseline_price = price if has_k else self.baseline_price
# Read all the items from the wishlist.json
with open('wishlist.json') as data_file:
items = json.load(data_file)
# Create csv file to write available items to
filename = "available_wishlist_products.csv"
f = open(filename, "w")
headers = "Date, Title, Brand, Price, Link\n"
f.write(headers)
f.close()
# Parse JSON into Product objects and pass them to scraper for search
scraper = AmazonScraper()
for i in items["items"]:
p = Product(i)
scraper.search_item(p)
time.sleep(5)
Step 04 : Writing The Utility Functions
You might have noticed calls to many unknown functions in the script we have written so far (e.g. print_similarity
, display_windows_notification
, write_to_csv
). In the last part of the script, we'll be writing these utility functions.
Get Item Details
This utility function will prepare the data for writing to csv
file by reformatting the text and generating a coma separated string from them.
def get_details(brand, price, title, link):
details = title.replace(",", "|") + ", " + \
brand.replace(",", "|") + ", " + \
str(math.floor(float(price))) + ", " + \
link
return details
Write Details To CSV File
As the name suggests, this function will open our output file and write the contents of available items to it along with the date.
Notice that this time we open the file with
a
parameter (for appending) unlike in the main routine where we recreate the file every time with aw
flag.
def write_to_csv(details):
file = open("available_wishlist_products.csv", "a")
time = strftime("%a, %d %b %Y", gmtime())
details = time.replace(",", " ") + ", " + details + "\n"
file.write(details)
file.close()
print("\nProduct Details: ")
print(details)
Check Similarity Between The Product Descriptions
Since search results often bring a lot of irrelevant results, it is important that we check for the resemblance between the descriptions of our define product and the item received from search results. I have set a threshold of 45% for similarity in descriptions to make an item qualify as legitimate result.
Now as you may have guessed, checking for alikeness between two set of texts seems a very difficult task. Thankfully, Python comes to rescue once again with its difflib.
def is_similar(a, b):
similarity_percent = SequenceMatcher(None, a, b).ratio() \* 100
return similarity_percent > 45
This function is pretty much useless but helps debug when ran from console. You can omit it from your code if you don't want to print results to the console.
def print_similarity(a, b):
similarity_percent = SequenceMatcher(None, a, b).ratio() \* 100
print("Similarity % between " + b + " and " + a + ": " +
str(similarity_percent))
Displaying Windows Notifications
The last part is to display Windows toast notifications for new available items. If you were using Anaconda, this will be the only module that you'll have to install yourself:
pip install win10toast
We're simply creating an instance of ToastNotifier
and supplying it a title and message to display along with option icon path and a duration.
def display_windows_notification(title, msg):
print(msg)
notification = ToastNotifier()
notification.show_toast(
title, msg, icon_path=ICON_PATH, duration=30
)
That was all the code required, only thing left is the scheduling of this task. The final will look like this:
import time
import json
import math
import re
import urllib.parse
from time import strftime, gmtime
import requests
from bs4 import BeautifulSoup as bSoup
from win10toast import ToastNotifier
from difflib import SequenceMatcher
# Path to icon file to display with the notifications
ICON_PATH = "<Path to .ico File>"
class AmazonScraper: # Declare search URL and class names to picked
BASE_URL = 'https://www.amazon.com/s/ref=nb_sb_noss/' \
'134-4639554-0290304?url=search-alias%3Daps&' \
'field-keywords={}'
PRODUCT_TITLE_CLASS = "a-link-normal s-access-detail-page " \
"s-color-twister-title-link " \
"a-text-normal"
PRODUCT_BRAND_CLASS = "a-size-small a-color-secondary"
PRODUCT_PRICE_CLASS = "a-size-base a-color-base"
def search_item(self, product):
# Read the page contents
query = urllib.parse.quote(product.name)
url = self.BASE_URL.format(query)
# Get structured data using beautiful soup
data = bSoup(requests.get(url).text, "html.parser")
# Find all the item containers
containers = data.findAll(
"div", {"class", "a-fixed-left-grid-col a-col-right"}
)
# Get item information for each item in container
if len(containers) > 0:
for item in containers:
title_div = item.find(
"a", {"class", self.PRODUCT_TITLE_CLASS}
)
price_div = item.find(
"span", {"class", self.PRODUCT_PRICE_CLASS}
)
brand_div = item.findAll(
"span", {"class", self.PRODUCT_BRAND_CLASS}
)[1]
title = title_div.text
brand = brand_div.text
price = self.extract_price(price_div)
link = title_div["href"]
# Display Windows Notification and write the product
# details to a csv file when the Price picked form
# the search result is less than or equal to our
# baseline price for the item and the product's
# descriptions has a similarity rate of more than 45%
price = math.floor(float(price))
baseline_price = float(product.baseline_price)
if baseline_price >= price > 0:
prompt = "\"" + title.replace(",", "|") + \
"\" is now available in " + \
str(price) + " at Amazon (Baseline:" + \
" " + product.baseline_price + ")"
details = get_details(brand, price, title, link)
if is_similar(title, product.description):
print_similarity(title, product.description)
display_windows_notification(brand, prompt)
write_to_csv(details)
@staticmethod
def extract_price(price_div):
if price_div is None:
return 0
price = re.sub(r'[–|-||]', "", price_div.text).strip()
price = price.replace("$", "")
# Convert USD to PKR
price = float(price) * 100
return math.floor(price)
class Product:
def **init**(self, product):
self.description = product["description"]
self.name = product["name"]
self.baseline_price = product["baseline_price"] # Allowed number representation in 'K' format # (e.g. 3K, 10K etc) in json. Here while parsing, # convert the prices with K notation to numeric value
has_k = str(self.baseline_price).lower().**contains**("k")
price = self.baseline_price[:-1]
price = int(price) \* 1000
self.baseline_price = price if has_k else self.baseline_price
def get_details(brand, price, title, link):
details = title.replace(",", "|") + ", " + \
brand.replace(",", "|") + ", " + \
str(math.floor(float(price))) + ", " + \
link
return details
def write_to_csv(details):
file = open("available_wishlist_products.csv", "a")
time = strftime("%a, %d %b %Y", gmtime())
details = time.replace(",", " ") + ", " + details + "\n"
file.write(details)
file.close()
print("\nProduct Details: ")
print(details)
def is_similar(a, b):
similarity_percent = SequenceMatcher(None, a, b).ratio() \* 100
return similarity_percent > 45
def print_similarity(a, b):
similarity_percent = SequenceMatcher(None, a, b).ratio() \* 100
print("Similarity % between " + b + " and " + a + ": " +
str(similarity_percent))
def display_windows_notification(title, msg):
print(msg)
notification = ToastNotifier()
notification.show_toast(
title, msg, icon_path=ICON_PATH, duration=30
)
# Read all the items from the wishlist.json
with open('wishlist.json') as data_file:
items = json.load(data_file)
# Create csv file to write available items to
filename = "available_wishlist_products.csv"
f = open(filename, "w")
headers = "Date, Title, Brand, Price, Link\n"
f.write(headers)
f.close()
# Parse JSON into Product objects and pass them to scraper for search
scraper = AmazonScraper()
for i in items["items"]:
p = Product(i)
scraper.search_item(p) # IMPORTANT: Delay each request by 5 seconds # so Amazon wont suspect our bot
time.sleep(5)
Step 05 : Scheduling Auto Execution Of The Script
Our system is now up and running, you can run it using Python from command line now. However, what's the fun in it if you have to run it yourself every time?
In this final step, we'll set up our script to run at every windows start up.
- Open a text editor and write the following cmd commands.
cd /d "<Path To Your Script Folder>"
start pythonw "<Script Name>.py"
-
Save the file with
.bat
extension. Try running the file and see if the program gets executed. -
Create a shortcut of this file and copy it to
C:\Documents and Settings\[User Name]\Start Menu\Programs\Startup
if you are running older version of windows or otherwise toC:\Users\[User Name]\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
if your are running Windows 10.
That's all! Your script is now ready to execute at every windows start up. For a quick test, try signing out of your windows and sign in again.
Conclusion
In less than 150 lines of code, we created a very sophisticated price monitoring system that is easy to use and scale. Since starting to write this post, I've already expanded the script to work with 4 different local and international stores.
While this is small system brings a lot of value, there are still some limitations to it. For example, the scrapper for NewEgg worked for a day or two before the website identified the scraper and placed a "captcha" between the results. Even this is not a dead-end, with the power of Python's text and image recognition libraries, these small barriers can be overcome with just a few lines of code. However, this means consistent maintenance and upgrading in order to keep the system working.
Source Code for this project is available at GitHub repository.
For suggestions and queries, just contact me.