Scraping Amazon Offer Data at Scale: How I Pulled 10,000+ ASINs with Python + Keepa API
I. Introduction
I was asked to download 500 ASINs for 36 categories using the Keepa API.
The base categories were electronics and arts & crafts data and then within the base category were more specific subcategories of products like cameras, video projectors, and fabrics.
The idea is that if you get a lot of products belonging to a specific subcategory you can begin to mine insights about optimal price and seller’s geographic locations.
More specifically, I needed to download seller information as well as the offer history for the past 90 days. 36 categories * 500 ASINs = 18,000 ASINs!
While the above image demonstrates a dashboard, I was generating a CSV files. First, I had to make sure Keepa had deals for each category.
Then I had to run a script to make sure that offers were available, otherwise ignore them, and run a secondary script to get the seller information.
I was given a very clean requirements document with 36 categories and the requirement that I scrape 500 ASINs from each category.
What challenges did this present?
Time!! Downloading this much data takes time — not all results meet the criteria.
2. Why Offer History Matters
An offer is a specific instance of a seller listing a product for sale, including the price, condition, fulfillment method (FBA or FBM), shipping terms, and availability.
Multiple offers can exist simultaneously for the same product.
5 Things Offer History Tells You:
- Market Dynamics
- Competition
- Optimal Source and Pricing
- Validate Sales Assumptions
- Protect Against Risks
I had 90 columns, one for each date from 1/1/2025 to 3/31/2025 where each offer was listed.
This gives insight into how frequently offers were being made — and how that offer compares to the Amazon New Price.
Also, since the offer is by sellerId, one ASIN can have multiple rows of data with each row representing the date and offer price by each seller.
3. Understanding the Keepa API
I have written a few articles on this topic before such as this article where I talk about how I failed to commercialize a tool with some of this code
Essentially, Keepa neatly stores all of its data in several endpoints which can be easily parsed by a skilled programmer.
There was a major performance bottleneck at the basic API plan level.
Each deals request returns 200 ASINs (only a fraction of these meet the requirements)
4. Designing the Data Pipeline: Code and Logic
There are 4 programmatic steps that have to be taken here. In order they are:
- Call the Categories endpoint to find the category ID of the required cateogory
- Call Deals Endpoint -> figure out if the ASIN returned has offers
- Call the Product Endpoint -> determine the metadata about the ASIN
- Call the Seller Endpoint -> get seller address info
The toughest bit of code to write was the code that parsed the offer CSV given by the Keepa API to get each unique offer and then convert that price into USD and the date in to mm/dd/yy format.
I will admit I had ChatGPT help me with lots of bits of this code.
def parse_offer_price_history(offer, all_dates):
base_date = datetime(2011, 1, 1)
history = {}
offer_csv = offer.get("offerCSV")
if not offer_csv or not isinstance(offer_csv, list):
return None
for i in range(0, len(offer_csv) - 1, 2):
timestamp_index = offer_csv[i]
price_cents = offer_csv[i + 1]
date = base_date + timedelta(minutes=timestamp_index)
if date >= datetime(2025, 1, 1) and price_cents > 0:
date_str = date.strftime("%Y-%m-%d")
history[date_str] = price_cents / 100
# fill missing dates
return {date: history.get(date, None) for date in all_dates}
The base code for this project is over 250 lines of code long. Pagination must be handled because on average to get 500+ ASINs from the deal endpoint, you need to go through several pages of the deals endpoint response to get enough results. Errors must be caught and results are saved in a CSV format.
I had to produce many intermediate CSVs to save my results before continuing the processing. Finally, I needed to call the seller endpoint and parse that response:
url = "https://api.keepa.com/seller"
params = {"key": api_key, "domain": 1, "seller": seller_id}
for _ in range(retries):
resp = requests.get(url, params=params, timeout=10)
Keepa did have good information on the sellers which was extracted like this:
data = resp.json()
seller_info = data.get("sellers", {}).get(seller_id, {})
name = seller_info.get("sellerName", "")
5. Result and Lessons
This project was a massive success but it took a lot of time and effort to write the code and make sure everything worked correctly.
In total, I pulled offer history data for over 10,000+ ASINs using the Keepa API. The entire process took over 10+ hours, with pauses built in to respect Keepa’s token rate limits.
A few key lessons:
Token planning is everything: Offer history calls are expensive (in tokens), so batching and prioritizing ASINs is critical if you’re on a budget.
Some ASINs are dead: I found a surprising number of ASINs that returned no offer data or were inactive — worth filtering ahead of time.
Retry logic saved me: I built in exponential backoff for timeouts and rate limit errors, which prevented the script from crashing overnight.
6. Call to Action
Next, I am going to create a landing page to offer this as a service to FBA sellers. Follow my Medium page for more project and data science updates.