Amazon Camping & Hiking Products Analysis

18 minute read

Published: September 30, 2025

Overview

This project aims to extract insights about products sold on Amazon using publicly available data found in Amazon product listings. Due to the immense amount of products available on Amazon, I narrowed down the project to focus on Camping & Hiking related products. The goal is to identify characteristics that differentiate best selling products from the rest, which could serve as useful information for Amazon sellers. The process involved scraping the intial datset from Amazon.com, cleaning and preprocessing the scraped data, and finally analyzing it for insights.

Data Scraping

I used the requests_html library in python to scrape the product data. The data was scraped using three different URL’s: one found by typing “Camping & Hiking” in to the Amazon search bar, another found by using the drop-down navigation options in Amazon get to Camping & Hiking products, and another which listed only the top one hundred bestselling Camping & Hiking products. I found it necessary to include the last link because regular searches only inlcude a small amount of bestselling products on their results’ page, and I wanted to have more bestselling products in the dataset for analysis. The URL found by navigating to Camping & Hiking products included many more pages of results than by searching directly in the search bar, so using two URL’s to search for regular products (by regular I mean not bestselling) helped to add more products to the dataset.

The data was scraped from the product listings on the search results page rather than individual product pages. The following data was included for all products when available:

Product Title
ASIN (a unique identification number for each Amazon product)
Price
Star Rating (out of 5 stars)
Number of Reviews
Best Seller indicator (for products not found on the best sellers page, this was marked by a “Best Seller” tag in the product image)

Products listed on the best sellers page included a little less information in their listings. The following are data that was collected for products on regular search results pages but not the best sellers list. Because this data was only available for a portion of the dataset, I focused my analysis on the previously listed variables and used the following data variables as supplementary information.

Number of the product bought in the past month
Free delivery availability (products listed as having free delivery)
Coupon availability (products that included a coupon)
Page Number (this information was technically available for products on the bestsellers page but wasn’t as meaningful to collect since the bestsellers page was an ordered list of products from 1 to 100)

Some of the data was extracted using patterns in the product text and the requests_html .search() functionality. For example, product star ratings were always written in the form “{rating} out of 5 stars”, so this information was easy to extract directly in python before putting the data in a dataframe. Other information like free delivery and coupon availability was extracted from the entire text output in the cleaning and preprocessing stage. Most of the data was extracted from each product listing’s text output except for the title and ASIN. The ASIN was not displayed in the product text, so I pulled into from another part of the html. The title was displayed in the product text but due to each title’s differing length and verbage, I found it easier to extract the complete title from another part of the html. “Try” and “except” was used to keep the scraper running when a product didn’t display certain information, and ‘NA’ was inserted for unavailable data.