Web Scraping means to collect information from web sites. It allows business of all sizes to get the first-hand information of the market, such as competitor pricing, customer reviews, and best sellers. It has been gaining popularity with increasing online presence for business of all sizes, especially during the pandemic where lockdown takes place and customers reply even more on online shopping and research.
In this article, I would use Amazon.com as an example and what you could achieve by setting up a web scraping routine without any prior coding experience for your business conceptually. At the end, I would touch upon some ideas as fruits of thoughts.
As a disclaimer, I would like to let you know upfront that I am writing this article to also promote a course I created on Udemy, which I will provide more details toward the end of this article. However, my intent is genuine to share with you this exciting tool that business of all sizes can leverage on.
Imagine you are a business owner and your business is faced with fierce competition. It is clear to you that pricing is a main factor for your target customers. To maximize your revenue, you would likely to set your price just slightly below your main competitors. Simple, right? But, how?
The three main approaches are getting it from (1.) your retailer clients — business that sells products from you and your competitors, (2.) a market research company, (3.) the competitors’ online presence — either their direct channel or their retailer clients. The first 2 are often referred to as second-hand approach, while the last one is a first-hand approach.
Challenges for Second-Hand Approach
I have tried all three approaches myself when I had to closely retail prices in both Canada and USA for a multi-billion dollar manufacturer. Here are some drawbacks for the first two approaches, which are often referred to as second-hand approach.
(1.) Getting it from your retailer clients — there is no incentive for them to help you — you might have hard time explaining to them that you would like to maximize your profit. Most of our retailer clients took a step back from us after learning our intention. They feared that we might increase our wholesale price to them after we know more about competitors’ pricing. In contrast, there is no easy way to validate the information they provide either. It is to their best interest if they can convince you to decrease your wholesale price to them.
(2.) Getting it from a market research company — pricing reports from market research companies are often aggregated — meaning you only get to see the average by product category, region, or a period of time — outdated. In addition, it might be cost-prohibitive for small businesses. For example, we had to pay about USD$10,000 to $20,000 every year for a monthly price report covering just 4 retailers in the United States.
These challenges we were faced with, and are likely also applicable to your business. In the next section, I will share with you how I address these challenges.
Snapshot for Using Octoparse
In contrast, collecting retail prices by yourself and in-house address the challenges above:
(1.) Reliability — you and your team collect information from the source that is made public to everyone, so it is a true reflection of how much your retailer clients are charging their customers.
(2.) Cost effectiveness — most web scraping tools offer a free tier plan that allows you to do web scraping easily. You can decide whether to upgrade to their more premium, paid version after you see the benefit of it.
(3.) Timeliness — once you set up your first web scraper, you can choose how often you would like to run it, be it monthly, weekly, or even daily. You make that decision and this would not incur any additional costs.
(4.) Flexibility — it is quite often the information you want to collect depend on your understanding of the market landscape. Hence, that is a moving target. With the knowledge to set up a web scraper, you can make any adjustment as needed — again without any extra costs.
(5.) Full Data Access — you have the full access to which website you are scraping from, when you collect data, how detailed you would like to collect data. You have full access to each row of data being collected, instead of aggregated data over a period of time that was way in the past.
Tools for First-Hand Approach
Knowing the advantages of First-Hand approach, you might be surprised how easy it can be implemented without any prior coding experience. You might not even have to take my Udemy course to do this if you are comfortable with trying out new applications.
First, you would need a web scraping tool. I have tried most of them and would recommend these three to start with — (1.) Octoparse, my choice of tool and I have been using it since 2014, (2.) ParseHub, a great alternative if you cannot get Octoparse to work with the website you would like to scrape, (3.) WebScraper.io, a lightweight alternative and comes as a Google Chrome extension to save you some trouble explaining to your IT team and boss before you convince them the power of web scraping. These 3 tools come with great documentation and tutorials.
There are other great web scraping tools that allow you to do the same job without coding as well and I would encourage you to test all them out. Some tools might work better for particular sites than the others.
These are the main reasons I decided to use Octoparse: (1.) you can easily import and export your tasks and share with your team, (2.) for free-tier, Octoparse offers more than the others — up to 10,000 URL, 10,000 lines to export, (3.) a really good YouTube channel. In addition, their paid version comes with a lot of pre-configured web scraper for most popular commercial websites, including Amazon.com
Here is their pricing plan at the time of publishing this article. If your budget allows of it, I would encourage you to go ahead to make use of their templates. However, you can easily build a web scraper for Amazon with their free plan and my Udemy course.
Octoparse also provides a referral program for you to help promote it and get some credits to user their popular paid services.
Here is my Octoparse referral link you can use if you enjoy reading this article so far and would like to encourage me by signing up Octoparse.https://www.octoparse.com/signup?re=Av34zyxS
Snapshot for Using Octoparse
In nutshell, this is what you see in Octoparse. On the left, there is Workflow, which you can easily set up by clicking on the in-app browser on the right, and choose from one of the options in the Tips window.
Below is a snapshot when you run Octoparse task. You see the browser again on top for real-time data extraction.
Below is the data extraction window to show data that has been collected in the same time. Once data collection is done, you can export it to an Excel file that you are likely familiar with.
Getting Insights with Tableau Data Visualization
In this part, I would like to use some of the course projects as an example to illustrate what you can do with the data you collected. For those you not so familiar with Tableau, it is a popular tool to analyze a large amount of data by simply drag-and-drop and visualize it for insights you may not otherwise see.
Project 1. Scraping Amazon’s Best Sellers for All Departments
Project 2. Scraping Amazon Products Using Search Key Words
Project 3. Scraping Amazon Reviews by Country
Data Visualization Examples
Tableau visualization is meant to be interactive but the following are just a static screen shots. Please check out free previews of my Udemy course (see details at the end) to see them in action and the rationale behind. Thank you.
I hope you enjoy this article. Please leave me a comment if you have any question. Thank you : )
Below is the link to the Udemy course in case you’d like to check out preview videos and see it in action. The course also touches upon data processing and data visualization with Tableau as I feel this would benefit my students the most.