Scheduling a website scraping job

In our previous tutorials we've learned how to create a scraping agent using CSS selectors and REGEX to extract data from a website. In this tutorial we will cover the scheduling feature to automate the data collection process on a particular day, time or re-occurrence.

Data Scraping Studio scheduling is most interesting feature powered by powerful windows task scheduling service for automation. Using the scheduling feature you can configure a scraping agent to run anytime automatically you want. for e.g.

  • Extract stock price from a website every 5 minutes and write output to a CSV file on my desktop.
  • Extract price from competitor websites every hour and import output to SQL.
  • Extract new jobs from jobs portal on Monday, Wednesday and Friday at 10:00am and post data to server

scheduling website scraping

Single job scheduler

If you want to run a scraping agent automatically at your scheduled time, click the "Add" button and locate the scraping agent file on your hard disk – most scraping agents will be located under "My Documents/Data Scraping Studio/My Agents" directory on your C: drive. Select the agent;and it will launch automatically at your scheduled time — for example, if you always use to look at a certain data in morning when you are in office around 10am, you can create a scraping agent and schedule that to run 9:45am every weekday to automatically collect data and put in a file on your desktop to ease the the process.

add new website scraping job in scheduler

To provide the day, time, days etc. Go to "Schedule" tab and select the "Schedule Type" (Daily, Weekly or Monthly).

Daily Schedule : For example the below daily schedule will start at 08:00pm and then will run every 10 minute forever using repeat option.

daily scraper schedule

Weekly Schedule : The below selected schedule will run on every weekday at 9:00am starting from March,1st 2016 until April, 1st 2016

weekly scraper schedule

Batch Scheduling

When click on browse, you may select any number of scraping agents to run them parallel in a batch will fasten the speed and you'd not need to create multiple schedule for all your scraping agents.

batch scheduling scraper

The batch scheduling will execute all scraping agents in parallel.

The Data Scraping Studio will wake up, extract the data, save the output to local drive or post to server as configured in the agent and then will sleep back.

batch website scraper with scheduling

Command Line Arguments

Argument Description
--ExitOnComplete The Data Scraping Studio will auto close the application when scraping completes for all the agents in a particular instance. For e.g 4 agents scheduled in batch scheduling with --ExitOnComplete argument will close the exe when all 4 agents completes.
--HideOnTray The Data Scraping Studio GUI will be hidden in tray

Both of these arguments added(options checked) in scheduler.

Scheduling with batch script(*.bat)

A batch file(*.bat) is an text file that contains one or more commands and has a .bat extension. When you type the file name at the command prompt or double click on the file the cmd.exe runs the commands sequentially as they appear in the file

You can use the batch script to execute one or more scraping agent with double click, or to schedule with task scheduler to run automatically when you want.

One line format : path-of-exe {arguments} {scraping agent 1} {scraping agent 2}...

"C:\Users\Vikas Rathee\Desktop\DataScraping.exe" --HideOnTray "C:\Users\Vikas Rathee\Documents\Data Scraping Studio\My Agents\Demo\CSS Selector\Hidden_HTML_tags.scraping" "C:\Users\Vikas Rathee\Documents\Data Scraping Studio\My Agents\Demo\CSS Selector\Product_List.scraping"

Enter the details in a text file in above format and save as name.bat

batch script for website scraping

Close me