DataScraping.co hosted scraping app is easy and powerful suite of website scraping and using the hosted scraping application you or your business can..
- Create and host scraping agents online
- Enter URL manually or with advanced list option to create a list of URLs to mange easily for batch crawling
- Start and schedule your scraping agent to extract data automatically on the go
- Can scrape data anonymously using our managed distributed servers with thousands of proxies in cloud
- Get email alerts when your extraction jobs completes or configure a webhook to post the data on your server when ready.
- Use the REST API to start agent, schedule agents, get the data, change urls and more..
- And many more...
Creating agent for hosted scraping app is same as for desktop app and you can use the chrome extension to setup your selectors for fields you want to scrape. Install the Chrome extension and go to the web page you are looking to scrape, then launch the extension and it will display a panel in right side as in this screenshot below.
Once the extension panel is up and visible, Click on the "New" button to add a field and give a name to your field as I did - and given "ProductName" to my first field. Then click on the "(asterisk)" button to enable the point-and-click feature to generate automatic CSS selectors when you click on the HTML element you wants to scrape. For example I want to scrape the name of products in this field - So I clicked on the product name and the extension generated the selector and highlighted the other matching products with same selector in the list.
Sometime you may see other matching items might be selected due to same CSS class or selector - So you may click on the yellow highlighted items to deselect them or may also write your selector manually by learning from here.
The extension will highlight the matching result and will also show you the result preview. Once you are satisfied with the result and the number of records looks per your expectation, click on the "Accept" button to save that field in your agent.
Now, follow the same process to add as many fields you want for text, attribute or html items to scrape anything from html pages. If you want to extract the link, image or any other attribute from the HTML tag, then you can use the
ATTR option from the "Extract" drop down which will display a new text box where you can enter the name of the attribute to extract instead simple TEXT or HTML.
For example -
- Image Scraping : In case of Images I want to extract the "src" value, so after generating my selector I selected the ATTR option and entered "src" in the corresponding text box to tell the extractor that I need the value of src in output instead the entire HTML for images scraping.
- Link Scraping : To scrape URL links - Write your selector and then select the ATTR option and entered href in the corresponding text box to tell the extractor that you need the value of href in output instead the entire HTML or text.
- The ATTR (attribute) option is powerful extractor feature and can be used to extract any attribute from a HTML tag.
Once you are done with all the fields setup, click on "Done" button and the below dialog box will appear. Now enter the API Id and Admin API Key in text boxes under "Send to Cloud Hosted App" and click on the "Save" button, the agent will be created in your online account. (If you don't have the API id and key, you can get one by logging in and then go to your account page in hosted app online)
The API Id and Key is stored in your chrome local storage when you enter it first time to remember in future. If you want to change any time later, just paste again, or the same will be used forever.
Once the agent is created, you can click on the link in success message which will take you to the agent page with for start, schedule and further configuration to manage and automate your data collection using the hosted scraping app online.
All scraping agents(*.scraping) file works on both desktop app and the hosted app. If you want you can upload/download the agent file from cloud app after login as well.