Dynamic Inputs in Data Scraping Studio

Are you a professional data extractor? Does your organization extract data from hundred of websites everyday? Are your working on large data mining project?

Then you must know that web scraping depends on HTML elements or particular pattern of a website and in today's world most of the websites has their own structure which may not be the same on all your targeted websites. That means, to extract data from hundred websites - you'd need to create 100 web scraping agent.

But that's not the case with Data Scraping Studio: Introducing dynamic inputs, a simple and scale-able way to set extract pattern and most of the other fields value dynamically from input file.

dynamic-inputs-in-website-scraping

And data processing engine will populate these dynamic fields from input file on run-time and extract the matching result.

Sample Input (JSON)

[
  {
    "COLUMN1": "http://www.ebay.com/product-1",
    "COLUMN2": "<div class='ebay-title'>([^<]+)</div>",
    "COLUMN3": "<div class='ebay-price'>([^<]+)</div>",
    "COLUMN4": "<img class='ebay image' src='([^']+)'",
    "COLUMN5": "<div class='ebay-discount'>([^<]+)</div>"
  },
  {
    "COLUMN1": "http://www.amazon.com/product-1",
    "COLUMN2": "<div class='amazon-title'>([^<]+)</div>",
    "COLUMN3": "<div class='amazon-price'>([^<]+)</div>",
    "COLUMN4": "<img class='amazon image' src='([^']+)'",
    "COLUMN5": "<div class='amazon-discount'>([^<]+)</div>"
  },
  {
    "COLUMN1": "http://www.flipkart.com/product-1",
    "COLUMN2": "<div class='flipkart-title'>([^<]+)</div>",
    "COLUMN3": "<div class='flipkart-price'>([^<]+)</div>",
    "COLUMN4": "<img class='flipkart image' src='([^']+)'",
    "COLUMN5": "<div class='flipkart-discount'>([^<]+)</div>"
  }
]

Using dynamic inputs - You, your orginzation can:

  1. Extract 100 or more websites with one web scraping agent by passing extract pattern dynamically.
  2. Change proxy(IP address) for web request when you want to change and for your choice of URLs.
  3. Set user-agent string for web request dynamically
  4. Set custom HTTP Referrer for web request dynamically
  5. and more...

Close me