Data Science · Big data

What are the Best Data Scraping Tools?

Lisa Falcone Rock Star Marketer available for company poised for growth

October 17th, 2015

I am looking to scrape data from a few websites on a regular basis. Keep in mind that the budget is very modest.

What data scraping tools would you recommend (free or cheap), or do you have any contacts that you'd recommend for this purpose?

Thanks in advance for your feedback.

Lisa


Paul Garcia President at TABLE

July 1st, 2017

Lisa, it may be illegal to scrape data from other web sites if you use the data to do a number of things, especially to make money or to send email. Before you spend your "modest" budget scraping data, get some actual legal advice on whether what you plan to do will put you on the receiving end of a VERY expensive lawsuit for theft of information or for violating the CAN-SPAM act. Just because it's online does not mean it is public domain or free.

Peter K Chen

October 17th, 2015

import.io

Michael Brill Technology startup exec focused on AI-driven products

October 17th, 2015

As always, it depends on what you need to scrape, your skillset and budget. It's really a big world. I've tried and failed with products like import.io and Kimono and have written maybe 10-20 scrapers... it is highly dependent on your skillset and the nature of the sites you want to scrape. Some take 10 minutes others are basically impossible.

My quick recommendation is that you use Upwork et al to hire a contractor to write your scrapers. They are pretty easy to write if you have the skillset and you can get your basic site scraper for, say, $100.





Thomas III Small Business Branding & Expanding Consultant

July 4th, 2017

Yea Lisa, just go out and pick up some python real quick. NBD!


I'd encourage you to look at a tool called Grepsr. It is much like the once popular Kimono. It a is very easy to use browser extension and most of the time will be enable very simple visual selection of your data. The intro tier is free with pay tiers by volume. My guess is you'd likely be able to stay under paid tiers. You can find it in the Chrome store.

Armando Vieira Data Scientist, entrepreneur, speaker

October 18th, 2015

the R package rvest is very easy to use and do the work. In python there are plenty of them

Peter Johnston Businesses are composed of pixels, bytes & atoms. All 3 change constantly. I make that change +ve.

October 18th, 2015

There are two approaches here.

The first is batch - to do a scrape on a one-off or regular basis. This can be a chore, repeating the same task over and over.

The other is track - to dynamically link so that changes in the target site are reflected in the data you have access to.

Increasingly we are moving to this sort of dynamic linking. This again splits into two - those who would be happy for you to track them and those which would not.

For friendly dynamic linking, consider an API. Ask them to share data with you and often give them something in kind as the main payment - a commission, perhaps, or even just recognition of source.

One other thing to consider is doing your own modelling from the data. If you have either dynamic data or regular snapshots to create a timeline, you can start to see what the data is doing over time and predict what it might do in future. Eventually this can get good enough that you are almost in charge, being able to set the figure before they do and simply using their real-time data as confirmation.

As well as scraping tools, you may wish to look into dynamic linking tools and data modelling and prediction.


Amit Tiwari DME at OTS Solutions

July 4th, 2017

You will find here best data scraper tools in lowest price.

technocomsoft.com/web-data-scraper.html

Amol R

October 18th, 2015

I found diffbot quite helpful. You can run tests on their site and see if your target site works well with there API.

If you can employ a tech resource then scrapy (python based) is good option.

However none of these tools can offer brainless scraping. You must tweak around to get results you are looking for. 

Stefan Smiljkovic Founder at Vanila.io - Web Studio

October 18th, 2015

There are a lot of tools you can use, but you need to have know technical knowledge.

- https://github.com/lapwinglabs/x-ray
- https://github.com/segmentio/nightmare
- https://github.com/n1k0/casperjs
- https://github.com/ariya/phantomjs

You can also reach me at www.vanila.io to give me more info what you want to scrape, and I will advice you on it.

Jared Ruplinger Senior Technical Consultant at comScore, Inc.

October 18th, 2015

I am no longer using rup@nomadfx.com because of the high volume of spam that I get there. If you sent me an email, I probably didn't read it. If you know who I am and I know who you are, send future emails to my first name @ my last name dot org. Remember, it is .org not .com. If that is hard to figure out, think of it this way. If my name was Jimmy Fallon, you would send it to jimmy@fallon.org. But my name isn't Jimmy Fallon. That would be weird. So, figure it out and send me an email. ;-)