Links about Web Scraping

Because everyone is so excited about web scrapting, we've put together a collection of links to get you started.

Conceptual Introduction to Scraping

I Don’t Need No Stinking API: Web Scraping For Fun and Profit - A blog post by Hartley Brody. This is a high level discussion, no code, more of a conceptual introduction. This guy has also written a book, I haven't read it so I can't speak to the quality.

Web Scraping with Python

Here is a collection about web scraping using the python programming language. There are several other programming languages popular with web scraping folks, namely, Ruby and Nodejs. For our purposes I'm going to focus on web scrapting tutorals using Python.

Here is a really basic and thorough introduction to web scraping on the Python for Beginners, which has a bunch of tutorials for doing various tasks with Python.

Greg Reda has written a very popular introduction to web scraping Web Scraping 101 with Python03 Mar 2013:

He also wrote a follow up post More web scraping with Python (and a map):

I am also including a link to some meta discussion about the post on Hacker News, a website for sharing and discussing programming links. HN Discussion about:

Arun Ravindran wrote a blog post in response to Greg Reda's post. He thinks there is an even easier way to introduce python web scraping:

I'm also including a link to a Reddit thread What do you recommend to read to start web scraping?:

Here is another post Easy Web Scraping with Python by Miguel Grinberg

Advanced web scraping

This is a fairly terse introduction to python web scraping with the lxml and Requests libraries:

A comprehnsive overview of web scraping with code (using lxml and Requests) and conceptual stuff too:

Using Lxml instead of beautiful soup for parsing HTML:

Web Scraping as a Services

Import.io is a new (for-pay) service (here is some web discussion about it)

Another service called Tubes.io (and some web discussion):

Scrapinghub is a new service for running recurring web scrapes. It is run by the folks who created the scrapy python library for writing web spiders: