Web Scraping of Quotes from Famous People using Python Take 3

SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping code was written in Python 3 and leveraged the Scrapy framework [https://scrapy.org/] maintained by Scrapinghub [https://scrapinghub.com/].

INTRODUCTION: A demo website, created by Scrapinghub, lists quotes from famous people. It has many endpoints showing the quotes in different ways, and each endpoint presents a different scraping challenge for practicing web scraping. For this Take3 iteration, the Python script attempts to scrape the quote information that is displayed via an infinite scrolling page.

Starting URLs: http://quotes.toscrape.com/scroll

import json
import scrapy

class ScrollSpider(scrapy.Spider):
    name = "scroll"
    api_url = 'http://quotes.toscrape.com/api/quotes?page={}'
    start_urls = [api_url.format(1)]

    def parse(self, response):
        data = json.loads(response.text)
        for quote in data['quotes']:
            yield {
                'author_name': quote['author']['name'],
                'text': quote['text'],
                'tags': quote['tags'],
                'author_url': quote['author']['goodreads_link'],

        # follow pagination link
        if data['has_next']:
            next_page = data['page'] + 1
            yield scrapy.Request(url=self.api_url.format(next_page), callback=self.parse)

The source code and JSON output can be found here on GitHub.