SUMMARY: The purpose of this project is to practice web scraping by extracting specific pieces of information from a website. The web scraping python code leverages the Scrapy framework.
INTRODUCTION: The Neural Information Processing Systems Conference (NeurIPS) hosts its collections of papers on the website, https://papers.nips.cc/. This web scraping script will automatically traverse through the listing and individual paper pages of the 2017 conference and collect all links to the PDF documents. The script will also download the PDF documents as part of the scraping process.
The source code and JSON output can be found here on GitHub.