Simple Yet Efficient Web Extraction with OXPath, Part II Has code image
Tim Furche
In the second part of this class, we will look at more complex wrappers, as well as the maintenance and management of large-scale extraction infrastructure. We’ll walk you through several examples on how to create wrappers, driven by real use cases from finance and competitive pricing. For these examples, we’ll use the OXPath Firefox IDE, which allows for the development of OXPath wrappers using familiar Firefox developer tools. We will discuss how to make wrappers robust and maintainable through a small set of wrapper design patterns. OXPath’s open-source engine is able to deal with many of the issues that make Web scraping a pain, from buffer management to auto-complete fields.

However, we will also show the limits of the engine and how to deal with them. We will conclude the presentation with best practices for deploying and scheduling the resulting wrappers, e.g., for repeated extraction to keep extracted data up to date.

Level : Advanced