Large-Scale, High-Accuracy Entity Extraction Made Easy Has code image
Tim Furche
Big Data is a great opportunity to make smarter decisions. But it is also a great challenge, in particular where Big Data comes as huge collections of raw text, logs, tweets, etc. Entity and relation extraction are crucial components in turning such collections of unstructured text into more meaningful, “smart” data. There exists a plethora of commercial and open-source services or tools for extracting entities such as cities, company names, or prices from documents. Unfortunately, traditional services have suffered from a trifecta of challenges: low coverage, inconsistent accuracy, and complex, tool-specific APIs.

In this class, we will introduce a recent open-source API, ROSEAnn, which provides a simple, uniform interface for most of the existing extraction services and tools out there. We will walk through several scenarios for using ROSEAnn, from detecting mentions of a company to more complex cases combining the detection of several entity types. In addition to providing a uniform interface, ROSEAnn also allows you to easily “scale up” the accuracy and coverage of your entity extraction by a smart integration of an arbitrary number of extraction services. On entity types where the underlying services overlap, accuracy is improved (by reconciling the different results); where they don’t overlap, coverage is increased. At the end of this class, you will be able to deploy automatic entity and relation extraction easily, and make use of the integration features of ROSEAnn to achieve entity extraction with unparalleled coverage and accuracy.

Level : Advanced