Metadata Management: The Emerging Big Data Discipline
Kurt Cagle
In many organizations, there is a nuclear pile lurking, quietly building up heat, which if left unchecked, will ultimately react explosively. Content/asset-management systems, built with an eye toward tracking hundreds or even thousands of individual assets for reuse, are now straining as the users of these systems begin dealing with millions of assets, content fragments, and media files. Add into this the increasingly tenuous ability of archivists and digital librarians to keep up with the classification of these resources, and the often inadequate approaches toward building effective taxonomies, and finally trying to maintain these systems for users who are not technically savvy, and you have a recipe for disaster that will likely have repercussions throughout the organization.

Metadata management is an emerging discipline that takes a different tack toward solving this Big Data problem. It combines Big Data tools such as Hadoop MapReduce with NoSQL databases, highly indexed textual content, document enrichment, and semantic processing to build a framework for managing, combining and curating these content fragments. Such solutions have applications primarily in the publishing and media spaces, but as many larger organizations are now increasingly publishing-oriented (especially as a byproduct of marketing), this means that metadata management has applicability everywhere.

This class dives into what metadata is, what the discipline of metadata management looks like, and explores some examples with live tools illustrating how metadata-management systems can be built to tackle the needs of organizations today and tomorrow.

This class is intended to be a deep dive into the tools and techniques of metadata management, so we will be looking at code and architecture for building such systems. It would also be good for users to have attended "From Big Data to Smart Data" as many of the core concepts will be first presented there.

Level : Advanced