Words of Wisdom about Data Structure

Borrowed, with her permission, from Janice Eklund

 

Words of Wisdom to be found online

Some of Janice Eklund's -UC Berkeley - recent posts on the VRA list serv ,very succinctly and clearly described the impacts of chosing a metadata structure. I am happy that she has given me permission to published them here also.

From Janice Eklund:

For images of cultural objects, there are many arguments for using a relational model (tables linked to other tables via intersection tables) for your cataloging tool and a flat model (rows and columns like a spreadsheet) for your presentation tool, but they all boil down to basically two concepts: complexity and consistency.  Data about cultural objects is often complex and this complexity cannot be captured efficiently in a flat data model because basically you have to leave space in every record to accommodate the most complex object you will ever encounter.  This adds up to a lot of wasted space, and wasted space means more money and hardware needed for storage, backup, preservation, etc.  It's much more efficient to catalog in a relational environment, where data can be entered once and then linked to many other records. 

Data consistency is the other compelling reason to catalog in a relational environment.  Once there is more than one person doing data entry, the potential for data inconsistency increases exponentially.  Differences in opinion, spelling, and transliteration in the source material make it hard enough for one person to keep things consistent.  In a relational model based on the Core categories, one work or collection record is established with appropriate links to one or more titles, dates, artist names, etc. and then individual image records, each representing a "view" of the work, may be linked to that one work record.  This way, all the descriptive data about the work is entered once, and every image that shows this work inherits the same information.  This data consistency insures that when you go to search for things, you get consistent results returned.

Management and service of the digital assets (the image files), on the other hand, is handled quite well by a flat model because the descriptive fields that apply to each image (file size, pixel dimensions, photographer who captured it, date it was captured, etc.) are fairly straightforward and not generally subject to scholarly debate.  A lot of image metadata can even be harvested automatically by the capture device and the digital asset management tool.  If the relational work data can be concatenated or "flattened" and then imported into a select number of descriptive fields in a digital asset management tool, then you have the best of both worlds: consistent, complex, descriptive metadata about works linked to multiple, individual, views of those works in an efficient discovery and access tool.

Regarding the flavor of data structure that you might choose:

the power of metadata crosswalks.

If you look at the metadata crosswalk at http://getty.edu/research/conducting_research/standards/intrometadata/metadata_element_sets.html you’ll see that many different metadata standards map to Dublin Core.  But the reason all these other metadata standards were developed was because Dublin Core did not meet their needs.   The DC metadata element was developed to provide basic information elements to improve indexing and retrieval of resources on the Web.   But different communities and different resources found that these basic elements were often not enough to provide the kind of access they needed for their collection materials. So MARCXML, and MODS were developed for library materials, CDWA was developed for museum objects, EAD was developed for archival materials and the VRA Core was developed for images of all of these materials. As a very basic set, DC represents the lowest common denominator between all these more specialized standards.  If you look at the crosswalk closely, you’ll see that the mapping between a complex schema like CDWA or VRA Core to the DC elements isn’t always satisfactory, but it does allow searching across collections if the more complex data is mapped to the DC element set. (There’s a good discussion of metadata mapping at http://getty.edu/research/conducting_research/standards/intrometadata/path.html).
 So the answer to the question “why not just use Dublin Core” is “because we don’t have to.”  There are better alternatives that meet the more specific needs of our collection materials that will talk to Dublin Core without sacrificing the richness and complexity of data that describes cultural works and the images of those works.