|PNLA Quarterly home|
Digital Postcard Collections: Consistency and Retrieval
Lynne Noone can be reached at: firstname.lastname@example.org
There are many institutions with large digital collections of local historical significance for particular areas of location, culture and events. Different metadata schemes are used to describe these digital objects throughout the various institutions that maintain these collections. Dublin Core is a commonly used scheme to describe many of these collections. While some institutions choose to create their metadata locally, many universities and digital archives use CONTENTdm to help in the creation and display of metadata records to describe the works in their collections. Many of these institutions then share their records through a large repository called OAIster. While Dublin Core offers flexibility in the creation of metadata used to describe digital objects, how effectively this metadata achieves interoperability is an important issue in subject searching. The question of how interoperable subject and location search terms are in five different digital libraries using CONTENTdm is discussed in this paper.
Statement of the Problem
Digital library collections are created for the use of both the general public and academic institutions alike. Having the ability to create metadata records locally, allows the creator of the collection the flexibility to design schemes that reflect the local nature of specific objects in their collection. Working in CONTENTdm, allows libraries to create records for objects on a local level using CONTENTdm templates and customizing them to meet their local needs. CONTENTdm libraries can share their metadata through OAIster, a repository of metadata records, now searchable through WorldCat. When searching outside the library's local collection for these items, the choice of vocabulary of subject headings and the fields chosen to display location and format of these local collections may inhibit interoperability despite the fact that the intent of these libraries is just the opposite.
CONTENTdm is software that handles storage, management and delivery of digital collections. It is managed by OCLC (OCLC CONTENTdm overview, website). CONTENTdm is a way for libraries to quickly and easily create metadata and store digital collections either on their own servers with CONTENTdm software or on the server of OCLC. Many libraries, like the University of South Carolina, choose CONTENTdm for its ease of use and stable support. It also is an attractive solution to the building of a digital collection for those institutions that do not have the resources to create digital collections without this type of support (Swain, 2006, 58).
CONTENTdm allows for a variety of controlled vocabularies, as the software offers 10 integrated thesauri to choose from. The creator of metadata for a collection has the option to use his or her own designed vocabulary as well (OCLC CONTENTdm collection building and management, website). CONTENTdm is flexible, so that a variety of data standards such as XML, Dublin Core and METS that can be used in building a collection (OCLC CONTENTdm collection building and management, website).
Open Archives Initiative
The Open Archives Initiative (OAI) is an initiative that aims to "develop and promote interoperability standards that aim to facilitate the efficient dissemination of content" (Open Archives Initiative, website). With the OAI, different types of metadata can be harvested from different repositories making access to information easier for the user. The OAI, aims to promote interoperability standards that will aid in the effective retrieval of digital information (Open Archives Initiative, website).
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) came out of the OAI. The OAI-PMH helps to bring data together in one place to enable the searching of the data at the same time. It defines a mechanism for harvesting metadata from different repositories (Open Archives Initiative, website).
OAIster is a union catalog that contains millions of records harvested from open archive collections using the OAI-PMH. OAIster is currently managed by OCLC and is available for searching through WorldCat (OCLC. The OAIster database, website).
Users of CONTENTdm have the option to upload their records to WorldCat to lend themselves to discovery by users searching through WorldCat. (OCLC. CONTENTdm, overview. Website).
While digital library projects and organizations are plentiful, so too are the problems that exist surrounding them. With all the metadata that is being created, there is no one standard that everyone who creates metadata uses. While XML is a clear choice to use in creating metadata records to allow for flexibility in how the data is described, there is still the problem of interoperability. Interoperability, as described by the NISO, is "the ability of multiple systems with different hardware and software platforms data structures and interfaces to exchange data with minimal loss of content and functionality". (NISO, 2004, 2). Interoperability is amongst the most important issues in creating good metadata records (Chan, Zeng, 2006, 3).
One of the great things about metadata is the way that it can be manipulated depending on the type of collection it is describing. At the same time, the flexible quality of metadata can be a negative for the user when trying to do an integrated search in a repository of records created with different schemes. A user may assume that all relevant records from different collections will be retrieved with a particular search term, however, if two collections describe similar elements differently from each other (through choice of different fields for locations or vocabulary for subject headings for example), relevant records may be missed.
Throughout the process of the digitization of library cataloging records, MARC (Machine-Readable Cataloging) has been the standard metadata used. This standard format for cataloging records has been a consistent and successful method for library systems (Chan & Zeng, 2006, 4). For digital libraries, however, this approach to standards and uniformity does not work in describing different digital objects. Different schemes are created to service different needs and audiences, and a one-size fits all standard does not work (NISO, 2004, 13).
One way for digital libraries to achieve maximum interoperability would be for all collections to adopt the same controlled vocabulary for subject headings rather than amend the vocabulary to reflect their local collection (Nicholson, Shiri, 2003, 58). As stated above, however, objects from different digital collections have different characteristics that, in order to be described accurately must have the flexibility of describing them with vocabulary that is appropriate to the item. With so many different types of objects being described, a variety of different vocabularies have been developed to describe them: Library of Congress Subject Headings (LCSH) and the Art and Architecture Thesaurus and Medical Subject Headings to name a couple. This inconsistency in subject heading creation leads to the issue of how best to approach the issue of interoperability for subject access (Mitchell, 2006, 20).
Another consideration in the discussion of interoperability is the role of the user. While the library community has a certain comfort level with LCSH, the average user will search with a more general term or keyword (Mitchell, 2006, 21). If metadata records are being designed for ease of user searching, then the rigidity of LCSH is counterintuitive to the concept of a user-centered system. Lois Mai Chan and Theodora Hodges address this issue in 2000 and suggest that a simpler form of LCSH is necessary for the future. (Chan, Hodges, 225, 2000). There already exist a variety of vocabularies to describe different types of materials. For example, the Thesaurus of Graphic Materials (TGM) was created as "a tool for indexing visual materials by subject and genre/format" (Library of Congress. Thesaurus for Graphic Materials II. Website). Chan and Hodges, however, are looking at newer and even simpler options. Faceted Application of Subject Terminology (FAST) is one type of vocabulary that has been created by "adapting the LCSH with a simplified syntax.to retain the very rich vocabulary of LSH while making the schema easier to understand, control, apply and use" (OCLC. FAST. Website).
In traditional cataloging, librarians are trained to create records using MARC, LCSH and Authority Control. By taking great care in record creation, library OPACs have achieved interoperability for books and other media that has been entered into library OPACs and these records are successfully shared through federated catalogs like WorldCat. Digital collections, however, have different considerations than traditional library collections do. For one thing, digital collections describe a variety of objects that require vocabulary specific to the object in order to describe it. Also, librarians do not necessarily create metadata records for digital collections trained in LCSH and authority control, but it is often students and paraprofessionals. Vocabularies like FAST may help to create some consistency in subject heading creation. It will be interesting to observe whether vocabularies like FAST will become a standard in the creation of metadata records for digital resources as FAST is designed to be used by people without extensive training as it is easy to use, understand and maintain (Dean, 2004, 333).
Five digital libraries currently using CONTENTdm and sharing their collections through OAIster were identified to survey. CONTENTdm libraries displaying their collection in OAIster were selected, as their choice to display their records in OAIster implies the library's desire to share their records with a broad range of users. Also, many digital library collections choose CONTENTdm to help create their digital collections and having all the libraries using the same database software allows for some level of uniformity. From each library, three postcards in the collection were randomly selected for analysis. A comparative analysis was done using the following criteria:
Boolean Search in Digital Collection (Diagram D1)
Comparison of Digital Collection Retrieval and OAIster Retrieval in Boolean Searches (Diagram D2)
Although the sample of records for this study is small, there are a few things that stand out and lend themselves to further examination. While running the Boolean searches, it appears that creation of metadata at the local level has a direct impact on users searching in a repository like OAIster (through WorldCat). Using vocabulary to describe an object with terms specific to its local nature may become a barrier to retrieval for someone unfamiliar with these terms. Take for example the University of South Carolina's digital collection (Diagram R5). The terms used for subject headings do not fall under LCSH, TGM or any other recognizable vocabulary. One of the terms used in subject headings is "Latta SC--Pictorial work" (Appendix 5. University of South Carolina, Image 2), which, unless a user is familiar with what or where Latta SC is, is too obscure of a search term for the general public. However, there are other terms in the subject headings for this image that are useful for retrieval, so the creator of the subject terms was thoughtful in creating the heading. Keeping the more local subject heading does have its purpose in describing the image, but does not provide help in retrieval.
The other barrier to user discovery is the choice of field where the location information is placed. If we look again at the University of South Carolina, general location information (i.e. state) is in the subject area, and more specific location information is in two separate data fields of county and region (Appendix 5. University of South Carolina image 1, image 2, image 3). In running a Boolean search for the University of South Carolina, it is difficult to determine what to use for a location term. If a specific region is used, there are a high percentage of relevant results (Diagram D1). The same is true for searching in OIAster, however, if using location as a search term in OAIster, one would have to know the exact county or region information to find any records for these particular subjects. If the user searches with the location using a more general term like "South Carolina", the results would not have been so precise and most likely thousands of records would have been retrieved.
In looking at the University of Miami, (Diagram R2) there are also problems arising from the field where the location is defined. As with the University of South Carolina, the location is defined in three separate fields, in this case: state, county and city. It is also difficult to determine which definition of location to search with for the Boolean search. In the case of Image 1 (Appendix 2. University of Miami, Image 1), the Boolean search using terms from this image results in more precise results than using terms from images 2 and 3 (Appendix 2. University of Miami, Image 2 and 3). As with the University of South Carolina, it is necessary to be familiar with the regional locations of the postcards in order to achieve this high level of precision with searching locations. When the same search is run in OAIster, the number of retrieved records is similar. This may indicate that if location search terms are too specific, they do not allow for the retrieval additional relevant (but not to specific location) records.
In the comparison of Boolean searching using "postcard" and location information done in each digital collection and OAIster, the results vary widely within each sample group of postcards, and there is not enough data to reach any conclusions (Diagram D2). One of the reasons that the results vary so widely may be that the location terms are so specific to the collections, that it is hit or miss whether or not those terms would be used for description of works in another collection thereby resulting in greater retrieval. When location is identified using county and region as fields, these search terms can be missed as users tend to search using broader terms. When the location is more general, the percentage of results seems to increase, however, this analysis is not within the scope of this project.
Postcard Locations (Diagram D3)
How location is defined is important to the retrieval of records for a search (Diagram D3). If the location is defined too broadly, like the University of Washington's collection, for example United State--Washington-Seattle (Appendix 4. University of Washington, Image 2), it retrieves results from beyond Seattle. However, when the location is more specific, as in the University of South Carolina, which is region and county specific (Appendix 5. University of South Carolina, Image 1), unless you know exactly the name of these counties, you would find it difficult to retrieve any records from these collections if you searched with locations. In the University of Washington (Appendix 4. University of Washington, Image1, 2 and 3) it was difficult to even find a postcard in the sample group by doing a search with location. As noted in Diagram D3, there is no one particular field used to describe "location" in any of the libraries studied. Though the scope of this study was small, it is possible to infer that digital collections in general, have a variety of ways to display "location", and this disparity of location description does not aid with interoperability.
Postcard Fields (Diagram D4)
In looking at how "postcard" is defined, in four of the five collections looked at, the creator of the metadata chose to define "postcard" in the subject field. Format was not chosen by any of the creators to define postcard, most likely because the metadata is describing the representation of the object in digital form, not the object itself. Only one library in the study chose to define postcard in the "type" field (Diagram D4). It is noteable that there is a consensus in how the work is described as a representation of the work. No matter how "postcard" is defined (by type, in subject etc), records are retrieved when doing a general search with "postcards". How the creator of the metadata chooses to define "postcard" does not seem to effect retrieval of postcard records.
Refineable After Initial Search (Diagram D5)
Whether or not a set of retrieved records is refineable can be useful to a user. In looking at the collections, there was only one library that allowed for refinement after a search was completed, namely the University of Miami, leaving 80% of the libraries examined in this study with no way to refine a search (Diagram D5). This is frustrating on a certain level, since if a search does not retrieve desirable results, the user must go back to the search field and try again. If an option to refine a search presents itself to the user, the user is given ideas of how further to refine his search based on fields that the user might not have considered. This is a useful feature of a database to users, especially if users are a consideration in how the metadata is created and searched for. One need only look to many library OPACs and even the search interface on many online retail sites like Amazon.com and Barnes and Noble to see this as an effective feature to a database interface.
While postcard is not listed in the "format" field in any of the records examined (Diagram D4), format is one of the ways to refine a search in OAIster. However, since "format" of the object is not defined as "postcard" for any of the postcards looked at for this study, this option to refine a search in OAIster is irrelevant here. How postcards are defined in a metadata record most likely does not inhibit searching on OAIster, although it is hard to definitively conclude this (Diagram D2).
Controlled Vocabularies Used to Describe Subject Headings (Diagram D6)
In looking at the different vocabularies that are used, as expected, there are a variety of vocabularies. Most notably missing is LCSH. It is difficult to tell how these vocabularies affect interoperability, as localization of the subject headings make it difficult to make similar comparisons. If users are searching with keywords for subjects, however, and not controlled vocabulary, it is unclear whether or not the controlled vocabulary in the subject headings makes a difference. It seems that while controlled vocabulary is important for consistency of recall, the information that the local subject headings contain are important for discovery of information about the work being described. One thing that is clear in looking at the different vocabularies used for each collection is that digital collections have a tendency to rely on local subject headings to describe the collection. How subjects are defined in OAIster is where interoperability becomes an issue. It is difficult to define a measurement for this. More research would need to be done to make definitive conclusions about how choice of field for the term "postcard" effects search retrieval. How the different controlled vocabularies affect interoperability is not clear from the research performed for this paper.
Looking at the variety of subject headings and fields that location was defined in proved to be challenging in trying to find some way to analyze how these fields aided or hindered access and interoperability. Without consistent fields used for location and the same vocabulary used for subject headings, comparisons were difficult. Because of the flexibility of metadata, the challenge lied in finding common ground to analyze. The original idea of this project was to compare the benefits of using one metadata scheme over another in describing postcards in digital collections, however, it is difficult to find actual metadata code to look at. Many digital libraries use CONTENTdm to create, manage and store their records, so there was no way to view the source code. Also, because of the flexible nature of metadata, it is difficult to compare the creation of one collection with another.
It was also difficult to determine the effectiveness of subject headings without having an idea of how a user might conduct a search and what knowledge a user possesses at the onset of a search.
While there is a consistent use of the subject fields in the postcard metadata records studied, the vocabulary within these fields is not consistent. Also, there is an inconsistent use of fields used to describe locations of the postcards within each collection. Finally, in the collections studied, there is a tendency to describe the postcards as digital object representations rather than as actual postcards, however, in describing "postcard", there are a variety of fields where this term is listed. This is in line with Cole and Shreeve's findings that after creators of metadata determine whether they will describe the object in the collection as a representation of the work, or as the work itself, the term used to describe the work is not held by any standard (Cole and Shreeve, 2004, 175).
Perhaps the key is not to focus on if local subject headings are effective for record retrieval, but rather, the focus should be on determining if there are other fields that should have more standardization such as format, type and location in order to help in retrieval. Focusing on how particular formats are defined, and in what fields location information is created can be useful in helping users find relevant records. The information in the local subject headings is helpful for the user once a record is found and can help in locating more information and learning more about an item.
Location fields are also important for interoperability and making location information more consistent and not too narrow in definition can help with interoperability for the end user. While the flexibility of metadata creation allows for a rich content of local information to be available to the user, it must be findable in order to be useful. Digital objects that are customized locally are not necessarily optimized for retrieval.
This project initially set out to find if it is more useful for subject headings and location information to have general or specific descriptions. There is a thin line between having too broad of a topic so that users retrieve too many records and too narrow of one where a user may never find the record at all. If a subject heading is so deconstructed as to become so generic users may not be able to find records limited to a place (Qiang, J. 2008, 108). If the subject heading is too specific, however, unless a user knows exactly the local term to search, they many never find information. Although this paper does not set out to analyze user interfaces, refineable searches can be useful in helping a user better define their topic. Users are accustomed to interfaces from library catalogs and retail databases that offer suggestions for searching records with similar search terms they have used. Finally, the concept of creating controlled vocabularies based on already existing vocabularies (like FAST) is also worth exploring further.
As more and more digital collections are available for the general public, the impact of subject heading vocabulary design and creating consistent fields for location information and description of a format for the work being represented will continue to be explored as more people see the potential to accessing the digital collections that exist and metadata creators find more ways to successfully achieve interoperability.
Chan, L., & Hodges, T. (2000). Entering the millennium: a new century for LCSH. Cataloging & Classification Quarterly, 29 (1/2), 225-34. Retrieved November 25, 2009, from Library Lit & Inf Full Text database.
Chan, L., & Zeng, M. (2006). Metadata Interoperability and Standardization - A Study of Methodology, Part I: Achieving Interoperability at the Schema Level. D-Lib Magazine, 12 (6), p. 1. Retrieved November 16, 2009, from Library Lit & Inf Full Text database.
Cole, T.W. & Shreeves, S. L. Lessons learned from the Illinois OAI Metadata Harvesting Project. Retrieved on Wednesday, November 25, 2009 from http://books.google.com/books?id=2nzLQHn1WAUC&printsec=frontcover&source=gbs_v2_summary_r&cad=0#v=onepage&q=&f=false
Dean, R. (2004). FAST: Development of Simplified Headings for Metadata. Cataloging & Classification Quarterly, 39 (1/2), 331-52. Retrieved November 29, 2009, from Library Lit & Inf Full Text database.
Dragon, P. (2009). Name Authority Control in Local Digitization Projects and the Eastern North Carolina Postcard Collection. Library Resources & Technical Services, 53 (3), 185-96. Retrieved November 15, 2009, from Library Lit & Inf Full Text database.
Hillmann, D. (2008). Present at the Creation. Technicalities, 28 (3), 7-9. Retrieved November 15, 2009, from Library Lit & Inf Full Text database.
Library of Congress. Thesaurus for Graphic Materials II. Website. Retrieved Sunday on Sunday, November 22, 2009 from http://www.loc.gov/rr/print/tgm2/
Mitchell, N. (2006). Metadata basics: A literature survey and subject analysis. The Southeastern Librarian, 54 (3), 18-24. Retrieved November 24, 2009, from Library Lit & Full Text database.
Nicholson, D., & Shiri, A. (2003). Interoperability in subject searching and browsing. OCLC Systems & Services, 19 (2), 58-61. Retrieved November 24, 2009, from Library Lit & Inf Full Text database.
NISO (2004). Understanding metadata. National Information Standards Organization. NISO Press. Bethesda, MD. Retrieved November 16, 2009, from www.niso.org/standards/resources/UnderstandingMetadata.pdf
NISO Metadata Principle 1. Webpage. Retrieved Thursday November 19, 2009 from http://framework.niso.org/node/38
OCLC. CONTENTdm Overview. Website. Retrieved Sunday, November 21, 2009 from http://www.oclc.org/contentdm/overview/default.htm
OCLC. FAST. Website. Retrieved Sunday, November 21, 2009 from http://www.oclc.org/research/activities/fast/default.htm
OCLC. The OAIster Database. Website. Retrieved Sunday November 22, 2009 from http://www.oclc.org/oaister/
Open Archives Initiative. Website. Retrieved Sunday November 20, 2009 from http://www.openarchives.org/documents/FAQ.html#What%20is%20the%20mission%20of%20the%20Open%20Archives%20Initiative
Open Archives Initiative Metadata Harvesting Project. Website. Retrieved Monday November 16, 2009 from http://uilib-oai.sourceforge.net/
Qiang, J. (2008). Is FAST the Right Direction for a New System of Subject Cataloging and Metadata?. Cataloging & Classification Quarterly, 45 (3), 91-110. Retrieved November 27, 2009, from Library Lit & Inf Full Text database.
Swain, S. (2006). University of South Carolina, CONTENTdm, and the Ege Leaves. The Southeastern Librarian, 54 (1), 58-9. Retrieved November 22, 2009, from Library Lit & Inf Full Text database.
Library Postcard Collections Used for Project
Santa Clara University Digital Collections, Projects and Initiatives
University of Miami Libraries Digital Initiatives
University of Louisville Libraries Digital Collections
University Libraries University of Washington Digital Collections
University of South Carolina Digital Collections