|PNLA Quarterly home|
Folksonomies for Digital Resources
Jill Bauhs Jensen
Jill Bauhs Jensen works with newspapers across North America at Shoom Inc, providing electronic tearsheets and invoicing. She has been integral in developing B2B information services throughout her career, and has a passion for well-designed user interfaces. Jill is an MLIS student at San Jose State, and can be reached at: email@example.com
With the advent of the second wave of Internet functionality (Web 2.0) came the ability for users to apply "tags" or descriptive words to digital resources, categorizing them and allowing for easier retrieval by both the tagger and other users. Considering the vastness of the Internet, does user tagging hold the key to cataloging the Web? "In order to manage the Web's massive amount of data, the process of social networking, through tagging, becomes an appealing option, especially since this tagged information is practically labeled by the user and can be shared" (Snipes, 2007).
"The collection of user-assigned tags is referred to commonly as a folksonomy" (Spiteri, 2007). The word "folksonomy" is rooted in the word taxonomy, but rather than describing a hierarchical organizational system, it denotes a flat system "of the people", where all terms are equal. Creator Thomas Vander Wal elaborates:
Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (shared and open to others). The act of tagging is done by the person consuming the information.
The value in this external tagging is derived from people using their own vocabulary and adding explicit meaning, which may come from inferred understanding of the information/object as well as. The people are not so much categorizing as providing a means to connect items and to provide their meaning in their own understanding.
Tags are often aggregated to create searchable metadata, creating broad, general categories. Suster (2006) defines tagging as "a democratic and distributed classification method." User-generated metadata stands in contrast to traditional cataloging, which places content into pre-defined categories and sub-categories (Shirky).
Occasionally, sites that permit social tagging include tag clouds, or visual representation of the most used tags. The larger the font in the tag cloud, the more prominent that description is within the metadata (Snuderl, 2008; Steele, 2009).
Tags generally fall into one of seven types: descriptive (subject of the resource), type ("video", "blog", "image"), ownership, opinion of the resource, self-reference ("myarticle"), organizational (e.g., course number) and lastly playful ("squaredcircle") (Smith, 2008)
Cataloging Digital Assets
Today, our world abounds with vast digital assets in the form of written words, images, sound and video, and much of this information remains uncatalogued by information professionals. In addition, conventional library organizational structures are stilted, and not conducive to natural language searches users are accustomed to, thanks to search engines like Yahoo and Google ( Buckland). Can user-generated metadata bridge the divide between traditional classification and user expectations that search language be both flexible and current? If so, can the quality of tagging be enhanced to aid in both precision and recall when searching?
Undoubtedly because the field is new, the literature on folksonomies focuses primarily on the theoretical advantages and disadvantages of user-generated metadata. In addition, several groups have undertaken examinations of tagging on commercial sites, and integration of folksonomies with traditional cataloging in library settings. Finally, several writers debate whether user tags are worth investing in at all.
Advantages and Disadvantages of User-Generated Metadata
Much of the literature on user-generated metadata to date focuses on the benefits and pitfalls of folksonomies. Numerous advantages to using folksonomies are detailed. High on the list is inclusiveness:
"Social tagging's great strength lies in its openness; the terms are devised and implemented by actual users, so by definition, they reflect the language of those users (McElfresh, 2008).
Spiteri (2007) agrees. "They reflect the vocabulary of the users, regardless of viewpoint, background, bias, and so forth". Because they are in the vernacular, and from the point of view of the lay person, user tags can be more comprehensive than cataloging created by subject experts:
Kroski (2007) adds that "because folksonomies include alternative views together with popular ones, they present a unique opportunity to discover . interests of the minority that lie at the 'tail' end of a power law, or statistical distribution." West (2007) believes that this leads to " more access points and richer metadata, which results in the materials being more findable."
Hand-in-hand with inclusiveness is the ability of folksonomies to be updated quickly. McElfresh (2008) deems them "nimbler and more flexible than controlled vocabularies" and later adds:
Kroski sees similar benefits. Folksonomies are current and flexible, and there is no need to predict categories in advance. They offer the potential to discover "unknown and unexpected resources," leading to exploration. Object can be tagged in multiple categories, offering multifaceted richness. Tags follow "desire lines." "It's not about the right or the wrong way to categorize something and it's not about accuracy or authority, it's about remembering".
In addition, folksonomies are simple (McElfresh), low-cost (Kroski, Furner), easily used (Kroski), empowering (Furner), share the workload (McElfresh) and create a spirit of sharing within the community (Kroski, West).
One final benefit is overlooked by most of the literature: tags also link directly to the tagged resource. "Tagging is an online, hyperlinked activity. When an item such as a bookmark, a picture, or even a person's profile has a tag added to it, the tag becomes a clickable link to more items associated with that tag at either a personal or a system-wide level" (West). While this may seem simple, even obvious, this is the equivalent to the card catalog automatically retrieving the book from the shelf for the user, a huge benefit.
Folksonomies are not without their disadvantages. One of the most interesting criticisms is that tags, by their very democratic nature, can be contradictory.
If I tag an article with the subject "white horse" and you tag it "black horse", that is all right since both can coexist in a folksonomy classification scheme. The problem with relativism is the question: "relative to what?" Each Internet user is bringing to bear on the item a different linguistic and cultural background. Although this is an inherent strength of folksonomies (since it recognizes many valuable individual perspectives), it can also lead to the existence of contraries. (Peterson, 2006)
Spiteri (2007), meanwhile, sees problems with synonymy. A word like "port" has multiple meanings, ranging from a type of wine to definitions relating to ships. The same concept may have different spellings (West). Expanding on this idea, Steele, 2009) writes:
Conversely, "polysemy is another problem with tagging related to the tagger's vocabulary selection. In this case, however, the user may select a word that has more than one similar meaning" (Steele, 2009). Spiteri uses the example of Apple Macintosh computers. Users may select tags like "mac," "Macintosh" and "apple" to describe the same device.
Both Spiteri and Steele point out that plurality is another potential pitfall. "Tagging would rely on the user to search both the singular and plural forms, since the original tagger would be likely to enter the tag in only one of the forms" (Steele, 2009).
Kroski (2007) adds that folksonomies lack hierarchy, and taxonomies "provide a deeper, more robust classification of entities. Such systems allow users a finer granularity in searching for resources". Complete recall is also affected. "Because of the lack of synonym control, a folksonomy search will not effect a complete results list because of the use of similar tags" (Kroski). "With tagging, it is up to the user to tag for both the broader and narrower terms if the resources will be retrieved" (Steele, 2009).
Malicious activity is another concern (Snipes, 2007). "A user can cause harm by tagging resources with inappropriate terms" (Steele, 2009). "With social tagging, the very openness that empowers users (or at the very least, draws people in) also leaves the tagging system - and any classification systems built upon it - exposed to vandalism or other abuse" (McElfresh, 2008).
Finally, Snipes (2007) is concerned that the tags be credible (in other words, accurate) and consistent, and respect the privacy of the user.
Despite the limitations of folksonomies, the potential exists for far-reaching effects of user tagging of electronic resources. As Peterson states, "Applying folksonomy tags has the potential to be very popular" (2008, p.4).
Commercial Sites Using Folksonomies
Several commercial sites that have implemented user tagging are frequently mentioned in the literature. They include del.icio.us, flickr.com, and librarything.com. West describes them plainly:
del.icio.us - social bookmarking where users of the site globally share bookmarks tagged with words like "todo" or "tools" or "coupons" to help them find and remember their bookmarks.
flickr.com - photo sharing where people add tags to their images as a form of textual metadata to both help them find their own photographs as well as locate similarly tagged photographs by others.
librarything.com - online personal library where people can add tags like "toread" or "thriller" or "topten" to the books they enter into their online libraries and see who else has books with the same or similar tags.
According to McGregor (2006), "del.icio.us is arguably the most developed and possibly the most collaborative." When the user is tagging a new URL, the site provides a list of "popular" tags that were previously used to describe that resource. "These common tags can then be used in a subsequent user search strategy" (McGregor). Detailing Golder and Huberman's study of del.icio.us tagging:
They found that the users of collaborative tagging systems exhibited much variety in the sets of tags they employ. The frequency of tag use and what the tags themselves described was also found to vary greatly between users. However, the data also suggested that there existed some measure of regularity in the tags being assigned by users.
Spiteri (2007) used the daily tag logs from d el.icio.us, among other sites, comparing the tags with "the National Information Standards Organization (NISO) guidelines for the construction of controlled vocabularies." She found that the tags and guidelines had a close correspondence in some areas (concepts, single terms, spelling) but that count nouns and ambiguous tags still presented problems. Her conclusion was that providing del.icio.us users with additional guidelines and links to reference sources could mitigate some problems. "Folksonomies could serve as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs."
Flickr.com allows users to form collections of images, and then tag those images for either public or private viewing. According to Dye (2006):
On Flickr, the focus is less on how to promote something within a community and more on how to increase the findability of personal content. . Although the burden of creating metadata rests mostly with the person who posts the content, social groups can use Flickr to create group tags to collect all members' photos tagged with a particular keyword together-something that Flickr calls tagography.
The LibraryThing website describes itself as "an online service to help people catalog their books easily" (LibraryThing, 2009). It incorporates data from Amazon, the Library of Congress and other world libraries in assisting users in the cataloging process. Rolla (2009) conducted a study comparing " LibraryThing tags for a group of books and the library-supplied subject headings for the same books." Not surprisingly, he found that professional catalogers and LibraryThing users described resources very differently. Rather than viewing this as a problem, he believes that "Because of these differences, user tags can enhance subject access to library materials, but they cannot entirely replace controlled vocabularies such as the Library of Congress subject headings." Yet, he concludes that " adding user tags to library catalogs could help improve subject access to collections."
Incorporating Folksonomies in Library Settings
Of course, library professionals are most interested in the possible inclusion of user-generated metadata within more traditional library catalogs. "We could harness folksonomies to build 'hybrid catalogs,' strengthening the catalog services we provide to our patrons" (McElfresh, 2008).
Spiteri (2006) outlines three ways that libraries can implement user tagging. First, "allow users of public library catalogues to create and organize their own personal information space in the catalogue" (p. 76). Adding their own notes, and references to outside links, enriches the user's personal catalog.
Second, "Allow users to supplement the existing controlled vocabulary in the catalogue . with their own metatags" (p. 76). Doing so allows for more natural (and current) language. And, third, "Folksonomies could be used to foster online communities of interest . public tags can be viewed by other catalogue users with similar interests; this use of folksonomies could facilitate the sharing and exchange of information" (pp. 76-77).
Kroski (2008) contributes additional means of using folksonomies. "Libraries are making use of social cataloging applications as tools to cataloging new titles in a browsable, interactive, user-focused community" (p. 81). The Danbury Library in Connecticut has integrated LibraryThing into its OPAC ( Online Public Access Catalog), allowing users to tag and recommend books.
By implementing this service, the Danbury Library not only had channeled additional functionality for its library patrons, but incorporated a massive resource of focused user-generated content. This customizable service enabled them to enhance their user experience while maintaining the previous functionality of their OPAC.
Mendes, Quinonez-Skinner and Skaggs (2009) studied LibraryThing for Libraries at California State University, Northridge (CSUN) and found:
The University of Pennsylvania had developed a closed system called PennTags that allow users within the Penn community to tag "web sites, articles in the library's database, and records in both the video catalog and Franklin, the library's OPAC" (Steele, 2009). "Folksonomy terms exist side by side with the LCSH headings . although they are not yet prevalent in the catalog" (Peterson, 2008). Unfortunately, users are unable to search the OPAC by these tags (Steele).
Steele documents several other libraries that are using folksonomies within their collections. Ohio State University Libraries are using one tag, leisurereading, "to create a list of books the library owns that patrons can read for leisure. . Once the patron clicks on the individual book, they can see several other tags created by other LibraryThing members." He also mentions that Ann Arbor District Library (AADL) has developed SOPAC or "social online public access catalog.
Finally, Montana State University is using folksonomies to tag Electronic Theses and Dissertations (ETDs). Peterson (2009), an Associate Professor and Information Resources Specialist at the University writes that " patrons are using folksonomy tags, and the usage of the tags is increasing." But, "it appears that the uses of LCSH and folksonomy are quite different, and that these parallel modes of access should continue to maximize usability and ease of access to the database."
One of the more subtle hurdles user-generated metadata may need to overcome is the skepticism of some information professionals. Snipes (2007) enumerates some expert concerns. Are the tags credible? Can the information be trusted? What about malicious postings? How do users safeguard their privacy? "At this time, our students still need the security and dependability of the traditional, controlled information seeking methods." Yet, at the same time she concedes, "Folksonomy is here to stay, and it shows promise of evolving into a vital structure of information retrieval for our students in the future."
Other professionals have gradually warmed to the benefits of user-generated metadata. In 2006, Peterson expressed her misgivings:
In 2008, her position softened slightly. ". Subject cataloging and user-generated tags will probably coexist." Yet, within a year she concluded, after studying their usage at the Montana State University libraries, that "P atrons are using folksonomy tags, and the usage of the tags is increasing. . Usage of the tags is evidence that permitting folksonomy tags in the ETD database has met patrons' needs. "
While Peterson was concerned about philosophical relativism, Shirky's point of view is the opposite. "It comes down ultimately to a question of philosophy. . [If] you believe that we make sense of the world, if we are, from a bunch of different points of view, applying some kind of sense to the world, then you don't privilege one top level of sense-making over the other." The collective wisdom of the users of the system has inherent benefit.
Macgregor and McCulloch (2006) similarly see benefit in incorporating user tags into existing cataloging:
Or, as West (2007) concludes, "Anything that can help our users find information should be a net gain to librarians."
Improving Tagging Quality
This project addresses the question of whether providing subject matter experts who are not information professionals will improve the quality of their tagging when provided simple instructions. In other words, does guidance increase the value of tags?
The commercial photo-sharing site Flickr.com was chosen as the platform for this study because of its ease-of-use and simple tagging feature. Two private Flickr accounts were created to allow users to upload photos with the understanding that they would not be repurposed for commercial or other uses. (Image theft has recently been an issue for some participants.)
Several user communities of Portuguese Water Dog enthusiasts were asked to participate. This group was chosen for several reasons, including their subject matter expertise and their access to digital photographs taken in a variety of settings, from formal dog shows and trials, to casual at-home photography.
Participants' demographics are similar to purebred dog enthusiasts in general: a typical contributor is female, middle-aged, unmarried with no children living at home.
Participants were told that their photos would remain private, but that they could tag other photos within the Flicker account as well as their own. A sample photo with tags was uploaded to the account for those given tagging instructions, but it was not identified as such to the participants. However, each participant could view all photos that had been uploaded to the account and model their tagging on other photos if they liked.
Contributors were divided into two groups and asked to upload one or more photos and tag them. The first group, as a control, was given no instructions beyond the web address and login information for their Flickr account. The second group was also given login information, and provided with the following instructions:
(Photos may be viewed using the login information found in Appendix A.)
Twelve photos were uploaded to the control group account in addition to the sample photo, and each photo was tagged an average of three times. In addition, six of the twelve were given a descriptive title, either on the participant's local computer or when the photo was uploaded; five pictures were given descriptions.
By contrast, twenty-nine photos were uploaded to the account used by participants given instructions. Again, each photo averaged three tags, although one image has ten tags and seven have none. The majority of photos (25) have a descriptive title, and fourteen have descriptions.
The following table details tags and descriptions that were suggested in the instructions:
(A complete list of tags for each group, and corresponding tag clouds, can be found in Appendix B.)
At first blush, it appears that the quantity of tags was virtually the same for both groups. However, when photos without tags are eliminated, the instruction group averages four tags per photo, compared with the control group's three per image. (Images without tags were primarily those uploaded on behalf of a contributor.) In addition, the photo descriptions in the instructions group were often highly detailed, containing data that would be included in tags by information professionals. For example, photo IMG_4459 does not have a descriptive title, but the description contains detailed information, including call names, breeds and MPH of the dogs:
Kibble Power - 265 DP with 20 PAW DRIVE !!! (recently clocked at 15+ MPH!) The dog driving Porties (Annie, Gladys, Chuckie, Jibby) and Qika the Bouvier de Flanders.
Overall, there was a higher level of participation by those in the instructions group. This may be attributed to the inclusion of a definition of tagging in the instructions. (Participants' computer skills varied from extremely familiar with uploading digital photos to a complete lack of comfort in this area. However, when approached about participating most contributors indicated that they did not have a clear understanding of tagging, so a description of tagging was added to the instructions.)
Whether the instructions led to a higher quality of tags, contributing to better precision and recall, is hard to determine. It is apparent that a much larger group of participants would be required to make this determination. But, the inclusion of instructions did lead to much more creative and varied tags. Rather than restricting the efforts of the contributors, including instructions appears to have allowed them more flexibility in their tagging. Delightful tags like " expression", "face", "kibblepower" and "watchyourdrinkaroundhim" add richness to the image descriptions.
One of the surprises of this study was how few of the participants included basic information about the dogs, including breed, kennel and call name (e.g. "Trixie"). In addition, the registered name of the dog, the definitive description of a purebred dog, was not used as a single tag. This was highly unexpected from a group of experts where this information begins even the most casual of conversations.
While this study of concerning improving the quality of user tagging is not definite, a few conclusions can be drawn. Users may respond more creatively in their tagging when they are given information explaining what tagging is and offering examples. Rather than feeling hemmed in by instructions, they appear to be more willing to expand their tagging horizons.
For many users, describing a resource is less compartmentalized than it might be for an information professional. Lay users treat title, description and tags as interchangeable, and search software (and perhaps the tagging mechanism itself) should accommodate this fluidity, even exploit it.
Finally, participants are more enthusiastic when offered some instruction, which in turns leads to higher levels of participation and creativity. Providing direction may be the first step in harnessing user-generated tagging for supplemental cataloging of digital resources.
The frontier of user-generated metadata is only now being explored. But, the possibilities are intriguing. "The overall usefulness of folksonomies is not called into question; just how they can be refined without losing the openness that makes them so popular" (Peterson 2006).
Photos may be viewed at www.flickr.com using the following login information.
Tag Clouds and Tag Lists
About LibraryThing. Retrieved 10/18, 2009, from www.librarything.com/about
Buckland, M. (1999). Vocabulary as a central concept in library and information science. T. Arpanac et al., (Eds.). Digital libraries: Interdisciplinary concepts, challenges, and opportunities: Proceedings of the Third International Conference on Conceptions of Library and Information Science, Dubrovnik, Croatia, 23-26 May 1999, 3-12. Zagreb: Lokve. Retrieved 11/06,2009 from http://www.sims.berkeley.edu/~buckland/colisvoc.htm
Cattuto, C. (2008). Emergent Community Structure In Social Tagging Systems. Advances in Complex Systems, 11 (4), 597-608.
Dye, J. (2006). Folksonomy. EContent, 29 (3), 38-43.
Fry, E. (2007). Of torquetums, flute cases, and puff sleeves. Art Documentation V.26 no.1 (Spring 2007) P.21-7, 26 (1), 21$l27-21$l27.
Furner, J. (2008). User tagging of library resources. International Cataloguing and Bibliographic Control V.37 no.3 (July/September 2008) P.47-51, 37 (3), 47$l51-47$l51.
Golder, S.A. and Huberman, B.A. (2005). The structure of collaborative tagging systems. Retrieved 10/17, 2009, from http://www.hpl.hp.com/research/idl/papers/tags/tags.pdf
Kroski, E. (2007). Folksonomies and user-based tagging. In N. Courtney (Ed.), Library 2.0 and beyond (pp. 91-103). Westport, CT: Libraries Unlimited.
Macgregor, G. (2006). Collaborative tagging as a knowledge organisation and resource discovery tool. Library Review, 55 (5)
McElfresh, L. K. (2008). Folksonomies and the future of subject cataloging. Technicalities V.28 no.2 (March/April 2008) P.3-6, 28 (2), 3$l6-3$l6.
Mendes, L. H. (2009). Subjecting the catalog to tagging. Library Hi Tech, 27 (1)
Pera, M. S. (2009). Sophisticated library search strategy using folksonomies and similarity matching. Journal of the American Society for Information Science and Technology V.60 no.7 (July 2009) P.1392-406, 60 (7), 1392$l1406-1392$l1406.
Peterson, E. (2006). Beneath the metadata. D-Lib Magazine V.12 no.11 (November 2006) P.1, 12 (11), 1.
Peterson, E. (2008). Parallel systems. Library Philosophy and Practice V.2008 (2008) P.1-5, 2008, 1$l5-1$l5.
Peterson, E. (2009). Patron preferences for folksonomy tags. Evidence Based Library and Information Practice V.4 no.1 (2009) P.53-6, 4 (1), 53$l56-53$l56.
Rolla, P. J. (2009). User tags versus subject headings. Library Resources & Technical Services V.53 no.3 (July 2009) P.174-84, 53 (3), 174$l184-174$l184.
Sanders, D. (2008). Tag--you're it! American Libraries, 39 (11), 52-54.
Shirky, C. Ontology is overrated: Categories, links, and tags. Retrieved 10/17, 2009, from http://shirky.com/writings/ontology_overrated.html
Smith, G. (2008). Tagging: People-powered metadata for the social web. Berkeley, Calif: New Riders. Retrieved from http://proquest.safaribooksonline.com.libaccess.sjlibrary.org/?uiCode=csusanjose&xmlId=9780321550149
Snipes, P. R. (2007). Folksonomy vs. minnie earl and melville. Library Media Connection V.25 no.7 (April/May 2007) P.54-6, 25 (7), 54$l56-54$l56.
Snuderl, K. (2008). Tagging. Statistical Journal of the IAOS, 25 (3), 125-132.
Spiteri, L. (2006). The use of folksonomies in public library catalogues. Serials Librarian, 51(2), 75-89.
Spiteri, L. F. (2007). Structure and form of folksonomy tags. Information Technology and Libraries V.26 no.3 (September 2007) P.13-25, 26 (3), 13$l25-13$l25.
Steele, T. (2009). New cooperative cataloging. Library Hi Tech V.27 no.1 (2009) P.68-77, 27 (1), 68$l77-68$l77.
Suster, M. (2006). Folksonomy. AIIM E-Doc Magazine V.20 no.6 (November/December 2006) P.20-1, 20 (6), 20$l21-20$l21.
Thomas, M. (2009). To tag or not to tag? Library Hi Tech, 27 (3)
Trant, J. (2009). Tagging, folksonomy and art museums. JODI: Journal of Digital Information, 10 (1), 5-5.
Vander Wal, T. (2005). Folksonomy definition and Wikipedia. Retrieved October 17, 2009 from http://vanderwal.net/random/entrysel.php?blog=1750
West, J. (2007). Subject headings 2.0. Library Media Connection V.25 no.7 (April/May 2007) P.58-9, 25 (7), 58$l59-58$l59.