GLIFOS-Media: Rich Media Archiving
Rich-media preservation
As posted on November 4, 2009, The University of Texas Libraries Human Rights Documentation Initiative (HRDI) has been working with the Kigali Genocide Memorial Centre in Rwanda on a pilot digital archiving program that takes advantage of a rich media platform called GLIFOS media. GLIFOS provides a social media tool kit that was originally created to meet the needs of a distance learning program at the Universidad de Francisco Marroqín in Guatemala, but it proves to also be promising as a tool for human rights archiving (see the article “Non-custodial archiving: U Texas and Kigali Memorial Center” at WITNESS Media Archive). As a rich-media wiki, GLIFOS is designed to integrate digital video, audio, text, and image documents through a process that “automates the production, cataloguing, digital preservation, access, and delivery of rich-media over diverse data transport platforms and presentation devices.” GLIFOS media accomplishes this by presenting related documents–for example, video of a lecture, a transcript of the same, and associated PowerPoint slides–in a synchronized fashion such that when a user highlights a particular segment of a transcript, for example, the program locates and plays the corresponding segment of the video and also locates the related Power Point slide. This ability to seamlessly synchronize and present related digital media translates well to the human rights context by allowing for the cataloging and integration of video material, documents containing testimonies, photographs, and transcripts. Materials that all relate to a single event can be pulled together and presented in a holistic fashion, which is useful for activism and scholarship.
GML: The Key to Preservation
In order to support the presentation of this integrated information for users, GLIFOS needed to ensure that materials can be read and accessed across existing digital presentation platforms (e.g., web browsers, DVDs, CD) and readers (e.g., PCs or PDAs), as well as on platforms yet-to-be-created (see “XML Saves the Day,” an article written by the developers in 2005 for more detail). This was accomplished by indexing and annotating all digital documents stored in the GLIFOS repository with an XML-based language called the “GLIFOS Markup Language,” or GML. The claim is that “GML is technology, platform, and format independent” (Ibid), thus allowing for preservation of established relationships between materials. Basically, the GML language allows users of GLIFOS to create a metafile that determines the relationships between related multi-media records held in a repository in such a way that the relationships between files are maintained across a variety of media reading platforms. This is possible because GML is a significantly stripped-down markup language that requires little or no translation from one reader to the next, thus content is preserved as technology changes and evolves.
GLIFOS and Human Rights Documentation
Given that GLIFOS is designed to catalog, index, and synchronize a wide variety of digital media types, it proves to be a promising tool for aiding in digital archiving. The GLIFOS GML protocol allows the program to access and present cataloged materials through the meta-relationships it establishes for records; and because GML is a streamlined markup language that allows multiple platforms to present and read digital documents, these relationships have been successfully maintained when migrated to entirely new data reading and presentation platforms. As long as the repository of documents that GLIFOS accesses remains intact, both in terms of the materials stored there and their associated metadata, and as long as new media platforms continue to read older video and image media files, use of the GLIFOS Markup Language aids in preservation by providing a means of cataloging and indexing documents using GML, as well as preserving the synchronized links and interactions that GLIFOS establishes between related documents over time.
[1] See http://www.glifos.com/wiki/images/f/f5/Arias_reichenbach_pasch_mLearn2005.pdf
Amateur Footage to Documentary: A Death in Tehran
One of the themes followed in this blog is the role of amateur photography and videography in contemporary human rights documentation efforts. As illustrated by organizations like WITNESS and the Chiapas Media Project, digital video cameras in citizens’ hands are becoming powerful tools for calling our attention to human rights violations and mobilizing action. A recent example of this was the June 2009 Presidential Election Protests in Tehran, Iran; for several weeks following an election result that many felt was fraudulent, Irani citizens protested to demand a revote and were violently retaliated against by their own government. During this time, protesters used digital cameras and cell phones to record events they witnessed and then transmitted the images they created via a variety of social media platforms on the World Wide Web. One of the most galvanizing and appalling events witnessed this way was the death of Neda Agha-Soltan, who was shot by paramilitary forces authorized by the government to disperse the protests with force. Bystanders with cell phones recorded Miss Agha-Soltan’s death and within hours, the recordings had gone viral on the web. As Neil Genzlinger of The New York Times observes:
Neda Agha-Soltan died one of history’s most-watched deaths. It was in the midst of the fury over the Iranian presidential election of June 12, and for a moment it seemed as if the young woman’s final moments — shot on a Tehran street on June 20 as a protest swirled around her, dying at the scene while a cellphone camera recorded it all in images soon flashed around the world — would be the start of something earthshaking.
However, paradoxically, nothing earthshaking occurred. Instead of fueling further protests, events quickly died down shortly after Neda’s death. It is hard to say exactly why this happened–it could have been fear of more deaths, or it could have been that the Iranian government somehow convinced people that deaths like Neda’s were caused by Western agents acting to upset Iranian peace. Whatever the case, PBS’s Frontline has created and aired a documentary titled “A Death in Tehran” about Neda’s death that incorporates a significant amount of amateur footage that captured not just of Neda’s death, but also her participation in the protests over several days. The use of this footage in a well-documented investigative film demonstrates the potential of amateur digital documentation as a resource for informed coverage of human rights events. The entire documentary is available on the Frontline Website. Further commentary about the documentary and the role of amateur footage in creating it can be found in today’s edition of The Lede Blog, as well as in a review written on November 16, 2009, which appeared in the New York Times.
UT-Austin Library Web Clipper: Follow Up Questions

Image courtesy of www.deepspace.com
A couple of weeks ago, Kevin Wood (University of Texas Libraries at Austin) and I posted an article with the title “Archiving Web Pages: UT-Austin Library’s Web Clipper,” where we described an innovative solution to capturing and preserving fragile human rights material from the World Wide Web. The post generated a number of interesting questions, so we have decided to post this follow up in a Q&A style to provide additional information on how the Web Clipper works. Special thanks again to Kevin for taking the time to craft answers to these questions. Please do not hesitate to contact me with more questions if you have them. We will be writing updates on the Web Clipper progress as Kevin and his team continue to develop it and will do our best to answer your questions here as we do so. –Sarah
The UT Libraries’ Web Clipper
As part of a Bridgeway Funded initiative, the University of Texas Libraries at Austin is engaged in a project developing a means for harvesting and preserving fragile or endangered Web materials related to human rights violations and genocide. Having tried a number of available technologies for harvesting Web material and finding them to be unsatisfactory for their needs, a team of developers created an in-house Web Clipper program designed to meet the libraries’ specific needs for preserving Web material. A full description of the Web Clipper is available here. What follows is a series of responses to questions generated from the first post about the Web Clipper.
Q1: When the clipper clips, does it save the file in the original formats (e.g., html, with all the associated files)?
A: Yes, to the extent possible. There are challenges with javascript and streaming media that we are still working on with the new clipper. In those cases we rely on attachments (see the answer to question 2 below). Before designing the new Web Clipper, We’d gone through a few different clipping strategies and were not pleased with any. Zotero does a good job of capturing what you see, but makes modifications to the files, thus complicating preservation. Placing Firefox behind a proxy captures a lot, but misses content that relies on user interactions if those interactions don’t occur. Heritrix does the best job, but we’ve seen it struggle with more than 10% of the pages that have been clipped.
Q2: Are there limitations on what the Web Clipper can and cannot capture?
A: There are limitations to what our new Web Clipper can automatically capture, but it has the ability to accept attachments. Extensions like DownloadHelper (a free Firefox extension for downloading and converting videos from many sites with minimum effort) can turn a streaming video into a file that can then be attached to a clipping. The final format of the attachment depends on the tool used to create it, but generally matches the original.
Q3: Are the graduate research assistants who are testing the Clipper capturing multiple instances of the same site over time, or are these one-off?
A: Each capture is a one-off. The Web Clipper allows users to dive deeper into sites and capture individual pages rather than whole sites (sometimes a site that wouldn’t normally carry relevant human rights information has an article or blog post that we want to preserve). Where one might use tools such as Archive-It, WAS, WAX or Web Curator Tool to capture an entire blog, one uses the Web Clipper to capture and describe a single blog post or article, for example.
Q4: When the clipped files are submitted to The University of Texas Libraries’ DSpace (the local repository), is the submission process simple? That is, is there an automated process created?
A: Yes, this process is automated. We use the SWORD (Simple Web-service Offering Repository Deposit) to facilitate interface between the Web Clipper and DSpace for ingestion. A script runs periodically, identifies new clippings and pushes them into the repository.
Q5: Regarding the use of a local Wayback machine for preserving the clipped materials: Are you capturing clipped material via Wayback in addition to DSpace, or is this all the same process with just one instance of the preserved site? If the latter, how does one set up a local Wayback version?
A: There is only one instance of the preserved site. The repository contains a link out to the Wayback machine, not the preserved clipping itself. The link allows a user to open the original record in the DSpace repository. Although we could store ARC files (a lossless data compression and archiving format) in the repository, they wouldn’t be of much use to our users as such, so we’re only exposing the content through a local Wayback instance. We use the open source version of the Wayback Machine.
Q6: Is access to the clipped documents restricted, or are they open to everyone via UT Libraries’ digital repository? Are there any privacy or confidentiality issues associated with the clipped material?
A: The clippings will be open to everyone, but while we’re in development they’re restricted. We haven’t seen any privacy or confidentiality issues with our clipped material. All of the clippings come from the public web.
Archiveros sin Fronteras: Barcelona Group Archives Fragile Human Rights Documentation
Archiveros sin Fronteras (or Archivists Without Boarders, in English) is a non-profit organization operating out of Barcelona, Spain. The goal of the organization is to preserve endangered human rights documentation throughout the Spanish speaking world. As they state on their web page:
Los valores de identidad, memoria, derecho a la información y defensa de los derechos humanos son elementos consubstanciales en nuestro trabajo cotidiano y constituyen valores universales que inspiran la manera de actuar de nuestra entidad.
The principles of identity, memory, right to information and defence of human rights are inherent elements in our daily work and constitute universal values that inspire the way of acting of our organization.
The organization has a number of projects currently underway ( project descriptions are not currently available in English) in Fez, Morocco and Catluña, Spain, as well as a salvage operation in Latin America (Recuperación de Archivos y Documentos en el Cono Sur y de Dictaduras y Gobiernos represivos en Iberoamèrica (2005-2007)) to preserve documentation of repressive regimes. One of their larger on-going projects in Latin America is helping to organize the Guatemala’s National Police Archive.
The site offers project descriptions and updates, an archive of project reports, and announcements for various international events in human rights archiving. There are also links to news items concerning human rights archiving efforts in the Spanish-speaking world. Some of these resources are translated to English, but the majority are only in Spanish.
UT Human Rights Archiving and GLIFOS
T-Kay Sangwand, the human rights archivist at he University of Texas Libraries in Austin has contributed a guest post to the WITNESS Media Archive blog to close out Grace Lile’s series for Archives Month last month. The post discusses a non-custodial archiving arrangement that the University of Texas Libraries has established with the Kigali Memorial Centre (KMC) in Rwanda. Funded by the Bridgeway Foundation and the University of Texas Libraries, the project–called the Human Rights Documentation Initiative (HRDI)–consists of a collaborative effort to digitize, preserve, and catalogue a variety of documentation from the Rwandan genocide. In order to accomplish this, HRDI project team members traveled to Rwanda this summer to help KMC set up an archiving system that utilizes the GLIFOS media toolkit–a rich-media storage program and reader developed in Guatemala:
In order to facilitate access to KMC materials, the HRDI has been working with the Guatemala-based company, Glifos, that provides powerful software that allows for cataloging, indexing, and syncing audiovisual materials with transcripts and other materials for enhanced access. Using Glifos, the HRDI built a prototype for a digital archive for KMC and in July 2008, three members of the HRDI project team (Christian Kelleher, T-Kay Sangwand, and Amy Hamilton) traveled to Rwanda to demo the prototype.
A unique piece of this project is the supportive role that the University of Texas Libraries is playing as KMC establishes and maintains their archive. Specifically, the library is serving as a repository of the digitized materials created at Kigali, while Kigali maintains the original collection of physical paper documents, film footage, or audio recordings. GLIFOS will allow users in Rwanda to directly access the digital materials held in the Texas repository. See the entire article at the WITNESS Media Archive for the complete discussion of this project.
As illustrated by the HRDI project at Texas, the GLIFOS program proves to be a good means of cataloguing, indexing, and preserving rich-media content (that is, video, text, audio, and even materials in multiple languages) in a way that allows for ease of archiving and ease of access and use. A future post on this blog will discuss the technical specifications of GLIFOS in terms of its utility for digital archiving.
Interesting NPR Series: The End of Privacy

Image courtesy of www.NPR.org
This week, National Public Radio’s All Things Considered has been airing a series called “The End of Privacy.” The articles call attention to the fact that social media platforms (e.g., Facebook, Twitter, and MySpace), email and web crawling services (e.g., Gmail/Google or Yahoo!), and even cell phone services harvest, store and act upon a large amount of personal and identifying information gathered from users. The journalists for these pieces investigate the legal and economic ramifications surrounding the use of personal data in Web applications in general, however, listening to these articles raises important questions for human rights in particular. Specifically, if activists are going to use these platforms to rapidly distribute materials and mobilize actions, they need to be aware of the fact that they are not necessarily working anonymously, though they may assume that they are.
As reflected in several posts on this blog, the use of social media is changing the face of human rights activism and thus causing NGOs and archivists alike to seek means of capturing and preserving the fleeting, transient, and ephemeral information that flies across the Web at the speed of rapidly typing fingers. One of the challenges that immediately rises to the surface in this is protecting the privacy and safety of victims, witnesses, and even activists from further injury by oppressive regimes–a problem that becomes further complicated if we recognize that Web applications are capturing detailed personal information that repressive regimes can access and use against the people who post the materials. Of course, the problem of privacy and safety is not new to human rights–as Valerie Love at the University of Connecticut observes in a recent post to the WITNESS Media Archive:
In recent years, archival institutions and organizations have become increasingly concerned with issues regarding human rights records and archival collections. Questions of access, privacy, politics, trust, and ensuring the safety of those documenting abuses and potentially controversial records all impact archivists working with human rights collections.
This observation applies to the content of documents and how they can impact the individuals represented withint them, but when we extend the situation to the web, these privacy issues become even more complicated–and not only for the privacy issues related to using the web described above. As people increasingly use the Web to post video and images of events and abuses they witness, the anonymity of the people captured in those materials is compromised; unfortunately, well-meaning witnesses post materials to the web that can help governments identify individuals involved in human rights protests, for example, and repressive regimes take advantage of those images to identify and arrest the represented individuals. Add to that that the person who posts may not know that they can be tracked through their Web use and we find yet another person potentially at risk. One thing that NPR’s “The End of Privacy” series makes clear is that users need to be aware that they are increasingly vulnerable to extensive data harvesting–data that can identify specific users, their preferences and activities, and even their physical location–and take measures to try to protect their anonymity.
Valerie Love for WITNESS Media Archive: Call for Human Rights Roundtable in SAA
As posted earlier this month, WITNESS archivist Grace Lile is honoring Archives Month by dedicating October’s blog posts to the WITNESS Media Archive to human rights archiving. Today’s entry was submitted by Valerie Love (the Curator for Human Rights and Alternative Press Collections at the Thomas J. Dodd Research Center at the University of Connecticut) and calls attention to the fact that there is as yet no established organization for collaboration between human rights archivists and human rights field workers focusing on issues of documentation–an issue that CRL also seeks to address through the Human Rights Electronic Evidence Study. As a step toward addressing this lack, Valerie and T-Kay Sangwand (the Human Rights Archivist for the University of Texas Libraries-Austin) have drafted a petition for establishing a Human Rights Roundtable within the Society of American Archivists. As Valerie notes:
…despite the proliferation of conferences and online information sites regarding human rights archives, there is not yet a space or group [dedicated to human rights] within the largest archival organization in the United States. T-Kay Sangwand of the University of Texas and I are currently petitioning to create a human rights roundtable within the Society of American Archivists (SAA). Informal gatherings of archivists concerned with human rights issues occurred at the SAA meeting in San Francisco in 2008 and at Austin in 2009, but the creation of a official roundtable would formalize current efforts to collaborate and share information on archives and human rights in the United States.
Please read the full post at the WITNESS Media Archive to learn more about this effort and its importance to human rights archiving and field work. If your are interested in learning more about this effort, please contact Valerie at valeri.love@uconn.edu.
Chiapas Media Project Video Collection
The Chiapas Media Project (CMP) is a media activism group founded twelve years ago to supply media equipment and training to indigenous groups in southern Mexico so that they can document and raise consciousness of the realities of their lives in a region that has experienced continuous political and social repression stemming from the Aguas Blancas massacre of 17 indigenous farmers on June 28, 1995 . CMP forms a bi-national partnership with Promedios de Comunicacion Comunitaria, located in San Cristobal de Las Casas, Chiapas–an organization that provides local communities with access to equipment for editing and producing documentary films for local media distribution, and for sale as a means of supporting their mission. As stated on the Chiapas Media Project homepage:
For many people who live in the developed world use of video cameras, VCR’s, TV’s, and computers is a daily occurrence. But when one speaks with indigenous peoples about access to this technology they say it is only a dream. For centuries outsiders have represented indigenous people and their cultures. Recently there has been an effort to get new communication technology into the hands of indigenous people so that they can represent themselves, with their own words and images. This is what the Chiapas Media Project (CMP)/Promedios is attempting to do in Southern Mexico.
For an example of the documentary materials they produce, see “Eye’s on What’s Inside: The Militarization of Gerrero,” on YouTube. This 35 minute film provides a glimpse into the poor, subsistence agricultural lives of the Ma Phee of Barranca, Guerrero, as well as the challenges they face related to the constant presence of the Mexican military in their communities. The purported purpose of the military is to monitor the area for narcotrafficing and the illegal production of marijuana and poppies for heroine (crops that some of these desperately poor farmers do produce), but the locals argue that the military’s real purpose is to suppress social resistance that call for equal treatment under the law. The film contains footage of military personnel harassing citizens at a road-side check point, as well as documentary testimony from two local women who were sexually assaulted by members of the military, but are unable to take their cases to court.
The Chiapas Media Project has produced approximately 30 documentary films for sale as DVDs through their office in Chicago–see their on-line catalog for a full listing. Pricing is available for individual or institutional/library sale. Several of the documentaries are entirely produced by community members, while the rest are the result of close collaborations between local human rights organizations and local community members. Funds from purchases go to supporting local media activism efforts in southern Mexico.
Capturing & Archiving Web Pages: UT-Austin Library’s Web Clipper
The following was co-authored with Kevin Wood at the University of Texas Libraries at Austin. The post describes a promising experimental archiving strategy that the UT Libraries is developing for harvesting and preserving primary resources from the Web. Special thanks to Kevin for contributing his expertise and time by co-authoring this post.
–Sarah
University of Texas Libraries-Austin’s Web Clipper Project for Human Rights
Developer: Kevin Wood

Example of a Web page clipped from the web for archiving as a primary resource. Image: Kevin Wood, University of Texas Libraries-Austin
Background
In July of 2008, the University of Texas Libraries received a grant from the Bridgeway Foundation to support efforts to collect and preserve fragile records (records that are at risk of destruction either from environmental conditions or human activity) of human rights conflicts and genocide. These funds are helping the library to develop new means for collecting and cataloguing “fragile or transient Web sites of human rights advocacy and genocide watch;” sites that are important because the internet has become a primary means for distributing both information and misinformation about human rights abuses and for documenting human rights events. Thus these fragile Web sites become valuable primary resources for survivors, scholars, and activists as they pursue their work in human rights (see the library’s grant announcement for a press release on the grant).
Harvesting Web Sites for Archiving
In their first attempt to establish a reliable means for harvesting Web sites for preservation, archivists at the University of Texas Libraries used Zotero, a free Firefox extension that allows users to collect, manage and cite online resources for research. The program allows users to capture copies of webpages and catalog them in a bibliographic program that functions much like End Note or Book Ends. Archivists at the University of Texas planned to use the program to pull specific documentation of human rights events off of the internet and then submit the collected pages to their institutional repository for cataloging and preservation. However, Zotero wound up not meeting their needs. Zotero is geared toward individual work from a desktop, therefore, when it harvests a page, it changes links to be relative to the individual’s desktop rather than saving the original links as they are built into the webpage of interest—in terms of archiving and preservation, this is problematic because it calls into question the authenticity of the captured pages. Zotero can be made to keep the original links, but it was not originally designed to do so, so this becomes a cumbersome process and as Zotero continues to evolve in the direction of meeting the needs of individual users, this work-around process becomes that much more difficult to maintain.
The solution for this problem is the in-house creation of a custom web clipper program that harvests pages without modifying them. It functions as a Firefox plug-in and was built from the bottom up borrowing heavily from open source programs that already have some of the right functionality for the libraries’ human rights archiving needs. The designer wants to keep the coding footprint of the web clipper as small as possible to minimize the deployment and maintenance burden. Therefore, the main logic of the clipper will be hosted on a server and accessed on individual machines or terminals through web services. Eventually, this will allow patrons to use the clipper from anywhere in the library system as a harvesting tool. The goal is to centralize the clipping process as much as possible without the need of customizing individual machines, thus streamlining collection, cataloging, and preservation processes.
The prototype clipper is currently housed on two computers at the library in Austin and graduate research assistants are actively clipping web pages for archiving. As they clip a page (see the image above for an example of a clipped page) , users enter metadata in predetermined fields and then assign descriptive terms as tags for subject and content cataloging. Users can either select from a thesaurus of human rights terms (in this case, they are beginning with the thesaurus from WITNESS and extending it with terms as appropriate) or assign arbitrary keywords. Though users have complete control over clipping, documenting, and tagging a Web page, a moderator or manager determines if new terms should be added to the thesaurus.
Regardless of whether a new term makes it into the thesaurus, the pages clipped by users get stored in the archive. Once items are clipped and tagged with descriptive terms, they are ingested into the UT Libraries’ institutional repository, based on DSpace. Metadata are stored in the repository with a link to a local instance of Internet Archive’s Wayback Machine. These copies appear exactly as the pages appeared when the material was first clipped and submitted for preservation, thus maintaining their value as primary resources.
Metadata & the Future of Mobile Web Access
As we move forward with the electronic evidence study at CRL, one challenge keeps rising to the surface in all of our conversations, literature reviews, and reviews of internet resources. In a word: Metadata. Collecting metadata and context information seems to be particularly challenging for digitally or electronically created documentation; in response, archivists and preservationists are striving to create simple, use-friendly means of capturing the information that will ensure that preserved documents will serve a long and useful life. Such processes would allow documents to serve as evidence for continued activism, policy work, scholarship, legal action, and even for maintaining national memories of events once crises are past and democratic processes get improved or established (see the recent post on ADAM at Amnesty International for an example of one recently developed strategy). After all, as the UN has recently established in its report Right to Truth, documentation and archives are fundamental to ensuring that all individuals’ human rights are supported and protected.
Automatic Metadata?
Given the challenges discussed above, it’s interesting to learn that the next generation of mobile Web devices will have the ability to automatically collect metadata such as geographic location, temporal information, and context for materials generated on them. As stated in “Data-rich Internet Needs Context, New Modes of Consumption and Serendipity:”
In the future, metadata will be available on our mobile phones and it will provide computers with contextual information around data that developers create, according to Marc Davis, partner at Invention Arts and former chief scientist of Yahoo Mobile. By bridging the gap between pieces of information, particularly geolocation data, temporal information (when something is created) and other contextual information that Davis called the “who, what, when and where” clues, we’ll be able to help machines filter through data in ways that are more relevant for us ( Jennifer Martinez, GigaOM).
Of course, this capacity is being developed for programming and product development purposes, but if such data could be easily accessed by non-developers, they would have positive implications for digitally created human rights documentation, too. If we can automatically have information about where, when, and in what context a document was created, we have a better chance of gleaning relevant materials–materials that might otherwise be lost for lack of context–for sustained engagement in activism, scholarship and legal action.

leave a comment