The Documentalist

HURIDOCS & IHRDA Collaborate on Caselaw Analyzer

Posted in Reviews, technology by Sarah on November 17, 2010

HURIDOCS has been working with the Institute for Human Rights and Development in Africa (IHRDA) to develop a on-line tool called the Caselaw Analyzer, which will allow IHRDA (and other groups in the future) to “easily browse inter-related court decisions, to quickly access the primary caselaw for a given type of [human rights] violation, to highlight and comment relevant sections of a decision, and to share their commentary with their colleagues and work collaboratively on case research” (see the full description of the program and project here).  The goal of the project was to help IHRDA have quick access to both the body of court decisions the organization has collected from the African Commission for Human and People’s Rights (ACHPR), and to provide a means of quickly and efficiently accessing caselaw research lawyers have already done in order to apply that research to new cases that the organization takes on.  You can access IHRDA’s Caselaw Analyzer connection here.  As stated in HURIDOCS’ description of the program, there is a particular challenge associated with using and managing caselaw that this new program will help organizations like IHRDA address:

Court decisions often reference specific paragraphs of other decisions, which in turn reference other decisions. Caselaw Analyser allows you to easily navigate from one decision to the next. And because it uses inline-browsing to jump stright to the quoted paragraph, you don’t even lose your page. Also, all incoming and outgoing citations are listed next to the text of the decision.

Also, CaseLaw Analyser will use a specifically designed CaseRank algorythm, meaning that the decisions that are the most often referenced, will appear first (like Google’s PageRank). This gives the user an indication of which cases are the most important: the primary caselaw.

In addition to assisting in the organization and management of such data, in the future, the program will also offer a social networking and collaboration feature that will allow multiple individuals within an organization to develop a case together on-line.  This feature will permit organization members working over a broad geographic area to consolidate notes, caselaw, and opinions as they develop new cases.


Building a private cloud server to cut IT costs in human rights

Posted in technology by Sarah on September 1, 2010

In a press release dated August 31, 2010, Riverbed Technology, an IT performance company, announced that  International Justice Mission (IJM) has created a private cloud server to support their digital documenting activities using Riverbed(R)  Steelhead (R) appliances to support the system.  As stated in the press release, though the upfront investment in the technology is substantial, the long-term savings in tech support is considerable.  The system also allows IJM more reliable access to the internet as well as considerable back-up and storage capacity.

IJM is a “human rights organization that secures justice for victims of slavery, sexual exploitation, and other forms of violent oppression” (press release) and they rely heavily on email for communicating case information and protecting the safety of workers.  However, with 13 offices on four continents (the U.S., Asia, Africa and Latin America), dependence on dial-up or satellite service made for unreliable connectivity.    IJM explored ways of centralizing their IT needs in-house and found that purchasing servers, bandwidth and constant systems upgrades would be cost-prohibitive, so they decided to build their own cloud.  As IJM’s vice president of information systems, John Lax, stated:

“As we faced trying to reduce IT expenditures, we decided to centralize our email servers to reduce software and hardware costs — in effect build a private cloud… However, in order to ensure email was accessible in our 13 remote offices, we chose WAN optimization instead of costly bandwidth upgrades. Based on our research we determined that bandwidth upgrades would not address the latency issues we were experiencing.”

The result of this decision being:

As a result of its Steelhead appliance deployment, IJM was able to build a private cloud that addressed its Exchange 2007 performance and upgrade issues, while avoiding costly bandwidth upgrades and accelerating access to critical applications. “For example,” said Lax, “in Uganda we have a 128K satellite link delivering only 64K performance, which is insufficient to support an office of 12-14 workers. With Riverbed WAN optimization, we get five times the throughput with the performance of a 256K link. The Riverbed Steelhead appliances effectively doubled our bandwidth capacity while saving us nearly $60,000 in Uganda alone during one year.”

See the entire article, “International Justice Mission Builds Global Private Cloud and Cuts IT Costs With Riverbed,” for further details on this innovative approach to cutting IT costs in a digital human rights world.

Humorous Take on How Digital Data Die

Posted in technology by Sarah on March 25, 2010

A colleague at HURIDOCS forwarded the following blog link to me and it touches on so many issues that are relevant to archiving and preservation in human rights that I am reblogging it here for you to consider.  It’s a fun read with important information.  Enjoy!  –Sarah

Image courtesy of

Giz Explains: How Data Dies (and How it Can Be Saved)

Bits don’t have expiration dates. But memories will only live forever if the media and file formats holding them remain intact and coherent. Time can be as deadly to data storage as it is to carbon-based life forms.

There are lots of ways data can die: YouTube can pull a video offline before anybody snags it, your hard drive can crash, taking ultra-rare Grateful Dead bootlegs that you never got a chance to upload to Usenet with it, or maybe you designed a brilliant piece of visual art a decade ago in some kooky file format that simply doesn’t exist anymore, and there’s no possible way to view the file without traveling to some creepy dude’s basement a thousand miles away.

What we’re talking about is digital rot—or data rot or bit decay or whatever you’d like to call it—systemic processes which can mean death to data. Kind of a problem when you’d like to keep it around forever. Let’s paint this in broad strokes: You can roughly break the major kinds of rot into hardware, software and network. That is, the hardware that breaks down, the formats that go extinct, and the online stuff that vanishes one way or another.

Please read the entire post here for an informative discussion of how hardware, formats and “online stuff” can trip us up as we try to preserve it and also for some possible solutions to these problems.

Good Resources on Digital Activism at Global Voices Online

Posted in Resources, technology by Sarah on March 9, 2010

In the article “The Technology for Transparency Review, Part 1” posted on Global Voices Online on March 2, 2010,  David Sasaki reviews a number of recent efforts to highlight how various groups are making use of on-line activism, social media, and other digital resources to document human rights issues and generate a wider sphere of activism.  The article is dense with links to resources and reviews of their efficacy and is part of a larger initiative by  Global Voices Online called the “Technology for Transparency Network” (tag line: “tracking civic engagement technology worldwide).  As stated on their webpage:

The Technology for Transparency Network is a  participatory research mapping [initiative] intended to gain a better understanding of the current state of online technology projects that increase transparency, government accountability, and civic engagement in Latin America, Sub-Saharan Africa, Southeast Asia, South Asia, China, and Central & Eastern Europe. The project is co-funded by Open Society Institute’s Information Program and Omidyar Network’s Media, Markets & Transparency initiative, and aims to inform both programs’ future investments toward transparency, accountability, and civic engagement technology projects.

The project, which is a collaboration with Global Voices’ outreach program Rising Voices, serves as a reaction to economic and political changes in mainstream media that have compromised the ability of reporters to engage in deeply investigative reporting that serves as a check on corrupt officials, business people, and politicians.  Quoting David Simon’s observations about the current vacuum in investigative journalism, “it is going to be one of the great times to be a corrupt politician.”  Thus the need to understand and capitalize on the tools that are gaining power and influence within the investigative world.  In order to accomplish this work, the Technology for Transparency Network has assembled a team of veteran Global Voices Online reporters and leading transparency activists from around the world who work together to “We will document in-depth as many technology for transparency projects as possible to gain a better understanding of their current impact, obstacles, and future potential.”

The results of this work will be a series of case studies, podcasts, and a traditional .pdf report that will be released in May.  This report will highlight:

…the most innovative and effective tools and tactics related to technology for transparency projects. The report will make recommendations to funders, activists, NGOs, and government officials regarding the current obstacles to effectively applying technology to improve transparency, accountability, and civic engagement. It will also aggregate and evaluate the best ideas and strategies to overcome those obstacles.

Case studies are plotted on a map and to date, have been conducted in Kenya, Brazil, Cambodia, Chile, China, India, Jordan, Mexico, Malaysia and Zimbabwe.  The page also offers updates on on-going projects and links to other mapping and monitoring projects.  For a more detailed discussion of all of these efforts see David Sasaki’s article published January 19, 2010.

Google’s GeoEye-1 Satellite Service and Human Rights

Posted in technology by Sarah on January 7, 2010

Image courtesy of

In the last year, Google joined forces with GeoEye–the world’s largest space imaging corporation ( Wikipedia) to provide users with access to some of the most detailed images of the earth’s surface currently available. Though GeoEye’s primary customers are United States defense and intelligence agencies, Google acquired exclusive online mapping use of images generated by the GeoEye-1–the company’s flagship satellite–for their Google Maps and Google Earth applications (Wikipedia and loudoni, December 28, 2009).  As Kate Hurowitz, a Google spokesperson observes, “The GeoEye-1 satellite has the highest resolution color imagery available in the commercial market place and will produce high-quality imagery with a very accurate geo-location” (Digital Media, August 29, 2008). Such information could be quite useful to human rights organizations that utilize crisis mapping as part of their monitoring and activism efforts.  Given that crisis mapping engines such as Ushahidi (described here and here) create media mashups that allow users to post events they witness to interactive Google Maps, the detailed imagery from the GeoEye-1 could  increase the accuracy of such information through more precise location information and by providing increased visual detail about the physical context of reported events.  Furthermore, this sort of detailed imaging could serve provide evidence of how areas are ravaged by wide-spread violence by comparing images taken of the area at several different points in time.

GeoEye Foundation Supports Human Rights

Image courtesy of the GeoEye Foundation

Though GeoEye’s main business model is primarily focused on selling contracts to private enterprises, the company also seeks to serve academic and non-governmental organizations by providing free imagery to students and NGOs for research purposes through its charitable GeoEye Foundation.  Since March, 2007, the GeoEye Foundation has provided 90 imagery grants covering 85,000 square kilometers of imagery (loudoni, December 28, 2009).  As stated at the foundation’s Web page, “Foundation awardees have spanned a variety of academic backgrounds, including archaeology, human rights, climate change, forestry, geospatial intelligence, and land cover assessments.”

The GeoEye Foundation is a 501(c)(3) non-profit organization founded to support the company’s belief that they have a social responsibility to share their imagery and to support efforts to train future professionals in means of monitoring the earth,which they focus on in three ways (GeoEye Foundation):

  • Fostering the growth of the next generation of geospatial technology professionals
  • Providing satellite imagery to students and faculty to advance research in geographic information systems and technology as well as environmental studies
  • Assisting non-governmental organizations in humanitarian support missions

For an example of how imagery from GeoEye satellites can serve human rights see the Porta Farm Zimbabwe case study published on the Foundation’s Web page.   This case study illustrates how detailed satellite imagery provided evidence of human rights abuses during Zimbabwe’s Operation Murambatsvina (Restore Order).  During this campaign in 2005,  the government confiscated white-owned corporate and private farms, ostensibly to redistribute the land to  black Zimbabweans.  However, in a bitterly ironic twist, during the demolitions of these farms, government officials destroyed the homes and livelihoods of hundreds of thousands of black Zimbabweans who lived and worked on them.  As the case study observes:

The Zimbabwe government began Operation Murambatsvina (Restore Order) in May 2005. This was a program of mass forced evictions and demolition of homes and businesses. The government carried out Operation Murambatsvina in the winter and during a period of food shortage. This increased the hysteria. One UN report estimated the number displaced to be 700,000. In late June, during a several day period, the government instituted forced demolitions at Porta Farm. Local human rights monitors reported that during the disorganized demolition several deaths occurred, including those of children. Bulldozers executed the main demolition process at the end of July 2005.

The images in this case study show the existence of the large and thriving Prota Farm village in 2000 followed by an image of the area after its demolition in 2005.  The new image shows only the village roads roads–all housing and commercial structures are gone, necessitating the displacement of the hundreds of families that had lived there.

GeoEye’s Image Quality and Resolution

The GeoEye-1 satellite is able to produce some of the most detailed images of the earth’s surface taken from space.  As the satellite circles the earth every 98 minutes (loudoni, December 28, 2009), traveling at a velocity of 4.5 miles per second (Digital Media, October 8, 2008), and at an altitude of 425 miles in a sun-synchronous orbit (Wikipedia), the GeoEye-1 is capable of capturing details on the earth’s surface as small as 41 cm, or 14 inches in size.  That said, government security regulations limit commercial images–such as those that appear on Google–to a resolution of 50 cm, or 20 inches.  However, this is still quite detailed, as illustrated by the image below.  This image of of the modern Library of Alexandria (apropo for a blog dedicated to archiving technologies)–was captured on May 30, 2009 at a resolution of 50 cm and demonstrates the level of detail that can be captured for public use by the cameras on the GeoEye-1 Satellite.

The modern Library of Alexandria (Bibliotheca Alexandrina) in Alexandria, Egypt. Image courtesy of

As of October, 2009, the GeoEye-1 had captured approximately 340 million square kilometers of imagery of the earth’s surface.  It combines two technologies to accomplish this: a military GPS receiver and star trackers, both of which allow the satellite to accurately identify the location of captured images.  Although it is impossible to change the orbit of a satellite once it is in motion, the angle of the GeoEye-1 can be shifted up to 60 degrees by operators on the ground, thus effectively expanding the “range of vision” that the satellite has (loudoni, December 28, 2009).  However, the company’s imaging abilities will soon increase as they are currently developing the GeoEye-2 satellite, which is scheduled to launch by 2013.  This satellite will have an imaging resolution of 25 centimeters, or approximately 10 inches.

The GeoEye satellites are licensed by the National Oceanic and Atmospheric Administration (NOAA), which is a division of the U.S. Department of Commerce.  Given that satellite images could reveal sensitive information from the point of view of national security, NOAA has the right to pull them, however, it has not done so to date.  But Mark Brender, the head of GeoEye does not feel that NOAA would be able to successfully shutter the cameras on the GeoEye satellites, stating:

In the event that NOAA does cancel service for GeoEye, the news media can contest it as a First Amendment issue, … , because space is non=sovereign.  This prohibits “shutter control” by the government.

Thus GeoEye stands to serve non-government information services well into the future.

Ushahidi: Using Social Media to Track Crises

Posted in Reports, technology by Sarah on December 30, 2009

Image courtesy of

In the field of crisis mapping (see iRevolution for an ongoing discussion of this field), the Ushahidi platform is gaining a strong foothold as an affordable and easy-to-use technology for capturing “distributed” information (that is to say, from multiple and scattered sources) about crisis events and providing a visual representation of the process of the crisis.  Ushahidi accomplishes this by posting incoming information on an on-line interactive map in near real-time as events unfold.  The platform allows users to submit digitally created documentation of events they witness primarily via cell phones (e.g., text messages, photos, or video recordings), but also from computers–basically by any means that allows access to the web and therefore access to a dedicated instance of the Ushahidi platform.  As stated on the Ushahidi webpage:

The Ushahidi Engine is a platform that allows anyone to gather distributed data via SMS, email or web and visualize it on a map or timeline. Our goal is to create the simplest way of aggregating information from the public for use in crisis response.

This tool was originally created to help raise awareness of and mobilize intervention in the post-election violence that erupted in Kenya in January, 2008 but has been further developed so that a range of grassroots efforts can adopt the tool to map events of concern to them.  The “Our Work” page at Ushahidi provides a list of the various organizations that have built the platform into their activism efforts.

Background: Violence in Kenya after Presidential Elections

On December 27, 2007 Kenya’s incumbent president, Mwai Kibaki, was declared the winner of that day’s presidential election.  However, supporters of the Orange Democratic Movement’s candidate, Raila Odinga, contested this decision, claiming election fraud; indeed, according to a New York Times article from January 17, 2008, independent election observers reported that the election was rigged at the last minute to ensure the incumbent’s victory. In response to Kibaki’s swearing-in in January, 2008, violence erupted across Kenya.  At first the violence was related to protests held by Odinga supporters, but it quickly morphed into targeted ethnic violence against the Kikuyu people–the community that Kibaki is from.  In a particularly brutal moment, 50 unarmed Kikuyus were burned in a church on New Year’s Day (warning-there are some graphic images at this link).  All told, in January of 2008, approximately 600 people died and around 600,000 people were displaced.

In response to this situation, Ory Okohllo (a graduate of Harvard Law from Kenya), launched Ushahidi–a platform for tracking events as they unfurled in Kenya.  The platform allows citizens who participate in, witness, or become victims of events to post information via SMS to the Ushahidi platform, which then publishes the information on-line and locates the reported event on a Google map in near real-time.    Over the course of several months, thousands of text messages, videos, and photographs were submitted to the nascent platform–largely via cell phones. As described by the Ushahidi website:

Ushahidi, which means “testimony” in Swahili, is a website that was initially developed to map reports of violence in Kenya after the post-election fallout at the beginning of 2008. Ushahidi’s roots are in the collaboration of Kenyan citizen journalists during a time of crisis. The website was used to map incidents of violence and peace efforts throughout the country based on reports submitted via the web and mobile phone. This initial deployment of Ushahidi had 45,000 users in Kenya, and was the catalyst for us realizing there was a need for a platform based on it, which could be use by others around the world.

Ushahidi was designed specifically to capitalize on cell phones and mobile access to the web because cell phone use in Kenya at the time was more wide-spread than computer use–largely because there was very little infrastructure of land-based internet access through telephone wires or cables.  The materials collected for this first use of Ushahidi have been archived for future use.

What is Ushahidi?

The Ushahidi platform itself is open source and modifiable so that any person or organization can set it up to meet their particular needs for the visualization of information.  According to the Ushahidi “Our Work” page, the platform consists of a simple mashup that pulls user-generated material into a Google map in order to create an interactive interface that allows viewers to see literally where in the world a particular piece of information was generated or submitted.  This is possible because a mashup is an application that pulls data and functionality from multiple external sources via APIs, or Application Programming Interfaces (see our Web Ecology post for more on APIs), in order to create a new service (see Wikipedia for further details on mashups).

According to Ushahidi’s developers (a team of largely volunteer programmers and designers from Africa, Europe, and the United States), the platform needs to be “agnostic,” which is to say, it should be able to work with as many platforms, tools, and devices (i.e., cell phones, cameras, computers) as possible so that organizations can use the platform with whatever technology or materials they already have to hand.  To this end, the Ushahidi Lab is constantly working to integrate new devices and platforms into the system as they emerge.  For example, the team is currently working on creating a smart phone application for sending and receiving rich data from the Ushahidi platform on iPhones, G-Phones, and other multi-media wireless telephone devices.

Because the goal is to make sure that the Ushahidi platform draws seamlessly from multiple data sources, the developers work to ensure two levels of opperability: 1) that software applications that already support information-aggregation get incorporated into the platform; and 2) that the out-flow of information from Ushahidi to users can work with platforms for data visualization other than Ushahidi.  Thus, the platform currently integrates with a variety of platforms on two levels–data that come into Ushahidi for presentation and data that leave Ushahidi for reading.  The platforms that Ushahidi is currently able to draw data from include: Twitter, Jaiku, and Instant Messaging clients of various sorts.  Platforms that can read the visual data produced by Ushahidi include: Grip, Many Eyes, GeoCommons, CMS modules (such as Drupal), and blog plug-ins or widgets (e.g., WordPress, Movable Type, Blogger).

One key challenge that the developers have been working on is devising a means of verifying information as it comes into the the Ushahidi platform.  Currently, verification has to be conducted by a human moderator, but they are working on an automated verification system called “Swift River.”  This initiative will help organizations to verify incoming information from a variety of sources, which will help them to deal with and present massive amounts of reliable citizen-generated data in real-time.

Ushahidi Downloads

Ushahidi is freely available to down load at their home page–simply click on the “Version 1.0–Mogadishu” download button on the left side of the screen.  To download, the following are needed: server space and someone with some programming skill.  The Ushahidi team hopes to have future versions that will be easier for “non-technologists” to use, but for the moment, some minimal setup and tweaking by a programmer are necessary to get Ushahidi running appropriately.  Technical advice is available at the website. Other downloads include modules for Android, Java Phones, and Windows Mobile.

MobileActive: Supporting Social Action Through Mobile Technology

Posted in Reviews, technology by Sarah on December 21, 2009

Image courtesy of MobileActive at

MobileActive is an organization that seeks to bring together research and user experiences of mobile technology to support social action.  As observed on their “about” page:

Mobile phones are proliferating at astounding rates across socio-economic and cultural boundaries, revolutionizing the way we organize ourselves.

With more than 4.5 billion mobile subscriptions in circulation in 2009, they are found in every corner of the world, used by people to communicate with each other, and access and deliver information and services. These trends are highly promising for NGOs and civil society organizations that can now engage people on issues that matter most — through always-on, always-on-hand devices.’s vision is to help organizations make use of the most ubiquitous communications technology in the world with data, tools, and how-to resources; build a network of practitioners and technologists in a supportive community of practice; and highlight and explore the many innovative campaigns and projects — their lessons learned.

The website contains a blog that updates information on a variety of projects, research resources (including an annotated bibliography of print and web resources), and a comprehensive search function.  Many of the issues they cover relate to human rights in its many guises–economic rights, crisis prevention or response, gender rights, environment, etc.    The group also support Citizen Media through mobile devises.

MediaActive has video presence on the web through Vimeo–site that  has a number of social impact videos in general–well worth a search to see what they have.  MobileActive has published two videos to date–both available here.

WikiLeaks: Creative Use of Social Media & Internet Security Protocols for Activism

Posted in technology by Sarah on December 3, 2009

Image courtesy of

573,000 9/11 Text Messages Posted to the Web

Starting on November 25, 2009, an on-line whistle-blowing organization known as WikiLeaks started publishing over 500,000 pager messages that were intercepted in New York and Washington, D.C. on September 11, 2001 during the attacks on the World Trade Center and the Pentagon.  The text messages, which are available here, were released “live” for 24 hours starting at 3:00 am on November 25 and ending at 3:00 am November 26, 2009.  During this period,  the release of each of the 573,000 messages was synchronized with its original send times from September 11 to September 12, 2001.  The purpose of releasing these text messages to the public is to foster a deeper understanding of events surrounding the 9/11 attacks by creating an archive that provides, “…a completely objective record of the defining moment of our time,” with the hope that the messages’ inclusion in “…the historical record will lead to a nuanced understanding of how this event led to death, opportunism and war” (WikiLeaks 9/11 Pager Data).

WikiLeaks–acting consistently with their mission and purpose (see below)–did not reveal the source of these text messages, stating only that the messages were intercepted from four major US  pager services.  According to a New York Times article written by Jennifer Lee on December 1, 2009, the messages were intercepted by a program that had been monitoring these sorts of communications before September 11, 2001 in order to raise awareness of issues surrounding privacy and data retention.   Texts were intercepted from pagers carried by members of the Pentagon, FBI, FEMA, and the New York Police Department, as well as from messages generated by computers within the World Trade Center reporting on faults in investment firms housed in the twin towers.  Below is a sample of the sorts of messages that circulated between 8:51 am and 10:05 am on the morning of September 11, 2001.  Some relate to mundane daily business matters while others relate to the terrorist events as they unfolded.

Example of leaked text messages released in “real-time” (courtesy of WikiLeaks)

8:51:31 AM|Please call Pentagon Weather|UNCLASSIFIED
           Please call Pentagon Weather.......reference 1030 Meeting.....703-695-0406
           ANDREW J. TERZAKIS, Lt
           AND VESSY.
10:05:57 AM Please don't leave the building. One of the towers just collapsed!
           PLease, please be careful. Repeat,

What is WikiLeaks?

According to the WikiLeaks web page, the organization “…was founded by Chinese dissidents, journalists, mathematicians and startup company technologists, from the US, Taiwan, Europe, Australia and South Africa” (WikiLeaks About).  These individuals came together to create a forum where people can publicly share documents about questionable government decisions, corruption, and human rights abuses with much less personal risk than that associated with traditional whistle-blowing  outlets.

WikiLeaks takes advantage of the wiki collaborative content production platform to support the anonymous release of sensitive documents for public review, thus aiming for a higher level of scrutiny than is typically accomplished by traditional media organizations or intelligence agencies.  Theoretically, the veracity, validity, credibility, and plausibility of the documents are greatly increased by being scrutinized by many more individuals and from many more points of view than is traditionally possible in journalism or intelligence activities.  In a sense, WikiLeaks is the “first intelligence agency of the people” (ibid).  Members of communities from which leaked documents originate can access  them in the wiki format, where they can help interpret them and explain their relevance while maintaining their own anonymity and therefore their safety.    For example, “[i]f a document comes from the Chinese government, the entire Chinese dissident community and diaspora can freely scrutinize and discuss it; if a document arrives from Iran, the entire Farsi community can analyze it and put it in context.  Sample analyses are available here.”  As stated on the WikiLeaks About page:

Wikileaks is a multi-jurisdictional organization to protect internal dissidents, whistleblowers, journalists and bloggers who face legal or other threats related to publishing. Our primary interest is in exposing oppressive regimes in Asia, the former Soviet bloc, Sub-Saharan Africa and the Middle East, but we are of assistance to people of all nations who wish to reveal unethical behavior in their governments and corporations. We aim for maximum political impact. We have received over 1.2 million documents so far from dissident communities and anonymous sources.

We believe that transparency in government activities leads to reduced corruption, better government and stronger democracies. All governments can benefit from increased scrutiny by the world community, as well as their own people. We believe this scrutiny requires information. Historically that information has been costly – in terms of human life and human rights. But with technological advances – the internet, and cryptography – the risks of conveying important information can be lowered.

WikiLeaks has combined a number of technologies to create an uncensorable version of Wikipedia (though there is no legal relationship with Wikipedia).  The technologies they draw from include MediaWiki, OpenSSL (an open source version of Secure Sockets Layer, a cryptographic protocol for internet security), FreeNet (a decentralized, anonymous information distribution network), Tor (an onion router for secure message transmission–see the Tor post on this blog), and PGP (Pretty Good Privacy, a  cryptographic privacy and authentication program).  By combining these technologies with their own in-house programing, WikiLeaks is able to provide a space for untracable document leaking and commentary that “combines the protection and anonymity of cutting-edge cryptographic technologies with the transparency and simplicity of a wiki interface” (WikiLeaks About).

Other Users of Social Media Build on WikiLeaks’s 9/11 Text Messages

The 9/11 text messages release of November 25, 2009 is just one example of the sort of information that can be acquired and shared when there is a safe forum for anonymous document circulation.  True to WikiLeak’s stated goal of providing a means by which members of the public can comment and elaborate upon the documents released on their platform, at least two researchers have used the internet to begin to analyze and make sense of the content contained in the text messages (see Jennifer Lee’s Times article for an overview of these projects).

On his blog “Neoformix: Discovering and Illustrating Patterns in Data,” Jeff Clark has created and published a video (available in his blog post) of  a “Phrase Burst Visualization” of the waxing and waning of  100  representative (as determined by Mr. Clark himself) words or phrases from the 9/11 messages over the course of the 24 hours captured in the sample provided to WikiLeaks.  In the visualization, words and phrases emerge and recede from view such that “[t]he larger the text the more frequently it was used during the 12 hour period[analyzed]. Text appears bright[ly] during the times of high usage and fades away otherwise. […] This phrase burst visualization is basically a word cloud where the brightness of the words varies according to how prominent the words were during specific periods of time.”  He has also provided a time-line graphic measuring the peak of each of the 100 terms between 8:00 am and 8:00 pm on September 11, 2009.

Colin Keigher, another programmer, has created a seachable database of the WikiLeak 9/11 text messages that allows users to search by keywords or phrases, as well as do boolean searches.  The tool makes it easier to find specific information within the 40-megabyte file housed at the WikiLeaks site, where a user would need to read through the messages as presented.  The 9/11 Page Search Index, as the tool  is called, is available on-line in a no-frills presentation of the search field with almost no explanation of what it searches or why it was created, but it is an interesting example of how individuals can engage with the materials presented at WikiLeaks.

GLIFOS-Media: Rich Media Archiving

Posted in Archiving Solutions, Reviews, technology by Sarah on November 19, 2009

Rich-media preservation

As posted on November 4, 2009, The University of Texas Libraries Human Rights Documentation Initiative (HRDI) has been working with the Kigali Genocide Memorial Centre in Rwanda on a pilot digital archiving program that takes advantage of a rich media platform called GLIFOS media.  GLIFOS provides a social media tool kit that was originally created to meet the needs of a distance learning program at the Universidad de Francisco Marroqín in Guatemala, but it  proves to also be promising as a tool for human rights archiving (see the article “Non-custodial archiving: U Texas and Kigali Memorial Center” at WITNESS Media Archive).  As a rich-media wiki,  GLIFOS is designed to integrate digital video, audio, text, and image documents through a process that “automates the production, cataloguing, digital preservation, access, and delivery of rich-media over diverse data transport platforms and presentation devices.”    GLIFOS media accomplishes this by presenting related documents–for example, video of a lecture, a transcript of the same, and associated PowerPoint slides–in a synchronized fashion such that when a user highlights a particular segment of a transcript, for example,  the program locates and plays the corresponding segment of the video and also locates the related Power Point slide. This ability to seamlessly synchronize and present related digital media translates well to the human rights context by allowing for the cataloging and integration of video material, documents containing testimonies, photographs, and transcripts.  Materials that all relate to a single event can be pulled together and presented in a holistic fashion, which is useful for activism and scholarship.

GML: The Key to Preservation

In order to support the presentation of this integrated information for users, GLIFOS needed to ensure that materials can be read and accessed across existing digital presentation platforms (e.g., web browsers, DVDs, CD) and readers (e.g., PCs or PDAs), as well as on platforms yet-to-be-created (see “XML Saves the Day,” an article written by the developers in 2005 for more detail).  This was accomplished by indexing and annotating all digital documents stored in the GLIFOS repository  with an XML-based language called the “GLIFOS Markup Language,” or GML.  The claim is that “GML is technology, platform, and format independent” (Ibid), thus allowing for preservation of established relationships between materials.  Basically, the GML language allows users of GLIFOS to create a metafile that determines the relationships between related multi-media records held in a repository in such a way that the relationships between files are maintained across a variety of media reading platforms.  This is possible because GML is a significantly stripped-down  markup language that requires little or no translation from one reader to the next, thus content is preserved as technology changes and evolves.

GLIFOS and Human Rights Documentation

Given that GLIFOS is designed to catalog, index, and synchronize a wide variety of digital media types, it proves to be a promising tool for aiding in digital archiving. The GLIFOS GML protocol allows the program to access and present cataloged materials through the meta-relationships it establishes for records; and because GML is a streamlined markup language that allows multiple platforms to present and read digital documents, these relationships have been successfully  maintained when migrated to entirely new data reading and presentation platforms.  As long as the repository of documents that GLIFOS accesses remains intact, both in terms of the materials stored there and their associated metadata, and as long as new media platforms continue to read older video and image media files, use of the GLIFOS Markup Language aids in preservation by providing a means of cataloging and indexing documents using GML, as well as preserving the synchronized links and interactions that GLIFOS establishes between related documents over time.

[1] See


UT-Austin Library Web Clipper: Follow Up Questions

Posted in Archiving Solutions, technology by Sarah on November 10, 2009


Image courtesy of

A couple of weeks ago, Kevin Wood (University of Texas Libraries at Austin) and I posted an article with the title “Archiving Web Pages: UT-Austin Library’s Web Clipper,” where we described an innovative solution to capturing and preserving fragile human rights material from the World Wide Web.  The post generated a number of interesting questions, so we have decided to post this follow up in a Q&A style to provide additional information on how the Web Clipper works.  Special thanks again to Kevin for taking the time to craft answers to these questions.  Please do not hesitate to contact me with more questions if you have them.  We will be writing updates on the Web Clipper progress as Kevin and his team continue to develop it and will do our best to answer your questions here as we do so.  –Sarah

The UT Libraries’ Web Clipper

As part of a Bridgeway Funded initiative, the University of Texas Libraries at Austin is engaged in a project developing a means for harvesting and preserving fragile or endangered Web materials related to human rights violations and genocide.  Having tried a number of available technologies for harvesting Web material and finding them to be unsatisfactory for their needs, a team of developers created an in-house Web Clipper program designed to meet the libraries’ specific needs for preserving Web material.  A full description of the Web Clipper is available here.  What follows is a series of responses to questions generated from the first post about the Web Clipper.

Q1: When the clipper clips, does it save the file in the original formats (e.g., html, with all the associated files)?

A: Yes, to the extent possible.  There are challenges with javascript and streaming media that we are still working on with the new clipper.  In those cases we rely on attachments (see the answer to question 2 below).  Before designing the new Web Clipper, We’d gone through a few different clipping strategies and were not pleased with any.  Zotero does a good job of capturing what you see, but makes modifications to the files, thus complicating preservation.  Placing Firefox behind a proxy captures a lot, but misses content that relies on user interactions if those interactions don’t occur.  Heritrix does the best job, but we’ve seen it struggle with more than 10% of the pages that have been clipped.

Q2: Are there limitations on what the Web Clipper can and cannot capture?

A: There are limitations to what our new Web Clipper can automatically capture, but it has the ability to accept attachments.  Extensions like DownloadHelper (a free Firefox extension for downloading and converting videos from many sites with minimum effort) can turn a streaming video into a file that can then be attached to a clipping.  The final format of the attachment depends on the tool used to create it, but generally matches the original.

Q3: Are the graduate research assistants who are testing the Clipper capturing multiple instances of the same site over time, or are these one-off?

A: Each capture is a one-off.  The Web Clipper allows users to dive deeper into sites and capture individual pages rather than whole sites (sometimes a site that wouldn’t normally carry relevant human rights information has an article or blog post that we want to preserve).  Where one might use tools such as Archive-It, WAS, WAX or Web Curator Tool to capture an entire blog, one uses the Web Clipper to capture and describe a single blog post or article, for example.

Q4: When the clipped files are submitted to The University of Texas Libraries’ DSpace (the local repository), is the submission  process simple? That is, is there an automated process created?

A: Yes, this process is automated.  We use the SWORD (Simple Web-service Offering Repository Deposit) to facilitate interface between the Web Clipper and DSpace for ingestion.  A script runs periodically, identifies new clippings and pushes them into the repository.

Q5: Regarding the use of a local Wayback machine for preserving the clipped materials: Are you capturing clipped material via Wayback in addition to DSpace, or is this all the same process with just one instance of the preserved site? If the latter, how does one set up a local Wayback version?

A: There is only one instance of the preserved site.  The repository contains a link out to the Wayback machine, not the preserved clipping itself.  The link allows a user to open the original record in the DSpace repository.  Although we could store ARC files (a lossless data compression and archiving format) in the repository, they wouldn’t be of much use to our users as such, so we’re only exposing the content through a local Wayback instance.  We use the open source version of the Wayback Machine.

Q6: Is access to the clipped documents restricted, or are they open to everyone via UT Libraries’ digital repository? Are there any privacy or confidentiality issues associated with the clipped material?

A: The clippings will be open to everyone, but while we’re in development they’re restricted.  We haven’t seen any privacy or confidentiality issues with our clipped material.  All of the clippings come from the public web.