The Library of Congress is still working on plans to create a
searchable archive of nearly every public tweet ever sent, but the
challenges inherent in that task are making it a slow process.
Understandably so, considering the substantial growth in tweets in
recent years; the LoC is essentially trying to tame a very rapidly
moving dataset.
If it ever happens, a searchable archive of tweets could prove
valuable to researchers, analysts, marketers and others. You can imagine
brands wanting to search for Twitter trends surrounding major
product/service announcements, or researchers looking for Twitter
activity surrounding major world events.
On Friday, Gayle Osterberg, the Library’s Director of Communications,
announced that the LoC is now getting about 500 million tweets per day,
up from about 140 million when the project began in February 2011. She spelled out some of the challenges that the project poses.
Currently, executing a single search of just the fixed
2006-2010 archive on the Library’s systems could take 24 hours. This is
an inadequate situation in which to begin offering access to
researchers, as it so severely limits the number of possible searches.
The Library has assessed existing software and hardware
solutions that divide and simultaneously search large data sets to
reduce search time, so-called “distributed and parallel computing”. To
achieve a significant reduction of search time, however, would require
an extensive infrastructure of hundreds if not thousands of servers.
This is cost-prohibitive and impractical for a public institution.
In a Washington Post article,
Deputy Librarian of Congress Robert Dizard Jr. says the collection will
eventually be made available only within the Library itself so that its
archive doesn’t compete with commercial services that offer Twitter
archives — that’s part of the agreement with Twitter.
But, as Gary Price said on INFOdocket, it doesn’t sound like any of that will happen anytime soon.
Twitter itself recently began letting users download their own tweet history, but the company doesn’t appear to have any plans to offer a historical search engine of its own.
Pawan Kumar