old problems, bigger data

The concept of big data has become a popular one in recent years, with many books and articles being written, masters degrees offered, and even art exhibitions exploring the subject. It is associated with the feeling of information overload.

The feeling of being overwhelmed by information is not new to humankind. The overseers of the Great Library of Alexandria probably shared this feeling while attempting to collect all of the world’s knowledge in one place, confiscating every text that passed through their port in order to make a copy of it before it could be lost to them. Paul Otlet, one of the fathers of information science, described the problem in a 1903 article:

“Today, there exist collections of books comprising more than two million volumes and whose annual accessions are more than one hundred thousand volumes. They have had to come to grips with quite new problems arising, on the one hand from difficulties of storage, classification and circulation of such tremendous masses of materials situated in the centres of large cities, and on the other hand, from new ideas within the research community about what it should be able to gain from such resources.”

Paul Otlet (quoted in Day 2014, p.17-18)

Finding ways of collecting and organising has been our way of dealing with a wealth of information for almost as long as we have been translating our thoughts, activities, and communications into documents.

library-of-asherbanipal
“The first library to contain all knowledge”- Ashurbanipal’s Library at Ninevah, 7th century BC

So why is big data such a talked-about issue these days? The so-called information age in which we are living has made the feeling of information overload a reality on a much larger scale. Computers and network technologies have given us the ability to record so much data everyday that it cannot easily be managed by traditional means.

“Every day, enough new data are being generated to fill all US libraries eight times over.”

(Floridi 2014, p.13)

Floridi calls ours a hyperhistorical society, meaning one which has become mostly dependent on ICTs for human progress and welfare (with at least 70% of a country’s GDP being dependent on intangible, information-related goods, rather than material goods.). It has become extremely difficult to function in society without being a part of the network and constantly allowing data to be collected about your communications and movements. I recently bought a coffee cup, thinking I would reduce my use of disposable cups and save the planet, only to discover that it has the ability to pay for my coffee and collect loyalty points. Even coffee is not safe from the internet of things!

Why is big data relevant to Library and Information Science? In our class discussions about data, we identified many ways in which it is relevant to LIS. The skills involved with information management have never been so important when we are surrounded by so much data and information every day. Much of this is stored in formats which may soon become obsolete or fall prey to plastic rot. The vast numbers of constantly updating and changing websites and social media pages of the Web 2.0 era present problems of preservation and curation. As information professionals, are we responsible for this?

We also collect data on library users and their use of our resources. This can be helpful in improving services, but privacy is an important issue to consider. Last year, librarians in Japan were outraged when a newspaper published novelist Haruki Murakami’s borrowing record from his school library. These were paper records that someone happened to stumble across, but we can imagine how much more information could be released about a person’s reading  habits if their digital user record or their browser history were made public.

In the age of the email, I wonder if the days of publishing The Letters of [significant person] are gone forever, or if biographers of the near future will develop ways of collecting and editing emails and social media posts. Perhaps people will curate and archive these things themselves in order to leave behind an image that they are happy with.

 

Day, R (2014) Indexing It All: The Subject in the Age of Documentation, Information, and Data. Cambridge, MA: MIT Press

Floridi, L (2014) The Fourth Revolution: how the infosphere is reshaping human reality. Oxford: OUP