In a recent tweet, Professor David Lankes asked a seemingly easy question:
And he got quite a few responses:
There are quite a few more responses, but you get the drift: librarians don’t have a common definition of information in practice. Which is weird, given the primacy of information in librarianship. But, it’s entirely understandable. ‘Information’ is a tricky word and the responses to Lankes’s tweet further underscore that librarians mean all sorts of mutually exclusive (sometimes even contradictory) things about information. But, I don’t think it has to be that way and I’d like to recommend Luciano Floridi’s Information: A Very Short Introduction (Oxford: Oxford Univ. Press, 2010. ISBN: 9780199551378) as essential reading for librarians interested in the concept of information (for a much abbreviated version, see the Stanford Encyclopedia of Philosophy entry “Semantic Conceptions of Information“).
The semantic conception of information
Luciano Floridi is sort of the architect of the philosophy of information and his Information: A Very Short Introduction is a great starting point for librarians interested in an account of information that coheres with the information types and processes we deal in. This rather slim, pocket-sized book is accessible to information novices, though the implications of Floridi’s semantic approach to information are relevant to library professionals at any level. Building off of an entry in the SEP, Information provides a “map of the main senses in which one may speak of information” (p. 2).
Chapter 1 discusses the nature of our current “information revolution”, defined as a “process of dislocation and reassessment of our fundamental nature and role in the universe” sparked by information and communication technologies (p. 12). And though Floridi isn’t naively idealistic like the more popular information technology pundits (e.g., Kurzweil, Shirky, Vinge, etc.), the chapter is still a bit of a diversion from the meat of the book: mapping the meaning of information. Chapter 2 is where you’ll find the conceptual heart of the text, and though it addresses several core concepts in information theory, I’ll just cut to the chase: here’s the general definition of information (GDI), presented on page 21:
σ is an instance of information, understood as semantic content, if and only if:
(GDI.1) σ consists of n data, for n ≥ 1;
(GDI.2) the data are well-formed;
(GDI.3) the well-formed data are meaningful.
Put another way,
Information is well-formed, meaningful data.
That information is a species of data is generally uncontroversial, though it’s helpful to adopt a coherent definition of data and Floridi provides a diaphoric definition of data: a datum is a difference or lack of conformity within some context (p. 23). You’ll probably note that this is a variation on Mackay’s (1969) “distinction that makes a difference” or Bateson’s (1972) “difference which makes a difference.” Really, though, it’s the ideas of well-formedness and meaningfulness that set GDI apart from the more technical conceptions common in electrical engineering. Floridi explains that to say that data is well-formed is just to say that “the data are rightly put together, according to the rules (syntax) that govern the chosen system, code or language being used” (pp. 20-21). And meaningfulness entails that “the data must comply with the meanings (semantics) of the chosen system, code or language in question” (p. 21), keeping in mind that semantic information is not necessarily linguistic (e.g., images can be meaningful). In fact, Floridi points out that GDI entails that “the actual format, medium and language in which data, and hence information, are encoded is often irrelevant and disregardable” (p. 25). This result should be of particular interest to librarians, especially given the increasingly complicated and competitive world of information resources in our purview.
The remainder of Chapter 2 analyzes several key concepts and distinctions including analogue and digital data, binary data, and the various types of data and information that fit GDI. The latter discussion should be especially enlightening for librarians. You see, data come in a few varieties: primary data, secondary data, metadata, operational data, and derivative data. Primary data are “the principle data stored in a database” or document (p. 30). Secondary data are “the converse of primary data, constituted by their absence” (p. 30). Metadata are “indications about some other (usually primary) data” (p. 31). Operational data are “data regarding the operations of the whole data system” (p. 31). And derivative data are “data that can be extracted from some other data” through inference, deduction, or similar means (p. 31). It follows that we can describe semantic information in much the same way: primary information, secondary information, and so on. I highly recommend that we librarians pay close attention to these distinctions and, in particular, the distinction between primary data and secondary (and derivative) data can help make sense of the crucial distinction between something being information and something being informative. For example, in a series of blog comments on 3D printing (Hugh Rundle vs. David Lankes), the question was raised as to whether the plastic doodads created on a Makerbot are information and, if so, whether 3D printing is relevant to libraries. It should be clear that the 3D printed objects are not themselves primary information, though they do transmit secondary or derivative information. Whether libraries should be tasked with stewardship of all forms of information, or whether they should limit their domain to, say, primary data and metadata, is an open question and a clear professional dividing line.
The rest of the book
Whew! That’s a lot of theory. But the book keeps on trucking. Chapter 3 discusses non-semantic conceptions of information by way of discussing Shannon’s Mathematical Theory of Communication (which, by the way, is probably the most important paper in the history of information theory and shame on you if you haven’t read it!). Chapter 4 discusses various constraints and affordances of semantic information. Floridi raises the important question of whether semantic information is necessarily true, discusses degrees of informativeness, Hintikka’s (1973) “scandal of deduction”, and the Bar-Hillel-Carnap Paradox (1953). Whether information is necessarily true is a particularly interesting concern for librarians interested in information literacy, where evaluation plays a prominent role. Likewise, defining semantic information as well-formed, meaningful, and true data can help to make sense of misinformation and disinformation. Chapters 5-7 address physical, biological, and economic information as notable subsets of semantic information. Chapter 8 concludes the text with an overview of the ethics of information and, in a short epilogue, Floridi seems to advocate for treating information ethics as a form of “holistic environmentalism” (p. 119).
Though ostensibly a book about information in general, Information is really an argument for the relevance of the concept of semantic information. Floridi’s overarching division between semantic and non-semantic (i.e., Shannon) information is best laid out by analogy:
[T]he difference between information in Shannon’s sense and semantic information is comparable to the difference between a Newtonian description of the physical laws describing the dynamics of a tennis game and the description of the same game as a Wimbledon final by a commentator. (p. 48)
Picking the “right” information
So, there are a lot of competing definitions of ‘information’ out there. Yet, as Losee (1997) explains, “most deﬁnitions of information refer only to the subset of information as studied in that particular discipline” (p. 254). So, what a librarian means by information and what an electrical engineer means by information are usually very different things. And both are quite different from the necessarily imprecise colloquial use of information. But, there’s nothing wrong with polysemy. Likewise, there’s nothing wrong with imprecision in ordinary language: we have meaningful conversations about information all of the time and we don’t act like nit-picky trolls or pedantic jerks about it. Pieter Adriaans (2012) offers a helpful analogy:
The situation that seems to emerge [with the concept of information] is not unlike the concept of energy: there are various formal sub-theories about energy (kinetic, potential, electrical, chemical, nuclear) with well-defined transformations between them. Apart from that, the term ‘energy’ is used loosely in colloquial speech.
Anyway, what we need is a conception of information that addresses the types of information and information processes most relevant to the practice of librarianship. I don’t want to go down the rabbit hole of “what is librarianship”, so let’s just consider the normal information types to be documents in the functional sense (à la Paul Otlet or Suzanne Briet) and normal information processes to involve things like archiving, organizing, accessing, and preserving said documents, keeping in mind that documents are not necessarily physical and not necessarily linguistic. Broadly, a document is “any material basis for extending our knowledge” (Schurmeyer, 1935, quoted in Buckland, 1997). For more on functional documentation, see Michael Buckland’s 1997 “What is a ‘document’?”
Picking the “right” information for library science means picking a conception of information that comports with documents and related processes. This entails that we need a conception that is concerned with meaningfulness and with knowledge (cf. Schurmeyer). Non-semantic approaches like Shannon’s are useful for engineers and computer scientists, but they are inapplicable for library science insofar as they are concerned with signal transfer and computability, rather than meaningfulness. Basically, if things like documents, learning, knowledge, or meaningfulness are relevant to libraries and librarians, we need a conception of information that addresses meaning…and that’s the semantic conception. Thus, as an outline of semantic information, Floridi’s book is an essential reading in the philosophy of LIS and I urge you to pick up a copy. [And just as a reminder, you can read an abbreviated version of Floridi's book in the SEP: Semantic Conceptions of Information]
Stuff I cited
Adriaans, Pieter. “Information.” The Stanford Encyclopedia of Philosophy (2012). http://plato.stanford.edu/entries/information/
Bar-Hillel, Yehoshua and Rudolf Carnap. “Semantic Information.” The British Journal for the Philosophy of Science 4 (1953): 147-157. [Link to JSTOR]
Bateson, Gregory. Steps to an Ecology of Mind. New York: Ballantine Books, 1972.
Buckland, Michael. “What is a ‘Document’?” The Journal of the American Society for Information Science, 48 (1997): 804-809. [Link to preprint]
Floridi, Luciano. Information: A Very Short Introduction. Oxford: Oxford University Press, 2010.
Hintikka, Jaako. Logic, Language Games and Information. Kantian Themes in the Philosophy of Logic. Oxford: Clarendon Press, 1973.
Losee, Robert M. “A Discipline Independent Definition of Information.” Journal of the American Society for Information Science 48 (1997): 254-69. [Link to HTML on author's website]
MacKay, Donald M. Information, Mechanism and Meaning. Cambridge: MIT Press, 1969.
Shannon, Claude. “A Mathematical Theory of Communication.” The Bell System Technical Journal 27 (1948): 379-423, 623-656. [Link to PDF]