Abstract. Could eXtensible-Markup Language, in itself a minor technical innovation, be the hallmark of an upcoming revolution in man's communication and, even further, cognition? The author believes so, and, to make his point, begins with a short survey of man's cognitive history to date, as seen by today's major anthropologists. He then proceeds to analyze what he sees as the ongoing transition from a culture based on externalized symbols to one based on externalized intelligence. Some speculations on the possible concrete impacts of XML on information gathering/transmission and consumption are finally presented. |
According to anthropologist Edward T. Hall1, man shares with very few other animal species (such as the bowerbird) the capacity to develop extensions. Thus we have substituted to a very slow biological evolution a much faster evolution through our extensions.
The first extensions of man have been, of course, his tools. But, as professor Merlin W. Donald duly stresses in his remarkable book Origins of the Modern Mind2, it is the acquisition of language that has made the first considerable step in human cognitive evolution3. Through language, each individual human was no longer a prisoner of his own brain. He could exchange much more information with other individuals. And, still more important, he could transmit it to younger generations.
Before language emerged, men had, of course, like other animal species, already been able to exchange a limited amount of information. But this information was essentially of a concrete, immediate, nature (such as mimicry signaling the approach of a predator or indicating a source of food). With language, an intermediate, abstract medium was created—symbols, which initially were meant to represent concrete events, but, in fact, acquired quite a life of their own, becoming, to a large extent, independent from the concrete objects they were intended to represent4.
The second major extension of man has been writing. With writing, man became able, not only to transmit information from person to person in close contact, but to store it externally. Hence messages were no more a fleeting phenomenon, requiring the simultaneous presence of an emitter and a receiver, but could be as it were solidified, acquiring durability and fixity. They could be transmitted in space and time with almost perfect accuracy.
That the emergence of writing has had a tremendous impact on man's civilization ("major cognitive re-organization", Donald) has been fairly well demonstrated, the seminal work on this subject remaining Walter J. Ong's remarkable Orality and Literacy5. According to Ong and others, the Greek invention of the alphabet6 might well be the predominant cause of the so-called "Greek miracle"—the classical Greek thinkers being the first humans not to use alphabetical writing, but to internalize it, the first ones to think in writing, with all the new possibilities of abstraction that it made possible.
Episodic culture (apes) | mammalian cognition, based on event perception |
First transition (Homo erectus) | mimetic skill and auto-cueing |
Mimetic culture | based on learned action |
Second transition (homo sapiens) | lexical invention |
Mythic culture | based on oral language and narrative thought |
Third transition (invention of writing) | externalization of memory (exograms) |
Theoretic culture | based on a symbiosis with external symbols |
The invention of the printing press was another—albeit minor—revolution7: messages could now be duplicated at very low cost, thus acquiring still more survivability (manuscripts have so often been lost) and fixity. And more that anything else, printing permitted the diffusion of reading and writing abilities to a huge extent.
Thus written (material) messages (printed and handwritten) have come to be a commonplace, cheap and reliable commodity. But still they remain essentially inert, i.e, they require human readers to acquire any meaning at all. A book that no one opens, or a book written in a language that no one understands anymore will be, obviously, a dead thing. And the same thing can almost be said of a "difficult" book, i.e. one requiring a very sophisticated reader to understand it (think of an abstruse scientific or philosophic work), or a very ancient book referring to a context no longer extant. The written/material word definitely needs humans with conveniently stored brains to come alive at all.
Hence the necessity in literate societies of a very ponderous educational/training system. For having learnt to read and write is far from enough. In order to be able to read a "difficult" book you must have learnt :
If, in one domain, we lack one or both of these prerequisite, the existence of books (even huge libraries of them) will be of little or no use to us8. Thus it can be said that in the Age of print, the diffusion of knowledge is often more potential (the books are there, in homes and libraries) than real (but very few people read them!).
And there comes the third (or fourth according to Donald's classification) major human cognitive transition, the one we are currently experiencing. The age of immaterial external messages that, to a large extent, will not be dependent on human brains to become alive.
Fourth transition | externalization of symbol processing |
"Synnoetic" culture | based on a symbiosis with external artificial minds (exonoos = memory + processor) |
Symbols, which were somehow alive from the beginning, as already mentioned, but heavily dependent on human interpretation, now really acquire a life of their own. They require to a much lesser extent well-stored human brains to be interpreted and transformed into actions.
The first of these "active texts" historically to appear has been, of course, the computer program, which is at the same time text and pure action. Written in specific programming languages, it is human-made, but not really human-readable (only machine-readable).
Computer programs were first applied to help humans perform otherwise momentous tasks: scientific calculations (e.g. astronomical calculations) on the one hand, and processing of huge amounts of data (e.g. censuses) on the other hand. These were clearly highly technical tasks far removed from the world of human daily natural-language interactions. Then computers and software used to be very expensive tools reserved to a caste of specialists, and few people ever dreamed that they could become a cheap commodity to be found in almost every office and every home. And even if the thing had been deemed technically and economically feasible, most people then would have asked "What for?9".
Computer programs were first associated exclusively with data files, i.e.; highly structured stores of information, mostly numerical, conceived on the model of the card file cabinets. A far cry from texts in natural language!
It is ironic that when electronic text processing first appeared it was considered a rather mundane offshoot of the noble art of computing. It is true that electronic texts were first used for very trivial reasons—more than anything else, because they were an easy and quick and cheap way of producing and modifying printed texts.
But very soon the advantages of immaterial electronic texts became obvious. They could be distributed worldwide at light speed and at a very low cost, and were potentially eternal10. But still the final objective remained essentially the production (with no limits in space and time) of printed material.
SGML, adopted as a standard in 1986 (ISO 8879), was the first systematic11 attempt at creating real electronic documents, i.e. documents that were no longer paper documents in electronic form. The main idea behind it being to separate the (logical) content of a document from its (material/printed) form. But still the final intention was mainly to produce printed documents, albeit more economically, a unique (logical) document being transformed automatically in different printed formats (e.g. maintenance documents for airplanes being produced in different formats suiting the presentation requirements of different airlines or airforces). SGML was a breakthrough, but it was so complex that its handling has to be restricted to specialists (either technical writers in the industry or textual scholars in the humanities).
Almost parallel was the development of on-line, interactive documentation, the first form of documentation to be purely electronic. And with it the popularization of the hypertextual link12. Still this form of documentation remained an "help", ancillary to paper documentation.
From 1992 on, the WWW and HTML (devised ca 1990) became a reality, and popularized electronic hypertextual documents to a huge extent13. From 1995 on, search engines have demonstrated the staggering capacities for information retrieval made possible by the WWW.
Still some dissatisfaction remains. If I search the WWW for a person named Cook, I won't be able to restrict my search engine to searching for "human:Cook", and so it will overwhelm me with cooking recipes. And if I download a catalog, it will have been generated on the fly from a database and, in the process, will have lost most of the intelligence contained in the database. So, off-line, on my computer, I won't be able to perform very sophisticated queries or calculations from that catalog (even if downloaded applets could help me a little bit). And it will be difficult to ask a robot to perform a complex search from different on-line catalogs.
So the time appears to be ripe for a new step forward, one that will combine the powers of SGML-type markup, of hyperlinks and of the WWW. XML comes of age!
Granted that XML is just a link in a chain of continous technical improvements—and thus not a revolution in itself—it is nonetheless arguable that its concrete impact might well prove considerable. The following are but a few unsystematic attempts at fathoming such an impact.
Tomorrow our word-processing software won't yield proprietary code anymore, but human readable, standard and yet indefinitely versatile, XML-encoded text. And as soon as we'll deal with a "formattable" subject (say, personal names and addresses, or bibliographical references, or cooking recipes) a specific add-on software will come forward and semi-automatically add XML tags to the text being keyed in. Imagine a technical writer writing a maintenance manual: every time he will refer to a specific part, his specialized WP will search the product database for the corresponding part-number, ask him whenever in doubt, and tag the text accordingly. If the part is one that appears more than once in the product structure, the software will ask for discrimination and again tag accordingly.
Database systems will of course be able to yield XML text in response to queries, with minimal loss of information (i.e., first names will remain first names, part numbers will remain part-numbers). So writers will be able at any moment to query databases and receive XML-tagged text ready for insertion in the text they're writing. Conversely it will be easy to automatically update databases from XML text.
So the frontier between (non-formatted) text and (formatted) data will
vanish. With XML not only will different word-processors be able to exchange
texts between themselves with minimum loss of information, but so will
DBMSs between themselves, DBMSs with WPs and WPs with DBMSs.
The enhanced linking features of XML, the new capabilities it will lend search engines to spot relevant information (Cook the person and not cook the verb!), and its interoperability, once added to the ubiquity of the WWW, will generate tremendous new possibilities for information gathering.
For example, personnel in charge of repairing an equipment will not only be able to navigate from the maintenance manual to the operator's manual (using predefined links or not), but also to navigate to and fro between manuals and product databases; and then, if necessary, through the WWW, to suppliers' databases in order to check availability or interchangeability and perform on-line orders.
Obviously these generalized navigation capabilities will also be a great
asset for continuous, on the spot training—since the answer to any question
will never be more than a few clicks away.
Right now, with the WWW, everyone has access to a huge store of (almost) free information, but the problem is to exploit it. With XML we'll have two novel possibilities. One will be to download information and process it locally. Thus an architect will download the catalogs of its favorite material suppliers and process them off-line directly from his CAD software in order to cost-optimize the building he is currently designing.
The second possibility will be to use robots. For example a manufacturer of electronic appliances will have a robot locate, anywhere in the world, the supplier of an electronic part with such and such performance at the minimal cost.
Another possible use of XML, that seems very promising although it hasn't been much mentioned yet, would be to specifically encode words and phrases so as to eliminate linguistic (lexical or syntactical) ambiguities in the original (source) version of a document in order to facilitate ulterior machine translation into any other (target) language.
It is still a bit hazardous to try and imagine in full detail how eXtensible-Markup Language will transform the lives of tomorrow's information gatherers and users (the probable successors of today's writers and readers). Paradoxically it may be safer to predict—as the author hopes to have convincingly suggested—that its advent will mark a turning point in the transition towards the new upcoming culture of externalized intelligence.