The Book's Foreword:
Most people are not aware of ambiguity when reading a text: the general feeling is that every word carries one -and only one- ‘suitable’ meaning. Ambiguity may be present, however, and it surfaces when we hear or detect puns in discourse. Puns require ambiguity. In discourse, though, puns are the exception, not the rule. We cannot rely on puns alone as the basis for a consideration of ambiguity.
Lexicological tradition and particularly the works of lexicographers –i.e., dictionaries-, are permanent reminders of the various meanings that many words may take on. Dictionaries take ambiguity for granted. More recently, corpus linguistics has added some ‘quantitative’ evidence to this fact of language. Miller (1990) reports that the most frequent 121 nouns in English have 7.8 meanings each on average. The conclusion seems to be that words and sentences carry the seeds of ambiguity in the sense that words or sentences can be assigned different meanings even when they have the same form. On the other hand, we also notice that this assumption applies to words when taken as isolated items, as they usually appear in dictionaries, but once words enter discourse, they apparently lose their potential for ambiguity. The condition for entering discourse appears to be that words are previously ‘disambiguated’, that only one of their potential meanings is selected and other possible meanings are blocked.
Word sense disambiguation (WSD) is one of the greatest challenges in computational linguistics. A computer capable of discovering whether the item table
, as used in a text, means‘a piece of furniture consisting of a flat, horizontal top usually set on legs’
or perhaps‘a compact arrangement of related facts, figures, values, etc. in orderly sequence, and usually in rows and columns, for convenience of reference’
(as specified in Webster’s dictionary) is of great help for extracting and arranging information as it occurs in real language use. The amount of language generated by the Internet in a single day (the equivalent to approximately 17 million books in a single day, ‘enough to fill 37,000 buildings the size of the U.S. Library of Congress in a year’, according to Newsweek magazine, June 26, 2006) illustrates the importance of the issue. Within the perspective of computational linguistics, disambiguation procedures are based on identifying the appropriate meaning of polysemous words, the one that is activated in the context in which the word is used. Context, therefore, is taken to be the key to discovering the right meaning, among various possible candidates.
WSD proceeds on the assumption that the word is the basic lexical unit, very much as if it were the carrier of meaning. Traditional dictionaries are built on the same premises. But what happens if lexical units of meaning are not words, or not only words? The use of corpora in linguistic analysis has reinforced the role of context in the shaping of meaning. More often than expected, the meaning of a word is not only ‘dependent’ on context: it fully belongs to it; it forms part of a complex set of linguistic elements within which it has lost its independence. At least in those instances, the role of context in the shaping of meaning is not only relevant and necessary; it triggers a question about what the lexical unit really consists of: a word ,as traditionally accepted, or a larger unit shaped by a set of lexical elements to which the word belongs. WSD should not be restricted to words if the unit of meaning extends over word boundaries.
From Words to Lexical Units deals with the core of what should precede WSD: the units of meaning to be disambiguated, their nature, their limits and their scope. In its short history of disambiguating distinct meanings of words in discourse, Computational Linguistics has not really taken into account lexical semantics; it relies heavily on words as they are dealt with in dictionaries, finite in number and each with a closed set of senses. If meaning is to be built on discourse, however, rather than on isolated items (words), lexical units as repositories of meaning should be approached from a different angle. A deeper and better understanding of language and meaning is required.
This book by Moisés Almela raises precisely the issue of what is to be taken as a lexical unit and challenges the traditional stand on it. With powerful theoretical coherence and skilful argumentation the author calls into question the autonomy of the word as an independent element in the construction of meaning, and thus the inappropriateness of taking it as the lexical unit in the identification of meaning. Hundreds of examples taken from current corpora of English and Spanish are carefully and skilfully analysed and discussed. The data gathered and the logical outcome of the analysis lead the author to state that lexical units of meaning should not just be named ‘words’. He concludes that lexical units operate at different discourse levels and should be better named extended lexical units (ELU).
Redefining lexical units in this direction requires a thorough review of current assumptions about the relationships between words and meaning, and some important adjustments in related fields. Words alone –to say the least- cannot be attributed an almost absolute role in conveying and demarcating meaning in discourse; idiomaticity and ELU should be adequately defined vis-a-vis each other; and the structure of meaning has to be more tightly related to text.
The study of meaning from the perspective of corpus linguistics has only just begun. This book is an excellent example of such an approach, a seminal work that challenges as well as comforting inquisitive minds. Aquilino Sánchez
University of Murcia