Why lexicographers should take more notice of phraseology, collocations, and creative language use
English dictionaries since 1755 have attempted to present succinct statements of the meaning(s) of each word. A word may have more than one meaning but, so the theory goes, each meaning can in principle be summarized in a neat paraphrase that is substitutable (in context) for the target word (the definiendum). Such paraphrases must be so worded that the substitution can be made without changing the truth of what is said – salva veritate, in Leibniz’s famous phrase. Building on Leibniz, philosophers of language such as Anna Wierzbicka have argued that the duty of the lexicographer is to “seek the invariant”.
In this presentation, I argue that this view of word meaning and definition may be all very well as a principle for developing stipulative definitions of terminology in scientific discourse, but it has led to serious misunderstandings about the nature of meaning in natural language, creating insuperable obstacles for the understanding of how word meaning works. As a result, linguists from Bloomfield to Chomsky and philosophers of language from Leibniz to Russell – great thinkers all – have been unable to say anything true or useful about meaning in language.
I argue that, instead, lexicographers should aim to discover patterns of word use in large corpora, and associate meanings with patterns instead of (or as well as) words in isolation.
They should also distinguish normal uses of each word from exploitations of norms.
At the end of the alphabet
For lexicographers, the alphabet has limited utility; in a time of instant database search results, the sole advantage of alphabetical order is entirely lost, and its disadvantages dominate. Alphabetical order is artificial, it fragments, it places unrelated items next to each other, it does not allow editors or users to make useful connections between meanings, and it prohibits browsing words on a particular subject. In a language such as Scots, with widespread spelling variability, the disadvantages increase yet further. Given, therefore, that alphabetical editing of a dictionary exists purely because of efficiency of access in the dying situation of a reader dealing with a print volume, it follows that by definition such an ordering is intrinsically uninteresting for an editor, telling us nothing about a dictionary entry other than its opening configuration of letters. A thematic and non-alphabetical thesaurus structure, by contrast, has both a more immediately useful classificatory system and a far longer historical pedigree.
To therefore admit that in the digital age we are at the end of the alphabet means we can lexicographically operate at a level where we operate on a synthesis of the conceptual vocabulary of a language, and of the relative place of each concept in the wider conceptual system of which it is a part. This paper outlines the argument above against the alphabetical ordering of a dictionary, and will then present the essential characteristics of a diachronic thesaurus. These Samuels and Kay principles are named after the Directors of the Historical Thesaurus of English, who conceived and implemented them over the past fifty years at Glasgow. Any serious attempt to arrange and edit diachronic lexical data by meaning should pay particular attention to these core principles, which have not before been presented together. The paper will conclude by outlining the consequences and utility of each principle, and in so doing demonstrate both their indispensable nature and the keen foresight of Professors Michael Samuels and Christian Kay.
The Oxford English Dictionary Online: work in progress, and future plans
The project to revise the OED has now been in progress for over twenty years. Since 2000, OED Online has been published in quarterly updates, including both new words and extensive revisions of the existing material, from Old English to current use. Much of this revision has been made possible by access to digitized texts in searchable form, such as Early English Books Online, Google Books, and a plethora of newspapers. In recent years we have also started to use parsed and tagged corpora to help, for example, identify new words and lemmas, assess what constitutes typical use, and establish accurate word frequencies, but there is a discrepancy between the range of coverage provided by the full-text databases and the chronological limitations of the corpora available to us. We would like the next stage of OED’s development to be driven by the application of quantitative corpus methods to earlier periods of English in a diachronic perspective, allowing for more accurate mapping of semantic change. The potential of this approach is clear even within the limited resources we have now.
The DSL: enhancements past, present and future
The Dictionary of the Scots Language (DSL) is one of many online dictionaries whose content was originally published in print during the 20th century – the heyday of such multi-volume dictionaries – and which have now been digitised. These digitised versions have reached various stages of development, ranging from a simple reproduction of the print version to a fully-functioning online resource. Like many of its fellow-dictionaries, DSL has reached a stage somewhere between these points.
In this talk I shall outline some of the milestones we have passed, some that we are in the process of overtaking and those that we see ahead. I shall give examples of some of the problems we have encountered, some of which apply to all similar endeavours and some which apply more particularly to DSL – for example, dealing with a language which has significant regional variation and no standardised orthography, and dealing with content which comprises two separate parent dictionaries which differed in their approach to the task in various ways.
I will finish by giving an example of one of the many challenges we must tackle, in bringing the DSL firmly into the 21st century.
Faclair na Gàidhlig: retrospect and prospects
Some years after it was first mooted, a new sort of Gaelic dictionary started to be planned in the Scottish Universities in the 1990s, utilising computer-based techniques to generate a corpus of words for analysis, and applying lexicographical principles exemplified in such historical dictionaries as OED and DOST. This project took shape in the early 2000s as Faclair na Gàidhlig. As the concept became clearer, so did the scale of the task; and so, simultaneously, did the conviction that FnaG had a central role to play in fulfilling educational needs and national aspirations for the future well-being of Gaelic. It is worth recalling at this point how FnaG evolved, and highlighting the key circumstances and critical factors which enabled it to progress to the point we have reached today.
It is likewise important to recall several particular challenges which Scottish Gaelic presents to the lexicographer. Some of these are purely linguistic; others are rooted in the historical and social circumstances of Gaelic speakers, but with significant effects on the consistency of the data available to lexicographers. Many lessons have been learned from the experiences of lexicographers working with other languages, while some questions have required us to customise or extemporise. Many decisions have been taken, though answers to some thorny questions are still emerging.
As we now move towards the completion of the foundational phase of FnaG it is important to look ahead, to envisage and prepare for the different sorts of challenge which will confront the project as a new generation of lexicographers goes to work on the historical dictionary of Scottish Gaelic.
Innovations in Slovenian (e-)lexicography: from (semi-)automatic data extraction to crowdsourcing and beyond
In recent years, lexicography has witnessed an emergence of the digital medium, which has brought many new possibilities for dictionary makers and lexicographers, and has also lead to an emergence of field of e-lexicography. Yet, the digital medium has also brought new challenges for lexicographers who have larger and larger corpora at their disposal, but due to raised expectations of users demanding quick access to up-to-date dictionary information, very little time to analyse them in. This problem is especially significant when it comes to compiling a dictionary from scratch, a prospect faced by the team working on the Dictionary of Contemporary Slovene Language (DCSL).
In my presentation, I will first describe the five-phase lexicographical process of DCSL (Kosem et al. 2013, Gantar et al., in print) and then focus on the first two phases, namely automatic extraction of lexical data from the corpus and post-processing of extracted data, the latter also including the use of crowdsourcing for data clean-up. I will show that the benefits of such an approach are considerable, and do not result in any loss of relevant information for subsequent lexicographic analysis, and consequently in any loss of reliability of dictionary content.
Carolin Müller-Spitzer & Sascha Wolfer
A quantitative view on dictionary use: Potentials and limitations of log file analyses
We introduce four research questions that can be addressed using log files of online dictionaries:
(1) Are words that occur more frequently in everyday language also looked up more frequently in a dictionary? (2) Are polysemic words visited more frequently than monosemic words? (3) How can we investigate temporal effects on visiting frequency? (4) What portions of Wiktionary stay “in the dark” (i.e., are not visited at all or very infrequently)? For almost all analyses of log file data, additional information is necessary, like corpus frequency of headwords or information that can be extracted from the dictionary article itself (e.g. part-of-speech of the headword or number of senses). We will focus on the methodological side of the analyses, proposing a quantitative view on the data. Apart from that, we will also discuss what limitations we face when dealing with log file data.
Democratizing the Dictionary: the challenges and opportunities presented by crowdsourcing content
In the last ten years, and with the near-ubiquity of interactive digital media, the concept of crowdsourcing and the expectation of increasingly direct relationships with customers have become norms in product development and business practice. The publishing industry is not unaffected by such trends, but how does an established and reputable dictionary publisher make its content more accessible to the public without losing its authority? In 2012, Collins opened up its dictionary website to registered users, encouraging new-word submissions, opinions, and comments. I will outline the methods used by Collins to manage such submissions and embrace the positive opportunities afforded by this democratization, and consider how we address the inevitable challenges, both foreseen and unforeseen, brought about by the process. I will demonstrate how Collins has utilized this user-generated content for the betterment of its dictionaries, and how it may continue to do so in the future.
More than just a dictionary: unlocking the content of a historical dictionary while adapting it for dynamic electronic presentation
Twenty years ago, Oxford University Press published A Dictionary of South African English on Historical Principles (DSAEHist), the culmination of more than 25 years of research by the Dictionary Unit for South African English at Rhodes University in South Africa. A work of significant linguistic and cultural scholarship, the dictionary gained added importance because of its gestation during the period of South Africa’s transition to a democratic society.
In 2014, as a first step towards bringing this unique record of the development of English in South Africa into the digital era, the Dictionary Unit launched a pilot online version of the DSAEHist. But like many of its kind, the online dictionary is, at present, largely an on-screen replication of the print edition. However, in collaboration with Professor Uli Heid at the University of Hildesheim, work has begun on adapting the dictionary text to support publication on multiple electronic platforms, and, in the process, to review the overall dictionary design.
This presentation will briefly contextualise the dictionary and its importance to contemporary South Africa, before outlining some of the early steps being taken to enhance its accessibility and adaptability to a broad range of online users.