Creation of the Thesaurus

The first stage of the TOE was the creation of an archive of slips[1]. For each meaning found in the standard dictionaries a slip was made. Together with the making of slips for each word-sense, sorting numbers were chosen from the 1962 edition of Roget's Thesaurus[2]. Each slip was given at least one number, often two, sometimes more, and cross reference slips were written for each number other than the first. All these slips were filed in the Glasgow archive, according to Roget numbers. For checking purposes the compiler (Jane Roberts) kept alphabetically arranged duplicates for each sense excerpted. Given the categories of information available in current dictionaries, no attempt could be made to cover phrases, whether verb phrases or idioms, other than haphazardly. Consultation of the completed letters of the DOE reveals a wealth of new collocational detail that must be taken into account in fuller lexical field studies. This thesaurus is, for the most part, a presentation of concepts lexicalized in Old English as single items[3]. Where obvious groups were clearly identifiable when the contents of the standard dictionaries were examined, these were excerpted and have been included, but all too little is known of multiword structures in Old English. Once compilation from the standard dictionaries was complete, microfiche checklists, both alphabetical and notional, were made from the duplicate slips retained by the compiler[4]. This stage was completed in 1982. It must be emphasized therefore that the compilation of the Old English slips for the HT was mostly completed before the publication of the MCOE.

In order to create the Skeleton Old English Thesaurus, information essential for checking purposes was entered on punched cards: Old English form; putative related OED forms; and Roget number(s) for each sense excerpted. Where this minimal information was insufficient for the differentiation of any two or more slips, the slips liable to confusion were given numbers. Thus, a record was made of the word senses excerpted from the Anglo-Saxon dictionaries, to allow checking at later stages in the compilation of the HT. The completion of the microfiche checklists provided a clearer overview of the experimental Old English materials contributed to the Glasgow archive. Thus, from 1982 we had available for research use both an alphabetic index to the Old English supplementary materials and a skeleton thesaurus arranged according to Roget-derived classificatory numbers, and copies were in use in English departments both in King’s College London and in the University of Glasgow. (As well, a copy was used by the DOE team in Toronto.) They were of particular value in London, where most of the Old English side of the thesaurus was centred, giving access not only to both alphabetic and notional checklists but also to a complete notional listing for every Roget number used on each slip. This useful product of the machine-generated listings gave us a summary not otherwise easily retrievable of all Old English slips assigned each particular Roget number.

The first stage in generating a fully computer-based system for the thesaurus was to create a structure to hold the information from the slips in a database, using dBase II software on Apricot microcomputers. In this phase floppy discs containing newly-converted materials and corrections were regularly exchanged between Glasgow and King's. In 1989 all the current dBase files were converted at King's – by a somewhat convoluted process made necessary by differences in storage formats – and loaded into a new system, using INGRES relational database software. Initially the database was on a personal computer, but by 1992 the volume of material in electronic form was such that it became desirable to allow simultaneous access by several people. As a result the database was transferred to a VAX mainframe computer. Later, the database was moved to a UNIX machine set up as a 'database server', but still using INGRES software. Once the HT materials in Glasgow were themselves moved into INGRES format, there was the additional advantage that data files could be transferred electronically between King's College and Glasgow over 'JANET' (Joint Academic NETwork), the communications system which links all UK research and higher education institutions. Holding the information from the slips in electronic form made it possible for the classification(s) of a word or group of words to be modified readily and for listings of all the entries in a category or sub-category to be produced at will. The database software also allowed different combinations of information to be produced in various sequences, for example by category, or by word. Hence both the Thesaurus and the Index were initially extracted from the database by report programs. In order to prepare these materials for publication, a further process was created to make them ready for input to LaTeX typesetting software. The LaTeX process produced the camera-ready copy sent to the publisher.

The present database is in MySQL format, with the data in separate relational tables (an advance not possible when the original database was created). It mirrors the structure of the larger HT database, and is online as part of Glasgow’s platform for the electronic release of HT-style thesauri.

[1] Jane Roberts, 'Some Problems of a Thesaurus Maker', Problems of Old English Lexicography, ed. Alfred Bammesberger, Eichstätter Beiträge, 15 (1985), 229-43. See also

[2] Robert A. Dutch, ed., Roget's Thesaurus of English Words and Phrases, London, 1962.

[3] Not all words from the language's closed systems are included.

[4] Skeleton Old English Thesaurus, 1982. Research tool on 9 microfiches. Prepared by Jane Roberts with Christine Brown at the King's College London Computing Centre.