Thesauri are knowledge organization systems in the form of controlled vocabularies that organize and relate the concepts and terms of a scientific domain according to very specific semantic rules. In thesauri, unlike dictionaries where all the different definitions, meanings and usages of a word are presented, the concepts are restricted in scope to selected meanings that serve the needs of a specific domain. The intended meaning of a term is usually indicated by its place in the hierarchical structure of the thesaurus. For each concept, they contain a brief note that defines the meaning and the use of the concept, lexical variables, synonyms and alternative expressions for each term.
They are based on natural language, which is transformed into an “artificial” and normalized language where concepts are represented by several terms, each of which has only one meaning.
They can be used as a tool that will aid people to successfully retrieve the desired information by guiding the indexer and the researcher to select the same term for the same concept. A search is considered successful when it results in identifying and retrieving as many as possible of the data stored in the information systems that are relevant to the research questions posed by domain experts. Thesauri facilitate the integrations of heterogeneous sources and this is particularly important in the cases where the various institutions that create information (such as museums, archives, art galleries, ect.) wish to share data with each other.
The need for using thesauri arises for the most part from the ambiguities of natural language namely:
The same word can have more than one meaning, the same word is used to represent different concepts.
A concept can be represented by two or more words that have the same or similar meanings and the desired content may be difficult to retrieve as it is described by different but equivalent terms
Thesauri aim to minimize the ambiguities of natural language by providing:
- A list of terms acceptable to domain experts, each of them representing only one concept.
- Mechanisms for structuring and using those terms such as:
- Restricting the scope and meaning of a concept within the domain of a particular scientific field,
- Using the equivalence relationship to link terms that are synonymous or quasi-synonymous,
- Distinguishing among homographs,
- Building hierarchies in which the higher hierarchical terms represent a class and the subordinate terms refer to its members,
- Relating concepts that are semantically related but are neither synonyms nor part of the same hierarchical branch.
- Providing several displays of the concepts and their relationships
The greater detail and information contained in a thesaurus compared with a simple controlled vocabulary aids users in finding the most appropriate term more easily than in a simple, unstructured controlled vocabulary.
Thesauri serve to:
- Guide the indexer and the researcher to select the same term for the same concept.
- Enable interoperability between different scientific fields.
- Allow access, compatibility and comparison across heterogeneous classification systems.
- Minimize semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of content.
- Achieve the greatest economy in the process of organizing terms in a common framework.
- Increase the effectiveness of online information retrieval.
- Standardize classification and cataloging expressions and thus creating a systematic language of communication between the experts of a scientific field.