In contemporary linguistics, agglutination usually refers to the kind of morphological derivation in which there is a one-to-one correspondence between affixes and syntactical categories. Languages that use agglutination widely are called agglutinative languages. For example, the Hungarian word hajókon `on ships' may be divided into a root hajó with two endings -k and -on expressing respectively the plural number (hajó-k `ships') and the location `on' something (hajó-n `on a ship'). Moreover, the ending -n is so regular that the Hungarian Wiktionary simply marks this case as "-on/-en/ön" (in English it is called superessive). In contrast to this, in the Czech translation v lodích, the location is expressed by a combination of a separate word (a preposition v `in') and the locative plural ending ích which is added to the stem loď `ship' and cannot be subdivided into a part expressing plural and a part expressing the locative case. Therefore Czech is not an agglutinative language.
Agglutinative languages are often contrasted both with languages in which syntactic structure is expressed solely by means of word order and auxiliary words (isolating languages) and with languages in which a single affix typically expresses several syntactic categories and a single category may be expressed by several different affixes (as is the case in inflectional (fusional) languages). However, both fusional and isolating languages may use agglutination in the most-often-used constructs, and use agglutination heavily in certain contexts, such as word derivation. This is the case in English, which has an agglutinated plural marker -(e)s and derived words such as shame·less·ness.
Agglutinative suffixes are often inserted irrespective of syllabic boundaries, for example, by adding a consonant to the syllable coda as in English tie – ties. Agglutinative languages also have large inventories of enclitics, too, which can be and are separated from the word root by native speakers in daily usage.
Note that the term agglutination is sometimes used more generally to refer to the morphological process of adding suffixes or other morphemes to the base of a word. This is treated in more detail in the section on other uses of the term.
Hungarian uses extensive agglutination in almost all and any part of it. The suffixes follow each other in special order, and can be heaped in extreme amount, resulting words conveying complex meanings in very compact form. An example is fiaiéi where the root "fi-" means "son", the subsequent 4 vowels are all separate suffixes, and the whole word means "[properties] of his/her sons". The nested possessive structure and expression of plurals is quite remarkable (note that Hungarian uses no genders).
Agglutination is used very heavily in some Native American languages, such as Nahuatl, Quechua, Tz'utujil, Kaqchikel, Cha'palaachi and K'iche, where one word can contain enough morphemes to convey the meaning of what would be a complex sentence in other languages.
Agglutination is also a common feature of Basque. The conjugations of verbs, for example, are done by adding different prefixes or suffixes to the root of the verb: dakartzat, which means 'I bring them', is formed by da (indicates present tense), kar (root of the verb ekarri-> bring), tza (indicates plural) and t (indicates subject, in this case, "I"). Another example would be the declination: Etxean = "In the house" where etxe = house.
Almost all of the Philippine languages also belong to this category. This enables them, especially Filipino, to form new words from simple base forms. An example is nakakapagpabagabag, which means causing someone or something to be upset and is formed from the root bagabag, which means upset/upsetting.
Japanese is also an agglutinating language, adding information such as negation, passive voice, past tense, honorific degree and causality in the verb form. Common examples would be hatarakaseraretara (働かせられたら), which combines causative, passive or potential, and conditional conjugations to arrive at two meanings depending on context "if (subject) had been made to work..." and "if (subject) could make (object) work", and tabetakunakatta (食べたくなかった), which combines desire, negation, and past tense conjugations to mean "(subject) did not want to eat".
Turkish is another agglutinating language: the expression Avustralyalılaştıramadıklarımızdanmışsınızcasına is pronounced as one word in Turkish, but it can be translated into English as "as if you were one of those whom we could not make resemble the Australian people."
All Dravidian languages, including Kannada, Telugu, Malayalam and Tamil, are agglutinative. Agglutination is used to very high degrees both in formal written forms in Tamil (e.g. sevvaanam "red sky") and in colloquial spoken forms of the language (e.g. sokkathangam "pure gold").
Esperanto is a constructed auxiliary language with highly regular grammar and agglutinative word morphology. See Esperanto vocabulary.
Whilst agglutination is characteristic of certain language families, it would be facile to jump to the conclusion that when several languages in similar geographic area are all agglutinative, they necessarily have to be related in the phylogenetic sense. In particular, such a conclusion formerly led linguists to propose the so-called Ural-Altaic language family which would (in the largest scope ever proposed) include Uralic and Turkic languages as well as Mongolian, Korean and Japanese. However, contemporary linguistics views this proposal as controversial.
On the other hand, it is also the case that some languages that have developed from agglutinative proto-languages have lost this feature. For example, contemporary Estonian, which is so closely related to Finnish that the two languages are mutually intelligible, has shifted towards the fusional type. (It has also lost other features typical of the Uralic families, such as vowel harmony.)
The number of slots for a given part of speech can be surprisingly high. For example, a finite Korean verb has seven slots (the brackets indicate parts of morphemes which may be omitted in some phonological environments):
# honorific: -(ǔ)si is used when the speaker is honouring the subject of the sentence # tense: (ə)s' for completed (past) action or state; when this slot is empty, the tense is interpreted as present # experiential-contrastive aspect: (ə)s' doubling the past tense marker means "the subject has had the experience described by the verb" # modal: kes' is used with first-person-subjects only for definite future and with second-or-third-person-subjects also for probable present or past # formal: (sǔ)pni expresses politeness to the hearer # retrospective aspect: tə indicates that the speaker recollects what he observed in the past and reports in in the present situation # mood: ta for declarative, k'a for interrogative, la for imperative, ca for propositive, yo for polite declarative and a large number of other possible mood markers
Moreover, passive and causative verbal forms can be derived by adding suffixes to the base, which could be seen as the null-th slot; however, passives are not as commonly used as in English and many verbs do not allow passivization at all.
Even though some combinations of suffixes are not possible (e.g. only one of the aspect slots may be filled with a non-empty suffix), over 400 verb forms may be formed from a single base. Here are a few examples formed from the word root ka `to go'; the numbers indicate which slots contain non-empty suffixes:
{| |
{| | yu-le 1sg-that | m-tu 1sg-person | m-moja 1sg-one | m-refu 1sg-tall | a-li 1sg-he-past | y-e 7sg-rel.-it | ki-soma 7sg-read | ki-le 7sg-that | ki-tabu 7sg-book | ki-refu 7sg-long | |} `That one tall person who read that long book.'
{| | wa-le 1pl-that | wa-tu 1pl-person | wa-wili 1pl-two | wa-refu 1pl-tall | wa-li 1pl-he-past | (w)-o 7pl-rel.-it | vi-soma 7pl-read | vi-le 7pl-that | vi-tabu 7pl-book | vi-refu 7pl-long | |}`Those two tall people who read those long books.'
In the same paper, Greenberg proposed several other indices, many of which turn out to be relevant to the study of agglutination. The synthetic index is the average number of morphemes per word, with the lowest conceivable value equal to 1 for isolating (analytic) languages and real-life values rarely exceeding 3. The compounding index is equal to the average number of root morphemes per word (as opposed to derivational and inflectional morphemes). The derivational, inflectional, prefixial and suffixial indices correspond respectively to the average number of derivational and inflectional morphemes, prefixes and suffixes.
Here is a table of sample values:
{| |
Several examples from Finnish will illustrate how these two rules and other phonological processes lead to diversions from the basic one-to-one relationship between morphs and their syntactic and semantic function. No phonological rule is applied in the conjugation of talo `house'. However, the second example illustrates several kinds of phonological phenomena.
{| |- | talo `house' || märkä paita `a wet shirt' || the roots contain consonant clusters -rk- and -t- |- | talo-n `of the house' || märä-n paida-n`of a wet shirt' || consonant gradation: the genitive suffix -n closes the preceding syllable; rk -> r, t->d |- | talo-ssa `in the house' || märä-ssä paida-ssa`in a wet shirt' || vowel harmony: a word containing ä may not contain the vowels a, o, u; an allomorph of the inessive ending -ssa/ssä is used |- | talo-i-ssa `in the houses' || mär-i-ssä paido-i-ssa`in wet shirts' || phonological rules also imply different vowel changes when the plural marker -i- meets a stem-final vowel |}
English is capable of agglutinating morphemes of solely Germanic origin, as un-whole-some-ness, but generally speaking the longest words are assembled from forms of Latin or Ancient Greek origin. The classic example is antidisestablishmentarianism. Agglutinative languages often have more complex derivational agglutination than isolating languages, so they can do the same to a much larger extent. For example, in Hungarian, a word such as elnemzetietleníthetetlenségnek, which means "for [the purposes of] undenationalizationability" can find actual use. The same way, there are the words that have their meaning but probably are never used such as legeslegmegszentségteleníttethetetlenebbjeitekként, which means "like the most of most undesecratable ones of you", but hard to decipher in meaning when heard by native speakers. Using inflectional agglutination, these can be extended. For example, the official Guinness world record is Finnish epäjärjestelmällistyttämättömyydellänsäkäänköhän "I wonder if – even with his/her quality of not having been made unsystematized". It has the derived word epäjärjestelmällistyttämättömyys as the root and is lengthened with the inflectional endings -llänsäkäänköhän. However, this word is grammatically unusual, since -kään "also" is used only in negative clauses, but -kö'' (question) only in question clauses.
A very popular Turkish agglutination is Çekoslovakyalılaştırabildiklerimizden miydiniz?, which actually is one word, however, the question suffixes (miydiniz in this case) are written separately and the word stands for Were you one of those who we failed to assimilate as a Czechoslovakian?. This historical reference is used as a joke for the individuals who are hard to change or those who stick out in a group.
On the other hand, Afyonkarahisarlılaştırabildiklerimizdenmişsinizcesine is a longer word and it does not surprise people as it contains no spaces and the latter stands for As if you are one of the people that we made resemble from Afyonkarahisar. A recent addition to the claims has come with the introduction of the following word in Turkish muvaffakiyetsizleştiricileştiriveremeyebileceklerimizdenmişsinizcesine, which means something like (you are talking) as if you are one of those that we cannot easily convert into an unsuccessful-person-maker (someone who un-educates people to make them unsuccessful).
Georgian is also highly agglutinative language, for example the word gadmosakontrrevolucieleblebisnairebisatvisaco (გადმოსაკონტრრევოლუციელებლებისნაირებისათვისაცო) would mean (someone not specified) said that it is also for those who are like the ones who need to be to again/back contrrevolutionized.
Especially in some older literature, agglutinative is sometimes used as a synonym for synthetic. In that case, it embraces what we call agglutinative and inflectional languages, and it is an antonym of analytic or isolating. Besides the clear etymological motivation (after all, inflectional endings are also "glued" to the stems), this more general usage is justified by the fact that the distinction between agglutinative and inflectional languages is not a sharp one, as we have already seen.
In the second half of the 19th century, many linguists believed that there is a natural cycle of language evolution: function words of the isolating type are glued to their head-words, so that the language becomes agglutinative; later morphs become merged through phonological processes, and what comes out is an inflectional language; finally inflectional endings are often dropped in quick speech, inflection is omitted and the language goes back to the isolating type.
The following passage from Lord (1960) demonstrates well the whole range of meanings that the word agglutination may have.
(Agglutination...) consists of the welding together of two or more terms constantly occurring as a syntagmatic group into a single unit, which becomes either difficult or impossible to analyse thereafter.Agglutination takes various forms. In French, welding becomes complete fusion. Latin hanc horam `at this hour' is the French adverbial unit encore. Old French tous jours becomes toujours, and dès jà (`since now') déjà (`already'). In English, on the other hand, apart from rare combinations such as good-bye from God be with you, walnut from Wales nut, window from wind-eye (O.N. vindauga), the units making up the agglutinated forms retain their identity. Words like blackbird and beefeater are a different kettle of fish; they retain their units but their ultimate meaning is not fully deducible from these units. (...)
Saussure preferred to distinguish between compound words and truly synthesised or agglutinated combinations.
Even more problems occur with the recognition of word forms. Modern linguistic methods are largely based on the exploitation of corpora; however, when the number of possible word forms is large, any corpus will necessarily contain only a small fraction of them. Hajič (2010) claims that computer space and power are so cheap nowadays that all possible word forms may be generated beforehands and stored in a form of a lexicon listing all possible interpretations of any given word form. (The data structure of the lexicon has to be optimized so that the search is quick and efficient.) According to Hajič, it is the disambiguation of these word forms which is difficult (more so for inflective languages where the ambiguity is high than for agglutinative languages).
Other authors do not share Hajič's view that space is no issue and instead of listing all possible word forms in a lexicon, word form analysis is implemented by modules which try to break up the surface form into a sequence of morphemes occuring in an order permissible by the language. The problem of such an analysis is the large number of morpheme boundaries typical for agglutinative languages. A word of an inflectional language has only one ending and therefore the number of possible divisions of a word into the base and the ending is only linear with the length of the word. In an agglutinative language, where several suffixes are concatenated at the end of the word, the number of different divisions which have to be checked for consistency is large. This approach was used for example in the development of a system for Arabic, where agglutination occurs when articles, prepositions and conjunctions are joined with the following word and pronouns are joined with the preceding word. See Grefenstette et. al (2005) for more details.
Mwana Simba, a web-page about Swahili grammar.
Bernard Comrie (editor): The World's Major Languages, Oxford University Press, New York – Oxford 1990.
Keith Denning, Suzanne Kemmer (ed.): On language: selected writings of Joseph H. Greenberg, Stanford University Press, 1990. Selected parts are available on googlebooks.
Victoria Fromkin, Robert Rodman, Nina Hyams: An Introduction to Language, Thompson Wadsworth, 2007.
Joseph H. Greenberg: A quantitative approach to the morphological typology of language, 1960. Available through JSTOR and in Denning et. al (1990), p. 3-25. There is also a good a short summary.
Gregory Grefenstette, Nasredine Semmar, Faïza Elkateb-Gara: ''Modifying a Natural Language Processing System for European Languages to Treat Arabic in Information Processing and Information Retrieval Applications'', Computational Approaches to Semitic Languages – Workshop Proceedings, University of Michigan 2005, p. 31-38. Available at .
Jan Hajič: Reliving the history: the beginnings of statistical machine translation and languages with rich morphology, IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing, Springer-Verlag Berlin, Heidelberg, 2010. Abstract available at .
Helena Lehečková: Úvod do ugrofinistiky, Státní pedagogické nakladatelství, Praha 1983.
Robert Lord: Teach Yourself Comparative Linguistics, The English Universities Press Ltd., St Paul's House, London 1967 (first edition 1966).
Hans Christian Luschützky: Uvedení do typologie jazyků, Filozofická fakulta Univerzity Karlovy, Praha 2003.
J. Vendryes: Language – A Linguistic Introduction to History, Kegan Paul, Trench, Trubner Co., Ltd., London 1925 (translated by Paul Radin)
This text is licensed under the Creative Commons CC-BY-SA License. This text was originally published on Wikipedia and was developed by the Wikipedia community.