This is the main page of the website where a linguistic corpus of the Adyghe language is located. The search in annotated Adyghe texts is performed via a web interface. When you search for something, you get individual sentences as search hits; there is no access to entire texts. You can search by character sequences, morphs and their combinations, grammatical tags, Russian translations of words, and position of the word in the sentence. The search can be narrowed down to a subset of texts, selected e.g. by genre. Morphological annotation of words was carried out automatically and was not checked manually; authors of the corpus take no responsibility for their correctness. It the table, main characteristics of the corpus are outlined. More detailed information about the texts and the tagsets used in the corpus can be found below.
|Size||8.25 million words|
|Texts||contemporary press — 81.2%, 20th century fiction — 8.7%; Quran and Bible — 6.4%; other — 3.7%|
A language corpus is a collection of texts in that language which has been enriched with additional linguistic information, called annotation, and, preferably, equipped with a search engine. Here you will find a short list of frequently asked questions about the Adyghe corpus.
— Who needs corpora?
First of all, corpora are used by linguists. The search engine and annotation of corpora are designed in such a way that you can make linguistic queries such as “find all nouns in the genitive case” or “find all forms of the word цӏыф followed by a verb”. Apart from linguists, corpus can be a useful tool for language teachers, language learners, and even the native speakers.
— Can I use the corpus as a library?
No, this corpus is not designed for that. When you work with a corpus, you make a query, i.e. search for a particular word, phrase or construction, and get back all sentences that contain what you searched for. By default, the sentences are showed in random order. You can expand the context of each of the sentences you get, i.e. look at their neighboring sentences. However, you may do so only a limited number of times for each sentence. Therefore, it is impossible to read an entire text in the corpus. This is done for copyright protection.
— Can I use the corpus as a dictionary?
Each Adyghe word in the corpus has Russian translation (no English translations are available at the moment). However, they are only provided as auxiliary information for users who do not speak Adyghe. The translations in the corpus are kept short and simple by design, they do not list all senses and do not provide usage examples like real dictionaries. If you want to know how to translate a word, the right way to do so is consulting a dictionary.
— What is morphological annotation and how do you get it?
The corpora located here are lemmatized and morphologically annotated. Lemmatization means that each word in the texts is annotated with its lemma, i.e. dictionary/citation form. Morphological annotation means that each word is annotated for its grammatical features, such as part of speech, number, case, tense, etc. Since the corpus in question is too large for manual annotation to be feasible, it was annotated automatically with a program called morphological analyzer. The analyzer uses a manually compiled grammatical dictionary and a formalized description of Adyghe inflection. Automatic annotation unfortunately means that, first, out-of-vocabulary words are not annotated, and, second, that some words have several ambiguous analyses.
The corpus comprises the following texts:
|“Adyghe Maq” newspaper||Articles (2009–2017 )|
|“Argumenty i fakty” newspaper||Articles (2011–2017 )|
|“NatPress” news agency||Articles (2007–2011 )|
|Wikipedia in Adyghe||Articles|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Адыгэхэр|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Айщэт|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Джасус|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Къокӏыпӏэмрэ Къухьэпӏэмрэ|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Мэшбащӏэ Исхьэкъ и усэхэр|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Рафыгъэхэр|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Хэхэсхэр|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Чӏыгу-огу зэнэсым сыда щыӏэр?|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Тыжьын тас|
|Iskhak Mashbash (Мэшбащӏэ Исхьэкъ)||Литературэр – сищыӏэныгъ. Статьяхэр, зэӏукӏэхэм къащиӏуагъэхэр, зэдэгущыӏэгъухэр|
|Nalbiy Kuek (Куекъо Налбый)||Абадзахэмэ ян|
|Wilhelm Busch||Jesus — unser Schicksal|
|Цуекъо Нэфсэт||Къэзгъэзэжьыгъэ налмэс-налкъутэхэр. Тыркуем щыпсэурэ адыгэхэм ялъэпкъ ӏушыныгъэ щыщхэр|
|Цуекъо Нэфсэт||Адыгэ ӏорӏуатэхэр. Я 3–4-рэ томхэр|
If you have questions, would like to propose collaboration, or noticed an error in the corpusexcept ambiguous analyses, which are not corrected manually, please contact Yury Lander.