THE ESSENCE OF MODELING AND SEGMENTATION OF THE KARAKALPAK AND UZBEK LANGUAGES

Authors

  • Munisa Xudoyberganova Master’s Student National University of Uzbekistan

DOI:

https://doi.org/10.62536/sjehss.2025.v3.i8.pp1-6

Abstract

This article explores the modeling and segmentation of the Karakalpak and Uzbek languages within the framework of computational linguistics and Natural Language Processing (NLP). Given their shared agglutinative morphological structure, both languages require detailed morphological, syntactic, and semantic analysis for effective computational processing. The study emphasizes the importance of accurate segmentation—at sentence, word, and morpheme levels—as a foundational step for various NLP applications, including machine translation and morphological parsing. It also addresses the underrepresentation of Karakalpak in digital linguistic resources, advocating for the creation of structured parallel corpora.

References

1.Abdurakhmonova, N., Shamsiyeva, G. (2025). Machine Translation Based on Parallel Corpus. Globeedit. – 101 pages.

2.Abdurakhmonova, N.Z. (2018). Linguistic Support of the English-Uzbek Machine Translation Program (On the Example of Simple Sentences). Author’s abstract of the PhD dissertation. Tashkent. – p. 49.

3.Bocharov, V.V., Alexeyeva, S.V., Granovskiy, D.V., Ostapuk, N.A., Stepanova, M.E., Surikov, A.V. (2012). Text Segmentation in the "Open Corpus" Project. In: Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference "Dialogue" (Bekasovo, May 30 – June 3, 2012). Vol. 11 (18). Moscow: RSUH. – pp. 51–60.

4.Boyarskiy, K.K. (2013). Introduction to Computational Linguistics. Saint Petersburg. – p. 28.

5.Varga, D., Nebel, B. (2007). Hunalign: A Tool for Statistical Alignment of Parallel Corpora. Proceedings of the Machine Translation Summit XI. – pp. 188–195.

6.Xudoyberganova, M.Sh. (2025). Modeling of Syntactic Units for Karakalpak–Uzbek Machine Translation. Master’s dissertation. Tashkent. – p. 74.

Downloads

Published

2025-08-11

Issue

Section

Articles

How to Cite

THE ESSENCE OF MODELING AND SEGMENTATION OF THE KARAKALPAK AND UZBEK LANGUAGES. (2025). Sciental Journal of Education Humanities and Social Sciences, 3(8), 1-6. https://doi.org/10.62536/sjehss.2025.v3.i8.pp1-6