This digital corpus was developed as part of a postdoctoral research project funded by TÜBİTAK (The Scientific and Technological Research Council of Turkey) through its 2219 – International Postdoctoral Research Fellowship Program. The project was conducted at the Institute for the Interdisciplinary Study of Language Evolution and Zurich Center for Linguistics at the University of Zurich, under the academic supervision of Prof. Dr. Paul Widmer and Dr. Dagmar Jung.
The corpus is based on a curated selection of 20 texts written in a Northern dialect of Zazaki (Kırmancki) and serves as the foundation for a structured, linguistically annotated Zaza language resource.
Each text is enriched with detailed linguistic metadata, including:
The annotations span several linguistic domains and subcategories:
Annotation was performed using the digital tool CATMA (Computer Assisted Text Markup and Analysis). While some texts are already fully annotated, others are still in progress. Annotations will continue to be added incrementally.
Annotations are organized into six categories: Morphology, Syntax, Morphosyntax, Semantics, Phrase structure, and Lexicology. While annotation coverage is still incomplete, it is being continuously expanded. By hovering the mouse over a word, users can view its corresponding annotations. Annotations can also be downloaded for each individual text or as a combined dataset. Both words and annotation layers can be searched via the interface, with results highlighted in color for easy identification.
Users can explore the corpus through:
This corpus was developed to serve multiple purposes:
This corpus was created by Assoc. Prof. Dr. İlyas Arslan.
All included texts were previously published by their respective authors.
The use of these texts in the corpus was authorized by the authors.
For academic or other using, please cite as follows:
Arslan, İ. (2025). Zaza Text Corpus. University of Zurich & Munzur University. https://ilyasarslan62.github.io/zazatextcorpus/