Contribute Tamil, Sinhala & English text data to train Mozhii.AI — a multilingual AI built for Sri Lanka.
Mozhii.AI is an open-source language model initiative focused on Tamil, Sinhala, and English. We're building high-quality datasets through community contributions to train AI models that truly understand our languages and cultures.
🌐 Meet Our Team →Built by the community, for the community. Every submission enriches our shared language heritage.
Your personal information is never collected in the dataset. All submissions are reviewed before inclusion.
Approved data is automatically formatted into JSONL — the standard format for AI training.
Tamil and Sinhala are severely underrepresented in global AI datasets. Your contribution helps bridge this gap and ensures our languages are not left behind in the AI revolution.
Language carries culture. By building AI that understands our languages, we preserve and promote the rich cultural heritage embedded in Tamil and Sinhala.
From translation to education, healthcare to government — multilingual AI improves services for every citizen of Sri Lanka.
Every text you submit directly improves Mozhii.AI's ability to understand, translate, and generate content in your language. You're shaping the future of AI in Sri Lanka.
Gather text data in Tamil, Sinhala, or English. Accepted formats: .txt, .pdf, .docx, .csv, or images.
Fill in the form below, paste your text or upload files, and submit your contribution.
Our team reviews every submission for quality and compliance before adding it to the dataset.
Approved data is used to train and improve Mozhii.AI, making it smarter and more accurate.
Complete the steps below to share your valuable language data with Mozhii.AI
Feedback, thoughts, problems — we'd love to hear from you!