Open Source Data Initiative

Train Tomorrow's AI Build the Dataset Shape Language AI Power the Future of Tamil & Sinhala AI

Contribute Tamil, Sinhala & English text data to train Mozhii.AI — a multilingual AI built for Sri Lanka.

40+
Contributors
8+
Datasets
3
Languages
Scroll to explore

Building Language AI
for South Asia

Mozhii.AI is an open-source language model initiative focused on Tamil, Sinhala, and English. We're building high-quality datasets through community contributions to train AI models that truly understand our languages and cultures.

🌐 Meet Our Team →
🌍

Community-Driven

Built by the community, for the community. Every submission enriches our shared language heritage.

🔒

Privacy-First

Your personal information is never collected in the dataset. All submissions are reviewed before inclusion.

🤖

AI-Ready Format

Approved data is automatically formatted into JSONL — the standard format for AI training.

Why Your Data Matters

01

Underrepresented Languages

Tamil and Sinhala are severely underrepresented in global AI datasets. Your contribution helps bridge this gap and ensures our languages are not left behind in the AI revolution.

02

Cultural Preservation

Language carries culture. By building AI that understands our languages, we preserve and promote the rich cultural heritage embedded in Tamil and Sinhala.

03

Better Technology for All

From translation to education, healthcare to government — multilingual AI improves services for every citizen of Sri Lanka.

04

Your Impact

Every text you submit directly improves Mozhii.AI's ability to understand, translate, and generate content in your language. You're shaping the future of AI in Sri Lanka.

Simple, Transparent, Impactful

01

Prepare Your Data

Gather text data in Tamil, Sinhala, or English. Accepted formats: .txt, .pdf, .docx, .csv, or images.

02

Submit Your Contribution

Fill in the form below, paste your text or upload files, and submit your contribution.

03

Our Team Reviews It

Our team reviews every submission for quality and compliance before adding it to the dataset.

04

Powers Mozhii AI

Approved data is used to train and improve Mozhii.AI, making it smarter and more accurate.

Submit Your Contribution

Complete the steps below to share your valuable language data with Mozhii.AI

1
Language Select your language
2
Your Info Name & email
3
Upload Text, docs, or photos
4
Submit Confirm & send

Choose Language

Which language is your data in?

Your Details

Let us know who's contributing

Upload Your Data

Choose a data format below

0 characters

Drop Documents Here

or browse files

.txt, .pdf, .docx, .csv — Max 20MB — Up to 5 files

Drop Images Here

or browse images

.png, .jpg, .jpeg, .gif, .webp — Max 20MB

Tell Us Something

Feedback, thoughts, problems — we'd love to hear from you!