AI Engineer — Model Training & AI Exploration
About The Project
We are building a comprehensive
Quran Recitation Learning Platform — a production system that helps users practice and improve their Quran recitation using real-time AI-powered speech recognition, Tajweed rule analysis, and personalized audio feedback. The platform consists of a React Native mobile app, a FastAPI backend, and multiple GPU-accelerated microservices.
Our AI pipeline currently processes thousands of audio recordings, combining
ASR (Automatic Speech Recognition),
Tajweed analysis,
pronunciation validation, and
TTS (Text-to-Speech) feedback generation — all running as containerized gRPC microservices with CUDA acceleration.
Role Overview
We are looking for an
AI Engineer to own and advance the model training pipeline and explore new AI approaches to improve our Quran recitation system. You will work with production ASR models and Tajweed analysis — improving accuracy, reducing latency, and expanding capabilities.
This is a hands-on role focused on
fine-tuning, evaluation, improve scoring and AI R&D — not just API integration. You will be the primary person responsible for making AI models and scoring better.
What You'll Do
Scoring Improvement
- Use method for improve tajweed and word error calculation
- Create script for harness test
Model Training & Fine-Tuning
- Fine-tune ASR models for Quranic Arabic using NVIDIA NeMo (FastConformer Hybrid RNNT/CTC architecture)
- Train and optimize custom models for Tajweed rule detection (currently Whisper-based)
- Train pronunciation validation models using Wav2Vec2 for harakat (diacritics) error detection
- Build and maintain training data pipelines — data collection, cleaning, augmentation, and quality control
- Develop evaluation harnesses with automated metrics (WER, CER, Tajweed accuracy, speaker similarity)
- Manage experiment tracking (MLflow / Weights & Biases) and model versioning
AI Exploration & R&D
- Research and prototype new architectures for Quranic Arabic ASR (conformer variants, whisper fine-tuning, custom tokenizers)
- Explore on-device / edge deployment of lightweight ASR models for mobile inference
- Experiment with LLM-based approaches for contextual recitation feedback and error explanation
- Benchmark alternative models (e.g., Whisper large-v3, SeamlessM4T, custom conformer) against current pipeline
- Research voice activity detection (VAD) and audio segmentation optimized for Quranic recitation patterns
Current System You'll Improve
Our AI Pipeline Today
Mobile App (React Native)
↓ Audio (WAV 16kHz)
Backend (FastAPI + Socket.IO)
↓ gRPC
├── QuranASRNemo (port 50051) -- NeMo FastConformer, streaming + offline
├── QuranASRTajweed (port 50053) -- Whisper-based Tajweed rule detection
├── QuranASRWav2Vec2 (port 50054) -- Raw pronunciation validation
└── QuranFeedback (port 50052) -- Coqui XTTS v2 TTS with voice cloning ## Disabled for now
↓
Weighted Scoring → Accuracy + Tajweed Violations + Pronunciation Errors ## This need to be improve
↓
Audio Feedback (TTS) + Text Feedback → Mobile App ## Disabled for now
Known Areas For Improvement You'd Tackle
- Hardcoded confidence scores (currently fixed at 0.9 regardless of actual model output)
- GPU inference serialization bottleneck (single lock, no batching)
- No model versioning or experiment tracking infrastructure
- Scoring thresholds lack empirical calibration (current heuristic: 45/25/15/15 split)
- TTS voice cloning path bug (hardcoded speaker reference)
- No training data pipeline or data quality tooling exists yet
Notes
Model training and fine tune is not primary focus for now, but nice to do if wanted