AI Engineer - Speech

AJARI.AI

Indonesia

Fresher

Save

Posted 20 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

AI Engineer - Speech

Company Description

AJARI TECHNOLOGIES is a technology company focused on an AI-powered learning and workforce development platform. Our solutions are designed for governments, institutions, schools, universities, and large organizations to personalize education, upskill, and transform learning outcomes at a national level. We work to scale education to meet the diverse needs of our clients, ensuring impactful results and continuous improvement.

About the Role

This is a full-time on-site role for an AI Engineer (NLP). The AI Engineer will be responsible for working on Natural Language Processing (NLP) projects, developing and implementing machine learning solutions, and creating algorithmic models. Daily tasks will include coding, testing, and maintaining software applications related to NLP and LLM-based systems, such as chatbots, RAG pipelines, and agentic workflows. The role involves close collaboration with product managers, backend engineers, and other stakeholders to design, build, and deploy scalable NLP solutions into production.

Responsibilities:

ASR (Speech-to-Text)

Design and develop an end-to-end ASR pipeline: audio preprocessing, decoding, post-processing, and product integration.
Improve ASR accuracy through measurable evaluation (e.g., **WER**) and domain-specific experiments (noise, accents, multilingual).
Integrate production components such as **Voice Activity Detection (VAD)** and **speaker diarization** for use cases like call analytics and meeting transcription.

TTS (Text-to-Speech)

Develop / fine-tune TTS models to produce natural, expressive, real-time speech output.
Build a TTS evaluation framework (e.g., **MOS (Mean Opinion Score)**, intelligibility, and latency) and continuously iterate on quality improvements.

Inference Optimization & Production

Optimize ASR/TTS inference for low latency and high throughput: profiling, batching/streaming, memory optimization, and resource utilization.
Accelerate GPU inference and handle deployment.
Establish production-grade practices: monitoring (latency, error rate), rollback strategy, A/B evaluation, and incident readiness.

Must Have:

Strong Python skills and experience building models with **PyTorch/TensorFlow**.
Hands-on experience in ASR (training/fine-tuning/inference) and evaluation metrics (e.g., **WER**).
Experience building AI systems for production (API/service development, containerization, observability, ML CI/CD).
Familiarity with performance engineering (profiling, bottleneck analysis, inference capacity planning).

Nice to Have:

Experience with streaming ASR and/or pipeline components such as VAD and diarization.
Experience optimizing GPU inference.
Experience with open-source and closed-source TTS + quality evaluation (MOS).
Experience in voice domains: call centers, meetings, voice assistants, telephony, noisy environments.

Tech Stack: