Founding AI Engineer

Highfive

Indonesia

5-10 Years

Save

Posted 18 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Our Client - A US-based startup who is the neutral routing layer for voice AI. One API, every provider (STT/TTS/V2V), routed by language, latency, cost, and quality. Built for multilingual voice agents.

Responsibilities

Own and ship Company's Voice Reliability Index (VRI), our public weekly-updated benchmark across voice providers (OpenAI, Google, ElevenLabs...) measuring p50/p95 latency, successful-turn rate, language-specific WER/MOS, cost per minute.
Build the recommendation engine that powers Company's core value prop: when a customer onboards (a YC voice-agent dev, a contact-center vendor, a multilingual app), We answers for your specific use case, the optimal stack is X with system prompt Y and tool-call config Z - with a benchmark-backed guarantee.
Lead TTS voice-identity preservation across providers. Decide build-vs-partner, staff and ship.
Define the canonical metric system (routed turns, success-rate, latency budget) used across product, deck, dashboard, investor updates.
Expand benchmarks to multilingual coverage (Indonesian, Vietnamese, Cantonese, Thai, Uzbek)
Work directly with the CEO and the 4-person founding team. Daily async with engineering in Central Asia
Ship public artifacts weekly: leaderboard updates, methodology posts, regression cards, incident postmortems. The benchmark is also the marketing engine.
Own the public face of Company's eval credibility on Twitter, GitHub, Hugging Face. Be quotable.

Qualifications

5-8+ years building production AI/ML systems, ideally as a founding/early engineer at an AI-native or voice/speech startup.
Direct experience with real-time voice agent pipelines (STT -> LLM -> TTS over WebRTC / WebSocket).
Has built or shipped voice agents in production, not just experimented. Pipecat / LiveKit familiarity is a strong plus.
Strong evaluation / benchmarking instincts: has shipped public benchmarks, contributed to leaderboards (Hugging Face Open ASR, TTS Arena, LMSYS, SEA-HELM), or built internal eval pipelines that drove product decisions.
Familiar with RAGAS / DeepEval / Promptfoo / TruLens.
Multi-LLM workflow experience: has built pipelines that route subtasks to different models (e.g. GPT-4o for vision, Gemini for spatial, Claude for reasoning). Understands the right model per use case pattern.
Statistical and methodological rigor: percentiles not averages, reproducibility, version control for datasets, environmental robustness.
Open-source / public-shipping track record: GitHub repos with stars, technical blog posts, public benchmarks, conference talks, or active Twitter/X presence in the AI/voice community.
AI-native operator: daily user of Claude Code, Cursor, GPT, agent workflows. Treats benchmarking + research as engineering problems automatable with agents.

What you bring

Required skills

Tech stacks: Python, TypeScript, ML evaluation, statistical methodology, voice AI pipelines (STT/TTS/LLM), WebRTC/WebSocket, LiveKit/Pipecat, OpenAI/Anthropic/Google APIs, Hugging Face, GitHub Actions, Docker.
Industry: voice AI infrastructure, AI-native B2B, dev tools, speech tech, conversational AI.
Language: fluent English (written + spoken). Bahasa Indonesia native a plus for benchmark dataset curation in regional languages.
Comfortable being a public technical voice for the company on Twitter, GitHub, Hugging Face, Discord.

Preferred skills