Search by job, company or skills

Highfive

Founding AI Engineer

5-10 Years
Save
new job description bg glownew job description bg glow
  • Posted 18 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Our Client - A US-based startup who is the neutral routing layer for voice AI. One API, every provider (STT/TTS/V2V), routed by language, latency, cost, and quality. Built for multilingual voice agents.

Responsibilities

  • Own and ship Company's Voice Reliability Index (VRI), our public weekly-updated benchmark across voice providers (OpenAI, Google, ElevenLabs...) measuring p50/p95 latency, successful-turn rate, language-specific WER/MOS, cost per minute.
  • Build the recommendation engine that powers Company's core value prop: when a customer onboards (a YC voice-agent dev, a contact-center vendor, a multilingual app), We answers for your specific use case, the optimal stack is X with system prompt Y and tool-call config Z - with a benchmark-backed guarantee.
  • Lead TTS voice-identity preservation across providers. Decide build-vs-partner, staff and ship.
  • Define the canonical metric system (routed turns, success-rate, latency budget) used across product, deck, dashboard, investor updates.
  • Expand benchmarks to multilingual coverage (Indonesian, Vietnamese, Cantonese, Thai, Uzbek)
  • Work directly with the CEO and the 4-person founding team. Daily async with engineering in Central Asia
  • Ship public artifacts weekly: leaderboard updates, methodology posts, regression cards, incident postmortems. The benchmark is also the marketing engine.
  • Own the public face of Company's eval credibility on Twitter, GitHub, Hugging Face. Be quotable.

Qualifications

  • 5-8+ years building production AI/ML systems, ideally as a founding/early engineer at an AI-native or voice/speech startup.
  • Direct experience with real-time voice agent pipelines (STT -> LLM -> TTS over WebRTC / WebSocket).
  • Has built or shipped voice agents in production, not just experimented. Pipecat / LiveKit familiarity is a strong plus.
  • Strong evaluation / benchmarking instincts: has shipped public benchmarks, contributed to leaderboards (Hugging Face Open ASR, TTS Arena, LMSYS, SEA-HELM), or built internal eval pipelines that drove product decisions.
  • Familiar with RAGAS / DeepEval / Promptfoo / TruLens.
  • Multi-LLM workflow experience: has built pipelines that route subtasks to different models (e.g. GPT-4o for vision, Gemini for spatial, Claude for reasoning). Understands the right model per use case pattern.
  • Statistical and methodological rigor: percentiles not averages, reproducibility, version control for datasets, environmental robustness.
  • Open-source / public-shipping track record: GitHub repos with stars, technical blog posts, public benchmarks, conference talks, or active Twitter/X presence in the AI/voice community.
  • AI-native operator: daily user of Claude Code, Cursor, GPT, agent workflows. Treats benchmarking + research as engineering problems automatable with agents.

What you bring

Required skills

  • Tech stacks: Python, TypeScript, ML evaluation, statistical methodology, voice AI pipelines (STT/TTS/LLM), WebRTC/WebSocket, LiveKit/Pipecat, OpenAI/Anthropic/Google APIs, Hugging Face, GitHub Actions, Docker.
  • Industry: voice AI infrastructure, AI-native B2B, dev tools, speech tech, conversational AI.
  • Language: fluent English (written + spoken). Bahasa Indonesia native a plus for benchmark dataset curation in regional languages.
  • Comfortable being a public technical voice for the company on Twitter, GitHub, Hugging Face, Discord.

Preferred skills

  • Pipecat, LiveKit, Daily.co, RAGAS, DeepEval, Promptfoo, TruLens, Hugging Face

Work arrangement

  • Indonesia-based, with 3 hours of daily overlap with US Pacific (early morning local time)
  • Fully remote

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148519839