
Search by job, company or skills
Company Description
Straitpoint is an AI-driven business and technology company dedicated to designing, building, and operating modern digital systems. The company develops proprietary platforms and partners with organizations to address critical business and technology challenges. In this project, we are creating an agentic AI-powered children's storytelling app.
About the Role
Quality at an agentic AI startup is not just about whether buttons work. It is about whether AI behaves.
As our Senior QA / SDET, you will own the entire quality engineering function for a platform where the output is dynamically generated stories, real-time audio, and AI-driven interactions experienced by children ages 26.
You will build intelligent test systems that evaluate not just code correctness, but AI output quality, safety, and consistency.
This role is for someone who thinks like an engineer, tests like a detective, and embraces AI as both the product being tested and the tool used for testing.
Key Responsibilities
Test Architecture & Automation
Design and own the end-to-end test automation strategy across mobile (Flutter / iOS), backend services, and AI pipelines
Build robust, maintainable test frameworks from the ground up using modern SDET practices
Develop automated regression, integration, and performance test suites integrated into CI/CD pipelines
Maintain test environments that mirror production, including on-device AI model behavior
AI & LLM Quality Engineering
Design evaluation frameworks for LLM-generated story content, assessing coherence, age-appropriateness, safety, and engagement quality
Build automated pipelines that run LLM outputs through scoring rubrics, safety classifiers, and regression benchmarks
Detect and track AI-specific failure modes such as hallucinations, prompt injection vulnerabilities, inconsistent persona behavior, and content policy violations
Collaborate with AI engineers to define measurable quality thresholds for model updates and prompt changes
Test agentic workflows end-to-end, including multi-step LLM chains, tool-calling sequences, and cross-session context management
Mobile & Platform QA
Own test coverage for the iPad-first Flutter application across device types, OS versions, and edge cases
Test on-device AI model integration (Apple Intelligence / Core ML), including offline scenarios, memory constraints, and model fallback behavior
Validate audio playback, TTS synchronization, and interactive media features across varying network conditions
Ensure builds meet App Store submission and review standards
Child Safety & Compliance Testing
Develop specialized test protocols for child-facing AI content, including adversarial prompt testing and edge-case content scenarios
Validate COPPA and GDPR-K compliance in data handling, consent flows, and parental management features
Conduct structured red-teaming exercises targeting child safety guardrails in the AI content pipeline
Quality Culture & Process
Define and track quality KPIs across engineering, including defect escape rate, test coverage, eval pass rates, and incident metrics
Embed quality gates into CI/CD workflows and enforce them as deployment blockers
Mentor junior engineers on testing best practices and AI-aware quality thinking
Produce clear, well-documented bug reports and drive issues to resolution with product and engineering
Requirements
Must Have
5+ years in QA engineering or SDET roles, with at least 2 years in a senior or lead capacity
Strong programming skills in Python, Dart, TypeScript, or similar. You write code, not just test scripts
Proven experience building test automation frameworks from scratch
Hands-on experience with mobile testing, iOS or Flutter preferred
Solid understanding of CI/CD integration and writing reliable automated tests
Active use of AI tools in daily engineering workflows
Strong debugging and root cause analysis skills across complex systems
Nice to Have
Direct experience writing LLM evals or working with evaluation frameworks (LangSmith, RAGAS, PromptFoo, or custom-built)
Familiarity with AI safety concepts such as content moderation, output filtering, prompt injection, and jailbreak testing
Experience testing on-device ML models or Core ML integrations on Apple platforms
Background in testing audio or real-time interactive applications
Knowledge of child safety standards in digital products, including COPPA and GDPR-K
Experience in early-stage startups where quality infrastructure was built from zero
Technical Stack You'll Work With
Mobile: Flutter, iOS (Swift / SwiftUI), TestFlight
AI / LLM: Anthropic Claude, Apple Intelligence (on-device), Core ML
Backend: Node.js / TypeScript services, REST and streaming APIs
CI/CD: GitHub Actions
Cloud: AWS
Testing Tools: Your recommendations will help shape our stack
The Mindset We're Hiring For
You treat AI outputs as first-class test subjects, not black boxes
You are proactive and anticipate failure modes before they ship
You are comfortable owning quality end-to-end in a fast-moving environment
You recognize the responsibility of building for toddlers and young children
You use AI to write better tests and automate repetitive QA work without cutting corners
You document what you build so the team can sustain it
What Success Looks Like in 90 Days
By the end of your first 90 days, you will have:
Audited the existing test coverage
Shipped at least one automated eval pipeline for AI story content
Integrated quality gates into CI/CD
Produced a clear testing roadmap for the next two quarters
Why Join Straitpoint
Location: Sudirman, CBD, Jakarta
Work Model: Full-Time, WFO
Apply via LinkedIn or send your CV to: [Confidential Information]
Job ID: 143147947