About the Company AI Rudder is at the forefront of AI innovation, developing advanced Large Language Models (LLMs) and AI-powered solutions that enhance how businesses interact with technology. Our Data & Evaluation team plays a critical role in ensuring that our AI models are accurate, reliable, and aligned with real-world use cases.
We are looking for Data & Evaluation LLM to join our AI/Data & Evaluation team. This role is ideal for students or recent graduates who are curious about AI, enjoy working with language and data, and want hands-on experience evaluating and improving cutting-edge LLMs.
Responsibilities
- Support evaluation workflows for Large Language Models (LLMs) across chat, voice, and task-based use cases.
- Review model outputs for quality, factuality, instruction-following, safety, tone, and user experience.
- Label / annotate datasets based on internal guidelines and maintain high consistency.
- Assist in prompt testing, A/B comparisons, and regression checks after model updates.
- Identify recurring failure patterns and summarize findings to senior team members.
- Prepare reports, trackers, and dashboards for quality metrics and project progress.
- Collaborate with Data, Product, QA, and Operations teams on improvement cycles.
- Help organize datasets for training, fine-tuning, and benchmarking.
Requirements
- Currently pursuing a Bachelor's degree or recently graduated in Data Science, Linguistics, Computer Science, Cognitive Science, Statistics, Mathematic or related fields
- Strong logical and analytical thinking, with the ability to work independently and responsibly
- High sensitivity to data, excellent writing and analytical skills; comfortable working with large volumes of text
- Proficient in English (speaking, reading, writing); proficiency in Indonesian is a plus
- Familiarity with AI / ChatGPT / LLM tools and prompt behavior
- Basic knowledge of Python, SQL, JSON, or data preprocessing
- Experience using Jira, Notion, or project management tools
- Familiarity with model evaluation frameworks (human evaluation vs automated evaluation)
- Background in linguistics or AI-related coursework
Additional Information
Internship Type: Internship
Duration: Minimum 3 months
Work Days: At least 4 days per week
Location: Indonesia
Work Environment: Fast-paced AI & technology startup
Learning Opportunity: Hands-on experience in LLM evaluation, data annotation, and AI model improvement