Data and Evaluation LLM Intern

AI Rudder

Indonesia

Fresher

Save

Posted 3 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About the Company AI Rudder is at the forefront of AI innovation, developing advanced Large Language Models (LLMs) and AI-powered solutions that enhance how businesses interact with technology. Our Data & Evaluation team plays a critical role in ensuring that our AI models are accurate, reliable, and aligned with real-world use cases.

We are looking for Data & Evaluation LLM to join our AI/Data & Evaluation team. This role is ideal for students or recent graduates who are curious about AI, enjoy working with language and data, and want hands-on experience evaluating and improving cutting-edge LLMs.

Responsibilities

Support evaluation workflows for Large Language Models (LLMs) across chat, voice, and task-based use cases.
Review model outputs for quality, factuality, instruction-following, safety, tone, and user experience.
Label / annotate datasets based on internal guidelines and maintain high consistency.
Assist in prompt testing, A/B comparisons, and regression checks after model updates.
Identify recurring failure patterns and summarize findings to senior team members.
Prepare reports, trackers, and dashboards for quality metrics and project progress.
Collaborate with Data, Product, QA, and Operations teams on improvement cycles.
Help organize datasets for training, fine-tuning, and benchmarking.

Requirements

Currently pursuing a Bachelor's degree or recently graduated in Data Science, Linguistics, Computer Science, Cognitive Science, Statistics, Mathematic or related fields
Strong logical and analytical thinking, with the ability to work independently and responsibly
High sensitivity to data, excellent writing and analytical skills; comfortable working with large volumes of text
Proficient in English (speaking, reading, writing); proficiency in Indonesian is a plus
Familiarity with AI / ChatGPT / LLM tools and prompt behavior
Basic knowledge of Python, SQL, JSON, or data preprocessing
Experience using Jira, Notion, or project management tools
Familiarity with model evaluation frameworks (human evaluation vs automated evaluation)
Background in linguistics or AI-related coursework

Additional Information

Internship Type: Internship

Duration: Minimum 3 months

Work Days: At least 4 days per week

Location: Indonesia

Work Environment: Fast-paced AI & technology startup

Learning Opportunity: Hands-on experience in LLM evaluation, data annotation, and AI model improvement