Job Description
Company Overview:
Our organization is a leading innovator in cybersecurity, cloud, and AI solutions, dedicated to developing cutting-edge products and services that address the evolving needs of the technology landscape. We thrive in a rapidly developing market (Indonesia) where the demand for advanced tech solutions is ever-growing, driven by rapid technological advancements. We are an AI-native company committed to continuous improvement, helping our customers unlock their full revenue potential.
Role Summary
As a Data Engineer you will play a crucial role in building and managing the data pipelines that are essential for training and fine-tuning our Large Language Models (LLMs), with a specific focus on the Indonesian language. You will be responsible for designing, building, and maintaining a robust and scalable data infrastructure. You will collaborate closely with our team of Data Scientists and Machine Learning Engineers to ensure the availability of high-quality, clean, and structured Indonesian language data for developing accurate and locally relevant AI models.
Key Responsibilities
- Build and Manage Data Pipelines: Design, develop, and maintain ETL (Extract, Transform, Load) processes to collect and process Indonesian text data from various sources, such as databases, APIs, and log files.
- Data Collection and Integration: Gather complex and relevant datasets tailored to business needs, particularly Indonesian text data that covers a wide range of dialects and linguistic styles.
- Data Cleaning and Pre-processing: Perform data cleaning to handle inconsistent, duplicate, or corrupted data. You will also transform raw data into a usable format for training machine learning models.
- Data Architecture: Design and implement an efficient and scalable data architecture, including data warehouses and data lakes, to store and manage large volumes of data.
- Ensure Data Quality: Develop data validation methods and analysis tools to ensure the integrity and accuracy of the data used for model training.
- Team Collaboration: Work closely with Data Scientists to understand their data requirements and provide ready-to-use data for the fine-tuning and evaluation of LLM models.
- Performance Optimization: Monitor and optimize the performance of data pipelines to ensure efficiency and scalability, especially when handling very large volumes of data.
Requirements
Qualifications & Experience:
Education
- Required: Bachelor's degree in Computer Science, Engineering, Information Technology, or a related quantitative field.
- Preferred: Master's degree in a Computer Science.
Experience
- Required: Minimum 3 to 5 years of hands-on experience in a data engineering role, particularly in projects involving big data and machine learning with a proven track record of designing and implementing data pipelines and architecture including data ingestion, storage, processing and delivery.
- Preferred: Experience with Big Data technologies (e.g., Hadoop, Spark). Experience with cloud platforms (e.g., AWS, GCP, Azure) and their associated data services. Familiarity with DevOps/DataOps principles for CI/CD.
Required Skills
- Technical Skills:
- Programming Languages: High proficiency in programming languages such as Python, SQL, and Scala.
- Databases: Deep understanding of relational databases (like MySQL, PostgreSQL) and NoSQL databases (like MongoDB).
- Big Data Tools: Hands-on experience with big data technologies such as Apache Spark, Hadoop, and Kafka.
- Cloud Computing: Knowledge of cloud platforms like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.
- ETL Tools: Familiarity with ETL tools like Apache Airflow, Talend, or Stitch.
- Soft Skills:
- Strong analytical and problem-solving abilities.
- Excellent communication and teamwork skills to collaborate effectively with various teams.
- Ability to work independently in a dynamic environment.
Competencies
- Technical:
- Architecture Design
- Business Needs Analysis
- Data Analysis and Interpretation
- Infrastructure Design
- Software Design
- Solution Architecture
- System Architecture Design
- System Configuration Management
- System Integration
- Leadership:
- Applied Learning
- Building Customer Loyalty
- Business Awareness
- Collaborating
- Continuous Improvement
- Planning & Organizing
- Quality Orientation