Engineer - Cloud Infrastructure Operation

Lintasarta

Indonesia

3-5 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

About Lintasarta:

As Indonesia's leading Information and Communication Technology (ICT) provider, Lintasarta is at the forefront of accelerating digital transformation for businesses across the nation. We empower enterprises with innovative, high-impact technology solutions. At the core of everything we do is our commitment to our fundamental values: Innovation, Collaboration, Agility, Resilience, and Ethics (ICARE). By joining our team, you will be part of a dynamic environment that pushes technological boundaries and builds mission-critical digital infrastructure for Indonesia's future.

The Role:

Are you passionate about building highly reliable, efficient, and scalable production systems We are looking for a dedicated Site Reliability Engineer (SRE) to act as the vital bridge between our development and operations teams. In this critical role, you will focus heavily on automation, comprehensive monitoring, and swift incident response to ensure our mission-critical services run flawlessly and continuously adapt to growing demands.

What You Will Do:

Ensure Reliability & Performance: Architect and maintain high availability (HA) and disaster recovery (DR) solutions while proactively identifying and resolving system bottlenecks (CPU, memory, latency, and throughput).
Drive Monitoring & Observability: Build and manage comprehensive monitoring dashboards (utilizing USE/RED methods) and establish rigorous Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) for critical services.
Manage Incidents & Changes: Participate in on-call rotations for rapid incident response, lead blameless post-mortems to propose long-term systemic improvements, and manage system changes using Infrastructure as Code (IaC) principles.
Automate & Plan Capacity: Automate deployment, scaling, and recovery processes, alongside conducting strategic capacity planning based on traffic growth trends.
Guarantee Security & Compliance: Ensure strict adherence to security standards, including environment isolation, least privilege access, reliable backups, and meticulous patch management.

What You Bring to the Table:

Education & Experience: Minimum Bachelor's Degree (S1) in Informatics Engineering, Information Systems, Computer Science, or a related field, coupled with 3–5 years of proven experience as an SRE, DevOps Engineer, System Engineer, or Backend Engineer.
System & Infrastructure Mastery: Deep expertise in Linux/Unix environments (debugging, performance tuning, shell scripting), familiarity with cloud platforms (specifically AWS: EC2, EKS, RDS, S3), and a strong grasp of containerization and orchestration (Docker, Kubernetes).
Programming & Scripting: Proficiency in at least one programming language (Go, Python, or Ruby) and robust scripting abilities (Bash) for operational task automation.
Monitoring & Observability Tools: Hands-on experience with Prometheus, Grafana, Cortex Metric, and the ELK Stack (Elasticsearch, Logstash, Kibana) or Loki.
Automation & CI/CD: Strong expertise in IaC tools like Terraform or Terragrunt (Ansible is a plus), along with CI/CD pipelines such as Jenkins, GitLab CI, ArgoCD, or GitHub Actions.
Incident Management Culture: Experience with on-call alerting tools (Alertmanager, PagerDuty), blameless post-mortem cultures, and a solid foundation in problem-solving. Bonus points if you have experience with Chaos Engineering (Chaos Mesh, Gremlin, Chaos Monkey).
Exceptional Soft Skills: Strong analytical thinking, excellent communication within Agile/Scrum teams, and the meticulous ability to document runbooks and incident reports.