About The Job
VIDA Digital Identity is Indonesia's leading provider of digital identity verification, digital signature, and trust services , serving enterprises and government institutions with high standards of security, compliance, and reliability .
We are seeking an experienced Site Reliability Engineering (SRE) Lead to drive the reliability, scalability, and operational excellence of VIDA's core infrastructure across both data centers and cloud environments .
The ideal candidate will have deep expertise in data center operations , infrastructure reliability , and automation , with strong experience in regulated SaaS environments .
Responsibilities 1. Site Reliability & Infrastructure Management
- Lead the SRE function to maintain high availability and performance across all environments.
- Manage robust, scalable, and secure infrastructure supporting VIDA's digital identity and trust platforms.
- Establish monitoring, alerting, and incident response systems to proactively detect and mitigate service disruptions.
- Drive automation in deployment, scaling, and recovery processes to reduce manual effort.
2. Data Center Operations
- Oversee VIDA's physical and hybrid data center operations, ensuring performance, security, and uptime SLAs.
- Collaborate with network engineers, cloud architects, and system admins to maintain seamless connectivity and integration.
- Establish and maintain Disaster Recovery (DR) and Business Continuity Plans (BCP) aligned service obligations.
3. Reliability Engineering & Continuous Improvement
- Build and maintain observability frameworks for system health and performance monitoring.
- Conduct root cause analyses (RCA) for incidents and implement corrective actions.
- Partner with development teams to embed reliability and performance improvements into the software delivery process.
Qualifications & Experience
- Leadership & Team Development
- Lead and mentor a team of SREs and infrastructure engineers.
- Collaborate cross-functionally with Engineering, Security, Compliance, and Product teams.
- Establish and maintain documentation and standard operating procedures (SOPs) for infrastructure management.
Must Have:
- Bachelor's degree in Computer Science, Information Systems, or a related technical field.
- 8+ years of experience in SRE, Infrastructure, or DevOps roles with at least 3 years in a leadership position.
- Strong technical expertise in data center operations , networking , load balancing , storage systems , and server infrastructure .
- Strong knowledge of networking ( TCP/IP, BGP routing, switching, VLANs, firewalls, VPNs, Transit IP ).
- Experience managing hybrid infrastructure environments (on-premise and cloud).
- Experience with Linux systems administration, containerization (Docker/Kubernetes) , and Infrastructure as Code (Terraform, Ansible) .
Preferred:
- Experience in SaaS or regulated industries
Familiarity with cryptographic systems, PKI, and HSM management .