About You
- You're a go-getter with mad juggling skills (or multiple hats) who can thrive in a fast-paced, agile environment
- You enjoy doing purpose-led and meaningful work
- You have a strong thirst for knowledge and are driven to find solutions that don't exist yet
- You are comfortable with ambiguity and extremely resourceful (in your past life, you could've been a detective)
- You always find a way to get things done without sacrificing the quality of your work, integrity, and values
- No task is off limits for you
- You are humble and prioritize the success of the team over your own with an eagerness to help those around you
- You don't shy away from challenges and can bounce back from setbacks
What you'll do and what success looks like in this role:
- Manage and oversee the handling of critical incidents in a timely and structured manner, ensuring effective root cause analysis (RCA) and implementation of preventive actions.
- Drive operational automation initiatives, including auto-remediation, automated health checks, and improvement of monitoring alerts to enhance efficiency and reduce recurring issues.
- Monitor service performance in alignment with Service Level Agreements (SLA) both internally and externally, including collaboration with vendors and regulators.
- Ensure all system changes are conducted based on proper risk analysis and adhere to change management procedures.
- Maintain compliance with internal policies, security standards, and applicable regulatory requirements.
- Promote a data-driven culture focused on preventive measures and continuous improvement across IT operations.
What Is Required and What We're Looking For
- Bachelor's degree in Information Technology, Information Systems, Computer Science, or a related field.
- Minimum 7 years of experience in IT Operations, Service Assurance, Incident/Problem Management, SRE, DevOps, or other technology operations functions within banking, fintech, or digital platforms.
- Proven experience in implementing modern operational approaches, including observability, automation, reliability metrics, post-incident reviews, and continuous improvement.
- Strong understanding of ITIL framework and IT operational governance.
- Hands-on experience with monitoring and observability tools.
- Solid understanding of cloud-native architecture, APIs/microservices, and large-scale digital services.
- Ability to design or lead the implementation of operational automation.
- Familiarity with OJK/BI regulations and ISO 27001 standards related to incident, change, and DR/BCP.
- Strong leadership in handling major incidents and coordinating cross-functional teams.
- Excellent communication skills, with the ability to bridge technical and non-technical stakeholders.
- Strong analytical thinking and ability to make quick decisions under pressure.
- Results-oriented, adaptable, and equipped with structured problem-solving capabilities.