About The Role
We are looking for a proactive and experienced
Senior DevOps Engineer with a minimum of
4 years of hands-on experience. You will play a crucial role in managing our cloud infrastructure, automating our software delivery processes, and ensuring the high availability of our services. You'll work closely with multiple engineering teams and business units, taking ownership of various projects from conception to completion.
What You Will Do:
- Architect and manage CI/CD pipelines to enable continuous integration and deployment across our systems.
- Maintain and scale our Kubernetes infrastructure, ensuring it meets our growing demands for performance and reliability.
- Implement and manage infrastructure as code (IaC) using Terraform, provisioning and configuring cloud resources in an automated and consistent manner.
- Automate workflows and processes using tools like GitHub Actions.
- Administer Cloudflare for DNS management, security, and performance optimization.
- Manage secrets and sensitive data using a dedicated secret manager to enhance security.
- Build and maintain monitoring and alerting systems using the Grafana/Prometheus stack to ensure system health and proactively identify issues.
- Collaborate with engineering teams to provide support and expertise on infrastructure-related needs.
- Lead Site Reliability initiatives to improve system uptime, performance, and overall resilience.
- Respond to and resolve production emergencies with a sense of urgency and professionalism.
What You'll Bring:
- Minimum 4 years of professional experience in a DevOps or Site Reliability Engineering role.
- Deep expertise in CI/CD principles and automation.
- Extensive experience with Kubernetes and containerization.
- Proficiency in Terraform for managing cloud infrastructure.
- Hands-on experience with GitHub Actions or similar workflow automation tools.
- Familiarity with Cloudflare or other content delivery networks (CDNs).
- Experience with secret management tools.
- Strong knowledge of the Grafana/Prometheus stack for monitoring and observability.
- Excellent problem-solving and communication skills, with the ability to manage multiple projects and priorities simultaneously.
- A proactive and ownership-driven mindset, especially during critical incidents.