About The Role
We are looking for a proactive and experienced
Senior DevOps Engineer with a minimum of
4 years of hands-on experience. You will play a crucial role in managing our cloud infrastructure, automating our software delivery processes, and ensuring the high availability of our services. You&aposll work closely with multiple engineering teams and business units, taking ownership of various projects from conception to completion.
What You Will Do:
- Architect and manage CI/CD pipelines to enable continuous integration and deployment across our systems.
- Maintain and scale our Kubernetes infrastructure, ensuring it meets our growing demands for performance and reliability.
- Implement and manage infrastructure as code (IaC) using Terraform, provisioning and configuring cloud resources in an automated and consistent manner.
- Automate workflows and processes using tools like GitHub Actions.
- Administer Cloudflare for DNS management, security, and performance optimization.
- Manage secrets and sensitive data using a dedicated secret manager to enhance security.
- Build and maintain monitoring and alerting systems using the Grafana/Prometheus stack to ensure system health and proactively identify issues.
- Collaborate with engineering teams to provide support and expertise on infrastructure-related needs.
- Lead Site Reliability initiatives to improve system uptime, performance, and overall resilience.
- Respond to and resolve production emergencies with a sense of urgency and professionalism.
What You&aposll Bring:
- Minimum 4 years of professional experience in a DevOps or Site Reliability Engineering role.
- Deep expertise in CI/CD principles and automation.
- Extensive experience with Kubernetes and containerization.
- Proficiency in Terraform for managing cloud infrastructure.
- Hands-on experience with GitHub Actions or similar workflow automation tools.
- Familiarity with Cloudflare or other content delivery networks (CDNs).
- Experience with secret management tools.
- Strong knowledge of the Grafana/Prometheus stack for monitoring and observability.
- Excellent problem-solving and communication skills, with the ability to manage multiple projects and priorities simultaneously.
- A proactive and ownership-driven mindset, especially during critical incidents.