We are currently supporting a leading organization in the transportation sector in their search for an experienced Senior DevOps professional. This role partners closely with key stakeholders and plays a critical role in driving infrastructure scalability, system reliability, and continuous improvement initiatives across the organization. It is a high-impact opportunity suited for someone who excels in a fast-paced, technology-driven environment and can adapt quickly to evolving operational and business needs.
Responsibilities
- Lead the implementation of Infrastructure as Code (IaC) using tools such as Terraform, Ansible, and others to ensure scalable and maintainable infrastructure.
- Optimize the Software Development Life Cycle (SDLC) to enable smooth and reliable releases, incorporating advanced deployment strategies such as Canary Releases, Blue-Green Deployments, and rollback mechanisms.
- Design and maintain comprehensive monitoring and auto-alerting systems, including ownership of log management, log rotation, and ensuring logs are effectively utilized during incident response by on-call engineers.
- Proactively drive cloud cost optimization by applying industry best practices and selecting the most cost-effective resource configurations.
- Ensure all critical systems have sufficient redundancy (both technical systems and team knowledge) to maintain high availability (HA) at all times.
- Collaborate with the Information Security (InfoSec) team to prioritize and execute patching for critical CVEs at both infrastructure and container levels.
- Manage and actively participate in a 24/7 on-call rotation, ensuring the team is well-prepared to respond to production incidents with clear and effective procedures.
- Mentor and guide team members to identify and eliminate process inefficiencies, while continuously introducing and reinforcing industry best practices.
Requirements
- Minimum of 8–10 years of experience in DevOps, Site Reliability Engineering (SRE), and/or Cloud Infrastructure.
- Extensive hands-on experience with containerization technologies (e.g., Kubernetes, Docker) and public cloud platforms (e.g., AWS, GCP, Huawei Cloud).
- Strong proficiency in scripting languages (Python, Bash) and Infrastructure as Code (IaC) tools such as Ansible and Terraform.
- Solid understanding of networking concepts, cybersecurity principles, and certificate/encryption management.
- Proven track record in managing high-traffic systems and executing zero-downtime migration strategies.