Reporting to the Data, AI & Platforms Manager, you will be responsible for leading the daily technical execution and engineering quality of the Platform Engineering team through hands-on work, technical guidance, review, mentoring, and coordination across DevOps, Platform Engineering, and Data Platform activities.
Your primary objective is to act as the primary technical counterpart and operational right hand of the Manager for the platform domain, translating defined architecture, priorities, and engineering standards into consistent technical execution. Leading primarily through technical seniority, hands-on expertise, and execution ownership, you will ensure platform services remain stable, observable, secure, maintainable, and production-ready while decisively coordinating team dependencies, managing ITIL governance, leading incident response, and handling Change Advisory Board (CAB) processes to safeguard production environment integrity.
Key Responsibilities
- Act as the primary technical reference and senior escalation point for the Platform Engineering team, independently diagnosing and resolving complex platform, infrastructure, database, and reliability issues
- Translate defined architectures, priorities, and engineering standards into practical implementation tasks, technical guidance, review criteria, and robust production-readiness controls
- Review and approve technical designs, scripts, pipelines, configurations, Infrastructure as Code (IaC), operational runbooks, deployment plans, and engineering changes
- Design, build, maintain, and troubleshoot enterprise CI/CD pipelines, deployment workflows, release processes, rollback mechanisms, artifact handling, and strict quality gates to ensure complete deployment traceability
- Lead hands-on Linux and Kubernetes operations, orchestrating complex troubleshooting, performance profiling, advanced networking, storage management, resource optimization, and rapid service recovery
- Write, maintain, and rigorously review Infrastructure as Code (IaC) using Terraform, Ansible, or equivalent tools to ensure infrastructure changes are entirely reproducible and version-controlled
- Build and mature monitoring and observability platforms using Prometheus, Grafana, OpenSearch, or ELK to maximize dashboard coverage, alert quality, service-health visibility, and anomaly detection
- Provide senior technical ownership and governance for PostgreSQL, MySQL, Hadoop, and Data Lakehouse infrastructure, driving cluster configuration, health monitoring, high availability, replication, and capacity management
- Enforce ITIL governance framework guidelines, systematically managing ticketing systems, request chains, and the Change Advisory Board (CAB) process to control operational risk and production changes
- Lead technical incident response and crisis triage, coordinating active service recovery, conducting meticulous root-cause analysis, and delivering comprehensive post-incident documentation and follow-up reliability improvements
- Coordinate day-to-day technical activities, dependencies, blockers, and delivery requirements among platform engineering colleagues through assertive technical direction
- Mentor and scale platform engineers via constructive code reviews, hands-on pairing sessions, troubleshooting guidance, knowledge sharing, and structured technical feedback
- Build and deploy automation scripts, internal APIs, webhooks, and utilities using Python and Bash to aggressively eliminate manual effort and reduce operational friction
- Utilize and govern AI-assisted engineering tools, agentic assistants, and operational copilots responsibly, while guiding the team on safe practices for log analysis, scripting, and technical documentation
- Maintain highly current, production-grade documentation, including live architecture notes, operational runbooks, IaC codebases, incident records, and technical decision logs
Is this you
- 6+ years of experience in Platform Engineering, DevOps, or Systems Engineering with a proven track record as a technical lead, demonstrating strong, assertive leadership to oversee 3 major domains (Data, AI, and Platform) simultaneously
- Bachelor's degree in Computer Science, Information Technology, Information Systems, Engineering, or a related field, with excellent stakeholder-management skills and English fluency
- Deep hands-on expertise in enterprise Linux administration and advanced Kubernetes cluster management, including workload scheduling, networking, ingress, storage, and pod-level troubleshooting
- Extensive experience with automated CI/CD tools (GitLab CI/CD, Jenkins, ArgoCD) and Infrastructure as Code platforms (Terraform, Ansible) to ensure changes are version-controlled and reproducible
- Heavy experience with monitoring setups (Prometheus, Grafana, OpenSearch, ELK) combined with solid operational knowledge of relational databases (PostgreSQL/MySQL) and Big Data environments (Hadoop/Data Lakehouse)
- Practical understanding of ITIL governance practices, ticketing systems, request chains, and Change Advisory Board (CAB) process alignment, alongside a solid grasp of platform security practices (RBAC, secrets management, network policies)
- Strong scripting capability using Python and Bash for operational tooling, with the ability to responsibly use AI-assisted engineering tools and copilots for troubleshooting and log analysis
- Relevant professional certifications (CKA, CKAD, CKS, RHCE, ITIL, cloud, or database certifications) are highly preferred, and exposure to advanced data frameworks (Iceberg, Spark, Trino) or GitOps/Service Mesh tooling is a strong plus
Be Part of the ATI Journey
Our aim at ATI Business Group in working with our Clients is to support their continued growth by providing cost-effective technology and talented & scalable people resources on demand. ATI's singular focus on providing services to the travel and hospitality business communities across the globe has been a remarkably successful one. Since commencing in 2002, we now have over 1,300 employees providing services to our clients worldwide.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.