Reporting to the Data, AI & Platforms Manager, you will be responsible for the day-to-day operation, maintenance, troubleshooting, and continuous improvement of CI/CD pipelines, Linux and Kubernetes environments, monitoring and observability platforms, relational databases, and Hadoop/Data Lakehouse infrastructure.
Your primary objective is to independently own assigned technical tasks, operational activities, incidents, and improvement initiatives to strengthen hands-on technical execution across DevOps, Platform Engineering, and Data Platform domains. Working under the technical direction of the Platform Engineering Lead, you will ensure platform services remain stable, observable, secure, and production-ready while proactively managing morning operations to maintain seamless 24/7 infrastructure availability.
Key Responsibilities
- Design, build, maintain, and troubleshoot CI/CD pipelines using GitLab CI/CD, Jenkins, ArgoCD, or equivalent tools to enable scalable and consistent deployment operations
- Optimize and streamline software build, test, deployment, rollback, artifact management, environment promotion, and deployment-traceability processes across business systems
- Manage daily Linux system operations, including service management, log investigation, process troubleshooting, networking checks, storage monitoring, permissions, and routine maintenance
- Deploy, monitor, and troubleshoot Kubernetes workloads, resources, services, ingress, ConfigMaps, secrets, persistent storage, and common cluster-level operational problems
- Develop and maintain controlled Infrastructure as Code (IaC) configurations and systematic changes using Terraform, Ansible, or equivalent technologies
- Engineer backend automation, operational utilities, APIs, webhooks, and integrations using Python and Bash scripts to reduce manual effort and accelerate platform tasks
- Automate monitoring and observability configurations, dashboards, log collection, and alerts using platforms such as Prometheus, Grafana, OpenSearch, ELK, or similar tools
- Operationalize database and data platform environments, performing routine health checks, connectivity validation, backup verification, and resource monitoring for PostgreSQL, MySQL, and Hadoop/Data Lakehouse infrastructures
- Investigate infrastructure alerts, analyze logs, collect diagnostic information, and support active incident response and comprehensive root-cause analysis
- Collaborate closely with Data, Platform, DevOps, and Business teams to translate requirements into production-ready solutions while following platform security standards covering RBAC, secrets handling, and access control
- Maintain high-quality technical documentation for architecture, operational runbooks, configuration records, validation evidence, and troubleshooting procedures
- Progressively assume ownership of more complex tasks, operational areas, and critical incidents as your technical capability and experience grow
Is this you
- Bachelor's degree in Computer Science, Information Technology, Information Systems, Engineering, or a related field
- 3+ years of hands-on experience in Platform Engineering, System Administration, DevOps, or Infrastructure Operations
- Strong capability in Data Administration, Back-ops, or Reliability Engineering (RE)
- Deep proficiency with container orchestration using Kubernetes and managing Virtual Machines (VMs) in a Linux environment
- Solid experience building and managing automation workflows using CI/CD tools, specifically GitLab CI or GitHub Actions
- Practical experience implementing and utilizing infrastructure monitoring and observability tools (such as Prometheus, Grafana, or equivalent)
- Familiarity with Cloud infrastructure administration and operations
- Good communication and collaboration skills, with an English speaking proficiency level that enables smooth daily coordination
- Proven ability to work independently, take ownership of technical incidents, and drive them to resolution
- Professional certification such as RHCSA (Red Hat Certified System Administrator) is a strong plus
- Prior experience with GCP (Google Cloud Platform) Administration is highly preferred
- Exposure to or experience in handling Big Data infrastructure and Data Lakehouse environments (e.g., Hadoop) is a significant advantage
Be Part of the ATI Journey
Our aim at ATI Business Group in working with our Clients is to support their continued growth by providing cost-effective technology and talented & scalable people resources on demand. ATI's singular focus on providing services to the travel and hospitality business communities across the globe has been a remarkably successful one. Since commencing in 2002, we now have over 1,300 employees providing services to our clients worldwide.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.