About The Role
CosmosGrid is a modern DevOps and Cloud Engineering consultancy delivering scalable infrastructure, automation, 24/7 support, and secure private AI solutions. Our engineers work across global time zones to support clients with precision, clarity, and technical excellence.
Key Responsibilities
- Design, build, and maintain cloud-native infrastructure on AWS using Kubernetes, Terraform, and modern DevOps tooling.
- Implement CI/CD pipelines using GitHub Actions, GitLab CI, or Jenkins, ensuring fast, reliable delivery.
- Manage Kubernetes clusters, troubleshoot workloads, optimize scaling, and ensure platform security.
- Deploy and configure observability stacks (Prometheus, Grafana, Loki, Alertmanager) to monitor system performance.
- Support infrastructure automation, configuration management, and GitOps practices (ArgoCD/Flux).
- Participate in on-call rotation as part of CosmosGrid's global 24/7 DevOps support model.
- Collaborate closely with client engineering teams to deliver solutions aligned with business and technical goals.
- Identify and implement cloud cost optimizations using FinOps principles.
- Contribute to documentation, internal tooling, and best practices across the organization.
Required Qualifications
- 3+ years of hands-on DevOps, Cloud Engineering, or SRE experience.
- Strong experience with AWS (EC2, VPC, IAM, S3, EKS, CloudWatch, etc.).
- Proficiency in Kubernetes administration and troubleshooting.
- Solid experience with Terraform and Infrastructure as Code workflows.
- Hands-on experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins).
- Familiarity with observability tools: Prometheus, Grafana, Loki, ELK, Alertmanager.
- Strong scripting ability in Python, Bash, or Go.
- Understanding of networking concepts (DNS, load balancing, proxies, ingress).
- Experience implementing DevOps best practices: automation, repeatability, scalability.
- Comfortable communicating with clients and working in fast-paced, distributed teams.
Preferred Skills (Nice To Have)
- Experience with Karpenter, Bottlerocket, or EKS cost/performance tuning.
- Familiarity with GitOps tooling (ArgoCD, FluxCD).
- Understanding of MLOps architectures or experience deploying AI/LLM workloads.
- Experience with Vault, KMS, or other secrets management tools.
- Exposure to multi-cloud environments (Azure, GCP).
What We Look For
- Engineers who are curious, resourceful, and enjoy solving hard problems.
- People who take ownership and deliver with reliability and professionalism.
- Strong communicators who thrive in collaborative, client-facing environments.
- A passion for cloud-native technologies and continuous learning.