About the Role:
As a Tech Ops Specialist, you will serve as the frontline technical support function for Krom's production systems, helping ensure operational stability and a seamless customer experience. You will be the first technical point of contact for Customer Service, performing initial issue assessments, reproducing reported problems, analyzing logs, gathering technical evidence, and providing investigation updates under the guidance of the Tech Ops Lead or Incident Owner.
This role focuses on operational support, technical investigation, issue management, monitoring, documentation, and cross-functional coordination rather than software development.
Operational Issue Handling & Customer Service Support
- Receive and investigate issues reported by the Customer Service team.
- Perform initial assessments based on information provided by customers or internal stakeholders.
- Identify issue symptoms, customer impact, and potentially affected system components.
- Reproduce reported issues to validate and understand root causes.
- Gather critical information such as user IDs, transaction IDs, timestamps, screenshots, and error messages.
- Provide investigation updates and communicate findings to Customer Service teams.
- Ensure issue reports contain sufficient information before escalation to relevant teams.
Technical Investigation & Log Analysis
- Analyze logs using observability and monitoring tools such as Scalyr, Kibana, Grafana, Datadog, or similar platforms.
- Identify error patterns, failed requests, timeouts, integration issues, and system anomalies.
- Trace transaction and request flows when necessary.
- Collect technical evidence to support engineering teams during root cause analysis.
- Document investigation findings clearly and systematically.
Issue Follow-up & Cross-Functional Coordination
- Follow up with engineering and related teams to ensure timely issue resolution.
- Collaborate with Backend, Mobile, QA, Infrastructure, Security, Data, Product, and other stakeholders.
- Work with QA teams to reproduce issues when required.
- Communicate technical updates to internal stakeholders.
- Ensure incidents and ongoing issues are properly tracked and followed through.
Third-Party Operational Support
- Coordinate issue investigations involving third-party providers, payment gateways, switchers, vendors, and technology partners.
- Gather and provide technical information required by external parties.
- Monitor progress and resolution status of third-party issues.
- Communicate updates and risks to internal stakeholders.
Monitoring & Operational Excellence
- Monitor alerts, error trends, and overall system health.
- Identify recurring issues and potential operational risks.
- Escalate anomalies and incidents appropriately.
- Participate in on-call rotations or shift coverage as required.
- Ensure incidents and operational issues are properly documented and tracked.
Documentation & Knowledge Management
- Create and maintain troubleshooting guides, SOPs, and operational runbooks.
- Document known issues, workarounds, escalation paths, and investigation outcomes.
- Contribute to the team's knowledge base to improve operational efficiency and response times.
- Maintain clear and accessible operational documentation.
About You:
We're looking for someone who is analytical, detail-oriented, and passionate about solving production issues in a fast-paced environment. You enjoy troubleshooting complex systems, collaborating across teams, and taking ownership of operational reliability.
Minimum Qualifications
- Experience in Tech Operations, Production Support, Application Support, Incident Support, or a similar role.
- Strong troubleshooting, problem-solving, and analytical thinking skills.
- Hands-on experience with log monitoring and observability tools such as Scalyr, Kibana, Grafana, Datadog, or equivalent platforms.
- Understanding of APIs, backend systems, mobile/web application flows, and basic database querying.
- Experience using REST clients such as Postman, Bruno, or similar tools.
- Ability to work effectively during incidents and under pressure.
- Strong communication skills, particularly when translating technical issues for non-technical stakeholders.
- High sense of ownership, accountability, and urgency in handling operational issues.
- Willingness to participate in on-call rotations or shift schedules.
Nice to Have
- Experience in fintech, banking, digital banking, payment systems, or other high-availability environments.
- Familiarity with incident management processes, service-level agreements (SLAs), and operational best practices.
- Experience investigating issues involving third-party integrations.
- Understanding of payment transaction flows and troubleshooting within payment ecosystems.
- Familiarity with Jira, Confluence, Slack, cloud platforms, and web-based operational tools.