About the Role
We're looking for a Senior Software Engineer to design, build, and sustain high-availability, fault-tolerant real-time systems that power connected industrial environments. You'll work across both on-premise nodes and cloud backends, ensuring robust integration between hardware interfaces, data pipelines, and production infrastructure.
This role combines deep systems engineering with hands-on development ideal for someone who thrives on debugging in production, optimizing low-latency performance, and building resilient distributed architectures that never go down.
What You'll Do
- Design and implement scalable, maintainable, and fault-tolerant software for real-time industrial applications.
- Deploy and debug production systems, ensuring reliability and uptime across edge and cloud environments.
- Integrate with hardware interfaces and manage data pipelines across hybrid environments.
- Lead the design of resilient architectures, including circuit breakers, retries, and failover strategies.
- Implement real-time monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, ELK).
- Own incident response and post-mortem processes, driving continuous reliability improvements.
- Contribute to CI/CD, automated testing, and system-level QA to ensure consistent deployments.
- Collaborate with AI, hardware, and infrastructure teams to deliver performant, deterministic systems.
Travel: Up to 15% (primarily Quebec & Ontario, with regular opportunities for U.S. travel)
Overview Application
- 37 years of software engineering experience, including production deployment and debugging (not just academic or research projects).
- Proven experience designing and implementing scaled production environments, ideally within remote real-time or IoT systems.
- Strong programming skills in C/C++, Python, and TypeScript/React.
- Hands-on experience with Linux-based development and production environments.
- Solid understanding of networking, storage, and compute resources in on-prem environments (bare metal, VMs, Kubernetes clusters).
- Experience building low-latency, deterministic systems with strict timing constraints.
- Familiarity with embedded hardware systems and interfacing software with hardware components.
- Expertise in monitoring, alerting, and observability frameworks (Prometheus, Grafana, ELK stack).
- Strong skills in testing, CI/CD pipelines, debugging, and performance optimization.