Site Reliability Engineer

About the team

Onomondo is on a mission to revolutionize global IoT connectivity. We’re here to redefine how connected devices communicate and we need great engineers to help us push the boundaries of what’s possible.

Our work directly impacts millions of devices globally, ensuring seamless, low-latency, and highly available connectivity for mission-critical applications. Our services enable secure authentication, signaling, and messaging for devices using Onomondo SIM cards, across 680+ networks in 180+ countries.

We are now hiring for an experienced Site Reliability Engineer to join our Platform Engineering team, who will be embedded closely within the Core Network squad to help achieve our ambitious high availability goals. The Core Networks squad plays a crucial role in building and optimizing a reliable, secure and scalable cellular network, empowering IoT businesses worldwide.

You will work with Kubernetes, Terraform, AWS, LGTM stack, OpenTelemetry, GitOps tools, Node.js, Golang, Redis and Postgres, applying reliability engineering practices to ensure resilience, scalability and observability across the stack.

What you’ll be doing

As a Site Reliability Engineer at Onomondo, you’ll work at the intersection of infrastructure, platform tooling and network systems. Your focus will be helping the Core Network squad achieve new levels of reliability, with hands-on contributions in automation, monitoring, incident response and performance tuning.

You’ll work closely with platform and product engineers to ensure our core services can scale smoothly and recover gracefully, keeping our global IoT infrastructure resilient and visible at all times.

Your role will include:

Driving reliability and performance improvements across the Core Network systems.
Define, deploy, track and report uptime metrics.
Partnering with engineers to establish SLOs, SLIs, and incident response playbooks.
Building tools and automations that support scalable operations.
Designing and improving monitoring, alerting and observability.
Contributing to infrastructure-as-code practices and CI/CD pipelines.
Participating in pre-mortems, production readiness reviews and incident postmortems to drive learning and continuous improvement.
Bringing a reliability mindset into everything from planning to system design.

What you’ll bring:

We’re looking for someone who’s passionate about reliability and thrives in environments where systems thinking and hands-on engineering go hand in hand. You might come from a software engineering or infrastructure background, but you’re driven by how systems behave in the real world, especially under pressure.

You have experience in a Site Reliability Engineering, Platform Engineering or similar role, focused on reliability and operations at scale.
You’ve worked with cloud environments (preferably AWS) and containerized systems like Kubernetes.
You enjoy digging into distributed systems and uncovering failure points before they become incidents.
You’re comfortable with infrastructure-as-code tools like Terraform and have experience automating complex workflows.
You’ve built or maintained robust monitoring, logging and alerting systems using tools like Prometheus, Grafana or similar.
You thrive on collaboration and can work effectively with product squads, platform teams and leadership to deliver resilient systems.
You’re a strong communicator, especially when sharing insights during incident response, reviews, or technical planning.
Bonus: You’re curious about cellular protocols, embedded systems, or the unique challenges of global connectivity.

Why join Onomondo?:

A playful, ambitious culture where people are trusted to do what they do best
A workspace that’s one-of-a-kind, in both design and energy
Legendary lunches, snack heaven, and events that actually bring people together
Room to bring your personality and ideas into the way we work and collaborate

Our hiring process:

We care deeply about creating a fair and inclusive process.

That means:

We don’t need your picture or cover letter—just your CV
We select candidates based on skills and relevant work experience for the role.

We’re fast but thoughtful—our goal is to ensure you feel informed, respected, and excited throughout the journey.

30-minute online screening call with our recruiter, Christian Payne
Skills interview (in-person) with David (interim manager for Platform Engineering) and Kasper (Staff Engineer).
Culture interview with Dana (VP of Engineering) and one more team member
Meet our Co-Founder, Michael, and our CTO, Henrik.

Ready to Make an Impact?

Please send us your CV, and let's start the conversation.

Learn more about us and other opportunities at onomondo.com/careers

Site Reliability Engineer

About the team

What you’ll be doing

Your role will include:

A playful, ambitious culture where people are trusted to do what they do best

A workspace that’s one-of-a-kind, in both design and energy

Legendary lunches, snack heaven, and events that actually bring people together

Room to bring your personality and ideas into the way we work and collaborate

Our hiring process:

About Onomondo

Site Reliability Engineer