About this role

Why work at Nebius

Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.

Where we work

Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team of over 1400 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI R&D team.

Responsibilities

Nebius is looking for a Senior Software Engineer to join the Hardware Infrastructure Observability team. You're welcome to work from our office in Amsterdam. We build and run low-level monitoring for servers and data center engineering systems to ensure reliability at scale. We also design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep the infrastructure healthy.

Design and develop services and agents that provide deep visibility into a large server fleet and DC engineering systems

Evolve our metrics/aggregation/alerting pipelines and improve signals quality

Build maintenance workflows and automation that keep fleets healthy

Investigate incidents hands-on (including on-host debugging) and drive root-cause fixes

Collaborate with hardware, networking, and DC operations to improve reliability

We expect you to have:

5+ years of professional software engineering experience

Excellent knowledge of Python and Golang or you are ready to quickly switch to these programming languages

Strong Linux fundamentals

Ability to write reliable code and and dig into complex problems

Working proficiency in English

It will be an added bonus if you have:

Solid understanding of modern server architecture, and its components

Experience with metrics/monitoring/alerting Prometheus-compatible stacks (like VictoriaMetrics)

Good knowledge of computer networks

Experience designing, developing, and running high-load distributed systems

We conduct coding interviews as part of the process.

Senior Software Engineer in Hardware Infrastructure Observability

About this role

Responsibilities

EU Requirements

Job Details

Contact

Similar Jobs