Senior Systems Engineer – Performance & Reliability

Gdańsk, Pomeranian Voivodeship, PolandCompetitive0 applicants

About this role

Salary Range: PLN 350,700 - 474,400 + Benefits + Equity

Subject to alignment to the responsibilities and duties of the role.

Requirements

  • Strong software engineering experience, typically gained across multiple projects or systems over several years
  • Experience working in Linux-based environments, ideally with distributed or high-performance systems
  • Proficiency in Python
  • Experience with automation and CI/CD systems (e.g. GitLab CI, Jenkins, GitHub Actions)
  • Ability to design, implement, and run experiments or tests that produce meaningful results
  • Ability to interpret results and communicate findings clearly, with an emphasis on accuracy and usefulness to decision-making
  • Comfortable working in areas where requirements are not fully defined and judgement is required

Nice to have

  • Experience working with large-scale or distributed systems (e.g. clusters, cloud
  • platforms, HPC environments)
  • Experience with performance, reliability, or systems-level testing/measurement
  • Familiarity with pytest or similar frameworks for structured test/measurement execution
  • Experience analysing system behaviour under load (compute, network, or ML workloads)
  • Experience working with containerisation, orchestration, or provisioning systems (e.g. Docker, Kubernetes, OpenStack)
  • Proficiency in other applications programming languages (e.g. C++)
  • Exposure to data analysis, statistics, or interpreting variability in results

About Graphcore

At Graphcore, we’re building the future of AI compute.We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem.To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world.We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence. Job Summary We decide whether entire racks of machines are good enough to enter production. Our team builds and runs measurements on large-scale Linux clusters—from a single rack up to datacentre-scale systems. We use these measurements to determine system performance, reliability, and whether a system can be trusted in production. You will work across: Designing and running measurements on distributed systems Analysing performance and reliability results Improving systems that execute measurements at scale Typical work includes: Running measurements on large-scale Linux clusters (rack-level and beyond) Using and extending tools such as pytest for measurement execution Measuring compute, network, and ML workload performance Analysing variability and repeatability of results Engineers in this team are not limited to a single area. You may work on infrastructure, workloads, or analysis depending on the problem. You are free to specialise, but the team is responsible for ensuring complete coverage. This role is not purely infrastructure or data analysis. It combines systems engineering, measurement, and interpretation. We are looking for engineers who: Are comfortable working with distributed systems at scale Can reason about performance and reliability Prefer clear, evidence-based conclusions over assumptions Selection criteria: Our engineers typically bring significant practical experience and sound engineering judgement. Depth in one area is valued, but the ability to work across boundaries is equally important.

EU Requirements

Job Details

Posted7 May 2026
Closes6 June 2026

Contact

Similar Jobs

Finding similar jobs...

Senior Systems Engineer – Performance & Reliability at Graphcore | EuroTalent AI