At Graphcore, we’re building the future of AI compute.
We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.
As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem.To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world.We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.
Job Summary
We are looking for a highly experienced Lead Engineer Support Linux Engineer to guide and develop a small group supporting engineering systems in a fast-paced AI-centered environment.
The position requires strong Linux skills combined with leadership, automation, and DevOps approaches to maintain systems that are reliable, scalable, and easy to support at scale. An important responsibility is developing and managing a configuration-as-code environment. In this setup, system configuration and operations are handled through automation, pipelines, and source control rather than manual intervention.
You will be responsible for leading incident response, driving operational improvements, and setting standards for how Linux systems are managed and supported across the organization.
While the role includes leadership responsibilities, it will initially require a hands-on approach, including direct involvement in troubleshooting, system support, and automation efforts, while building team capability and scaling processes.
Collaborating intimately with engineering groups, platform engineers, and infrastructure experts, you will guarantee systems stay stable, efficient, and consistent with changing business and product delivery requirements.
The Team
You’ll be joining a multi-disciplinary team with strong technical skills and a very supportive culture. We work closely together, regularly share knowledge, and your skills will make a direct impact on our business. It’s an exciting and pivotal moment for us right now, with plenty of new projects ahead. If you're looking to solve interesting problems and see your work deliver real-world results, this is the team for you!
Responsibilities and Duties
Guide, mentor, and cultivate a team of Linux Engineering Support Engineers, defining clear roles, responsibilities, and methods of collaboration
Own and oversee support for Linux-based systems and engineering environments, ensuring stability, performance, and availability
Act as a point of contact for complex technical issues and outages, providing hands-on support when a customer concern arises
Diagnose and resolve high-impact system and interoperability issues across mixed and distributed environments
Perform hands-on investigation and troubleshooting to understand issues and drive effective solutions
Direct incident response efforts, encompassing triage, coordination, and resolution
Take responsibility for and lead Root Cause Analysis (RCA) processes, ensuring preventative improvements are identified and applied
Establish and improve incident management processes, driving operational maturity and reliability
Drive adoption of automation and configuration-as-code practices across Linux systems
Ensure system changes are delivered through controlled, auditable processes wherever possible
Oversee development and implementation of automation solutions for system management and operational tasks
Promote and support use of workflows based on Git and CI/CD pipelines for configuration and operational processes
Identify and prioritize opportunities to reduce manual effort through automation and improved tooling
Collaborate with engineering teams to assist development environments and system requirements
Act as a senior technical liaison between engineering teams and infrastructure/platform functions
Support onboarding of new systems, services, and environments using standardized and automated approaches
Ensure system configurations stay consistent and aligned with established standards and governance
Oversee integration points (e.g. identity, CI/CD, tooling) and ensure issues are resolved effectively
Identify and drive improvements in system performance, scalability, and maintainability
Contribute to and enforce documentation, standards, and operational guidelines
Ensure systems meet audit, compliance, and governance requirements, with full traceability of changes
Candidate Profile