Job Description
The Position
Join a global healthcare biopharma technology team in Prague as a
Senior Site Reliability Engineer
. You will own reliability, scalability, and security for critical platforms and applications that support medicines, vaccines, and animal health products worldwide. This role combines hands-on engineering with strategic leadership: you’ll drive modernization, shape operational best practices, mentor engineers, and collaborate across products and regions to deliver resilient, high-performing systems.
What will you do?
Lead reliability efforts: define and own SLOs/SLIs, error budgets, availability goals, and improvement plans for services under your scope.
Design and implement robust, scalable architectures and automation that minimize operational toil and support rapid delivery.
Build and maintain IaC, CI/CD pipelines, and automated testing to ensure consistent, auditable deployments.
Drive observability and proactive monitoring: implement metrics, logging, tracing, and alerting that enable data-driven reliability improvements.
Lead incident response for complex issues, coordinate cross-team resolution, conduct blameless postmortems, and drive preventative actions.
Establish and codify operational frameworks, runbooks, and standard methodologies to raise operational maturity across teams.
Coach and mentor SRE and DevOps engineers; foster knowledge sharing and create a pipeline of technical leaders.
Partner with product engineering, security, platform, and vendor teams to align technology roadmaps, compliance, and lifecycle management (patching, upgrades).
Perform capacity planning, performance analysis, and troubleshooting for distributed systems and cloud platforms.
Produce and maintain technical documentation, architecture diagrams, installation qualification/validation artifacts, and compliance evidence when required.
Qualifications, Skills & Experience Required
BSc in Computer Science, IT, Engineering, or equivalent experience; advanced degree a plus.
5+ years of hands-on experience in SRE, Platform, or DevOps engineering with proven ownership of reliability for production systems.
Deep experience with cloud platforms (AWS preferred; Azure/GCP experience valuable) and cloud-native services.
Strong skills with IaC (Terraform, CloudFormation, or similar), Git-based workflows, and mature CI/CD pipelines.
Demonstrated experience defining and operating SLOs, SLIs, SLAs, and error budget processes.
Proficiency with observability tooling (Prometheus, Grafana, ELK/EFK, OpenTelemetry, distributed tracing).
Solid programming/scripting skills (Python, Go, Bash, PowerShell, or similar) and experience building automation and tooling.
Strong knowledge of networking (VPCs, VPNs, load balancing, firewalls) and cloud security best practices and compliance frameworks.
Experience with platform engineering patterns, multi-tenant platforms, and managing analytics or data platforms is a plus.
Familiarity with Conversational BI tools, Power Platform, and emerging LLM/Copilot concepts.
Excellent problem-solving, communication, and stakeholder-management skills; experience working with global, cross-functional teams.
Relevant certifications (AWS Professional, Kubernetes, etc.) are a plus.