Senior Site Reliability Engineer - Observability (x/f/m)

Berlin, Berlin, GermanyCompetitiveHybrid0 applicants

About this role

Doctolib’s Engineering environment is rich and we are building innovative products and features aiming each day to ease doctors' and patient life. We are looking for a Senior Site Reliability Engineer to keep Doctolib production systems running smoothly. You will also be a key-player to support the exponential growth of Doctolib services.

Responsibilities

  • As a Senior Site Reliability Engineer within the Core Reliability & Observability team, you will play a pivotal role in shaping the company’s observability strategy and ensuring our platform remains reliable, debuggable, and scalable. This role sits at the intersection of infrastructure, developer experience, and product engineering, with a particular focus on building and evolving the foundations of logging, metrics, tracing, and alerting across the organization.
  • Lead the observability strategy across the platform, with an emphasis on building scalable, developer-friendly logging and tracing capabilities.
  • Identify and lead large-scale cross-cutting reliability initiatives, including improvements to our incident detection, response, and postmortem analysis capabilities.
  • Take part in the on-call rotation, and actively contribute to improving our on-call experience by refining alerting, reducing noise, and ensuring actionable telemetry.
  • Who you are
  • You could be our next team mate if you:
  • Have a solid hands-on experience (3y+) on a large-scale production platform
  • Have proven experience with cloud platforms such as AWS, Azure or Google Cloud
  • Have solid understanding of containerization and orchestration technologies (Docker and Kubernetes)
  • Have a strong understanding of Helm for managing Kubernetes manifests and ArgoCD for GitOps workflows
  • Deep expertise in observability tooling and architecture, such as:
  • Logging: Fluent Bit, OpenTelemetry, Loki, Elasticsearch, Logstash, Vector
  • Tracing: OpenTelemetry or proprietary APMs
  • Metrics: Prometheus, Thanos, Datadog, or equivalent
  • Have proficiency in at least one programming language (Ruby, Python, Go, Java, etc.) and a deep understanding of infrastructure as code principles
  • Have an experience with monitoring and observability tools
  • Like troubleshooting performance issues in complex environments
  • Speak English

EU Requirements

Job Details

Posted7 April 2026
Closes7 May 2026
Work ModeHybrid

Contact

Similar Jobs

Finding similar jobs...