Senior/Staff Machine Learning Engineer - Health Evaluation - AI Teams (x/f/m)

Paris, Paris, FranceCompetitiveHybrid0 applicants

About this role

What you’ll do

At Doctolib, we're on a mission to transform how healthcare is delivered by harnessing the power of AI.

As a Senior/Staff Machine Learning Engineer, you’ll play a key role in designing, implementing, and scaling the evaluation framework that ensures our AI Health Companion behaves safely, reliably, and helpfully for millions of patients and practitioners.

You’ll join a cross-functional team of Machine Learning Engineers, Product Engineers, and Medical Experts to build robust evaluation pipelines for agentic AI systems — models capable of reasoning, planning, and interacting with complex healthcare data.

Your responsibilities include, but are not limited to:

Define and own the evaluation strategy for our AI agentic system - metrics, protocols, datasets, and tooling

Implement and maintain automated evaluation pipelines to monitor model quality, safety, and alignment across iterations

Run systematic experiments to assess reasoning, factuality, robustness, and user experience

Collaborate closely with model developers and research scientists to provide insights and drive iterative improvement

Contribute to research and internal knowledge sharing on LLM evaluation methodologies and best practices

About our tech environment

Our solutions are built on a single fully cloud-native platform that supports web and mobile app interfaces, multiple languages, and is adapted to the country and healthcare specialty requirements. To address these challenges, we are modularizing our platform run in a distributed architecture through reusable components

Our stack is composed of Rails, TypeScript, Java, Python, Kotlin, Swift, and React Native

We leverage AI ethically across our products to empower patients and health professionals. Discover our AI vision here!

Who you are

Before you read on — if you don't have the exact profile described below, but you feel this job description matches your skill set, we still encourage you to apply.

MSc or PhD in Computer Science, Machine Learning, Data Science, or related field

7+ years of hands-on experience working with large language models (e.g., GPT, Claude, Llama, or BERT-like architectures)

Proven experience in evaluating agentic or reasoning systems (e.g., autonomous agents, tool-using LLMs, dialogue systems, or task-oriented assistants)

Strong track record in experiment design, metric definition, and evaluation automation

Ability to bridge research and production, influencing modeling and product decisions

Excellent communication skills and a collaborative mindset

Now it would be fantastic if:

You have experience in the clinical or medical domain and sensitivity to ethical or regulatory challenges in healthcare AI

Responsibilities

  • Define and own the evaluation strategy for our AI agentic system - metrics, protocols, datasets, and tooling
  • Implement and maintain automated evaluation pipelines to monitor model quality, safety, and alignment across iterations
  • Run systematic experiments to assess reasoning, factuality, robustness, and user experience
  • Collaborate closely with model developers and research scientists to provide insights and drive iterative improvement
  • Contribute to research and internal knowledge sharing on LLM evaluation methodologies and best practices
  • Our solutions are built on a single fully cloud-native platform that supports web and mobile app interfaces, multiple languages, and is adapted to the country and healthcare specialty requirements. To address these challenges, we are modularizing our platform run in a distributed architecture through reusable components
  • Our stack is composed of Rails, TypeScript, Java, Python, Kotlin, Swift, and React Native
  • We leverage AI ethically across our products to empower patients and health professionals. Discover our AI vision here!
  • MSc or PhD in Computer Science, Machine Learning, Data Science, or related field
  • 7+ years of hands-on experience working with large language models (e.g., GPT, Claude, Llama, or BERT-like architectures)
  • Proven experience in evaluating agentic or reasoning systems (e.g., autonomous agents, tool-using LLMs, dialogue systems, or task-oriented assistants)
  • Strong track record in experiment design, metric definition, and evaluation automation
  • Ability to bridge research and production, influencing modeling and product decisions
  • Excellent communication skills and a collaborative mindset
  • You have experience in the clinical or medical domain and sensitivity to ethical or regulatory challenges in healthcare AI
  • Free health insurance for you and your children

Requirements

  • Parent Care Program: receive one additional month of leave on top of the legal parental leave
  • Free mental health and coaching services through our partner Moka.care
  • For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
  • Work from EU countries and the UK for up to 10 days per year, thanks to our flexibility days policy
  • Work Council subsidy to refund part of sport club membership or creative class
  • Up to 14 days of RTT
  • Lunch voucher with Swile card
  • Recruiter interview
  • Technical Deep Dive
  • Data System Design
  • Behavioral Interview
  • At least one reference check
  • Permanent position
  • Full Time
  • Workplace : Hybrid in our Levallois office
  • Start date: asap

EU Requirements

Job Details

Posted1 April 2026
Closes1 May 2026
Work ModeHybrid

Contact

Similar Jobs

Finding similar jobs...