About us
Owkin is an AI company on a mission to solve the complexity of biology. It is building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software. At the heart of this system is Owkin K, an AI copilot and its new LLM fine-tuned on biology called Owkin Zero, used by researchers, clinicians, and drug developers to better understand biology, validate scientific hypotheses, and deliver better diagnostics and therapies faster.
Owkin is an AI company on a mission to solve the complexity of biology. It is building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software. At the heart of this system is Owkin K, an AI copilot and its new LLM fine-tuned on biology called Owkin Zero, used by researchers, clinicians, and drug developers to better understand biology, validate scientific hypotheses, and deliver better diagnostics and therapies faster. This position is remote and based in the UK or Germany. Please submit your CV in English About the role: This is a critical role at the intersection of advanced AI and high-stakes biology. You will be instrumental in building the core agentic technology that powers Owkin’s Data Transformation Agent (DTA), directly enabling the creation of "ML-ready" multimodal datasets. You will be given the autonomy to drive the technical direction of your projects, choosing the best tools and development path to accomplish your mission. You will be joining a highly skilled, international team with a passion for both cutting-edge technology and scientific discovery. In particular, you will: Lead Agent Development: Drive the development of Owkin’s Data Transformation Agent (DTA) and actively contribute to the core components and agents of our platform, "K-Pro." Orchestrate Data Workflows: Design, implement, and maintain complex data transformation workflows, leveraging tools like Apache Airflow for robust and scalable orchestration. Deployment and Integration: Manage the seamless integration and deployment of the DTA within Owkin's broader Data Platform infrastructure. Ensure Code Excellence: Define and enforce robust engineering practices, including Test-Driven Development (TDD), defining best practices for deployment, QA, and maintaining infrastructure, and performing code reviews to ensure high standards. Technical Leadership: Guide key technical choices and tradeoffs, plan the development goals and milestones for software packages, and focus on performance optimization. Stakeholder Collaboration: Interface directly with internal customers and research engineers to gather requirements and efficiently resolve implementation bottlenecks. Required qualifications / experience: 3+ years of industry experience in a similar role, with a BS in software engineering, computer science, applied mathematics, or an associated field. Deep Python expertise and familiarity with other programming languages. Strong knowledge of LLMs and Agentic Systems principles, or a demonstrable ability and interest in rapidly acquiring this expertise to build multi-agent software. Experience with multi-omics data (e.g., genomics, proteomics, imaging) or other complex multimodal datasets, including preprocessing and transformation. Expertise in documentation, specification, robust testing, continuous deployment, and code optimization. Familiarity with Agile/Scrum development methodologies. Solid understanding of software architecture concepts. Knowledge of cloud computing services (GCP, AWS, Azure) and container-based deployments (Docker, Kubernetes). Development experience in Linux/Unix-like environments. Preferred qualifications/bonus skills: Direct experience with data orchestration tools such as Apache Airflow. Experience working with spatial transcriptomics data. Knowledge of different technologies for distributed computing, databases, and serialization. Experience with deep learning frameworks (Tensorflow, PyTorch, etc.). Knowledge of applied mathematics concepts (signal & image processing, statistics, etc.). Contributions to OSS projects. Contribution to publications at machine learning conferences and journals (NeurIPS, ICML, ICLR, etc.).