Senior ML Engineer (Token Factory)

Germany; Israel; Netherlands; Prague, Czech Republic; Remote - Europe; United KingdomCompetitive0 applicants

About this role

About Nebius:

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Responsibilities

  • Token Factory is a part of Nebius Cloud, one of the world’s largest GPU clouds, running tens of thousands of GPUs. We are building an inference & fine-tuning platform that makes every kind of foundation model — text, vision, audio, and emerging multimodal architectures — fast, reliable, and effortless to train & deploy at massive scale.
  • Some directions we currently working on and which you can be a part of:
  • Advanced Fine-Tuning: Enhancing fine-tuning methodologies - both LoRA-based and full-parameter - for cutting-edge LLMs (e.g., GPT-OSS, Kimi K2.5, DeepSeek V3.1/V3.2, GLM-4.7), focusing on both model quality and training efficiency.
  • Inference Optimization: Identifying LLM inference bottlenecks to drive production speedups. This involves building model training and evaluation pipelines in JAX for speculative decoding, experimenting with architectures (dense/MoE, auto-regressive/parallel), and deriving scaling laws to guide resource allocation.
  • Low Precision Training & Inference: Investigating low-precision (FP8, NVFP4/MXFP4) methodologies for supervised fine-tuning and reinforcement learning - spanning both inference and training - optimized for modern hardware
  • We expect you to have:
  • A profound understanding of theoretical foundations of machine learning and reinforcement learning.
  • Deep expertise in modern deep learning for language processing and generation
  • Experience with training large models on multiple computational nodes
  • Reasonable understanding of performance aspects of large neural network training (sharding strategies, custom kernels, hardware features etc.)
  • Strong software engineering skills (we mostly use Python)
  • Deep experience with modern deep learning frameworks (we use JAX)
  • Proficiency in contemporary software engineering approaches, including CI/CD, version control and unit testing
  • Strong communication and leadership abilities

About Nebius Group

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

EU Requirements

Job Details

Posted11 May 2026
Closes10 June 2026

Contact

Similar Jobs

Finding similar jobs...