8 of 8 Slurm Workload Manager Jobs in the UK

Solutions Architect - HPC/AI/ML

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
CoreWeave platform. On top of this foundation, our customers are building AI/ML and High Performance Computing workloads oftentimes running on the Slurm workload manager that are changing the world. As a Senior Solutions Architect at CoreWeave, you will play a vital and dynamic role. … functionality, and performance, contributing regularly to discussions about product strategy and architecture. Lead periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimization and recommending suitable solutions. Stay abreast of the latest developments and trends in cloud computing and infrastructure, sharing your thought leadership with customers ...

Infrastructure Engineer

Hiring Organisation
Spectrum IT Recruitment
Location
Southampton, Hampshire, United Kingdom
Employment Type
Permanent
Salary
£55000 - £65000/annum
Secrets management platforms, including HashiCorp Vault. Software licence management tools such as FlexLM/FlexNet. CI/CD platforms including Azure DevOps. Artifactory administration. Slurm Workload Manager. Windows Server, Active Directory, Microsoft 365, Azure and Entra ID administration. Spectrum IT Recruitment (South) Limited is acting as an Employment ...

Linux System Administrator

Hiring Organisation
Spectrum It Recruitment Limited
Location
Southampton, Hampshire, South East, United Kingdom
Employment Type
Permanent
Salary
£65,000
Secrets management platforms, including HashiCorp Vault. Software licence management tools such as FlexLM/FlexNet. CI/CD platforms including Azure DevOps. Artifactory administration. Slurm Workload Manager. Windows Server, Active Directory, Microsoft 365, Azure and Entra ID administration. Spectrum IT Recruitment (South) Limited is acting as an Employment ...

Site Reliability Engineer

Hiring Organisation
Generative Engineering
Location
London Area, United Kingdom
default reflex. Scaling and cost work — Fargate vs Lambda trade-offs, autoscaling, spot fleets, capacity planning. HPC/batch compute experience (AWS Batch, ParallelCluster, Slurm, Karpenter) for heavy simulation or ML workloads. GPU infrastructure: CUDA-aware scheduling, GPU operator, driver pain. Nix experience (inc Nix Flakes) for reproducible builds ...

Senior Engineering Lead, Chem-Bio

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
record of leading technical work in a team. Strong infrastructure and platform skills — experience with cloud environments (AWS), container orchestration (Kubernetes), and job scheduling (Slurm or similar). Demonstrated experience leading or managing engineers — whether through formal line management, tech-leading a team, or running hiring pipelines. ...

Research Engineer, Pre-Training

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Background in numerical computing, HPC, or distributed systems, including familiarity with GPUs/TPUs, high-performance networking (NVLink/InfiniBand), Kubernetes/Slurm, and OS internals Expertise in Python and deep experience with modern deep learning frameworks (PyTorch and/or JAX) Advanced degree (MS or PhD) in Computer ...

Research Engineer, Machine Learning – Paris/London/Zurich/Warsaw

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
+ years working on large‐scale ML codebases. Hands‐on with PyTorch, JAX or TensorFlow; comfortable with distributed training (DeepSpeed/FSDP/SLURM/K8s). Experience in deep learning, NLP or LLMs; bonus for CUDA or data‐pipeline chops. Strong software‐design instincts: testing, code review ...

Senior Staff+ Software Engineer (Kubernetes Platform)

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
controllers — so it stays responsive as object counts and node counts grow by orders of magnitude. And we build the core cluster services every workload depends on, like service discovery, so they hold up under the same pressure. We make sure the control plane is fast, correct, and always … accelerator fleets, including custom scheduling plugins and policies for gang scheduling, topology awareness, and preemption Scale the Kubernetes control plane (apiserver, etcd, controller-manager) to support clusters far beyond typical limits, and find the next bottleneck before it finds us Design, build, and operate core cluster services such ...