21 of 21 Slurm Workload Manager Jobs in the UK

Infrastructure / DevOps Lead

Hiring Organisation: Jobleads-UK
Location: United Kingdom

details. Nice to Have Experience managing physical data centres, co‐location facilities, or hybrid infrastructure environments. Working knowledge of ML orchestration frameworks (e.g., Ray, Slurm, Kubeflow). Background in media pipelines, VFX tooling, or media compliance standards (MPA, ISO 27001). Prior experience working in a hybrid startup/ ...

Enterprise Architect - AI

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

continuous, declarative platform delivery. Pipeline orchestration: Kubeflow Pipelines, Apache Airflow, or Argo Workflows to orchestrate multi-stage training, fine-tuning, and inference pipelines. Cluster & workload scheduling: Slurm, Run:ai, and NVIDIA Base Command Manager for GPU job scheduling; Kubernetes-native GPU scheduling including device plugins ...

Founding AI Infrastructure Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

technical direction of the company from day one. What we’re looking for Large‐scale distributed training. PyTorch and modern deep‐learning frameworks. Kubernetes, Slurm or GPU orchestration platforms. AWS and specialist GPU cloud providers. High‐performance computing and distributed systems. Training optimisation, memory management and networking. MLOps tooling ...

ML Infrastructure Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Claude Code, Codex, Kimi Code, Pi Agent, Droid, or similar agentic coding systems as a development surface Experience with GPU clusters on Kubernetes, Slurm, Ray, custom schedulers, or cloud GPU orchestration NCCL, UCX, NVSHMEM, RDMA, InfiniBand, RoCE, or EFA Rust, C++, CUDA, Go, or systems‐level performance work ...

HPC Specialist Architect - Energy Industry (AWS)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

more of the following programming languages: C++, Python, Cuda, or Bash. Experience in architecting an HPC platform with scheduling middleware (e.g., Slurm, Torque, Symphony or GridServer) and in deployment, tuning and management of HPC technologies in a multi‐user environment. High level understanding of the underlying infrastructure platform ...

Lead AI Infrastructure & Distributed Systems Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

engineer who thrives in early stage startup environments and prefers broad systems ownership over narrow specialisation. Technical Expertise: Strong production background with AWS, Kubernetes, Slurm, PyTorch, and distributed training frameworks. Deep hands‐on experience with GPU compute optimisation, cluster scheduling, and high performance networking is essential. Relevant Background: Experience ...

Research Engineer, Pre-Training

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Background in numerical computing, HPC, or distributed systems, including familiarity with GPUs/TPUs, high-performance networking (NVLink/InfiniBand), Kubernetes/Slurm, and OS internals Expertise in Python and deep experience with modern deep learning frameworks (PyTorch and/or JAX) Advanced degree (MS or PhD) in Computer ...

Senior AI Infrastructure Engineer - Scale Multi-GPU Training

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

will architect and optimize distributed training across multiple GPUs and machines in AWS, eliminate bottlenecks in the data path, and manage cluster orchestration with Slurm and Kubernetes. The role requires deep PyTorch expertise, familiarity with transformer models, and experience deploying production AI systems. #J-18808-Ljbffr ...

AI infrastructure engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

will eliminate bottlenecks in the data path to ensure training is fast and as capital efficient as possible alongside managing cluster orchestration using slurm and Kubernetes while preparing to expand into specialised GPU providers. And finally you will master the stack from pytorch based learning libraries to complex data ...

Research Engineer, Machine Learning – Paris/London/Zurich/Warsaw

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

+ years working on large‐scale ML codebases. Hands‐on with PyTorch, JAX or TensorFlow; comfortable with distributed training (DeepSpeed/FSDP/SLURM/K8s). Experience in deep learning, NLP or LLMs; bonus for CUDA or data‐pipeline chops. Strong software‐design instincts: testing, code review ...

RF Signature Analyst

Hiring Organisation: MASS Consultants
Location: Fareham, Hampshire, South East, United Kingdom
Employment Type: Permanent
Salary: £55,000

SolidWorks or RhinoCAD STEM degree It would be great if you also have: Understanding of Linux environments and High-Performance Computing (HPC) systems, including Slurm Understanding of advanced combat air, weapons or UAS (Uncrewed Air Systems) capabilities Experience within the defence sector and/or an understanding of survivability ...

AI Inference Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

multi-tenant serving or SLA-driven infrastructure. Background at a hyperscaler, frontier AI lab, or large-scale distributed inference system. Familiarity with Kubernetes/Slurm for cluster orchestration. Interest or experience in energy markets, grid systems, or sustainability-focused compute. Benefits Competitive salary and an equity sign-on bonus. ...

Quant Developer (C++)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

global team Python, Q/kdb+ Testing methodologies (unit tests, regression tests) Dev workflow – SVN, GIT, JIRA, Code Reviews, etc Grid & cluster tools (especially SLURM) The minimum base salary for this role is $120,000 if located in New York. This expectation is based on available information ...

MLOps Engineer

Hiring Organisation: Jobleads-UK
Location: Oxford, England, United Kingdom

across on-premises accelerator clusters and cloud (GPU/CPU) for training and simulation workloads Drive infrastructure-as-code practices: containerisation, orchestration (Kubernetes/Slurm), and reproducible environment management Contribute to the internal developer platform: self-service tooling, documentation, and runbooks that raise engineering productivity across the company What … Experience with experiment tracking and model lifecycle management tools (MLflow, W&B, DVC, or similar) Solid understanding of containerisation (Docker) and orchestration (Kubernetes or Slurm) for distributed compute workloads Infrastructure-as-code mindset: Terraform, Ansible, or equivalent; CI/CD pipelines (GitHub Actions, Jenkins, or similar) Experience with hardware ...

High Performance Computer Scientist /HPC Developer

Hiring Organisation: IT Graduate Recruitment
Location: London, South East, England, United Kingdom
Employment Type: Full-Time
Salary: £50,000 per annum

large-scale distributed systems. Research experience involving computational workloads. Experience with parallel programming (MPI, OpenMP, CUDA). Knowledge of scheduling systems such as Slurm, PBS or LSF. Contributions to technical projects, open source or research communities. Experience working with advanced computing environments. Academic Focus We are particularly interested … Parallel Programming, MPI, OpenMP, Multithreading, Concurrency, Algorithms, Data Structures, Systems Design, Kernel Development, Networking, Storage Systems, Distributed Storage, Automation, Infrastructure Automation, Shell Scripting, Bash, Slurm, PBS, LSF, Workload Scheduling, Resource Management, Linux Administration, Server Infrastructure, Cloud Infrastructure, AWS HPC, Azure HPC, Data Processing, Machine Learning Infrastructure, AI Infrastructure ...

Solution Architect - GPU & HPC

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

winning them requires more than a great sales team. The Solutions Architect sits at the intersection of sales, infrastructure, and the customer, translating complex workload requirements into technically sound, commercially viable solutions on the Hyperstack platform. You’ll be the primary technical authority through the sales cycle: engaging directly … proposal, and delivery handover — acting as the primary technical authority for GPU cloud solution design. Engage directly with prospective and existing customers to understand workload requirements, technical constraints, and commercial objectives, producing detailed solution designs including architecture diagrams, network topology, storage configurations, and GPU resource allocation models. Collaborate closely ...

Senior Cloud Engineer (K8S)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

cloud platforms. Experience with solutions for monitoring and observability. e.g. Grafana, Prometheus, OpenSearch/ElasticSearch, Loki. Experience with High Performance Computing (HPC) environments using SLURM or similar batch workload solutions. Programming experience with Python3 utilising classes and inheritance. Benefits In addition to a competitive salary, Graphcore offers flexible ...

Senior Software Engineer, Inference Platform

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

model lifecycle management is highly desirable Bonus/Good to Have HPC & Cluster Management: Experience handling large‐scale HPC clusters using Kubernetes and Slurm for job scheduling, resource allocation, and workload orchestration Data Engineering: Expertise with data pipelines, ETL systems, and large‐scale data processing frameworks Systems-Level ...

Senior Staff+ Software Engineer, Kubernetes Platform

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

controllers — so it stays responsive as object counts and node counts grow by orders of magnitude. And we build the core cluster services every workload depends on, like service discovery, so they hold up under the same pressure. We make sure the control plane is fast, correct, and always … accelerator fleets, including custom scheduling plugins and policies for gang scheduling, topology awareness, and preemption Scale the Kubernetes control plane (apiserver, etcd, controller-manager) to support clusters far beyond typical limits, and find the next bottleneck before it finds us Design, build, and operate core cluster services such ...

Platform Engineer

Hiring Organisation: Technical Futures Ltd
Location: Cambridge, Cambridgeshire, England, United Kingdom
Employment Type: Full-Time
Salary: £55,000 - £80,000 per annum

code quality. Applications are welcomed from mid level up to Senior level Engineers with knowledge of Cloud computing or HPC job management (such as Slurm), identity and authorization flows (such as OAuth2/OIDC) being highly beneficial. This cutting edge technology company, focused on optimizing complex engineering systems, seeks … failure. Focus on code quality. Some of the following should compliment the skills above: Experience of Cloud computing or HPC job management ( such as Slurm). Identity and authorization flows such as OIDC/OAuth2. Deploying containerized services on Linux (such as Podman). Infrastructure as code (such ...

Backend Engineer

Hiring Organisation: Technical Futures Ltd
Location: CB2, Cambridge, Cambridgeshire, United Kingdom
Employment Type: Permanent
Salary: £55000 - £70000/annum Dep on Exp + Shares + Hybrid + 30DH

between them; with experience of modern Python tooling and packaging, some knowledge of Cloud computing or HPC job management (such as Slurm), identity and authorization flows (such as OAuth2/OIDC). This cutting-edge technology company, focused on optimizing complex engineering systems, seeks a top class Backend Engineer … code quality. Some/most of the following should support the skills above: Experience of Cloud computing or HPC job management (such as Slurm). Identity and authorization flows such as OIDC/OAuth2. Deploying containerized services on Linux (such as Podman). Infrastructure as code (such as Ansible ...