2 of 2 Slurm Workload Manager Jobs in London

HPC Systems Administrator

Hiring Organisation
Accenture
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
Competitive salary
Systems Administrator, Assoc Manager Salary: Competitive salary and package (Depending on level of experience) Locations: UK, London (must be willing to travel to client sites throughout the UK on an ad hoc basis) Salary: Competitive salary and package (Depending on level of experience) Accenture are partnering with scaled … related incidents, implementing preventive measures as needed. Required Skills: •Expertise in an HPC environment, including GPU cluster administration (e.g., NVIDIA, AMD) and workload schedulers such as SLURM or PBS. •Proficiency with AI model training workflows and experience supporting popular AI/ML frameworks (e.g., TensorFlow, PyTorch, CUDA). ...

Site Reliability Engineer - Data Centers

Hiring Organisation
TGS International Group
Location
London Area, United Kingdom
speed interconnects where applicable Orchestration & Benchmarking Provision and configure GPU clusters using automated workflows Execute and analyse performance and stability benchmarks orchestrated via a workload scheduler Validate results against expected performance and reliability thresholds Test Framework & Automation Maintain and extend the automated validation framework built using Python and Ansible … skills High standards for system reliability, consistency, and documentation Preferred/Desirable Experience working with GPU-based or high-performance compute environments Familiarity with workload schedulers (e.g. Slurm or similar tools) Understanding of data centre hardware lifecycle and server validation processes Exposure to high-speed networking technologies Experience ...