2 of 2 Slurm Workload Manager Jobs in London

Principal Platform Engineer

Hiring Organisation
Ncounter
Location
Canary Wharf, London, England, United Kingdom
Employment Type
Full-Time
Salary
£150,000 - £170,000 per annum
DevOps/SRE setting Linux, Kubernetes, public cloud Prometheus, Grafana, telemetry and full Observability tooling GitLab, Bitbucket and modern CI/CD Bonus: Slurm, HPC What they’re looking for 8+ years engineering with Python or Go Strong systems engineering mindset Confident in design discussions, delivering clean and reliable ...

Site Reliability Engineer - Data Centers

Hiring Organisation
TGS International Group
Location
London Area, United Kingdom
speed interconnects where applicable Orchestration & Benchmarking Provision and configure GPU clusters using automated workflows Execute and analyse performance and stability benchmarks orchestrated via a workload scheduler Validate results against expected performance and reliability thresholds Test Framework & Automation Maintain and extend the automated validation framework built using Python and Ansible … skills High standards for system reliability, consistency, and documentation Preferred/Desirable Experience working with GPU-based or high-performance compute environments Familiarity with workload schedulers (e.g. Slurm or similar tools) Understanding of data centre hardware lifecycle and server validation processes Exposure to high-speed networking technologies Experience ...