Saffron Walden, Essex, South East, United Kingdom Hybrid / WFH Options
EMBL-EBI
or more modalities Experience developing or integrating image visualisation systems Experience with NoSQL databases, such as MongoDB Experience with batch scheduling systems such as SLURM Experience with containerisation (e.g. Docker) and container orchestration (e.g. Kubernetes) Infrastructure-as-code deployment tools such as Ansible or Terraform Experience working in an more »
systems, CI/CD, etc.) Attention to detail needed to manage and debug production services. Experience with research clusters and implementing tools such as Slurmworkload manager. Job Duties Own the lifecycle of our Linux-based servers and applications across our multiple business environments. Automate and troubleshoot a more »
magnitude of training runs Explore novel synthetic data generation techniques Engineer robust, high-performance inference Experience Technical: Have experience operating orchestration systems such as SLURM, Ray, or similar. Experience in creating and managing multi-instance clusters for data and model parallel training across GPUs/TPUs, preferably using PyTorch more »
high-performance inference platforms Collaborate in defining and steering their evolving inference and training stack Experience Technical: Have experience operating orchestration systems such as SLURM, Ray, or similar. Experience in creating and managing multi-instance clusters for data and model parallel training across GPUs/TPUs, preferably using PyTorch more »
of your team, ideally for a l eading AI research laboratory, or a pioneering AI business Key Requirements: Python and PyTorch expertise Experience in SLURM, Ray, or similar Graphics Processing Units (GPUs) Experience in creating and managing HPC clusters for ML models Experience in efficiently serving large ML models more »
with key stakeholders for enterprise customers. Technical Experience High Performance Computers – (Supporting Users) Configuration, and management of HPC Infrastructure Linux MPI InfiniBand Job schedulers SLURM Contract Details: PAYE Contract - Competitive Rate 18 Months Contract Remote - UK Based Including Training and Upskilling It’s an amazing opportunity to be a more »
Python and Bash, expertise in automation tools like Ansible, and experience with operating platforms at scale using cluster management systems like Kubernetes, OpenStack and Slurm Additionally, you will be actively involved in troubleshooting networking issues, and deploying infrastructure as code using CI/CD pipelines. Key Responsibilities: Linux Administration … provisioning, configuration management, and application deployment. Platform Operations at Scale: Experience in operating platforms at scale, utilising cluster management systems such as Kubernetes or Slurm to manage high-performance computing workloads efficiently. Networking Skills: Strong networking skills including troubleshooting network issues, understanding network topology, protocols, and ensuring efficient traffic … adapt to a fast-paced, dynamic work environment and prioritise tasks effectively. Certifications such as Certified Kubernetes Administrator (CKA), Certified Openstack Administrator or Certified Slurm Administrator (CSA) would be advantageous. more »
SLES EnterpriseMandatory technical skills:Linux administrationSuch as: SuSE or RedHat - any modern Linux distribution admin experience will be considered.Cluster management solutionsSuch as: Bright Cluster manager, PXE booting, OpenHPC, Warewulf or RocksBeneficial technical skills:Experience using or managing HPC clustersSuch as: Beowulf, OpenStack or HadoopExperience managing batch scheduling systemsSuch as … PBS Pro, Slurm, SGE/UGE, Microsoft Scheduler Experience with scientific or engineering applicationsSuch as LSDyna, Altair Hyperworks, AbaqusScripting skillsPrimarily Bash, but any shell scripting along with Python and Perl.Beneficial 'Soft' skills:Good problem-solving skillsStrong stakeholder management skillsStrong communication skillsPlease be aware that you will be joining the more »
provisioning, configuration management, and application deployment. Platform Operations at Scale: Experience in operating platforms at scale, utilising cluster management systems such as Kubernetes or Slurm to manage high-performance computing workloads efficiently. Networking Skills: Strong networking skills including troubleshooting network issues, understanding network topology, protocols, and ensuring efficient traffic … environment. Capacity to adapt to a fast-paced, dynamic work environment and prioritise tasks effectively. Certifications such as Certified Kubernetes Administrator (CKA) or Certified Slurm Administrator (CSA) would be advantageous. more »
Ethernet), processors (Intel/AMD/ARM/NVIDIA), parallel file systems, and data center infrastructure. Additional skills in MPI, parallel job scheduling (e.g., SLURM), and management & monitoring tools (e.g., Icinga, Prometheus, Grafana) are advantageous. Requirements: Eligible and willing to undergo UK Govt. security clearance. Proven experience as a more »