Agile events and activities. Team Technologies used include Python, Conda, Behavior Driven Development (PyTest-BDD, Cucumber), Gherkin, Ubuntu, Docker, Jenkins, Bash, Groovy, C++/CUDA, JIRA, and Github. Work schedule is flexible, but some intersection with team members in different timezones will be required (two regular meetings per week More ❯
another engineering field. Examples include nonlinear estimation, numerical simulation, nonlinear optimization, and control theory. Experience in the following would be beneficial but not mandatory: CUDA C/C++ GPU computing High performance computing Scientific computing Natural language processing Computer vision Compensation and Benefits: Base Salary Range More ❯
another engineering field. Examples include nonlinear estimation, numerical simulation, nonlinear optimization, and control theory. Experience in the following would be beneficial but not mandatory: CUDA C/C++ GPU computing High performance computing Scientific computing Natural language processing Computer vision Compensation and Benefits: Base Salary Range More ❯
training and serving foundation models at scale (federated learning a bonus); distributed computing frameworks (e.g., Spark, Dask) and high-performance computing frameworks (MPI, OpenMP, CUDA, Triton); cloud computing (on hyper-scaler platforms, e.g., AWS, Azure, GCP); building machine learning models and pipelines in Python, using common libraries and frameworks More ❯
MLIR, Triton, etc.). Expertise in tailoring algorithms and ML models to exploit GPU strengths and minimize weaknesses. Knowledge of low-level GPU programming (CUDA, OpenCL, etc.) and performance tuning techniques. Understanding of modern GPU architectures, memory hierarchies, and performance bottlenecks. Ability to develop and utilize sophisticated performance models More ❯
latency, high-performance, Real Time video or image processing software Experience developing or implementing Real Time image processing algorithms using hardware acceleration Experience with CUDA or OpenCL Experience with TensorRT, Triton, or equivalent AI acceleration/inferencing frameworks Ability to write clear, maintainable and well-documented code Capability to More ❯
medical device development Technical Expertise: Experience with multi-tasking systems (real-time preferable) and familiarity with signal processing or AI/ML applications using CUDA on GPUs (preferred), medical device communications protocols (HL7, FHIR) Development Approach: Knowledge of agile methodologies and best practices in software development Tools & Practices: Proficiency More ❯
deep learning, including multivariate calculus, linear algebra, and optimization techniques. Proficient in Python and deep learning frameworks such as TensorFlow and PyTorch. Experience with CUDA kernels and GPU profiling is a plus. Excellent communication skills, with the ability to present complex technical ideas to both technical and non-technical More ❯
learning for molecules and proteins (ideally with some background in chemistry and biological sciences). Lower-level programming for hardware efficiency, e.g. C++/CUDA/Triton. Practical familiarity with hardware capabilities for deep learning – threads, caches, vector & matrix engines, data dependencies, bus widths and throttling. Practical familiarity with More ❯
complex machine learning algorithms into scalable, production-quality code, with proficiency in Python and a strong understanding of optimization techniques (experience with Cython and CUDA is a plus). Experience in developing Large Language Models (LLMs) is advantageous. In-depth understanding of computer architecture and its implications on AI More ❯
Background: Experience in highly regulated industries, preferably in medical device development. Technical Expertise: Experience with multi-tasking systems, Linux and RTOS, FPGAs, micro-controllers, CUDA, communication protocols (e.g. I2C, SPI, UART, USB, Ethernet, PCIe), driver development and familiarity with signal processing using GPU (preferred). Development Approach: Knowledge of More ❯
experience developing and training deep learning models using PyTorch and/or JAX. Excellent experience with Sci-Kit Learn, Pandas and Numpy. Experience with CUDA or Triton. Experience with Computer Vision, Natural Language Processing and Graph Neural Networks. Experience with a wide range of generative modelling techniques, including diffusion More ❯
at top-tier conferences like NeurIPS, CVPR, ICRA, ICLR, CoRL etc. Strong software engineering experience in Python and other relevant languages (e.g., C++ and CUDA) Experience bringing an ML research concept through to production and at scale This is a full-time role based in our office in London. More ❯
on low-precision arithmetic, deep learning models including large generative models for language, vision and other modalities . Experience writing C++/Triton/CUDA kernels for performance optimisation of ML models. Have contributed to open-source projects or published research papers in relevant fields. Knowledge of cloud computing More ❯
on low-precision arithmetic, deep learning models including large generative models for language, vision and other modalities}. Experience writing C++/Triton/CUDA kernels for performance optimisation of ML models. Have contributed to open-source projects or published research papers in relevant fields. Knowledge of cloud computing More ❯
detailed breakdown of all the technologies we use: Backend: Python Frontend: Typescript and React Kubernetes for deployment GCP for underlying infrastructure Machine Learning: PyTorch, CUDA, Ray We encourage people from all backgrounds, cultures, and skill levels to apply. It is okay to not meet all requirements listed as we More ❯
well as NVIDIA GPU ecosystems and optimization stacks Highly metric-based Strong Python and C++ skills Bonus Qualifications Experience optimizing kernels with Triton or CUDA Enjoy completely reimagining and reconstructing production systems Experience with large models (>100M parameters) #J-18808-Ljbffr More ❯
Hands-on with monitoring tools like Prometheus and Grafana Nice to have: Experience building Developer Experience (DevX) tools and workflows Familiarity with GPU setups (CUDA, TensorFlow, etc.) Strong networking and network security knowledge Linux/Unix skills and shell scripting A degree in Computer Science or a related field More ❯
london, south east england, united kingdom Hybrid / WFH Options
Velocity Tech
Hands-on with monitoring tools like Prometheus and Grafana Nice to have: Experience building Developer Experience (DevX) tools and workflows Familiarity with GPU setups (CUDA, TensorFlow, etc.) Strong networking and network security knowledge Linux/Unix skills and shell scripting A degree in Computer Science or a related field More ❯
vehicle software on commercial automobiles, and/or knowledge of ASPICE, DriveOS, or AutoSAR. Proven experience in GPU programming and optimization, with proficiency in CUDA, OpenCL, or other GPU programming frameworks. Experience with QNX or similar real-time operating systems. A Master’s degree or greater in Computer Science More ❯
the boundaries of model performance. You'll also work on re-implementing models in an efficient manner by using PyTorch and underlying technologies like Cuda Kernels, Torch compilation techniques. This would include: Evaluating and optimising compute resource usage (e.g., Hopper GPUs) for cost and time efficiency at training and More ❯
detailed breakdown of all the technologies we use: Backend: Python Frontend: Typescript and React Kubernetes for deployment GCP for underlying infrastructure Machine Learning: PyTorch, CUDA, Ray We encourage people from all backgrounds, cultures and skill levels to apply. It is okay to not meet all requirements listed as we More ❯
decoding, and transmission at scale (e.g. HLS, WebRTC, and FFMPEG). Accelerator experience. You've developed GPU kernels and/or ML compilers (e.g., CUDA, OpenCL, TensorRT Plugins, MLIR, TVM, etc). Real-time experience. You've optimized systems to meet strict utilization and latency requirements with tools such More ❯
of experience in HPC environments, particularly for AI/ML workloads. Proficiency in parallel programming, distributed systems, and HPC-specific libraries (e.g., MPI, OpenMP, CUDA, ROCm). Hands-on experience with at least one hardware platform (e.g., NVIDIA GPUs, AMD GPUs, TPUs, FPGAs, or custom ASICs). Familiarity with More ❯
of experience in HPC environments, particularly for AI/ML workloads. Proficiency in parallel programming, distributed systems, and HPC-specific libraries (e.g., MPI, OpenMP, CUDA, ROCm). Hands-on experience with at least one hardware platform (e.g., NVIDIA GPUs, AMD GPUs, TPUs, FPGAs, or custom ASICs). Familiarity with More ❯