organisational skills Additional skills in any of the following also beneficial C#, WinForms, WPF, Qt/QML framework, HTML5, experience with algorithmic problems (OpenCL, CUDA), Machine learning and AI, OpenGL, OpenGL Shaders, VTK, OSG, Vulkan, JIRA, TestRail, TFS, Git, Jenkins, csh/bash, batch files, cmake, PowerShell. About the More ❯
Agile events and activities. Team Technologies used include Python, Conda, Behavior Driven Development (PyTest-BDD, Cucumber), Gherkin, Ubuntu, Docker, Jenkins, Bash, Groovy, C++/CUDA, JIRA, and Github. Work schedule is flexible, but some intersection with team members in different timezones will be required (two regular meetings per week More ❯
another engineering field. Examples include nonlinear estimation, numerical simulation, nonlinear optimization, and control theory. Experience in the following would be beneficial but not mandatory: CUDA C/C++ GPU computing High performance computing Scientific computing Natural language processing Computer vision Compensation and Benefits: Base Salary Range More ❯
MLIR, Triton, etc.). Expertise in tailoring algorithms and ML models to exploit GPU strengths and minimize weaknesses. Knowledge of low-level GPU programming (CUDA, OpenCL, etc.) and performance tuning techniques. Understanding of modern GPU architectures, memory hierarchies, and performance bottlenecks. Ability to develop and utilize sophisticated performance models More ❯
medical device development Technical Expertise: Experience with multi-tasking systems (real-time preferable) and familiarity with signal processing or AI/ML applications using CUDA on GPUs (preferred), medical device communications protocols (HL7, FHIR) Development Approach: Knowledge of agile methodologies and best practices in software development Tools & Practices: Proficiency More ❯
complex machine learning algorithms into scalable, production-quality code, with proficiency in Python and a strong understanding of optimization techniques (experience with Cython and CUDA is a plus). Experience in developing Large Language Models (LLMs) is advantageous. In-depth understanding of computer architecture and its implications on AI More ❯
a strong focus on memory management, multi-threading, and low-level performance optimizations. Experience with GPU architectures (e.g., NVIDIA, AMD) and programming frameworks like CUDA, OpenCL, and TensorFlow. Understanding of machine learning algorithms, including model training and inference, and how to optimize these for GPU-based computation. Strong knowledge More ❯
ML frameworks. Experience optimizing deep learning performance on accelerator hardware. Solid knowledge of deep learning algorithms and compute patterns. Strong programming skills in C++, CUDA, or OpenCL. Background in performance profiling and optimization. BS/MS in Computer Science, Electrical Engineering, or a related field. Interested? Send your CV More ❯
Background: Experience in highly regulated industries, preferably in medical device development. Technical Expertise: Experience with multi-tasking systems, Linux and RTOS, FPGAs, micro-controllers, CUDA, communication protocols (e.g. I2C, SPI, UART, USB, Ethernet, PCIe), driver development and familiarity with signal processing using GPU (preferred). Development Approach: Knowledge of More ❯
on low-precision arithmetic, deep learning models including large generative models for language, vision and other modalities . Experience writing C++/Triton/CUDA kernels for performance optimisation of ML models. Have contributed to open-source projects or published research papers in relevant fields. Knowledge of cloud computing More ❯
detailed breakdown of all the technologies we use: Backend: Python Frontend: Typescript and React Kubernetes for deployment GCP for underlying infrastructure Machine Learning: PyTorch, CUDA, Ray We encourage people from all backgrounds, cultures, and skill levels to apply. It is okay to not meet all requirements listed as we More ❯
South West London, London, United Kingdom Hybrid / WFH Options
La Fosse
Sports tech experience: Background applying AI/ML in the sports domain for data generation or insights. Systems optimisation: Knowledge of GPU kernel development (CUDA, OpenCL, etc.), real-time system optimisation (e.g., Nvidia NSight), or experience working with embedded SoCs (Nvidia, Qualcomm, etc.). If you're interested in More ❯
the boundaries of model performance. You'll also work on re-implementing models in an efficient manner by using PyTorch and underlying technologies like Cuda Kernels, Torch compilation techniques. This would include: Evaluating and optimising compute resource usage (e.g., Hopper GPUs) for cost and time efficiency at training and More ❯
Hands-on with monitoring tools like Prometheus and Grafana Nice to have: Experience building Developer Experience (DevX) tools and workflows Familiarity with GPU setups (CUDA, TensorFlow, etc.) Strong networking and network security knowledge Linux/Unix skills and shell scripting A degree in Computer Science or a related field More ❯
the boundaries of model performance. You'll also work on re-implementing models in an efficient manner by using PyTorch and underlying technologies like Cuda Kernels, Torch compilation techniques. This would include: Evaluating and optimising compute resource usage (e.g., Hopper GPUs) for cost and time efficiency at training and More ❯
decoding, and transmission at scale (e.g. HLS, WebRTC, and FFMPEG). Accelerator experience. You've developed GPU kernels and/or ML compilers (e.g., CUDA, OpenCL, TensorRT Plugins, MLIR, TVM, etc). Real-time experience. You've optimized systems to meet strict utilization and latency requirements with tools such More ❯
and enthusiasm for exploring new methods and technologies Effectively manage multiple responsibilities and can adjust to shifting priorities. Responsibilities Design and develop Python and CUDA/HIP C++ code that enable distributed training of multimodal LLMs ingesting text, audio, images, or video data. Build and maintain cutting-edge infrastructure More ❯
to our rapidly growing team The role will be exposed to a broad tech stack (e.g. ReactJS, Python, REST & GraphQL, OpenCV, PyTorch, GCP, AWS & CUDA, Kubernetes) and the cutting edge of computer vision and deep learning. Qualifications The right candidate will have a proven track record of relevant publications More ❯
frameworks, is highly desirable. Experience working with GPUs: Hands-on experience in configuring, managing, and optimising GPU resources for computational tasks, including familiarity with CUDA, TensorFlow, or similar frameworks, is a strong advantage. Deep understanding of networking, protocols and network-security concepts. Good familiarity with UNIX-like operating-systems More ❯
hands-on experience with real-time, low-latency ML pipelines in high-performance environments. They should possess strong engineering skills, including expertise in Python, CUDA, or C++. They should also have knowledge of machine learning frameworks such as PyTorch, TensorFlow, or JAX. Additionally, the candidate should be proficient in More ❯
on proficiency in C/C++. 3.Working experience in GPU or GPGPU UMD driver development. 4.Proficiency and working experience with GPGPU APIs such as CUDA/HIP/OpenCL. Preferred Qualifications: 5.Familiarity with CUDA or ROCm development and debugging. 6.Good understanding of GPU hardware/software architecture, including More ❯
outputs from the ML team onto video pipelines. Develop efficient inference pipelines for running AI models in real-time on constrained hardware. Implement custom CUDA kernels. Collaborate with cross-functional teams, including ML researchers, embedded software engineers, and UI/UX designers, to integrate ML solutions seamlessly into products. … deep learning frameworks such as TensorFlow or PyTorch Hands-on experience and strong theoretical knowledge in quantization and pruning Experience with kernel development using CUDA or OpenCL for image processing Hands-on experience with TensorRT, embedded hardware accelerators and the ONNX Strong proficiency in both C++ and Python Software More ❯