providers such as Azure, AWS or GCP. Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and More ❯
Develop a baseline monitoring and tooling concept for cloud to address the need for compliance infrastructure reporting within agile deliveries as part of our Observability strategy. Develop concepts and tools for chargeback and showback (Financial Instrumentation) in a multicloud context. Implement and mature a cloud forecasting and capacity management solution More ❯
secure applications and infrastructure Strong communication skills, with the ability to convey and or understand complex technical concepts clearly and concisely SRE skills including observability and telemetry monitoring HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul) Containerisation using Docker, Kubernetes, OpenShift & Helm Programming skills using languages such as Python, Go, Java More ❯
Actions, CircleCI ) and orchestration technologies (e.g., Kubernetes, Docker). Proficiency in scripting and programming languages (e.g., Python, Bash, Go). Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog). Solid understanding of security best practices, compliance standards, and DevSecOps . Proven ability to manage and deliver complex projects More ❯
CD Pipeline Development: Develop and maintain robust CI/CD pipelines for continuous integration and deployment of ML models and related infrastructure Monitoring and Observability: Build and maintain comprehensive monitoring and alerting systems for our ML infrastructure and models, leveraging tools like DataDog to ensure system health and performance Collaboration More ❯
Proficiency in cloud platforms (AWS, GCP), Linux/Unix, and container ecosystems (Kubernetes, Docker) Experience with CI/CD, GitOps (ArgoCD/FluxCD), and observability tools (Prometheus, ELK, OpenTelemetry) Skilled in at least one programming or scripting language (Python, Java, Bash, etc.) Networking and security fundamentals, including protocols and best More ❯
skills. Preferred Skills: Experience with TDD, BDD, and automated testing frameworks (PyTest, Selenium). Familiarity with security best practices in software development. Knowledge of observability tools like Prometheus, Grafana, and ELK stack. More ❯
from Home. The Role: Engineer and automate their SaaS solution for security, reliability and scale. Create and/or improve the tools that enable observability on availability and performance of cloud services. Apply DevOps principles and practices to speed-up the delivery of change to the SaaS environment. Collaborate with More ❯
balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps More ❯
GitLab CI/Jenkins) Automate deployments and monitoring for multiple environments Implement Infrastructure as Code using Terraform Manage containerised environments with Docker & Kubernetes Enhance observability with tools like Prometheus , Grafana , and Datadog Collaborate closely with developers, testers, and platform teams 🧰 Tech Stack You'll Use: Cloud: AWS (core services: EC2 More ❯
high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). More ❯
available. We combine problem-solving skills with software and systems engineering to take a proactive approach in building fault-tolerant and secure systems, improving observability and zealously automating away toil. In this role you will: Use your site reliability expertise to design, operate and support Preqin's infrastructure, middleware and More ❯
CD tools and workflows (e.g., GitHub Actions, Jenkins, GitLab CI). Expertise in Infrastructure-as-Code using Terraform (or similar tools). Experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog). Strong communication and collaboration skills. Bonus Points For Experience in containerization and orchestration (e.g., Docker, Kubernetes). Background More ❯
or CloudFormation. Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. Monitor system performance, availability, and security, implementing observability best practices. Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. Your skills and experience Essential: Experience deploying and More ❯
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
City of London, London, Farringdon, United Kingdom Hybrid / WFH Options
83zero Ltd
tools, such as Terraform, CloudFormation, ARM, or Pulumi. Expertise in building secure applications and infrastructure, with strong knowledge of security practices. SRE skills, including observability and telemetry monitoring. Hands-on experience with the HashiCorp Suite (Packer, Terraform, Vault, Vagrant, Consul). Experience in containerisation using Docker, Kubernetes, OpenShift, and Helm. More ❯
Employment Type: Permanent
Salary: £60000 - £80000/annum benefits, perks, and healthcare opti
Terraform and ARM templates. Hands-on experience and understanding of containerization and orchestration with Azure Kubernetes and Docker . Design and implement monitoring and observability solutions to ensure the health and performance of cloud resources and applications. Identify opportunities to optimize cloud resources, improve performance, and reduce costs through monitoring More ❯
enhance internal DevOps culture, tooling, and CI/CD processes. Collaborate cross-functionally to continuously innovate and improve development workflows and system operations. Foster observability and reliability across live systems through best-in-class monitoring and automation. Day to Day: Collaborate with engineers and architects to define and implement cloud More ❯
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
East London, London, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
Central London / West End, London, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯
Bury, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise More ❯