owned the uptime and performance of user-facing systems. Comfortable participating in (and improving) on-call rotations and incident management. Experience setting up/tuning observability (Prometheus, Grafana, CloudWatch, OpenTelemetry, etc.). Build great tooling & abstractions You’ve built internal tools, libraries, or platforms on top of cloud providers so product teams can move faster with fewer foot-guns. You More ❯
scalability, and uptime. ✅ Design and implement Infrastructure as Code using Terraform and modern DevOps practices. ✅ Optimize Azure environments for cost, performance, and resilience. ✅ Own observability with Prometheus, Grafana, and OpenTelemetry—reduce detection and response times. ✅ Build and streamline CI/CD pipelines (Azure DevOps, GitHub Actions) for faster, safer deployments. ✅ Lead and mentor a team of engineers, define technical roadmaps More ❯
cloud (preferably Azure) using Terraform and Kubernetes. Manage CI/CD pipelines using GitHub Actions and ensure smooth delivery to production. Own monitoring, alerting, and observability, using tools like OpenTelemetry and Dynatrace. Security & Compliance: Champion secure coding practices and data protection across services. Collaboration & Mentoring: Work closely with product owners, engineering leads, and other stakeholders to shape technical solutions. Mentor More ❯
with observability tools, APM, log analytics, and infrastructure monitoring. Proficiency in scripting or programming languages (e.g., Java, Python, JavaScript). Certifications in Dynatrace, AWS, Azure, or GCP. Familiarity with OpenTelemetry, FluentBit, Cribl, or similar data pipeline tools.Ability to translate technical capabilities into business value, aligning observability solutions with customer KPIs and strategic goals Excellent communication and presentation skills. Ability to More ❯
will be helping the client move to an AIOps environment. What you'll need to succeed Extensive experience in observability/SRE/platform engineering roles Strong experience with OpenTelemetry, Prometheus, Grafana, Splunk, Elastic etc Python, Go or Java programming Experience with Terraform, Helm or other IAC tools What you'll get in return An exciting opportunity to join an More ❯
West London, London, United Kingdom Hybrid/Remote Options
Staffworx Limited
development. Familiarity with testing frameworks (Vitest, Playwright) for both API and end-to-end testing. Experience with Docker, Helm, YAML, Kubernetes, and cloud-native deployments. Telemetry tools; Prometheus, Grafana, OpenTelemetry, DataDog, APM tools Understanding of infrastructure-as-code and CI/CD pipelines. Ability to improve codebases and influence architectural direction. Experience mentoring or coaching engineers. Please send updated CV More ❯
Wigan, Lancashire, England, United Kingdom Hybrid/Remote Options
Searchability
or .NET preferred) * Cloud experience, ideally AWS, and knowledge of container orchestration (Kubernetes) and Infrastructure as Code (Terraform) * Experience with monitoring and observability tools such as Grafana, Prometheus or OpenTelemetry * Strong understanding of networking fundamentals and distributed systems* Ability to collaborate effectively with engineering, operations and product teams TO BE CONSIDERED: Please either apply through this advert or email me More ❯
swindon, wiltshire, south west england, united kingdom Hybrid/Remote Options
Humana
Become a part of our caring community and help us put health first Why Join Enterprise Observability Engineering? The Enterprise Observability Engineering team is a high-impact, high-autonomy group focused on building intelligent, scalable, and resilient observability solutions. We More ❯
Strong experience with AWS, GCP, or Azure, plus Kubernetes or other containerized environments. Proficiency in search and query languages (KQL, PromQL, SPL, Lucene, Elasticsearch DSL, etc.). Understanding of OpenTelemetry standards, metrics, logs, traces, and observability best practices. Experience with APIs, Infrastructure as Code, and data ingestion workflows. Excellent communication and documentation skills across technical and non-technical audiences. Preferred More ❯
their success Available to work 9-5 pm BST, and help to offset the current team coverage Nice to Have: Experience with instrumentation and distributed tracing tools such as OpenTelemetry An understanding of modern observability practices and/or other observability solutions Familiarity with using and troubleshooting any of the following technologies or similar: Cloud networking and administration, including Kubernetes More ❯
Warwick, Warwickshire, West Midlands, United Kingdom Hybrid/Remote Options
Sanderson Government and Defence
ElasticSearch clusters, Kibana dashboards, and Logstash pipelines. Integrate SIEM with cloud-native observability tools (AWS CloudWatch, Azure Monitor, GCP Operations Suite). Automate log collection and enrichment using Beats, OpenTelemetry, and scripting. Security Use Cases & Threat Detection Build and maintain SIEM use cases, alerts, and dashboards for threat detection. Map detection rules to frameworks like MITRE ATT&CK, STRIDE, and More ❯
Wigan, Lancashire, England, United Kingdom Hybrid/Remote Options
Searchability
SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability and performance tuning Experience with observability tools (Grafana, Prometheus, OpenTelemetry) Proficiency in a programming language such as Go or .NET for automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure … process and submit (subject to required skills) your application to our client in conjunction with this vacancy only. KEY SKILLS SRE, Site Reliability Engineering, AWS, Kubernetes, Terraform, Grafana, Prometheus, OpenTelemetry, Go, .NET, Cloud Infrastructure, Observability, CI/CD, DevOps, Automation, Performance Tuning, Incident Management More ❯
Wigan, Greater Manchester, United Kingdom Hybrid/Remote Options
Searchability (UK) Ltd
SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability and performance tuning Experience with observability tools (Grafana, Prometheus, OpenTelemetry) Proficiency in a programming language such as Go or .NET for automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure … process and submit (subject to required skills) your application to our client in conjunction with this vacancy only. KEY SKILLS SRE, Site Reliability Engineering, AWS, Kubernetes, Terraform, Grafana, Prometheus, OpenTelemetry, Go, .NET, Cloud Infrastructure, Observability, CI/CD, DevOps, Automation, Performance Tuning, Incident Management More ❯
About Us Have you ever wanted to build something that doesn't just improve the status quo, but makes it 100x better? Not just a small step forward, but a complete reinvention. That's what we're doing, and we More ❯