into microservices architectures. In-depth Linux/Unix experience, emphasizing system performance tuning and automation. Familiarity with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Loki, OTel, ELK stack) to ensure system reliability and performance. Experience in developing and working with backend applications technologies (e.g. Express, Django). Benefits we More ❯
at least one programming language that compiles to machine code such as Rust, C++, or Go. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty. Expert knowledge of deployment technologies such as Pulumi or Terraform. Expert knowledge of Kubernetes. Responsibilities: Improving our observability by adding/adjusting metrics. More ❯
with CI/CD tools (GitHub Actions, Jenkins, AWS CodePipeline), and integrating data-centric workflows. Familiarity with monitoring and logging tools (e.g., Prometheus, Loki, Grafana) in application and data-intensive environments. Proficiency in Configuration Management tools (Chef, Puppet, Ansible) and data orchestration tools (e.g., Airflow, Prefect). Strong background in More ❯
or PowerShell for automation. Understanding of AWS networking concepts, including VPCs, subnets, and security groups. Experience with monitoring and logging solutions such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch. Familiarity with Zero Trust security models and best practices for securing cloud workloads. Ability to troubleshoot complex infrastructure issues and More ❯
london, south east england, united kingdom Hybrid / WFH Options
LHH
or PowerShell for automation. Understanding of AWS networking concepts, including VPCs, subnets, and security groups. Experience with monitoring and logging solutions such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch. Familiarity with Zero Trust security models and best practices for securing cloud workloads. Ability to troubleshoot complex infrastructure issues and More ❯
Bash, or PowerShell for automation. Understanding of AWS networking concepts, including VPCs, subnets, security groups. Experience with monitoring and logging solutions, such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch. Familiarity with Zero Trust security models and best practices for securing cloud workloads. Ability to troubleshoot complex infrastructure issues and More ❯
Bash, or PowerShell for automation. Understanding of AWS networking concepts, including VPCs, subnets, security groups. Experience with monitoring and logging solutions, such as Prometheus, Grafana, ELK Stack, or AWS CloudWatch. Familiarity with Zero Trust security models and best practices for securing cloud workloads. Ability to troubleshoot complex infrastructure issues and More ❯
end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying (PromQL), modelling, pipelines and dashboards to provide concise, focused insights and alerts for distributed systems More ❯
cloud-native environments at scale. Exposure to high-load, high-performance systems and large-scale microservices architectures. Experience with observability and monitoring frameworks (OpenTelemetry, Grafana, Prometheus). Knowledge of Graph Databases and AI integration in platform operations is a plus. Experience mentoring junior engineers and leading cross-functional initiatives. Why More ❯
and/or NoSQL is a plus). Excellent scripting skills in Bash and Python Experience with monitoring and logging tools such as Prometheus, Grafana is essential. Strong problem-solving and troubleshooting abilities Excellent communication and collaboration abilities For United Kingdom NSC Roles Support the product as part of its More ❯
with TDD, BDD, and automated testing frameworks (PyTest, Selenium). Familiarity with security best practices in software development. Knowledge of observability tools like Prometheus, Grafana, and ELK stack. More ❯
e.g., GitHub Actions, Jenkins, GitLab CI). Expertise in Infrastructure-as-Code using Terraform (or similar tools). Experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog). Strong communication and collaboration skills. Bonus Points For Experience in containerization and orchestration (e.g., Docker, Kubernetes). Background in performance tuning and More ❯
containerization for applications and their subsequent orchestration within Kubernetes environments. Experience working on at least one monitoring/observability stack (Datadog, ELK, Splunk, Loki, Grafana). Strong knowledge of Unix or Linux Strong communication skills to collaborate with various stakeholders Able to work independently in a fast-paced environment Detail More ❯
using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful More ❯
container orchestration tools such as Docker and Kubernetes Observability champion, experience of designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Strong scripting skills in Bash, JavaScript or similar Knowledge of SecDevOps security best practices and experience implementing security controls in a cloud environment including SIEM More ❯
production environment. Proficiency in CI/CD pipelines and tooling (e.g., Jenkins, GitLab CI, CircleCI). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, DataDog). Strong understanding of cloud security best practices, including IAM, firewall configurations, and network segmentation. Experience with cloud cost management and optimization strategies. Excellent More ❯
as code (IaC) tools such as Terraform, Ansible, or Chef for automation and configuration management. Strong understanding of monitoring and observability tools like Prometheus, Grafana, Azure App Insights for proactive system monitoring and troubleshooting. Knowledge of networking, security principles, and best practices in a cloud environment. Demonstrated experience of CI More ❯
data throughput in Java and C++. We use Airflow for workflow management, Kafka for data pipelines, Bitbucket for source control, Jenkins for continuous integration, Grafana + Prometheus for metrics collection, ELK for log shipping and monitoring, Docker and Kubernetes for containerisation, OpenStack for our private cloud, Ansible and Terraform for More ❯
in containerisation (Docker) and orchestration (Kubernetes), with a focus on scalability and resilience. Hands-on experience with monitoring, observability, and incident management tools (Prometheus, Grafana, ELK, Azure Monitor, Application Insights, Kusto) and a data-driven approach to improving system reliability. Strategic mindset, able to align technical initiatives with business goals More ❯
fundamentals. Containerisation: Practical experience with Docker (Swarm or Kubernetes) for container orchestration and management. Monitoring and Alerting: Familiarity with monitoring and analytics tools like Grafana, ELK, and Prometheus for system visibility and performance insights. Version Control and Collaboration: Knowledge of Git/GitHub/GitLab for code management, along with More ❯
. Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes ). Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog ). Familiarity with infrastructure-as-code tools like Terraform or CloudFormation. Proficiency in scripting languages (Python, Go, Bash ) and knowledge of software More ❯
Experience with disaster recovery and redundancy strategies in both cloud and on-premises environments. Proficiency with leading monitoring tools, such as Datadog, Splunk , Prometheus, Grafana, ELK Stack, and New Relic. Programming expertise, especially in systems programming languages (e.g., Java, Kotlin, Scala) and databases (e.g., SQL Server, PostgreSQL). Familiarity with More ❯
data throughput in Java and C++. We use Airflow for workflow management, Kafka for data pipelines, Bitbucket for source control, Jenkins for continuous integration, Grafana + Prometheus for metrics collection, ELK for log shipping and monitoring, Docker and Kubernetes for containerisation, OpenStack for our private cloud, Ansible and Terraform for More ❯
networking, security, and system architecture. Proficient in scripting languages (Java, Golang, Python, Bash, or similar). Experience with monitoring and observability tools (DataDog, Prometheus, Grafana). Knowledge of database management systems (PostgreSQL, Bigtable). Understanding of API and microservices architecture. Strong people leadership skills with at least a year in More ❯