automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and business continuity planning. Experience with performance tuning and capacity planning. Understanding of chaosengineering principles and practices. Skills in cost optimization for cloud infrastructure. Specific Tools and Techniques: Experience in using cloud native monitoring tools like AWS CloudWatch, Azure Monitor, and More ❯
salary and annual discretionary bonus. Pension contributions, in addition to Health Insurance, Life Assurance. 25 Annual Leave. What You’ll Be Doing This is a hands-on and strategic engineering role where you’ll be responsible for ensuring production stability across a highly dynamic microservices architecture hosted in Azure . You’ll have end-to-end ownership over reliability … . Automating recovery, scaling, and monitoring across distributed systems. Collaborating with cross-functional teams to align platform strategy and reliability goals. What You’ll Bring: 5+ years in software engineering or SRE/production infrastructure roles. Strong experience with Java (Spring) and cloud platforms (ideally Azure ). Proven track record in building and maintaining mission-critical systems. Deep understanding … of Kubernetes, observability tooling (Grafana, Prometheus, ELK, etc.), and Infrastructure as Code (Terraform, Bicep). Ability to lead technical conversations across Engineering and Product. Bonus points if you bring: Experience in fintech, crypto, or regulated digital infrastructure RDBMS performance tuning (MS SQL) Knowledge of SLAs/SLOs/chaosengineering and platform risk management More ❯
salary and annual discretionary bonus. Pension contributions, in addition to Health Insurance, Life Assurance. 25 Annual Leave. What You’ll Be Doing This is a hands-on and strategic engineering role where you’ll be responsible for ensuring production stability across a highly dynamic microservices architecture hosted in Azure . You’ll have end-to-end ownership over reliability … . Automating recovery, scaling, and monitoring across distributed systems. Collaborating with cross-functional teams to align platform strategy and reliability goals. What You’ll Bring: 5+ years in software engineering or SRE/production infrastructure roles. Strong experience with Java (Spring) and cloud platforms (ideally Azure ). Proven track record in building and maintaining mission-critical systems. Deep understanding … of Kubernetes, observability tooling (Grafana, Prometheus, ELK, etc.), and Infrastructure as Code (Terraform, Bicep). Ability to lead technical conversations across Engineering and Product. Bonus points if you bring: Experience in fintech, crypto, or regulated digital infrastructure RDBMS performance tuning (MS SQL) Knowledge of SLAs/SLOs/chaosengineering and platform risk management More ❯
generation omni-commerce Gateway. We are currently hiring a Principle/Distinguished Engineer to support teams within this domain. In this role you will lead highly technical and strategic engineering initiatives on mission-critical platforms across our team, enabling every engineer to their best work. Your role will be tasked with solving the most complex, challenging technical problems across … this team to meet our demanding needs. You will play an influential role in partnership with engineering leadership group and other cross-divisional VPs of Engineering, owning technical vision and direction as well as Developer Experience. In order to excel in this role you will possess: Great communication skills. Ability to influence across teams and with senior stakeholders. … to speed on the latest and greatest happenings within technology. Strong appreciation of Event Storming and DDD having applied these mythologies in shaping microservices architectures. Experience in creating/engineering Cloud Native Architectures. Additional Experience (nice to have). Some experience with Model Context Protocol/AI having had some experience in how this can shape the future of More ❯
and manage reliability, feature flags and cloud costs. The Harness Software Delivery Platform includes modules for CI, CD, Cloud Cost Management, Feature Flags, Service Reliability Management, Security Testing Orchestration, ChaosEngineering, Software Engineering Insights and continues to expand at an incredibly fast pace. Harness is led by technologist and entrepreneur Jyoti Bansal, who founded AppDynamics and sold More ❯
Staff Software Engineer, AI Reliability Engineering London, UK About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build … maintaining SLO/SLA frameworks for business-critical services Are comfortable working with both traditional metrics (latency, availability) and AI-specific metrics (model performance, training convergence) Have experience with chaosengineering and systematic resilience testing Can effectively bridge the gap between ML engineers and infrastructure teams Have excellent communication skills Strong candidates may also: Have experience operating large More ❯
AWS Fault Injection Service is a fully managed service for running fault injection experiments to improve an application's performance, observability, and resilience. Fault injection experiments are used in chaosengineering, which is the practice of stressing an application by creating disruptive events in testing or production environments. Examples of these events are sudden increase in CPU or … naturally customer centric and thrive in a fast-paced environment that requires strong technical and business judgment and solid written and verbal communication skills. You are experienced on leading engineering teams, helping individuals grow and making the team effective, while remaining humble and fun! If this sounds like the right challenge for you, then please apply today! Key job … s why you'll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS - 2+ years of engineering team management experience - Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management More ❯
and manage reliability, feature flags and cloud costs. The Harness Software Delivery Platform includes modules for CI, CD, Cloud Cost Management, Feature Flags, Service Reliability Management, Security Testing Orchestration, ChaosEngineering, Software Engineering Insights and continues to expand at an incredibly fast pace. Harness is led by technologist and entrepreneur Jyoti Bansal, who founded AppDynamics and sold … afraid of being data driven - including using Salesforce and other tools to track your progress Managing full sales cycle from prospect to close Collaborating with other teams, including sales engineering and sales development About You A proven track record of driving and closing enterprise deals Account planning and execution skills Ability to sell C-Level and across both IT More ❯
and manage reliability, feature flags and cloud costs. The Harness Software Delivery Platform includes modules for CI, CD, Cloud Cost Management, Feature Flags, Service Reliability Management, Security Testing Orchestration, ChaosEngineering, Software Engineering Insights and continues to expand at an incredibly fast pace. Harness is led by technologist and entrepreneur Jyoti Bansal, who founded AppDynamics and sold … afraid of being data driven - including using Salesforce and other tools to track your progress Managing full sales cycle from prospect to close Collaborating with other teams, including sales engineering and sales development About You A proven track record of driving and closing deals Account planning and execution skills Ability to sell C-Level and across both IT and More ❯
in technology operations, who is looking to broaden their skillset. After developing your specialist skills you are now looking for opportunities to grow and learn more about wider resilience, chaosengineering and cloud services - we will support, provide guidance and mentor you. Nevertheless, we are open to other experiences as we are creating a new diverse and dynamic More ❯