automation. Collaborate & Support: Work hand-in-hand with cross-functional teams and help develop the junior SRE through mentoring and knowledge sharing. Monitor & Troubleshoot: Strengthen monitoring systems (moving from Nagios to Datadog) and take ownership of incident management. What You'll Bring Solid experience in SRE or DevOps roles within cloud environments (AWS preferred). Confidence with infrastructure-as-code More ❯
procedures. Implement proactive monitoring measures to detect and prevent issues. Monitor & Troubleshoot Troubleshoot system issues using logs, monitoring tools, and a methodical approach. Oversee and enhance system monitoring with Nagios, with a transition to Datadog. Incident Management Support incident management processes, including post-mortems and follow-up actions. Communicate outcomes with customers clearly and effectively. What We’re Looking For … security best practices. Version control experience (e.g., Git). Strong troubleshooting and root cause analysis skills. Desirable Skills Experience with Kubernetes and/or other cloud platforms. Familiarity with Nagios, Datadog, or similar monitoring tools. Exposure to CI/CD systems such as TeamCity, AWS CodeBuild, AWS CodePipeline, or ArgoCD. Personal Attributes Proactive, curious, and process-driven. Enjoys collaboration and More ❯