The Resiliency team is part of the Production Engineering organization that builds, operates, and improves the heart of Shopify’s technical platform, and unlock the power of planet-scale infrastructure for all of Shopify’s merchants, buyers, and developers.
- Responding to automated alerts and execute playbooks.
- Managing ongoing incidents, using your understanding of Shopify to involve the right teams and resolve as quickly as possible.
- Cleaning up the noise in our signals, ensuring we can get an understanding of the system and debug a problem easily.
- Acting as a force multiplier across and within engineering departments.
- Collaborating with high-calibre engineering teams across Shopify to help them create resilient systems.
- Ensuring we never fail for the same reason twice.
- Helping teams build tools to automate the toil of on-call duties.
- Following up on each meaningful incident to ensure the appropriate learnings are extracted and teams know what to do next.
- Setting standards with teams for building resilient, debuggable systems.
- Comfort with hands-on development, navigating through multiple programming languages (Java, Python, Go, etc), digging deep in the stack, and using cloud infrastructure (AWS, GCE, Azure, Kubernetes, Docker).
- You understand the meaning of continuous improvement and evolving systems.
- Strong software engineering skills, primarily in backend software development.
- Experience working with a variety of open-source software, including nginx, redis, Memcached and MySQL.
- Familiarity with network and web protocols, from IP to HTTP.
- A commitment and drive for quality, technical excellence and results.
- You reject the idea that on call has to be a terrible, disruptive experience.
- Experience handling multiple on-call shifts for mission-critical systems, and responsibility for the tools and processes used to debug and correct failures.
- You understand how to improve difficult situations through short and iterative projects.
- You know what good observability looks like, but more importantly, how to get there.
- Experience with mentorship and helping teammates level up their craft and technical skills.
- You’ve navigated more than one incident through to the retrospective process.
Vacancy Type: Full Time
Job Location: Cambridge, Ontario, CA
Application Deadline: N/A