Lead Site Reliability Engineer

Post Date

Jul 17, 2025

Location

Vancouver,
British Columbia

ZIP/Postal Code

V6B1Z3

Job Type

Perm

Who Can Apply

Candidates must be legally authorized to work in Canada

Job Description

Insight Global is looking for a Lead Site Reliability Engineer/DevOps Engineer to join our client in the Artificial Intelligence space on a full-time, permanent basis. This is a hybrid role that will require the successful candidate to work on-site, downtown Vancouver 1-day per week. Within the role, you will be responsible for building the best infrastructure and maintaining the health of the internal systems. Ideal candidates should have experience working in a SaaS start-up environment in a lead capacity. There is a large emphasis on monitoring and alerting as you'll be person ensuring the health of the systems through actionable alerts. While New Relic is the main monitoring tool used, experience with similar tools is just as valuable. From a cloud perspective, strong prior AWS experience is a must have. Additionally, strong experience within infrastructure as code tools such as Terraform, Docker and containerization is a must have requirement. Lastly, the successful candidate should have a solid understanding of cloud security and compliance best practices, including SOC 2 readiness and audit support as it pertains to cost savings.

We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form. The EEOC "Know Your Rights" Poster is available here.

To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .

Required Skills & Experience

- 8+ years' experience working as a Site Reliability Engineer or DevOps Engineer, more recently in a lead capacity
- Excellent experience with how to increase the health of systems through creating actionable alerts with monitoring tools such as New Relic, Grafana, Prometheus or PagerDuty
- Strong knowledge and working experience in an AWS environment
- Expert with Infrastructure as Code experience in Terraform or similar tools, Docker and containerization
- Strong understanding of cloud security and best practices for SOC 2 readiness and support

Nice to Have Skills & Experience

- Understanding of scripting and programming languages such as Python and Bash
- Ability to understand backend code written in JavaScript/TypeScript
- Experience working in MongoDB or similar databases

Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.