Who Can Apply
- Candidates must be legally authorized to work in Canada
Job Description
Insight Global is looking for a Site Reliability Engineer to join one of Vancouver's largest retail/e-commerce-based companies on a 6 month contract, requiring to be on-site 3 days a week. In this role, you will be joining the Foundations team which is responsible for observability and monitoring in Site
Reliability Engineering, guiding the digital organization to improve the practice of reliability here at lululemon. This is a consultative enablement team providing guidance and support to product engineering teams for the development of high-quality and resilient software systems through the use of monitoring tools and practices. SRE partners with many product engineering teams across digital and beyond to infuse the concepts and practices of reliability into engineering process and deliverables. The Foundations team owns the management of the monitoring tools and the best practices for using those tools to provide total visibility into systems. This role requires a vision and strategy for monitoring and how to manage it across a disparate organization.
As a SRE Engineer, you will be responsible for designing, implementing, and maintaining robust monitoring solutions, creating insightful dashboards, identifying relevant metrics, and driving efficient problem management practices. You will help identify observability maturity opportunities and roadblocks to success for digital teams and clearing those roadblocks. You will partner closely with Product Owners and Scrum Masters to manage scope and strike a balance between support and investment work. You are expected to clearly communicate risks to your partners for deliverables.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
• 5+ years as a Site Reliability Engineer with an expertise on observability and monitoring
• Proficient with observability and monitoring tools such as: Datadog, Splunk, New Relic
• Experience engaging with teams to discuss Service Level Objectives (SLOs) and Service Level Indicators (SLIs), understanding performance, latency, and availability to create meaningful SLOs.
• Proficient in building custom dashboards that go beyond standard, out-of-the-box solutions. They should know how to pull APIs and tailor the dashboard to reflect various services
Incident management experience, including data collection for metrics like mean time to resolve, restore, and detect issues capable of conducting root cause analyses after incidents
Nice to Have Skills & Experience
Prior experience in e-commerce or high-availability digital platforms.
Background in product ownership or leading reliability-focused initiatives.
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.