System Engineer

Post Date

Mar 14, 2025

Location

Vancouver,
British Columbia

ZIP/Postal Code

V6Z2H3
Canada
Jul 23, 2025 Insight Global

Job Type

Contract

Category

Software Engineering

Req #

VAN-769251

Pay Rate

$48 - $60 (hourly estimate)

Who Can Apply

  • Candidates must be legally authorized to work in Canada

Job Description

Insight Global is looking for a System Engineer to join one of North America's largest retail and wellness companies. The System Engineer will be part of a technically strong team of engineers who are responsible for ensuring all solutions developed in the Retail Engineering teams meet our load and performance expectations and can scale and grow with the organization.

Their duties include:

- Collaborate with onshore and offshore resources and ensure alignment on priorities
- Collaborate with a cross functional team to develop performance designs, test strategies and plans.
- Identify performance bottlenecks across all tiers, components, layers.
- Conduct performance and capacity optimization analysis and studies to improve the effectiveness of applications.
- Understand the architecture of applications and technology stack to recommend appropriate strategies and ensure the system performance is within defined SLAs.
- Experience in identifying potential failures/impact and setting up failure simulation scenario
- In-depth understanding of distributed systems, microservices architecture, and containerization technologies (such as Docker and Kubernetes)
- Knowledge of Resiliency design pattern and its best practices Circuit breaker, Timeout/ Time limit, Retry, Bulkhead, Fall back etc.
- Knowledge of best practices in software development, testing, and deployment, including CI/CD pipelines and automation tools
- Analysis and resolution of critical and complex application issues (crashes, hung threads, memory leaks, etc.) and performance tuning based on RCA
- Excellent problem-solving skills, with the ability to troubleshoot complex issues and develop effective solutions
- Develop performance and test scripts to simulate real world scenarios
- Conduct Proof of Concept for engineering and testing tools, and demonstrate feasibility of implementing the solution, with business justifications
- Hands on experience of HA and DR simulations
Strong proficiency in any one of Industry standard chaos engineering tools (Chaos Monkey, Chaos Toolkit or Gremlin or Litmus, etc.) and experience in customizing / building chaos tools using Python or any other scripting / programming language
- Hands-on experience in analyzing and measuring MTTD/ MTTR from industry trending monitoring and incident management tools
Strong communication and collaboration skills, with the ability to work effectively in a fast-paced, team-oriented environment
- Monitor all infrastructure and systems installations, including configuration, testing, and maintenance for uninterrupted operations
-Build tools to automate managing IT Operations including CI/CD, Monitoring/Alerting, Incident response

We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form. The EEOC "Know Your Rights" Poster is available here.

To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .

Required Skills & Experience

- 5+ years experience with testing frameworks (ideally JMeter or LoadRunner)
- 4+ years of experience in chaos engineering, Resiliency validation engineering
- 3+ years testing experience for SaaS based products such Splunk and Data Dog or similiar observability tools
- Experience with APM tools such Datadog, Dynatrace, etc. and monitoring tools like Prometheus, Grafana, Splunk etc. across Windows and UNIX platforms, AWS Cloud & Kubernetes
- Experience with integrating performance testing/monitoring into CI/CD Pipelines with GitLab

Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.