Remote, Site Reliability Engineer

Post Date

Feb 04, 2026

Location

Pomona,
California

ZIP/Postal Code

91768

Job Type

Contract-to-perm

Job Description

Insight Global is seeking a highly skilled Remote Site Reliability Engineer to join the IT Operations team to maintain the reliability, performance, and availability of the enterprise platforms. In this role you'll be responsible for monitoring platform health, responding to critical incidents, implementing automation, and continuously improving our observability and reliability tooling. To be successful you must have experience with the following tech stack: Prometheus, Grafana, Datadog, Python/PowerShell/Shell/Go & Azure DevOps. To be successful in this role you must be self-sufficient and proactive in resolving issues, comfortable working with incomplete details and driving clarity and strong collaboration skills to engage with cross-functional teams.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

- 5+ years of experience as a Site Reliability Engineer (SRE).
- Experience with Grafana to create dashboards and define Service Level Indicators (SLIs) / Service Level Objectives (SLOs), capacity & latency trends.
- Experience monitoring and optimizing performance using Prometheus to measure reliability, error budgets, and performance.
- Experience working with Datadog to collect logs, traces, deep APM and container visibility.
- Experience with supporting Continuous Integration/Continuous Development (CI/CD), automation and deployments with Azure DevOps.
- Experience with scripting and automation utilizing Python and/or PowerShell.
- Experience working in Scrum teams, planning, scoping and creation of technical solutions for new product capabilities.

Nice to Have Skills & Experience

- Experience implementing Infrastructure-as-Code (IaC) with Terraform for automated provisioning.
- Experience containerizing applications using Docker and orchestrate deployments with Kubernetes.
- Experience with configuring VMs after provisioning and application deployment automation using Ansible.
- Experience deploying workloads using GitOps patterns with Flux and/or Helm.
- Azure Fundamentals (AZ-900)
- Azure Administrator Associate (AZ-104)
- Azure DevOps Engineer Expert (AZ-400)
- Terraform Associate (HashiCorp) - Infrastructure as Code (IaC) for Azure environments.
- Certified Kubernetes Administrator (CKA) - Container orchestration, often paired with AKS.
- Certified Kubernetes Security Specialist (CKS)

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.