Hybrid (San Jose or Raleigh) - Site Reliability Engineer

Post Date

Jan 12, 2026

Location

San Jose,
California

ZIP/Postal Code

95134

Job Type

Contract

Job Description

A large software and networking company is looking for a Site Reliability Engineer to join a growing team (ideally Hybrid in San Jose or RTP). This person would have a core focus on platform automation and will help lead the automation and AI integration intiatiave for a cloud-native platofrm built on Kubernetes, Helm, Terraform, AWS and GitHub Actions. The project transforms developer onboarding to the platform eliminating manual processes through intelligent automation.

Day to day:
- Develop automation solutions primarily in Python with supporting Shell scripting
- Build tools and automation to reduce manual operations and improve team efficiency
- Integrate and maintain secrets management using HashiCorp Vault
- Automate deployment, release, and infrastructure management workflows
- Manage container registries and artifact repositories in Artifactory

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

- 5+ years in Site Reliability Engineering, DevOps, and/or Platform Engineering roles
- Proven track record of building automation solutions that reduce manual operations (ideally in Ansible)
-Excellent Python skills along with Shell scripting
- Experience managing enterprise-scale infrastructure and deployments
- Experience with AI/ML infrastructure or platforms
- Experience working in global teams across multiple time zones
- Strong hands-on experience with Kubernetes (AWS EKS and/or OpenShift)
- Proficiency in Python scripting (primary language) and Shell scripting
- Deep understanding of infrastructure-as-code principles with Terraform/Terragrunt
- Experience building and maintaining CI/CD pipelines, particularly with GitHub Actions
- Solid knowledge of monitoring and observability tools (Splunk, Grafana, Prometheus)
- Experience with container technologies (Docker) and artifact management (Artifactory)
- Understanding of API gateways (APISix or similar platforms)
- Experience with policy engines (Kyverno or similar tools)
- Knowledge of secrets management solutions (HashiCorp Vault or similar)
- AWS cloud platform experience

Nice to Have Skills & Experience

-Previous Cisco Experience
- Background in multi-cluster Kubernetes environments
- Terraform/Terragrunt module development experience
- Experience with infrastructure migration projects
- Contributions to open-source infrastructure projects

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.