Back to Search Results

INTL - EU - Senior Site Reliability Engineer

Post Date

Apr 28, 2025

Location

Novato,
California

ZIP/Postal Code

94949

Job Type

Contract

Job Description

Systems Design, Scaling & Resilience
Build and operate distributed Unix-based systems (Ubuntu, Debian, Red Hat, CentOS).
Implement auto-scaling and self-healing infrastructure.
Tune kernel, filesystems, and networking parameters.
Ensure timely security patching and compliance.
Integrate Linux systems with enterprise auth services (AD, LDAP, Kerberos).
Automation & Infrastructure as Code

Design and maintain automation tools (Terraform, Ansible, Pulumi).
Automate configuration, service rollout, and patching.
Develop backend automation in Python, Go, or Ruby.
Extend platform automation APIs and workflows.
Observability, Monitoring & Incident Response

Develop observability pipelines (Datadog, Grafana, open-source tools).
Create service-level dashboards and alerts.
Participate in 24/7 on-call rotation and incident management.
Conduct post-mortems and root cause analysis.
Multi-Cloud Platform Engineering

Manage systems across AWS, GCP, and on-prem platforms.
Architect high-availability systems with multi-region failover.
Implement backup, recovery, and DR workflows.
Support hybrid environments (VMware/vSphere, container-based platforms).
Collaboration, Standards & Enablement

Work closely with backend and DevOps teams.
Contribute to system reliability standards and documentation.
Mentor engineers on Unix system performance and debugging.

We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form. The EEOC "Know Your Rights" Poster is available here.

To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .

Required Skills & Experience

Required Skills & Experience
7+ years in SRE, Infrastructure, or Systems Engineering roles.
Extensive experience with Unix/Linux systems.
Strong debugging and optimization skills.
Experience with AWS and/or GCP.
Strong programming skills in Python and shell scripting.
Deep understanding of CI/CD workflows and GitOps practices.
Expertise with Terraform, Ansible, or similar IaC tools.
Experience with hybrid infrastructure (cloud/on-prem).
Hands-on experience with observability tools.
Ability to troubleshoot complex reliability issues.

Nice to Have Skills & Experience

Nice to Have
Experience with live game infrastructure.
Contributions to open-source tooling.
Familiarity with telemetry systems (ETL, Flink/Zookeeper, Kinesis).
Familiarity with service mesh (Linkerd, Istio) and Kubernetes-native architecture.
Experience using Datadog for monitoring and visualization.
Experience with MySQL/Postgres in RDS and bare metal installations.

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.