Site Reliability Engineer

Post Date

Sep 12, 2025

Location

Plano,
Texas

ZIP/Postal Code

75024
US
Dec 20, 2025 Insight Global

Job Type

Contract

Category

Programmer / Developer

Req #

DAL-ccb13d39-253f-4e95-a731-8e0dc01bd42d

Pay Rate

$47 - $59 (hourly estimate)

Job Description

-Contribute to developing and implementing policies, tools, and initiatives that improve platform health and developer productivity.
-Infrastructure monitoring: measure, analyze, regularly assess and improve the reliability of core infrastructure components (networking equipment, compute, databases, caching layers) with emphasis on redundancy, fault tolerance, and scalable failover strategies.
-Participate in setting service level objectives (SLOs), RPO/RTO; implement capabilities (backup/restore procedures) to meet them; develop and conduct regular exercises to validate recovery procedures.
-Ensure robust backup/restore procedures: perform regular backup validation, and protect critical data across regions and environments.
-Forecast growth, model failure domains; ensure capacity buffers and scalable architectures to avoid single points of failure or component failures.
-Maintain and improve the reliability, availability, and performance of production services, with a focus on reducing incident frequency and recovery/restoration time.
-Design, implement, and operate monitoring, alerting, logging, and tracing solutions to provide end-to-end visibility of systems and dependencies.
-Respond to and resolve production incidents, participate in post-incident reviews, and help implement corrective actions.
-Build and maintain runbooks, standard operating procedures, and automation to reduce toil in common operations tasks.
-Collaborate with software engineers to optimize code for reliability, scalability, and resilience; assist with capacity planning and performance tuning.
-Implement modern CI/CD pipelines; deployment strategies including blue/green/canary releases; patterns to ensure safe rollouts of software delivery.
-Manage infrastructure as code with provisioning, scaling, and maintaining cloud environments.
-Enforce security and compliance best practices in the production environment including access controls, secrets management.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

-7 years of experience in DevOps or a related field.
-Strong Linux/Unix administration skills and proficiency in at least one scripting language (e.g., Python, Bash).
-Experience with cloud platforms, containers, and orchestration (AWS/Azure/GCP, Docker/Kubernetes).
-Experience with containerization [Docker] and container orchestration [Kubernetes].
-Experience with monitoring/observability tools [Prometheus, Grafana, ELK/EFK, OpenTelemetry].
-Solid understanding of incident management processes, on-call practices, post-mortem analysis.
-Knowledge of CI/CD concepts and tooling [e.g., Jenkins, GitHub Actions, GitLab CI] and automation scripting.
-Strong problem-solving, debugging, communication skills; ability to work in a collaborative cross-functional environment.

Nice to Have Skills & Experience

- Observability experience (logging and monitoring)
-Any AI tools background
-Bachelor’s degree in Information Technology, Computer Science, or a related field (or equivalent practical experience).
-Possession of IT service management certifications (ITIL Foundation or equivalent).
-Possession of government security clearances or experience in regulated environments is preferred.

Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.