Site Reliability Engineer

Post Date

Sep 16, 2025

Location

Raleigh,
North Carolina

ZIP/Postal Code

27606

Job Type

Perm

Job Description

• General SRE Work: Ensure the reliability, automation, and operational excellence of our services.
• Cloud/Database Engineering: Collaborate closely with our DBA team to manage both relational and non-relational database technologies.
• Database Deployment Automation: Help automate the deployment of database environments.
• Migrations: Support ongoing migrations, such as Oracle to Postgres.
• Performance Troubleshooting: Troubleshoot performance issues and assist with query optimization.
• Capacity Planning: Work on capacity planning and high availability configurations.
• Technology Exposure: Gain exposure to technologies like DocumentDB, DynamoDB, Redis, and Cassandra as part of our broader platform support strategy.
Design for Reliability:
• Provide guidance and influence key decisions on systems design and production readiness.
• Validate the impact of designs on on-call responsibilities, maintenance, cost, security, developer efficiency, and deployment velocity.
• Drive consensus on technical standards, architecture, and roadmap initiatives.
• Build trust with internal stakeholders and external customers by demonstrating the implementation of SRE and security practices.
Disaster Recovery:
• Contribute to business continuity plans by demonstrating how SRE practices enhance disaster recovery (DR) strategies.
• Act as an escalation point during crises, providing clear executive-level communication and engineering direction to teams.
• Ensure DR tests are scheduled and remediation tasks are completed in a timely manner.
Incident Management:
• Analyze incident trends and make recommendations for architectural, design, or operational changes that support business investments.
• Collaborate with stakeholders to define priority levels and appropriate response times for products or services.
• Work closely with Product Owners and senior leadership to make critical decisions during major incidents.
Observability:
• Promote best practices in observability across technology and product teams, presenting internally and externally at conferences and webinars.
• Foster collaborative relationships with senior stakeholders and Product Owners to align on the value of Service Level Objectives (SLOs) and balance feature velocity with reliability.
• Provide recommendations for changes in system designs or operating models based on performance and reliability reports.
Platforms and Automation:
• Seek engineering and architecture consensus on new standard components and services to be included in the Paved Road, such as platforms and CI/CD tools.
• Advocate for the adoption, standardization, and broader contributions through inner sourcing, with a deep understanding of its benefits to the Software Development Life Cycle (SDLC).
• Champion the use of the Paved Road and work to eliminate inefficiencies and silos within the technology organization.
Reliability Culture:
• Evangelize SRE principles and practices across technology and product teams.
• Simplify complex reliability and security topics, making them accessible for a broader audience.
• Define key SRE learning topics and integrate them into a structured training strategy and learning program.
• Mentor junior SREs, providing long-term direction for their career development.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

• Experience in Site Reliability Engineering.
• Familiarity with cloud/database engineering tasks.
• Knowledge of AWS and various database technologies.
• Strong troubleshooting and query optimization skills.
• Ability to work collaboratively with cross-functional teams.
• Proven experience in Site Reliability Engineering, DevOps, or related technical fields.
• Strong understanding of system design, disaster recovery, incident management, and observability practices.
• Experience with CI/CD pipelines, automation tools, and platform engineering.
• Excellent communication skills, with the ability to engage effectively with both technical and non-technical stakeholders.
• A passion for fostering a culture of reliability and continuous improvement.

Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.