Job Description
We are seeking an experienced Site Reliability Engineer (SRE) to support a large-scale enterprise platform that powers microservices across multiple global business divisions. The platform, built on Azure PaaS components with a .NET/C# backend, serves a growing number of products and consumers.
This SRE will serve as the first point of contact for production issues during US business hours, responsible for investigating incidents, diagnosing whether issues are infrastructure- or code-related, configuring and maintaining monitoring/alerting, and ensuring platform resiliency as adoption scales. The role requires hands-on knowledge of Azure PaaS services, application-level debugging, and the ability to engage development teams when deeper code-level resolution is needed.
The SRE will join a growing reliability team and work closely with platform engineering, development, and infrastructure teams. The platform is not fully containerized and relies heavily on Azure PaaS components, though Kubernetes workloads are part of the broader ecosystem. This individual will report to the SRE team lead and play a critical role in ensuring US-timezone coverage and support continuity.
We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form. The EEOC "Know Your Rights" Poster is available here.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
5–7 years of professional experience in a reliability, platform operations, or SRE-focused role.
Strong hands-on experience with Azure PaaS services, including:
Azure App Services (scaling, health mechanisms, plan configurations)
Azure SQL Database
Cosmos DB
Azure Data Factory
Application Insights (monitoring, alerting, configuration)
Proficiency in .NET, C#, and Web APIs — must be able to read and investigate code to determine whether issues are configuration-related or development-related.
Experience with monitoring, alerting, and observability tooling — ability to configure alerts, set up scale-out/scale-in policies, and build resiliency into platform services.
Understanding of microservices architecture — must be able to understand what individual services do (e.g., user service, contact service) and how they interact.
Incident response and triage experience — comfortable being the first point of contact for production issues, investigating App Service behavior, API health, and infrastructure status, and escalating to development teams as needed.
Strong communication skills — ability to collaborate across SRE, platform engineering, infrastructure, and development teams.
Nice to Have Skills & Experience
AWS experience — the platform is expanding to AWS; prior knowledge of AWS services would be beneficial as the team gears up for multi-cloud support.
Prior SRE experience or development-to-SRE career path — existing SRE team members transitioned from development roles; a similar background would be ideal.
Experience with Elastic (ELK Stack) for logging and observability.
Kubernetes / container orchestration familiarity — while the platform is not fully containerized, some workloads run on Kubernetes clusters and exposure would be helpful.
Experience supporting enterprise-scale platforms with high adoption across multiple business divisions or product lines.
Exposure to SRE process setup, including runbooks, on-call rotations, and incident management frameworks.
Interest or experience in innovative tooling such as AI agents for monitoring or automated remediation.
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.