Job Description
This role sits at the core of a global banking platform responsible for real-time movement of money across critical payment systems. You’ll own the reliability, performance, and scalability of systems that process millions of transactions daily, ensuring they operate with near-zero downtime and strict latency requirements.
This is not a traditional DevOps or support role — this is a production ownership role, where you are responsible for how systems behave in live environments and ensuring they remain resilient under constant load and change.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
• Deep experience with production systems at scale (not just building — owning in production)
• Strong background in distributed systems / microservices architecture
• Expertise in incident management + complex troubleshooting
• Experience with observability tools (Splunk, Dynatrace, Prometheus, Grafana, etc.)
• Strong automation mindset (Python, Bash, Terraform, CI/CD)
• Experience with cloud platforms (AWS/Azure) + containerization (Kubernetes)
• Proven ability to operate in high-pressure, real-time environments
• Financial services / payments experience strongly preferred
Nice to Have Skills & Experience
• Experience defining SRE frameworks (SLOs, error budgets, reliability strategy)
• Exposure to payment systems (ACH, Zelle, card processing, trading platforms)
• Experience improving system resiliency at enterprise scale
• Ability to influence engineering standards and reliability practices org-wide
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.