Platform Performance & Stability: Optimize, and maintain OpenShift platform infrastructure to ensure maximum uptime and optimal performance
Infrastructure Automation: Develop and maintain comprehensive Ansible playbooks to implement platform enhancements and ensure robust disaster recovery capabilities
Monitoring Enhancements: Partner with the Observability team to refine monitoring solutions, implement predictive alerting, and continuously improve platform visibility
Application Infrastructure Support: Provide expert platform-level troubleshooting assistance to the SRE team for complex application issues
Enterprise Update Management: Lead the planning and execution of bi-annual OpenShift platform updates with minimal service disruption across all environments
Cross-Team Change Coordination: Serve as platform ambassador in change management processes, aligning OpenShift modifications with application team requirements
Process Standardization: Engineer repeatable, automated workflows using Ansible Automation Platform to minimize manual intervention
Technical Knowledge Management: Develop and maintain comprehensive technical documentation that enables efficient platform operations
Vendor Relationship Management: Effectively manage vendor support relationships, drive issue resolution, and influence platform roadmaps
Platform Evolution: Lead implementation projects for new platform-level applications and utilities that enhance OpenShift capabilities
Operator Lifecycle Management: Oversee the deployment, configuration, and updates of OpenShift operators to extend platform functionality
Technology Adoption Planning: Research and evaluate emerging OpenShift features and capabilities to drive continuous platform improvement
Incident Resolution: Own the full lifecycle of platform-related incidents in ServiceNow, ensuring efficient response and comprehensive documentation
Resource Utilization Tracking: Maintain accurate time tracking for projects, incidents, and administrative tasks using ServiceNow
Oncall Rotation: Participate in every 5th week on-call rotation with Unix team, delivering rapid response and resolution for Priority 1 incidents
We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to
Human Resources Request Form. The EEOC "Know Your Rights" Poster is available
here.
To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy:
https://insightglobal.com/workforce-privacy-policy/ .
-3+ years experience RedHat OpenShift and Kubernetes including familiarity with troubleshooting, upgrade etc
-Strong focus on automation including tools including Jenkins, Ansible, and Github.
-Experience with Nutanix hardware and AHV hypervisor a plus.
-Experience with Amazon AWS or Microsoft Azure a plus.
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.