Job Description
Key Responsibilities
• Define standards for monitoring the reliability, availability, maintainability and performance of sponsor-owned and operated systems.
• Design and architect operational solutions for managing applications and infrastructure.
• Drive service acceptance by adopting new processes into operations and developing new monitoring for exposure of risks and automating against repeatable actions.
• Partner with service and product owners to establish key performance indicators to identify trends and achieve better outcomes.
• Provide deep troubleshooting for production issues.
• Engage with service owners to maximize a team’s ability to identify and remediate root cause performance issues quickly ensuring rapid service interruption recovery.
• Build and/or use tools to correlate disparate data sets in an efficient and automated way to help teams quickly identify the root-cause to issues and to understand how different problems relate to each other.
• Coordinate with the sponsor to support major incidents, large-scale deployments and SecOps user support.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
• Working knowledge of K8s, Docker, Helm and automated deployment via pipeline (e.g. Concourse or Jenkins)
• Familiarity with distributed control systems such as Git
• Experience with AWS cloud services
• Experience with setting up monitoring and observability solutions across sponsor owned systems, tools and data feeds
• Proficient in scripting with Python and Java
• Willingness to work onsite full time
• Ability and willingness to share on-call responsibilities
• Advanced knowledge of Unix/Linux systems, with high comfort level at the command line
• Experience with other cloud services providers beyond AWS
• Experience with CloudWatch or other monitoring tools inside of AWS
• Familiarity with Prometheus/Grafana or other monitoring tools for ETL feeds, APIs, servers, C2S servies, networks and AI/ML capabilities
• Good understanding of networking fundamentals
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.