Job Description
Insight Global is looking for a Site Reliability Engineer (SRE) to work for their client on a project that will be working with their Container Organization Platform. This person will be joining a team of 15-20 engineers supporting 90+ AWS accounts.
* Collaborate with the engineering team on projects as the expert on reliability, performance, and efficiency.
* Automate away the process of managing capacity, safely deploy software, and mitigate system failures.
* Design and develop tools and integrations.
* Collaborate with team to review code and provide feedback
* Ability to root cause / troubleshoot issues in a fast paced environment, and implement solutions to prevent them from happening again
* Participate in on call 24*7*365 rotation to respond to alerts or outages - 1 week per month (4 other people supporting this right now as well).
* Look for areas to improve: remove bottlenecks, eliminate waste, improve performance, and reduce costs.
Required Skills & Experience
* Previous experience in an SRE or related role: DevOps, platform engineering, software engineering
* CS Degree (or related field) and/or a demonstrable, solid understanding of CS fundamentals.
* Proficient coder: strong with at least one programming language. (Python, Golang, or Java a plus)
* Deep understanding of Linux system internals / OS fundamentals.
* Experience with distributed / highly available systems architecture, theory and practice.
* Understanding of container and orchestration tools like Docker, kubernetes, mesos, etc [we run EKS]
Experience with an infrastructure-as-code tool (terraform, CloudFormation, etc) [tf preferred]
Nice to Have Skills & Experience
* Previous experience building and maintaining production systems in the cloud (AWS preferred)
* Knowledge of security best practices operating in the cloud
* Experience with configuratioon management tools like Chef, Puppet, or Ansible
* Working knowledge of networking and common internet protocols (http, ssl, dns, tcp/ip)
Previous experience working on production, user facing internet applications at scale
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.