Job Description
We're seeking a passionate Site Reliability Engineer to join our team, focused on optimizing and scaling systems to boost developer velocity and time-to-market. You'll be customizing and building OpenSearch dashboards to display critical metrics like user login types, successful/unsuccessful login attempts, MFA abandonment rates, and hit counts by market, region, and product (e.g., checking, savings, banking). You'll share knowledge and build tools/frameworks to empower engineering teams. Familiarity with logging frameworks like Logback or Log4j (including appenders) is critical to enhance logging strategies. Java knowledge is a plus to contribute to logging and system enhancements. If you're driven to poke holes, build robust solutions, and thrive in a cloud-agnostic environment, we'd love to hear from you!
Compensation:
$60-70/hr
Exact compensation may vary based on several factors, including skills, experience, and education.
Employees in this role will enjoy a comprehensive benefits package starting on day one of employment, including options for medical, dental, and vision insurance. Eligibility to enroll in the 401(k) retirement plan begins after 90 days of employment. Additionally, employees in this role will have access to paid sick leave and other paid time off benefits as required under the applicable law of the worksite location.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
5+ years' experience as a Site Reliability Engineer or similar
Strong knowledge of monitoring and logging tools (e.g., Splunk, Grafana, Prometheus or Datadog).
Experience with OpenSearch and creating OpenSearch dashboards.
If no experience with OpenSearch, must have expertise in with ELK stack (Elasticsearch, Logstash, Kibana).
Experience with Java, Scala or Kotlin language to write logging for Engineers or change code around observability.
Experience with OpenTelemetry or other tracing frameworks (i.e. Spring actuator, Dynatrace)
Logging experience with frameworks like Logback, Log4J, or similar.
Excellent problem-solving skills and attention to detail.
Strong communication skills and ability to work collaboratively in a team environment.
Nice to Have Skills & Experience
Bachelor's degree in Computer Science, Engineering, or a related field.
Experience with GitHub Actions.
Experience with data lakes like GCP's BigQuery.
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.