Network Infrastructure Engineer

Post Date

Feb 19, 2026

Location

Milpitas,
California

ZIP/Postal Code

95035
US
Apr 25, 2026 Insight Global

Job Type

Contract

Category

Network Engineer

Req #

AUS-6ec3d087-619e-49e6-8ccd-b9d0cbff36cc

Pay Rate

$54 - $68 (hourly estimate)

Job Description

We’re looking for a hands-on Network Observability & Operations Engineer to own the ongoing health, visibility, and operational stability of our data center network fabric. You will design and implement end-to-end telemetry, advanced monitoring, and health dashboards; build alerting for fiber/optic degradation and link state changes; and establish operational workflows that keep the fabric stable after implementation handoff. This role is critical in ensuring seamless interoperability across compute, storage, and network domains.

Network Context
-Architecture: Leaf–spine–style fabric (currently VLAN-based segmentation)
-Traffic Mix:
--Low-speed control/management networks
--High-speed data acquisition paths
-Roadmap Considerations:
--RDMA / RoCE not currently used, but considered for the future
--Potential evolution toward an L3 routed fabric at larger scale (beta phase)
-Vendor/Tech Stack: Includes Spectrum‑X telemetry and advanced QoS policy constructs

What You’ll Do
-Build Post-Handoff Observability
--Design and own network telemetry, advanced monitoring, and health dashboards for day‑2+ operations
--Define and mature SLOs/SLIs for fabric health, latency, and loss
-End-to-End Monitoring & Alerting
--Detect and alert on fiber/optic degradation, port failures, link state changes
--Incorporate environmental signals (e.g., construction activity, cable damage) into risk/impact models
--Integrate and operationalize Spectrum‑X telemetry
-Operational Excellence
--Establish workflows from detection → alerting → triage → remediation
--Create runbooks, playbooks, and auto-remediation where appropriate
--Implement and refine advanced QoS policies, including class-based and lossless queues (where applicable)
-Cross-Domain Troubleshooting
--Serve as the escalation point for integration issues across compute, storage, and network
--Collaborate with platform/service teams to isolate bottlenecks and reduce MTTR
-Stability & Scale
--Maintain fabric stability after the professional services (PS) team exits
--Contribute to design validation and readiness for a future L3 routed fabric at scale (beta → GA)

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

-5+ years in network operations/engineering focused on data center leaf–spine or similar high-availability fabrics
-Observability Expertise: Building/operating telemetry pipelines, time-series monitoring, and health dashboards
-Protocols & Platforms: Strong grasp of Ethernet, VLANs, MLAG/VPC, LACP, link/optic health, QoS models; familiarity with EVPN/VXLAN and L3 routing (BGP) a plus
-Tooling: Hands-on with at least a few of: Prometheus, Grafana, InfluxDB, OpenTelemetry, Telegraf, Elasticsearch/Logstash/Kibana (ELK), Splunk, Kafka, NetFlow/IPFIX, sFlow, vendor SDKs/telemetry frameworks (e.g., gNMI, OpenConfig)
-Automation: Proficiency in Python or Go for data processing and automation; experience with Ansible/Terraform preferred
-Diagnostics: Skilled with packet capture/flow analysis, optics diagnostics, link flap analysis, and path tracing
-Mindset: Bias for operational rigor, clear documentation, and collaborative incident response

Nice to Have Skills & Experience

-Exposure to Spectrum‑X telemetry
-Experience with RDMA/RoCE environments or lossless Ethernet QoS
-Prior work implementing EVPN/VXLAN and migrating from L2 to L3 routed fabrics
-Experience integrating environmental telemetry (vibration/construction sensors, OT systems)
-Knowledge of SRE practices (SLO/SLI, error budgets, postmortems)

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.