Job Description
The FinOps Private Cloud Infrastructure Architect leads the end-to-end architecture, metering, and operational governance for private cloud infrastructure supporting LLM and agentic AI workloads, including GPU and accelerated compute platforms.
This role is accountable for ensuring accurate internal cloud usage metering, cost transparency, observability, and retention governance across a hybrid data center environment. The architect owns the Platform API Inventory and Collection Interval Validation Matrix across the AI ecosystem, ensuring all required telemetry is inventoried, validated, and collected at correct intervals to meet FinOps, security, reliability, auditability, and regulatory requirements.
The role also brings hands-on FinOps experience within a large financial organization and owns the per platform telemetry retention audit—a critical enabler for resilience, recovery, and warm-up operational readiness following incidents, maintenance, patching, or disaster recovery events.
________________________________________
Key Responsibilities
Private Cloud & AI Infrastructure Architecture
• Lead the architecture and governance of private cloud infrastructure supporting LLM and agentic AI platforms
• Architect and govern GPU and accelerated compute platforms, including cluster design, scheduling, capacity planning, and lifecycle management
• Design and operate infrastructure within a hybrid data center model, spanning private cloud, on prem virtualization, container platforms, storage, and network
________________________________________
FinOps & Usage Metering
• Lead the implementation of internal cloud usage metering for private cloud platforms
• Own FinOps governance for infrastructure platforms, including:
o Showback / chargeback models
o Cost allocation and unit economics
o Capacity and usage transparency
• Partner with Finance and Engineering to align infrastructure cost models with business consumption
________________________________________
Platform Telemetry & API Governance
• Own the Platform API Inventory and Collection Interval Validation Matrix
• Ensure all platform, infrastructure, observability, and cost telemetry APIs are:
o Properly inventoried
o Actively validated
o Collected at correct intervals
• Govern telemetry coverage across:
o Metrics
o Logs
o Traces
o Billing and cost data
o Capacity signals
o Model-serving and AI platform telemetry
• Ensure telemetry programs meet security, audit, risk, and reliability standards
________________________________________
Observability, Retention & Recovery Readiness
• Own per-platform telemetry retention audits, including data availability and completeness
• Ensure retention policies support:
o Incident investigation
o Compliance and audit requirements
o Capacity and cost analysis
o Warm-up recovery design, enabling rapid restoration of operational readiness after outages, upgrades, or DR events
• Partner with resilience and recovery teams to validate operational dependencies and recovery paths
________________________________________
Stakeholder Alignment & Governance
• Partner with Engineering, Platform, Finance, Risk, Security, and Operations teams
• Serve as the authoritative architectural voice for private cloud FinOps and AI infrastructure telemetry
• Communicate architectural decisions, risks, and trade-offs clearly to senior stakeholders
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
• 10+ years of experience in infrastructure architecture, platform engineering, or private cloud engineering within large-scale enterprise environments
• Demonstrated experience designing and operating hybrid data center infrastructure
• Hands-on experience with GPU platforms and accelerated compute operations
• Proven ownership of observability and telemetry programs, including:
o API inventory and validation
o Metrics, logs, and traces strategy
o Collection interval tuning
o Data quality and reliability controls
• Direct FinOps experience in a large organization, including infrastructure cost governance
• Strong understanding of:
o Resilience and recovery engineering
o Data retention strategies
o Operational readiness and warm-up dependencies
• Excellent stakeholder management and ability to influence across engineering, finance, and risk organizations
________________________________________
Preferred Qualifications
• FinOps Certified Practitioner or FinOps Certified Professional
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.