Who Can Apply
- Candidates must be legally authorized to work in Canada
Job Description
Insight Global is looking for a Senior Software Engineer (AI Platform) to join an enterprise AAA game company hybrid out of Vancouver BC on a permanent basis. The Senior Software Engineer will help build and operate a centralized AI platform used across multiple AAA game franchises. This role is focused on production systems—from scratch through deployment and long term operation—not research or application level feature development. The AI Platform team provides shared infrastructure, tooling, and services that enable teams to train, deploy, monitor, and operate AI and Generative AI models at scale. Your work will be actively used by internal teams, powering live services, analytics, personalization, and player facing AI capabilities. Key responsibilities include:
• Design, build, and operate production AI platforms supporting the full ML lifecycle: data ingestion, training, validation, deployment, monitoring, and iteration
• Own systems end to end—from initial architecture to deployment, observability, optimization, and long term maintenance in a live environment
• Deploy and host ML models as services (cloud or bare metal with GPUs), ensuring reliability, performance, and operational readiness
• Build containerized, scalable infrastructure using Docker and Kubernetes (or ECS/GKE), including cluster management, autoscaling, and resource optimization
• Implement MLOps workflows: CI/CD, model versioning, rollout/rollback strategies, metrics, alerts, and drift detection
• Provision and manage cloud infrastructure using Terraform, with hands on knowledge of core cloud components (compute, load balancing, storage, IAM/policies)
• Partner with game engineers, ML engineers, and data teams to translate real production needs into reusable platform capabilities
• Establish standards for reliability, observability, cost efficiency, and security across the platform
• Mentor other engineers, reviewing architecture and guiding best practices for production ML systems
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
• 5+ years of professional experience building production systems that are actively in use (e.g., processing real traffic, supporting live users, or powering internal teams)
• Demonstrated ownership of systems from scratch through deployment and ongoing operation (monitoring, metrics, alerts, iteration etc.)
• Hands on experience with ML model development or fine tuning (e.g., deep learning models, LLMs, diffusion models) and improving model performance or efficiency
• Experience deploying ML models to production (cloud or bare metal), not just local experimentation
• Solid experience with:
o Docker and Kubernetes (or ECS/GKE)
o Cloud platforms (AWS, GCP, or Azure)
o Infrastructure as Code (Terraform preferred)
• Proven ability to describe concrete outcomes: what you built, how it was used, what improved (performance, cost, accuracy, reliability, scale etc.)
Nice to Have Skills & Experience
• Experience with live service games, real time systems, or high throughput platforms (e.g., recommendations, search, fraud detection)
• Experience building internal, multi tenant platforms or shared services
• Exposure to Generative AI in production environments (e.g., content generation, agents, personalization)
• Understanding of latency and performance trade offs in large scale systems
• Experience enabling other teams to adopt platforms successfully (documentation, tooling, change management)
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.