Role Overview
We are looking for a senior, hands-on platform engineer to design and build cloud-like GPU and data platforms in on-prem and hybrid environments, with a strong focus on performance, scale, and autonomy.
Key Responsibilities
GPU-as-a-Service / GPU Virtualization
-
Design and build GPUaaS platforms for AI/ML and data workloads
-
Implement GPU orchestration and scheduling using Kubernetes
-
Enable GPU sharing, isolation, and quota management using technologies such as MIG, vGPU, or device plugins
-
Optimize GPU utilization, performance, and cost efficiency
-
Integrate GPU platforms with multi-tenant access control, billing, and observability
Required Skills & Experience
8+ years of experience in infrastructure, platform, or systems engineering
-
Proven experience building GPU virtualization or GPUaaS platforms
-
Strong understanding of Linux internals, K8 networking, and storage systems like CEPH
-
Experience with Kubernetes and containerized workloads
-
Proficiency in automation and scripting (Python, Bash, Go, or similar)
-
Experience building automation or autonomous systems that interact with real infrastructure
Preferred Skills
-
Experience with NVIDIA GPU technologies (MIG, vGPU, CUDA, DCGM)
-
Experience with Kubernetes GPU device plugins and scheduling
-
Experience with distributed storage systems (Ceph, object storage, etc.)
-
Experience building multi-tenant platforms with strong isolation and security using CNIs like Cilium, Calico