Key Focus & Responsibilities:
* Set up, manage, and scale GPU cluster orchestration using Kubernetes or Slurm.
* Implement high-throughput inference serving frameworks (such as vLLM or SGLang) for continuous batch processing.
* Architect and manage model versioning, pipeline monitoring, and local logging infrastructure.
* Build and maintain secure CI/CD pipelines optimized specifically for a strict, fully air-gapped, on-premise network environment.
Requirements:
* Solid experience managing high-end GPU infrastructures and multi-node systems.
* Proficiency with containerization (Docker, Kubernetes) and cluster management tools.
* Hands-on experience optimizing models for efficient inference serving (vLLM, TensorRT-LLM, etc.).
* Ability to work without cloud reliance (AWS/GCP/Azure) in an air-gapped environment.
What We Offer:
* Hands-on environment with cutting-edge, local multi-node GPU infrastructure.
* Competitive salary
Pay: ₹25,000.00 - ₹30,000.00 per month
Benefits:
- Commuter assistance
- Food provided
Work Location: In person