MLOps Engineer (On-Premise)

Job details

Key Focus & Responsibilities:

* Set up, manage, and scale GPU cluster orchestration using Kubernetes or Slurm.

* Implement high-throughput inference serving frameworks (such as vLLM or SGLang) for continuous batch processing.

* Architect and manage model versioning, pipeline monitoring, and local logging infrastructure.

* Build and maintain secure CI/CD pipelines optimized specifically for a strict, fully air-gapped, on-premise network environment.

Requirements:

* Solid experience managing high-end GPU infrastructures and multi-node systems.

* Proficiency with containerization (Docker, Kubernetes) and cluster management tools.

* Hands-on experience optimizing models for efficient inference serving (vLLM, TensorRT-LLM, etc.).

* Ability to work without cloud reliance (AWS/GCP/Azure) in an air-gapped environment.

What We Offer:

* Hands-on environment with cutting-edge, local multi-node GPU infrastructure.

* Competitive salary

Pay: ₹25,000.00 - ₹30,000.00 per month

Benefits:

Work Location: In person