Looking a highly skilled and visionary Lead Computer Vision AI Engineer . In this role, you will architect, build, and scale state-of-the-art computer vision systems that solve complex problems for our internal operations and external client projects. You will be supported by a robust, cross-functional engineering team—including Full-Stack .NET, Angular, React, and Node.js developers, Database Engineers, Software Architects, Frontend Designers, Flutter Mobile AppDevelopers, and other AI Engineers. Your primary focus will be designing the core vision models, multimodal pipelines, and deployment architectures, ensuring they are scalable, highly accurate, and optimized for both cloud and edge environments.
Key Responsibilities
Design and implement computer vision pipelines utilizing the latest architectures, including Vision Transformers (ViT), Vision-Language Models (VLMs), and multimodal foundation models.Build foundational vision architectures for object detection, segmentation (e.g., SAM, YOLOv8/YOLO-World), and video understanding that can be fine-tuned and deployed across various client projects with minimal friction.Multimodal Integration: Leverage Vision-Language Models (e.g., GPT-4V, Gemini, Qwen-VL, LLaVA) to bridge the gap between visual data and natural language reasoning, enabling complex visual Q&A and automated reporting. Optimize models for real-time inference across diverse hardware environments, from high-performance cloud GPUs to edge devices (NVIDIA Jetson, mobile devices) using TensorRT, ONNX, and OpenVINO. Implement robust MLOps pipelines for continuous data ingestion, model retraining, versioning, and observability to monitor model drift and performance .Required Skills-Deep expertise in Vision Transformers (ViT), CNNs, and modern detection/segmentation models (YOLOv8+, Segment Anything model).Experience with 3D computer vision, NeRFs, or spatial computing.Multimodal & VLMsHands-on experience with Vision-Language Models (VLMs) and multimodal foundation models (e.g., CLIP, LLaVA, Qwen-VL).Fine-tuning VLMs for domain-specific visual reasoning tasks.Model Optimization & Edge
Proficiency in model quantization, pruning, and deployment using ONNX, TensorRT, or OpenVINO.Experience deploying models to edge devices (NVIDIA Jetson, iOS/Android via CoreML/TFLite).Programming & FrameworksExpert-level Python, PyTorch, and OpenCV.Familiarity with C++ for high-performance inference, or C#/.NET to interface with our existing teams.Production MLOpsExperience with ML lifecycle tools (MLflow, Weights & Biases, DVC) and containerized deployments .Ability to translate complex CV concepts into actionable API contracts for engineering team.
Please submit your resume along with a portfolio, GitHub link, or case studies showcasing any advanced computer vision projects, multimodal systems, or production deployments to [email protected].
Pay: ₹387,601.08 - ₹1,591,170.07 per year
Application Question(s):
- Are you able to join immediately
Experience:
- Lead computer vision AI : 3 years (Required)
Work Location: In person