Experience Required: 3-8 Years
No. of vacancies: 2
Job Type: Full Time
Vacancy Role: WFO
Job Category: Development
Job Description
We are seeking a highly skilled data scientist with strong expertise in Computer Vision and Generative AI to join our AI team. The ideal candidate will have hands-on experience developing, fine-tuning, and deploying state-of-the-art vision and diffusion models for real-world applications. You will work on advanced image understanding, segmentation, object detection, depth estimation, image generation, and image editing systems.
Roles & Responsibilities
- Design, train, fine-tune, and deploy computer vision and generative AI models.
- Develop solutions for object detection, segmentation, depth estimation, image inpainting, and virtual staging applications.
- Build and optimize end-to-end pipelines for image understanding and image generation tasks.
- Evaluate model performance using appropriate metrics and implement improvements.
- Create and maintain data annotation, training, validation, and testing workflows.
- Work closely with engineering teams to productionize AI models and services.
- Research and implement the latest advancements in computer vision, diffusion models, and multimodal AI systems.
- Optimize models for inference speed, memory consumption, and scalability.
- Develop robust APIs and model-serving solutions for production environments.
- Document experiments, model architectures, and deployment processes.
Qualifications
- 3+ years of hands-on experience in Machine Learning, Deep Learning, Computer Vision, and Generative AI.
- Proven experience developing, optimizing, and deploying production-grade AI solutions.
- Strong expertise in computer vision models including RF-DETR, DETR variants, YOLO family, Faster R-CNN, Mask2Former, Segment Anything Model (SAM), semantic segmentation, instance segmentation, Depth Anything/Depth Anything V2, and monocular depth estimation.
- Hands-on experience with generative AI and diffusion models such as Stable Diffusion XL (SDXL), ControlNet, image inpainting/outpainting, image-to-image pipelines, LoRA training and fine-tuning, and Hugging Face Diffusers.
- Strong understanding of CNNs, Transformers, Vision Transformers (ViTs), attention mechanisms, and modern deep learning architectures.
- Advanced proficiency in PyTorch, model training, fine-tuning, hyperparameter optimization, and performance evaluation using metrics such as mAP, IoU, Precision, Recall, and F1 Score.
- Strong Python programming skills with experience in FastAPI, Flask, or similar backend frameworks.
- Experience with Docker, containerized deployments, Linux environments, Git, and collaborative development workflows.
- Familiarity with cloud platforms such as AWS, GCP, Azure, or RunPod.
- Experience in dataset preparation, augmentation, annotation, and quality control using tools such as CVAT, Label Studio, Roboflow, or similar platforms.
- Knowledge of multimodal AI systems, vision-language models (VLMs), MLOps practices, CI/CD pipelines, distributed training, and GPU optimization.
- Familiarity with OpenCV, image processing techniques, and synthetic data generation workflows.
- Experience working on projects involving virtual staging, furniture detection and removal, empty room generation, medical image segmentation, industrial inspection systems, depth-aware image editing, real estate AI solutions, or multi-model vision pipelines.
- Strong analytical thinking, problem-solving, research capabilities, and the ability to independently implement emerging AI technologies.
- Excellent communication, collaboration, and technical documentation skills.
Pay: ₹700,000.00 - ₹2,000,000.00 per year
Benefits:
- Flexible schedule
- Leave encashment
- Provident Fund
Work Location: In person