Remote Senior software engineer python Jobs – 132 Open Jobs Today

Senior Software Engineer - Model Training & AI Evals
Chegg —Remote
- Full-time
- Remote
- Laboratory experience
- Research
- Master's degree
4h
Applied AI Engineer (Automation)
Fusemachines —Remote
- Full-time
- Remote
- Azure
- Master's degree
- Databases
Firmware Engineer
Ethernovia, Inc. —Remote
- Remote
- Computer Science
- Master's degree
- C++
- Paid time off
- Flexible schedule
Quick apply
Machine Learning Engineer III
Avalara —Remote
- Remote
- Azure
- Computer Science
- Master's degree
- Health insurance
- Paid time off
Senior Site Reliability Engineer, Tenant Services: Geo
GitLab —Remote
- Weekend availability
- Remote
- AWS
- Scripting
- Communication skills
- Health insurance
- Paid time off
Sr. AWS DevOps Engineer- Kubernetes Expertise
Cloudelligent —Remote
- Full-time
- Remote
- Computer science
- DevOps
- English
Quick apply
13d
Lead Artificial Intelligence/Machine Learning Engineer
Ciklum —Remote
- Full-time
- Remote
- Master's degree
- AWS
- Machine learning
- Health insurance
2d
Senior Generative AI Engineer (India)
Codvo.ai —Remote
- Remote
- Computer Science
- Databases
- AWS
Principal Platform Engineer
Insight Software —Remote
- Full-time
- Remote
- Azure
- DevOps
- English
Principal Software Engineer, Connectors & Integrations
Avalara —Remote
- Remote
- Oracle
- Computer Science
- Java
- Health insurance
- Paid time off
Technical Support Analyst Lead
CSG —Remote
- Weekend availability
- Remote
- Oracle
- Computer Science
- English
Senior Full-Stack Web Developer (Django REST / React / Azure)
Teamficient —Remote
- Up to ₹1,500 a month
- Full-time
- Remote
- Azure
- Computer Science
- JavaScript
Quick apply
Technical Lead - Python
3Pillar —Remote
- Full-time
- Remote
- English
- AWS
- JavaScript
20h
Lead Frontend Engineer
ScaleReal Technologies —Remote
- Remote
- System design
- JavaScript
- APIs
Python Developer withMCP (Model Context Protocol)
Elfonze Technologies —Remote
- Full-time
- Remote
- System design
- SQL
- AWS
9d
Senior Recruiter – IT Services: USA & Europe
InOpTra Digital —Remote
- Full-time
- Remote
- Azure
- Oracle
- Java
Senior Golang Developer
Qube Cinema —Remote
- Full-time
- Remote
- System design
- Java
- AWS
AI Architect - IN
Rackspace —Remote
- Full-time
- Remote
- Master's degree
- AWS
- Software development
Automation Lead
OpenBots Inc. —Remote
- Remote
- Azure
- Oracle
- Computer Science
Quick apply
Team Lead/Senior Specialist - AI
BuzzBoard —Remote
- Full-time
- Remote
- Master's degree
- Databases
- APIs
Quick apply

I want to receive the latest job alerts for senior software engineer python in remote

By signing in to your account, you agree to SimplyHired's Terms of Service and consent to our Cookie and Privacy Policy.

Explore jobs in more locations

senior software engineer python jobs near remote

Senior Software Engineer - Model Training & AI Evals

Chegg -
Remote

Apply Now

Job details

Full-time
4 hours ago

Qualifications

CI/CD
Law
PyTorch
Enterprise Software
Laboratory experience
Research
Master's degree
Quality control
Continuous integration
APIs
AI
Python

Full job description

Job Description

ABOUT THE ROLE

We are looking for a Senior Engineer to join our AI team at the intersection of evaluation science, post-training, and foundation model development. You will own our end-to-end eval and benchmarking infrastructure — the critical feedback loop that drives every major model improvement — while contributing hands-on to post-training pipelines for industry-specific vertical foundation models.

This role is ideal for someone who has worked directly inside an LLM lab and understands what rigorous evaluation looks like at scale: designing the taxonomy of skills being measured, identifying failure modes, engineering synthetic data to close capability gaps, and translating eval signals into actionable training decisions.

WHAT YOU'LL DO

Evaluation & Benchmarking

Design and own task-level evaluation frameworks for LLM agents and base models, covering multi-step reasoning, tool/API use, instruction following, and domain knowledge — grounded in real user failure modes rather than off-the-shelf benchmark suites.
Build comparative benchmarking pipelines to assess leading frontier models (GPT-4o, Gemini, Claude, Llama, Mistral, etc.) against each other and against internal models, with structured analysis of where each model family fails, regresses, or excels across subjects, topics, and task types.
Produce capability gap reports that quantify performance deltas across dimensions such as subject-matter accuracy, reasoning depth, factual consistency, and refusal behaviour.
Track model version regressions across provider releases to maintain a living competitive intelligence benchmark.
Develop domain-specific benchmarks tailored to vertical use-cases (e.g., STEM tutoring, legal, finance, healthcare) — including problem taxonomy design, rubric definition, and inter-annotator agreement pipelines.
Define and drive synthetic data generation strategies to systematically address model shortcomings in specific subjects, topics, and skill areas:
Identify low-performance clusters from eval results and translate them into targeted data generation prompts and pipelines.
Design LLM-assisted pipelines for generating high-quality, diverse, and verifiable synthetic training and evaluation data at scale.
Validate synthetic data quality through auto-eval, human review, and downstream model performance lift experiments.
Build automated regression suites integrated into CI/CD workflows to detect capability degradation across fine-tuning runs and model updates.
Partner with product, curriculum, and research teams to translate eval insights into prioritized post-training and data flywheel decisions.

Post-Training & Fine-Tuning

Lead or directly contribute to SFT, RLHF, RLAIF, and DPO training runs on industry-specific vertical foundation models — from dataset design through training execution and eval-gated release.
Curate and engineer high-quality instruction-tuning and preference datasets for domain adaptation, with hands-on experience distinguishing signal from noise in annotation pipelines.
Define data quality criteria, rejection sampling strategies, and deduplication pipelines for SFT corpora.
Design preference pair construction methodologies and reward model training setups grounded in domain-specific quality rubrics.
Implement and experiment with alignment techniques including reward modelling, process reward models (PRMs), and constitutional/RLAIF approaches.
Run ablation studies and controlled experiments to attribute model behaviour changes to specific data or training interventions — not just report final numbers.
Contribute to continual pre-training and domain-adaptive fine-tuning pipelines for vertical models, including domain data sourcing, mixing strategies, and curriculum design.

Infrastructure & Tooling

Build scalable eval pipelines that run automatically on every training checkpoint and integrate into CI/CD for continuous model quality tracking.
Maintain model cards, eval leaderboards, and internal dashboards providing visibility across experiments for both technical and non-technical stakeholders.
Ensure reproducibility through rigorous experiment tracking (W&B, MLflow, or equivalent), versioned datasets, and documented training configs.

WHO YOU ARE

Required

5+ years of ML/AI engineering experience, with at least 2–3 years focused on large language models.
Lab pedigree: Direct, hands-on experience at an LLM lab, AI research organization, or equivalent frontier AI team — you have shipped models, not just called APIs.
Familiarity with the full model lifecycle: pre-training data, post-training alignment, eval, and production deployment.
Deep practical expertise in post-training methods:
SFT, RLHF, RLAIF, DPO, PPO — from dataset construction through training and eval-gated release.
Experience with reward modeling, preference data curation, and quality control for alignment pipelines.
Demonstrated experience designing LLM evaluation frameworks beyond standard benchmarks — including task-level evals for agentic or multi-step workflows.
Hands-on experience building synthetic data generation pipelines to address specific model capability gaps:
Designing targeted generation prompts based on eval failure analysis.
Validating synthetic data quality through downstream model performance experiments.
Proven track record of comparative benchmarking across leading foundation models, with structured analysis of capability shortcomings by subject, skill, or task type.
Experience training or fine-tuning vertical/industry-specific foundation models — domain data curation, continual pre-training, or domain-adaptive SFT.
Strong software engineering fundamentals: Python, PyTorch or JAX, distributed training

Preferred

Publications or applied research contributions in LLM evaluation, alignment, or post-training.
Experience with multi-modal models or agents with external tool/API use.
Exposure to red-teaming, adversarial evaluation, or safety benchmarking.
Model distillation, speculative decoding, or inference optimization experience.
Prior experience in an education, STEM, legal, biomedical, or enterprise software vertical.

WHAT SUCCESS LOOKS LIKE

30 Days

Fully onboarded into training infra and eval repos. Running existing benchmarks end-to-end and producing a written gap analysis identifying missing coverage.

60 Days

Shipped at least one new domain-specific benchmark and one synthetic data generation pipeline addressing a known model gap. CI-integrated eval running on every checkpoint.

3 Months

Standardize model evaluation framework for foundation models. Own golden dataset strategies for fine-tuning with measurable subject-accuracy gains

6 Months

Recognized internally as the authority on model quality and competitive benchmarking. Eval insights are directly driving roadmap prioritization.

Why do we exist?

Students are working harder than ever before to stabilize their future. Our recent research study called State of the Student shows that nearly 3 out of 4 students are working to support themselves through college and 1 in 3 students feel pressure to spend more than they can afford. We founded our business on provided affordable textbook rental options to address these issues. Since then, we’ve expanded our offerings to supplement many facets of higher educational learning through Chegg Study, Chegg Math, Chegg Writing, Chegg Internships, Chegg Skills, and more to support students beyond their college experience. These offerings lower financial concerns for students by modernizing their learning experience. We exist so students everywhere have a smarter, faster, more affordable way to student.

About Us

What is Chegg?

An ‘always on’ digital learning platform.

Chegg puts students first…Everything we build in this company is student-focused, making us the leading student-first connected learning platform. Chegg strives to improve the overall return on investment in education by helping students learn more in less time and at a lower cost. This is achieved by providing students a multitude of educational tools from affordable textbook rentals to Chegg Study which supplements their learning through 24/7 tutor access, step-by-step help with questions, and more.

Chegg is a publicly-held company based in Santa Clara, California and trades on the NYSE under the symbol CHGG.

Apply Now

Refine Your Search

senior software engineer python jobs in remote

Senior Software Engineer - Model Training & AI Evals

Applied AI Engineer (Automation)

Firmware Engineer

Machine Learning Engineer III

Senior Site Reliability Engineer, Tenant Services: Geo

Sr. AWS DevOps Engineer- Kubernetes Expertise

Lead Artificial Intelligence/Machine Learning Engineer

Senior Generative AI Engineer (India)

Principal Platform Engineer

Principal Software Engineer, Connectors & Integrations

Technical Support Analyst Lead

Senior Full-Stack Web Developer (Django REST / React / Azure)

Technical Lead - Python

Lead Frontend Engineer

Python Developer withMCP (Model Context Protocol)

Senior Recruiter – IT Services: USA & Europe

Senior Golang Developer

AI Architect - IN

Automation Lead

Team Lead/Senior Specialist - AI

I want to receive the latest job alerts for senior software engineer python in remote

Related Searches

Explore jobs in more locations

About Us

What is Chegg?

Jobseeker tools

Employer Tools

Browse

Stay Connected