Senior Data Scientist

RealPage, Inc. -
Ranga Reddy District, Telangana

Apply Now

Job details

14 days ago

Qualifications

TensorFlow
Relational databases
Data structures
NumPy
Databases
SQL
Pandas
Data management
PostgreSQL
Natural language processing
APIs
Flask
Communication skills
Python
Master data management

Full job description

Overview:

We are looking for an end-to-end Data Scientist to design, build, and maintain ML-powered systems that solve core data quality and classification problems across the business. You will own the full lifecycle — from exploratory analysis and feature engineering through model training, deployment, and ongoing performance monitoring. The work spans entity resolution (identifying duplicate records across large datasets) and multi-class classification models that drive decision-making across a variety of business domains.

Responsibilities:

What You'll Do

Own the end-to-end model lifecycle: problem framing, data exploration, feature engineering, model training, evaluation, deployment, and monitoring

Build and maintain entity resolution systems that detect duplicate records using supervised ML and string similarity techniques
Develop classification models that categorize unstructured or semi-structured data into meaningful business categories
Engineer features from messy, real-world text data — names, addresses, free-text fields — using string matching algorithms, phonetic encoding, n-grams, and other NLP techniques
Design candidate retrieval and indexing strategies to make models performant at scale
Tune thresholds, scoring logic, and rule-based overrides to balance precision and recall for production use cases
Maintain production model artifacts and data pipelines, ensuring models stay current as underlying data evolves
Collaborate with engineering and product teams to understand requirements and translate business problems into well-scoped modeling tasks

Qualifications:

10+ years of experience building and deploying ML models end-to-end (not just notebooks)
Strong Python skills — pandas, NumPy, scikit-learn, XGBoost or similar gradient boosting frameworks
Hands-on experience with record linkage, entity resolution, or deduplication problems
Experience building classification models (binary and multi-class) on structured and semi-structured data
Deep familiarity with string similarity algorithms: edit distance, sequence matching, phonetic encoding, shingling
Strong feature engineering instincts — ability to extract signal from noisy, inconsistently formatted data
Comfort working with large serialized data structures and understanding memory/performance tradeoffs in production contexts
Experience with SQL and relational databases (PostgreSQL or similar)
Clear communication skills — ability to explain model behavior and tradeoffs to non-technical stakeholders

Nice to Have

Experience with blocking and indexing strategies for scalable record linkage
Background in NLP, text normalization, or information extraction
Familiarity with model serving in API contexts (Flask, FastAPI, or similar)
Experience in data quality, master data management, or marketplace domains
Exposure to deep learning frameworks (PyTorch, TensorFlow) for text classification

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected