Data Science - Senior Data Scientist (Deep Learner)

Paytm -
Bengaluru, Karnataka

Apply Now

Job details

Qualifications

TensorFlow
Statistics
Doctoral degree
PyTorch
Software deployment
Data structures
Spark
Git
Research
Master's degree
SQL
AWS
Docker
Machine learning
Distributed systems
Deep learning
Data science
Graph databases
Python
Product lifecycle management

Full job description

Job ID: 839236

3 - 6 Years

1 Opening

Bengaluru

Data Scientist - Paytm Large Model (PLM)

About the Role

We're seeking Big & Deep Learners to join our pioneering team developing Paytm Large Model (PLM), a foundation model for digital payment intelligence.

You'll be at the forefront of :

Building and tweaking big transformer architectures on big data of payment and other activities (onboarding, login, order, invest, …)

Build Encoder to create embeddings that capture the DNA of payment ecosystems

Further build Decoder or Autoregressive Generative model for behavioural prediction/detection Do whatever it takes to enrich and use the model for downstream applications : IFT/DPO/RPO….

Key Responsibilities-

Design and develop large-scale foundation models for learning from payment transactions, user behavior, merchant patterns etc.- Create sophisticated encoder architecture to generate embeddings that capture nuanced relationships in financial data- Build decoder only model as well for generative downstream tasks- Build and optimize training pipelines for processing billions of payment transactions- Develop evaluation frameworks for measuring model performance across multiple financial use cases- Implement fine-tuning mechanisms for model adaptation across different payment environments- Collaborate with MLOps team for production deployment and scaling- Research and implement latest advances in foundation models and their applications to financial data- Make the model wiser everyday by enriching it through state of the art IFT, DPO, PPO ..

Required Qualifications-

Solid in 3 pillars of learning from data : Optimization, Statistics and Linear Algebra- Ph.D. or Master's degree in Deep Learning or related field- Strong track record in building and deploying foundation models, preferably on non text/image data such as time series, temporal point process or in general, numerical vector sequences- Deep understanding of transformer architectures and their variations- Expertise in PyTorch or TensorFlow for large-scale model development- Experience with distributed training systems and optimization techniques- Strong programming skills in Python and ML deployment frameworks- Must be good in data structures and algorithms- Understanding of payment systems and financial data structures

Preferred Experience-

Previous work with financial or transaction data- Experience with large encoder/decoder models and representation learning- Knowledge of privacy-preserving ML techniques- Background in self-supervised learning approaches- Familiarity with MLOps and production deployment of large models- Experience with model compression and optimization techniques- Working knowledge on Knowledge Graph representation learning using Graph Neural Network (GraphSage/GAT/Graph Transformer ..) is a plus

Technical Skills-

Languages: Python, PySpark, SQL- Frameworks: PyTorch/TensorFlow, Hugging Face- Infrastructure: Cloud platforms (AWS/GCP), distributed training systems- Tools: Git, Docker, ML experiment tracking platforms- Processing: Spark, distributed computing

Soft Skills-

Strong research orientation with practical implementation skills- Ability to translate complex technical concepts to business stakeholders- Excellence in technical writing and documentation- Collaborative mindset for cross-functional team projects

What We Offer-

Opportunity to work on cutting-edge foundation models in fintech- High-impact role shaping the future of payment intelligence- Access to large-scale computing resources and datasets- Collaborative research and development environment- Competitive compensation and benefits- Professional development and conference opportunities

Required

Mathematical Background Linear Algebra- Vector spaces and subspaces- Eigenvalues and eigenvectors- Matrix factorization techniques- High-dimensional linear transformations- Computational complexity of matrix operations

Optimization- Convex optimization theory- Gradient descent and its variants- Stochastic optimization methods- Constrained optimization- Loss function design and optimization- Learning rate scheduling strategies

Statistics & Probability- Probability distributions and their properties- Statistical inference and hypothesis testing- Bayesian statistics and modeling- Time series analysis- Sampling techniques- Information theory concepts

Deep Learning Foundations- Backpropagation and computational graphs- Neural network architectures- Regularization techniques- Normalization methods- Loss functions and their gradients- Model calibration and uncertainty

Required Transformer Architecture Knowledge

Attention Mechanisms- Self-attention : theory and implementation- Scaled dot-product attention- Key-Query-Value concept and computations- Attention masking techniques- Efficient attention variants- Sparse attention mechanisms

Multi-Head Attention- Parallel attention head design- Head projection techniques- Cross-attention mechanisms- Head pruning and importance scoring- Multi-query attention- Grouped-query attention

Positional Encoding- Absolute positional encodings- Relative positional encodings- Rotary positional encoding (RoPE)- ALiBi position encoding- T5-style relative position biases- Implementation of various positional encoding schemes

Tokenization & Embedding- Subword tokenization algorithms- Byte-Pair Encoding (BPE)- WordPiece and SentencePiece- Numerical feature tokenization- Categorical encoding strategies- Embedding table design and optimization

Architecture Specifics- Transformer blocks and their variants- Feed-forward network design- Layer normalization approaches- Residual connections- Activation functions- Architecture search and scaling laws

Specialized Knowledge for Financial Data- Time-aware attention mechanisms- Handling numerical features in transformers- Variable-length sequence processing- Missing value handling- Hierarchical attention for transaction data- Multi-modal fusion techniques

Technical Skills

Implementation- Vectorized implementation of attention- Efficient batch processing- Memory optimization techniques- CUDA optimization fundamentals- Distributed training patterns- Model parallelism strategies

Frameworks & Tools- PyTorch/TensorFlow proficiency- HuggingFace Transformers library- Distributed training frameworks- GPU programming basics- ML experiment tracking platforms- Version control for ML models

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected