Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.
Role Summary
We are looking for a mid-level Python Developer with combined experience in Data Engineering andAI/NLP engineering. The candidate will build NLP pipelines using libraries such as Flair, BERT, and LLM frameworks, and will also work on large-scale data processing using PySpark, Pandas, and related data tools. The role includes developing APIs, integrating with platform services, and supporting CI/CD deployments using GitHub and LightSpeed Enterprise.
-
Develop and optimize ETL/data processing jobs using PySpark, Pandas, PyArrow, and related libraries.
-
Work with Parquet files using FastParquet or pyarrow.parquet for efficient data processing.
-
Implement data parsing and serialization using json, ujson, or orjson for high-performance JSON handling.
-
Build and maintain NLP pipelines using Flair, BERT, and LLM-based models.
-
Develop scalable ingestion and data transformation pipelines for AI and analytics use cases.
-
Build and maintain Flask-based APIs for model inference and service integrations.
-
Use regular expressions for text cleaning, parsing, and NLP preprocessing.
-
Integrate caching and fast lookups using Redis.
-
Manage and deploy ML models using MLflow for tracking and versioning.
-
Support CI/CD workflows using GitHub, LightSpeed Enterprise, and deployment pipelines.
-
Create and maintain Autosys JILs for job scheduling and automation.
-
Use basic Linux commands for troubleshooting, operations, and deployment tasks.
-
Monitor application and system health using ITRS Geneos.
-
Write unit tests and improve automation test coverage (PyTest/unittest).
-
Work with REST APIs, microservices, and basic shell scripting.
-
Work with cloud services (ECS), including boto3.
-
8+ years of hands-on Python programming experience.
-
Strong fundamentals in Python, OOP, and design patterns.
-
Experience with NLP libraries such as Flair, BERT, HuggingFace Transformers, or similar.
-
Solid experience with PySpark, Pandas, PyArrow, and distributed data pipelines.
-
Proficient in working with Parquet using FastParquet or pyarrow.parquet.
-
Familiarity with fast JSON parsing libraries (json, ujson, orjson).
-
Experience building APIs using Flask (FastAPI is a plus).
-
Experience with MLflow for model tracking and deployment.
-
Good understanding of CI/CD practices and Git workflows.
-
Experience working with Redis or similar in-memory stores.
-
Experience with Autosys JILs for job scheduling.
-
Comfortable with Linux command line and shell scripting.
-
Strong debugging, problem-solving, and teamwork skills.
-
Exposure to cloud services; AWS boto3 experience is an asset.
-
Experience with Polars or Dask for high-performance data processing.
-
Experience with PyTorch or TensorFlow for model training.
-
Experience with Docker, Kubernetes, or containerized deployments.
-
Experience with monitoring tools such as ITRS Geneos.
-
Experience with FastAPI, Airflow, or Prefect.
-
Technology
-
Applications Development
-
Full time
-
Please see the requirements listed above.
-
For complementary skills, please see above and/or contact the recruiter.
-
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.