At Avolta (SIX: AVOL), our people are at the driving force behind our success. With a team of over 76,000 individuals representing more than 150 nationalities, we are a truly global company driven by passion, innovation, and excellence.
Born from the combination of Dufry and Autogrill, Avolta is redefining the travel experience through the dedication and expertise of our diverse workforce. Across 73 countries and 1,000 locations, our teams bring energy, creativity, and commitment to delivering world-class travel retail and food & beverage experiences.
We operate across multiple channels - including airports, motorways, cruise ships, ports, railways, and more - offering endless opportunities for collaboration and growth. Our people are empowered to make an impact, supported by a culture that values teamwork, development, and innovation.
Sustainability and social responsibility are embedded in our strategy, ensuring we grow in a way that benefits both our employees and the communities we serve.
Are you looking for a dynamic, international career where your contributions truly matter? Join Avolta and be part of a team that’s shaping the future of travel - together.
Avolta is scaling a competitive intelligence platform to monitor pricing and assortment data across 1,000+ retail competitors globally. As a Backend Engineer in the Bangalore scraping team, you will design, build and maintain the web data extraction systems that power this intelligence. This is a hands-on engineering role: you will spend the majority of your time writing Python, debugging complex scraping failures, reverse-engineering websites, and improving the reliability of existing data pipelines. You will work closely with the Lead Scraping Engineer to implement architectural patterns and with the QA/Ops analyst to validate the data you produce.
This role requires genuine, production-grade experience with web scraping — not hobbyist scripts, but robust systems that run daily against real-world sites with anti-bot protections, JavaScript rendering, session management and frequent structural changes.
DAY-TO-DAY RESPONSIBILITIES
Scraper Development (Primary — 60% of time)
- Implement new scrapers for assigned competitor websites using the team's standard Scrapy-based framework, with Playwright or Selenium for JavaScript-heavy targets.
- Reverse-engineer target websites by analysing browser network traffic in Chrome DevTools: identify XHR/fetch API calls, understand request headers, cookies and session flows, detect fingerprinting patterns.
- Parse structured and semi-structured HTML using BeautifulSoup4 and lxml; write robust XPath and CSS selectors that are resilient to minor DOM changes.
- Extract and normalise product data: price (including multi-currency), promotional mechanics, assortment (SKU name, brand, category), availability and URL.
- Handle pagination: offset-based, cursor-based, infinite scroll, AJAX-loaded content and URL-pattern iteration.
- Implement session management for scrapers requiring login flows, including form submission, CSRF token handling, and cookie persistence.
- Identify and implement evasion strategies: randomised delays, user-agent rotation, request header spoofing, referer injection.
Platform Integration & Quality (Secondary — 30% of time)
- Write unit tests for all parser functions using pytest; maintain >80% test coverage on new code.
- Document each scraper: target URL patterns, data fields extracted, known failure modes, approximate run time and output volume.
- Integrate scrapers into the orchestration layer (Airflow DAGs) according to team standards; configure scheduling, retries and alerting.
- Monitor your scrapers in production: review daily success/failure dashboards, triage failures within agreed SLA, fix breakages promptly.
- Participate in code reviews: review peers' scrapers for correctness, resilience and adherence to team coding standards.
Maintenance & Incident Response (10% of time)
- Investigate and resolve scraper failures caused by site structure changes, increased anti-bot aggression, IP blocks or infrastructure issues.
- Classify failure root causes and propose framework-level improvements to the Lead Scraping Engineer where patterns emerge.
TECHNICAL SKILLS — REQUIRED
- Python 3.8+ (primary language, mandatory)
- HTTP protocol: headers, cookies, sessions, redirects
- Scrapy framework (spider development, middlewares, pipelines)
- requests / httpx / aiohttp (HTTP client libraries)
- BeautifulSoup4 + lxml (HTML/XML parsing)
- JSON and XML parsing, data normalisation
- Selenium WebDriver (dynamic page interaction)
- Regular expressions (re module)
- Playwright for Python (modern headless browser automation)
- pytest (unit and integration testing)
- XPath and CSS selector authoring
- Git (branching, PRs, code review workflow)
- Chrome DevTools Network tab (request analysis)
- Basic SQL (SELECT, JOIN, WHERE, aggregations)
TECHNICAL SKILLS — STRONG ADVANTAGE
- Experience bypassing common anti-bot systems: Cloudflare Bot Management, Akamai Bot Manager, PerimeterX, DataDome, hCaptcha, reCAPTCHA v2/v3.
- Reverse-engineering private or undocumented APIs: intercepting mobile app traffic with Charles Proxy or mitmproxy, analysing obfuscated JavaScript.
- Asynchronous Python: asyncio, concurrent.futures, understanding of event loop mechanics.
- Docker: containerising scrapers, writing Dockerfiles, basic docker-compose.
- Redis: using as a request queue or deduplication cache.
- Rotating proxy integration: experience with Bright Data, Oxylabs, Smartproxy or similar residential/datacenter proxy providers.
- Basic JavaScript / Node.js: reading and understanding frontend JS to identify data sources.
- Experience with e-commerce, travel-retail or price comparison platforms specifically.
- CAPTCHA solving services: 2Captcha, Anti-Captcha, CapSolver.
EXPERIENCE & QUALIFICATIONS
- 3-5 years of professional software engineering experience.
- Minimum 1.5 years of hands-on, production web scraping — this must be explicitly evidenced in your CV with concrete examples (number of sites, data volume, live production use).
- Bachelor's degree in Computer Science, Software Engineering or equivalent; strong equivalent practical experience considered.
- Ability to demonstrate past work: GitHub repository, portfolio of scrapers, or ability to walk through a past project in technical detail during interview.
- Fluent English (written and spoken); clear technical communication skills.
WHAT GOOD LOOKS LIKE IN THIS ROLE
- You can open a competitor website, spend 20 minutes in DevTools, and identify whether it's best scraped via HTML parsing, API interception or headless browser — and explain your reasoning.
- You write scrapers that continue running for weeks without manual intervention, not scrapers that break on the first DOM change.
- You proactively document your work so that a colleague can maintain your scraper without asking you questions.
- You care about the quality and accuracy of the data you produce, not just whether the scraper runs.
Due to certain email system settings, some of our messages may occasionally land in your junk or spam folder. To ensure you don’t miss any important updates regarding your application, please check these folders regularly and mark our emails as ‘Not Spam’ if needed.
We look forward to connecting with you soon!