Jérôme Crevoisier
Senior backend engineer
Hi there !
I’m a passionate software engineer with a strong background in automation, API development, data engineering, and AI integration. I’ve built a wide range of systems, from intelligent data pipelines to machine learning workflows, always with a focus on performance, scalability, and real-world impact.
While I have deep experience in web scraping and browser automation, my work extends far beyond that. I love building robust backend services, designing clean, maintainable architectures, and working across the full stack when needed.
I thrive in startup environments where I can take ownership, move fast, and help shape products from the ground up. Whether it’s prototyping a new idea or scaling a core system, I bring a hands-on, product-driven mindset to every project.
At my core, I’m driven by curiosity, creativity, and a love for building useful things, with clean code and smart automation as my tools of choice.
My tech stack
Languages: Python · TypeScript · SQL
Backend Development: FastAPI, Flask, Django, Node.js, Express, REST API design, async programming, Celery, RabbitMQ, CI/CD pipelines, pytest
Parsing & LLMs: LangChain · LlamaIndex · Haystack · OpenAI · Hugging Face · Unstructured.io · Regex
Scraping & Automation: Playwright · Puppeteer · Selenium · Scrapy · BeautifulSoup · pdfplumber · requests · aiohttp · Tor · Proxy Pools · Stealth Browsers
Databases: SQL (PostgreSQL, MySQL), NoSQL (MongoDB, Redis)
Deployment & DevOps: Docker · GitHub Actions · Kubernetes · GCP · AWS
My Project portfolio

A production-ready FastAPI application featuring comprehensive AI chat capabilities with dual provider support (OpenAI & HuggingFace), JWT authentication, and asynchronous task processing. This project demonstrates modern API architecture with containerized microservices, background job processing, and scalable AI integrations, perfect for building intelligent conversational applications and AI-powered services.
Technologies:
https://github.com/jcrevoisier/ai-chat
Python · FastAPI · OpenAI API · HuggingFace · JWT Authentication · Celery · Redis · SQLAlchemy · Docker · Pydantic · AsyncIO · RESTful API

A production-ready scraping infrastructure designed to manage large-scale data collection, processing, and monitoring, all fully containerized with Docker.
This project integrates modern tools to orchestrate scrapers, schedule jobs, store and serve data, and monitor performance, ideal for real-world data pipelines and cloud deployment.
Technologies:
Python · Scrapy · FastAPI · Celery · Redis · PostgreSQL · Docker · Prometheus · Grafana · Google Cloud Platform

A toolkit demonstrating multiple approaches to solving CAPTCHAs, including reCAPTCHA, hCaptcha, and custom image-based challenges, using 2Captcha, Anti-Captcha APIs, and machine learning techniques.
Technologies:
Python · 2Captcha API · Anti-Captcha API · Machine Learning · OCR · .env Configuration

A Python library and Dockerized tool for anonymous web scraping with the Tor network. Enables automatic IP rotation, rate limiting, retry handling, and user-agent spoofing to avoid blocks and scraping restrictions.
Technologies:
Python · Tor · SOCKS Proxy · Docker · Requests · BeautifulSoup · User-Agent Rotation

A tool that combines web scraping with LLM-based post-processing to transform unstructured product and article data into clean, structured formats. Supports OpenAI, Hugging Face, and LangChain for parsing and summarization tasks.
Technologies:
Python · OpenAI API · Hugging Face Transformers · LangChain · BeautifulSoup · Requests · Pandas

A TypeScript library that mimics realistic human interaction for browser automation, simulating mouse movement, typing, scrolling, and randomness to help bypass basic bot detection systems.
Technologies:
TypeScript · Playwright · Browser Automation · User Simulation · Anti-Bot Evasionhttps://github.com/jcrevoisier/human-behavior-simulation-kit

A hands-on toolkit for identifying and interacting with undocumented APIs by inspecting browser traffic. Includes real-world examples from Twitter, Indeed, and Yelp, with clean Python implementations and best practices for handling headers, tokens, and pagination.
Technologies:
Python · requests · httpx · DevTools Protocol · HAR Analysishttps://github.com/jcrevoisier/api-reverse-engineering-playbook

A stealth-focused web scraper built with TypeScript and Playwright, featuring proxy rotation, CAPTCHA solving via 2Captcha, and realistic human interaction simulation to bypass anti-bot mechanisms.
Technologies:
TypeScript · Playwright · 2Captcha API · Proxy Rotation · Human Behavior Emulation · dotenv

Extracted and reformatted technical content from an API documentation page into a clean, structured reference document. Focused on clarity, logical structure, and adherence to documentation best practices. Delivered a polished result suitable for internal use or developer onboarding.
Technologies:
Python · BeautifulSoup · re (Regex) · pdfplumber · Markdown · Text Cleaning · Formatting Automation

Developed a custom web scraper to extract assisted and independent living facility data from Caring.com across all 50 U.S. states. Automated deep navigation, pagination handling, and JavaScript rendering. Enriched data by estimating bed counts using external AI-driven web searches and structured parsing of third-party sources.
Technologies:
Python · Playwright · BeautifulSoup · pandas · Regex · DuckDuckGo API · CSV/Excel Export · Stealth Browser Automation
Designed, developed, maintained, and continuously enhanced a custom scraper to detect and extract illegal movie streaming links across piracy websites. The solution operated at scale and included detection of mirror sites, automated content categorization, and frequent bypassing of obfuscation and anti-bot techniques, including CAPTCHA solving and IP rotation.
Technologies:
Python · TypeScript · Selenium · AWS Lambda · Kubernetes · Headless Browsers · Cloudflare Bypassing · CAPTCHA Solving · Proxy Rotation · Anti-Bot Evasion Techniques · AWS · Docker · Kubernetes · MongoDB · Redis
Built and maintained a complete suite of web scrapers to collect structured data from major social media platforms, including Twitter, Telegram, Facebook, Instagram, TikTok, LinkedIn, Reddit, Quora, YouTube, and Dailymotion. These scrapers handled a wide range of content types (posts, profiles, comments, reactions, media), overcoming anti-bot protections and rate limits. Solutions were containerized and deployed at scale using cloud infrastructure.
Technologies:
Python · Playwright · Selenium · Scrapy · Headless Browsers · Proxy Rotation · CAPTCHA Solving · Rate Limiting Bypass · AWS · Docker · Kubernetes · MongoDB · Redis
Developed computer vision models to analyze images and videos from social media, detecting and classifying logos, people, activities, and objects. This engine supported brand monitoring, threat detection, and visual trend analysis. The models were trained on custom-labeled datasets and optimized for performance in production environments.
Technologies:
Python · PyTorch · OpenCV · TensorFlow · FastAPI · Redis · Docker · AWS S3 · MediaPipe · YOLOv5 · Label Studio
Designed and trained language models to analyze the sentiment of large-scale Twitter posts. The system identified relevant mentions, classified sentiment (positive, negative, neutral), and generated real-time insights for clients in entertainment, politics, and consumer sectors.
Technologies:
Python · Hugging Face Transformers · FastAPI · Redis · Twitter API · AWS Lambda · PyTorch · TextBlob · Pandas
Developed and maintained a system using Large Language Models (LLMs) to automatically analyze and classify web pages as piracy-related or not. The tool processed large volumes of URLs daily, performing semantic analysis on the content, detecting piracy keywords, and flagging high-risk pages for review.
Technologies:
Python · OpenAI GPT · LangChain · Playwright · BeautifulSoup · Redis · AWS Lambda · FastAPI · Hugging Face · Pandas
Designed, developed, and maintained robust RESTful APIs to give enterprise clients secure access to intelligence data scraped from multiple sources. Initially built with Flask and progressively migrated to FastAPI for performance and async support. Some legacy services also integrated with a Ruby on Rails backend. Implemented efficient query filtering, scalable pagination, and flexible data access endpoints. The APIs were deployed via AWS ECS with tasks running on EC2, and used S3 for data exports and backups.
Technologies:
Python · Flask · FastAPI · Ruby on Rails · PostgreSQL · Redis · OpenAPI · Docker · CI/CD
AWS ECS · EC2 · S3 · CloudWatch · Route 53

Developed machine learning algorithms in Python and R to analyze thousands of customer support calls. Focused on extracting actionable insights from audio data, including emotion detection, keyword spotting, and topic modeling. Deployed models in Databricks to enable real-time dashboarding and reporting for operational teams. Worked closely with business stakeholders to ensure visual outputs were intuitive and aligned with KPIs.
Technologies:
Python · R · Databricks · Scikit-learn · NLTK · Librosa · PyDub · Spark · Pandas · Matplotlib · SQL
My contacts
📧 Email: crevoisierj@hotmail.com
📱 Phone: +34 647 840 103
🔗 LinkedIn: linkedin.com/in/crevoisierjerome/
💻 GitHub: github.com/jcrevoisier
✍️ Medium: medium.com/@jromecrevoisier