TinderJob is an advanced analytics and automation project built for DataTalent Solutions S.L., a Spanish company that designs and delivers technology reskilling programs. Using a custom web scraper, three complementary datasets, and a Streamlit dashboard, TinderJob answers hard questions about Spain’s live tech job market — which skills are in demand, what salaries look like, where the hidden biases are, and how consultants should act on that evidence. The project name reflects its centrepiece feature, TinderMatch: a CV-powered recommendation engine that swipes through thousands of real job listings and surfaces the ones that best fit a candidate’s actual technical profile.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/HelenDiMo/TinderJob/llms.txt
Use this file to discover all available pages before exploring further.
Business Context
DataTalent Solutions S.L. needs to optimise its technology training and reskilling programs based on real market evidence, not assumptions. TinderJob was commissioned to provide empirical, reproducible answers to five critical business questions:| # | Question | Why It Matters |
|---|---|---|
| 1 | Skills Demanded — Which technical skills are requested most frequently in data roles across Spain? | Drives curriculum design for reskilling cohorts |
| 2 | Salary Distribution — Are there salary biases by gender, geographic location, or contract type? | Informs compensation benchmarking and equity reporting |
| 3 | Leading Sectors — Which industrial sectors concentrate the highest volume of offers and the best salary bands? | Guides client partnerships and sector-focused bootcamps |
| 4 | Market Correlations — What is the mathematical relationship between years of experience, technical skills, and salary? | Enables ROI modelling for training investments |
| 5 | Impact of Biased Data — What is the strategic risk of making decisions based on incomplete (MNAR) or under-represented data? | Supports ethical AI and responsible analytics governance |
Strategic Pivot and Data Sources
The project originally planned to use a LinkedIn job listings dataset. However, applying rigorous QA criteria, the team identified a critical geolocation bias: every record in the recommended dataset belonged to the United States — making it entirely invalid for a Spain-focused analysis. To resolve this and align 100% with the client’s objectives, the team made a strategic pivot and combined three complementary sources:Stack Overflow Developer Survey
The 2025/2026 global developer survey, filtered to extract Spain-specific data profiles. Ideal for deep analysis of demographic and ethical biases in the tech workforce.
DS Salaries Dataset
A curated collection of data science and tech role salaries from around the world. Provides international salary benchmarks to contextualise Spanish market figures. Must be downloaded separately — see Quickstart.
Tecnoempleo Scraper
A custom-built scraper that extracts live job listings in real time from Tecnoempleo, Spain’s leading tech job portal. Covers 24 professional tech profiles and captures titles, companies, locations, salaries, contract types, skills, and direct offer URLs.
Repository Architecture
The project follows a modular, clean structure that maximises reproducibility and simplifies maintenance:| Directory / File | Purpose |
|---|---|
app/streamlit_app.py | Main Streamlit dashboard — all visualisation tabs and the TinderMatch engine |
data/raw/ | Raw outputs from the scraper (tecnoempleo_jobs.csv) and the DS Salaries dataset |
data/processed/ | Cleaned, deduplicated, salary-parsed dataset (clean_tecnoempleo_jobs.csv) |
data/metadata/ | Schema documentation and field dictionaries |
notebooks/ | Three sequential EDA notebooks covering descriptive stats, correlations, and bias analysis |
src/scraper/ | extract_tecnoempleo.py — the Tecnoempleo web scraper |
src/data_processing/ | clean_tecnoempleo_data.py — the full cleaning and normalisation pipeline |
src/analytics/ | Supporting analytical utilities used by the notebooks and dashboard |
Team
TinderJob was delivered by a five-person agile squad at DataTalent Solutions S.L., each member owning a focused technical domain to ensure parallel delivery without bottlenecks:| Member | Role | Specialization | GitHub |
|---|---|---|---|
| Verónica Melero | Product Owner | Front-end Developer | @vmelero13 |
| Elena Díaz | Scrum Master | QA & Presentation Lead | @HelenDiMo |
| Adriana Aránguez | Developer | Analytics & Bias Reporter | @adrianaarang |
| Joel Ibarra | Developer | Data Cleaning & Integration | @jowel2701 |
| Luis El Allali | Developer | Scraper Engineer | @luiselallali18-hub |
Key Findings at a Glance
Based on 1,148 live Tecnoempleo listings scraped across 24 tech profiles, the analysis surfaces findings directly actionable by DataTalent Solutions S.L.:- Python (168 offers), Java (159), and SQL (96) are the top three most demanded skills in Spain’s tech market.
- Only 19.3% of listings publish an explicit salary — an 80.7% MNAR pattern that correlates with job profile and seniority level, not randomness.
- Data Scientist leads by volume (84 offers), closely followed by Programador and Soporte Técnico (76 each).
- 35.3% of roles are hybrid, with only 7.3% fully on-site — remote-first is far from the norm in Spain.
- The global DS Salaries dataset places the median senior data science salary at €124,660/year, versus €51,980 for junior roles — a 2.4× experience premium.
- Correlation between remote work and salary is only 0.13 — barely meaningful — contradicting the common assumption that remote roles pay better.
Explore the Project
Quickstart
Clone the repo, install dependencies, run the scraper and cleaning pipeline, and launch the dashboard in under 10 minutes.
Data Pipeline
Deep dive into the scraper architecture, cleaning logic, salary parsing, and IQR-based outlier detection.
Analysis Notebooks
Explore the three sequential EDA notebooks: descriptive stats, market correlations, and the ethics/bias report.
Dashboard & TinderMatch
Learn how the Streamlit app works, what each analytical tab shows, and how the CV-matching engine ranks job offers.
TinderJob is an educational and research project developed for DataTalent Solutions S.L. All scraped data is used exclusively for statistical analysis and curriculum design. The project applies ethical rate-limiting to Tecnoempleo requests and does not store, redistribute, or commercialise any scraped content. The DS Salaries dataset and Stack Overflow Developer Survey are used under their respective open licences for non-commercial research.