Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HelenDiMo/TinderJob/llms.txt

Use this file to discover all available pages before exploring further.

TinderJob is an advanced analytics and automation project built for DataTalent Solutions S.L., a Spanish company that designs and delivers technology reskilling programs. Using a custom web scraper, three complementary datasets, and a Streamlit dashboard, TinderJob answers hard questions about Spain’s live tech job market — which skills are in demand, what salaries look like, where the hidden biases are, and how consultants should act on that evidence. The project name reflects its centrepiece feature, TinderMatch: a CV-powered recommendation engine that swipes through thousands of real job listings and surfaces the ones that best fit a candidate’s actual technical profile.

Business Context

DataTalent Solutions S.L. needs to optimise its technology training and reskilling programs based on real market evidence, not assumptions. TinderJob was commissioned to provide empirical, reproducible answers to five critical business questions:
#QuestionWhy It Matters
1Skills Demanded — Which technical skills are requested most frequently in data roles across Spain?Drives curriculum design for reskilling cohorts
2Salary Distribution — Are there salary biases by gender, geographic location, or contract type?Informs compensation benchmarking and equity reporting
3Leading Sectors — Which industrial sectors concentrate the highest volume of offers and the best salary bands?Guides client partnerships and sector-focused bootcamps
4Market Correlations — What is the mathematical relationship between years of experience, technical skills, and salary?Enables ROI modelling for training investments
5Impact of Biased Data — What is the strategic risk of making decisions based on incomplete (MNAR) or under-represented data?Supports ethical AI and responsible analytics governance

Strategic Pivot and Data Sources

The project originally planned to use a LinkedIn job listings dataset. However, applying rigorous QA criteria, the team identified a critical geolocation bias: every record in the recommended dataset belonged to the United States — making it entirely invalid for a Spain-focused analysis. To resolve this and align 100% with the client’s objectives, the team made a strategic pivot and combined three complementary sources:

Stack Overflow Developer Survey

The 2025/2026 global developer survey, filtered to extract Spain-specific data profiles. Ideal for deep analysis of demographic and ethical biases in the tech workforce.

DS Salaries Dataset

A curated collection of data science and tech role salaries from around the world. Provides international salary benchmarks to contextualise Spanish market figures. Must be downloaded separately — see Quickstart.

Tecnoempleo Scraper

A custom-built scraper that extracts live job listings in real time from Tecnoempleo, Spain’s leading tech job portal. Covers 24 professional tech profiles and captures titles, companies, locations, salaries, contract types, skills, and direct offer URLs.

Repository Architecture

The project follows a modular, clean structure that maximises reproducibility and simplifies maintenance:
TinderJob/
├── app/
│   └── streamlit_app.py
├── data/
│   ├── metadata/
│   ├── processed/
│   └── raw/
├── docs/
├── notebooks/
├── src/
│   ├── analytics/
│   ├── data_processing/
│   └── scraper/
├── requirements.txt
└── README.md
Directory / FilePurpose
app/streamlit_app.pyMain Streamlit dashboard — all visualisation tabs and the TinderMatch engine
data/raw/Raw outputs from the scraper (tecnoempleo_jobs.csv) and the DS Salaries dataset
data/processed/Cleaned, deduplicated, salary-parsed dataset (clean_tecnoempleo_jobs.csv)
data/metadata/Schema documentation and field dictionaries
notebooks/Three sequential EDA notebooks covering descriptive stats, correlations, and bias analysis
src/scraper/extract_tecnoempleo.py — the Tecnoempleo web scraper
src/data_processing/clean_tecnoempleo_data.py — the full cleaning and normalisation pipeline
src/analytics/Supporting analytical utilities used by the notebooks and dashboard

Team

TinderJob was delivered by a five-person agile squad at DataTalent Solutions S.L., each member owning a focused technical domain to ensure parallel delivery without bottlenecks:
MemberRoleSpecializationGitHub
Verónica MeleroProduct OwnerFront-end Developer@vmelero13
Elena DíazScrum MasterQA & Presentation Lead@HelenDiMo
Adriana AránguezDeveloperAnalytics & Bias Reporter@adrianaarang
Joel IbarraDeveloperData Cleaning & Integration@jowel2701
Luis El AllaliDeveloperScraper Engineer@luiselallali18-hub

Key Findings at a Glance

Based on 1,148 live Tecnoempleo listings scraped across 24 tech profiles, the analysis surfaces findings directly actionable by DataTalent Solutions S.L.:
  • Python (168 offers), Java (159), and SQL (96) are the top three most demanded skills in Spain’s tech market.
  • Only 19.3% of listings publish an explicit salary — an 80.7% MNAR pattern that correlates with job profile and seniority level, not randomness.
  • Data Scientist leads by volume (84 offers), closely followed by Programador and Soporte Técnico (76 each).
  • 35.3% of roles are hybrid, with only 7.3% fully on-site — remote-first is far from the norm in Spain.
  • The global DS Salaries dataset places the median senior data science salary at €124,660/year, versus €51,980 for junior roles — a 2.4× experience premium.
  • Correlation between remote work and salary is only 0.13 — barely meaningful — contradicting the common assumption that remote roles pay better.

Explore the Project

Quickstart

Clone the repo, install dependencies, run the scraper and cleaning pipeline, and launch the dashboard in under 10 minutes.

Data Pipeline

Deep dive into the scraper architecture, cleaning logic, salary parsing, and IQR-based outlier detection.

Analysis Notebooks

Explore the three sequential EDA notebooks: descriptive stats, market correlations, and the ethics/bias report.

Dashboard & TinderMatch

Learn how the Streamlit app works, what each analytical tab shows, and how the CV-matching engine ranks job offers.
TinderJob is an educational and research project developed for DataTalent Solutions S.L. All scraped data is used exclusively for statistical analysis and curriculum design. The project applies ethical rate-limiting to Tecnoempleo requests and does not store, redistribute, or commercialise any scraped content. The DS Salaries dataset and Stack Overflow Developer Survey are used under their respective open licences for non-commercial research.

Build docs developers (and LLMs) love