This page is the authoritative field-level reference for both datasets produced by the TinderJob data pipeline. The raw dataset (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/HelenDiMo/TinderJob/llms.txt
Use this file to discover all available pages before exploring further.
data/raw/tecnoempleo_jobs.csv) is the direct output of the Tecnoempleo scraper and contains eight columns exactly as extracted from the portal. The processed dataset (data/processed/clean_tecnoempleo_jobs.csv) extends the raw schema with six additional columns derived during the cleaning pipeline — work modality, clean city name, numeric salary bounds, and an outlier flag. Both datasets are documented in full below, including types, nullability, representative examples, and caveats.
Raw Dataset Fields
These eight columns are present indata/raw/tecnoempleo_jobs.csv after every scraper run. They are stored verbatim — no normalization or transformation has been applied.
The job offer title exactly as it appears in the Tecnoempleo listing card.
Example:
Example:
"Data Scientist Senior"A comma-separated list of technology and skill badges extracted from the listing card. Deduplicated at scrape time to remove repeated badges within a single offer.
Example:
Example:
"Python, SQL, Power BI, Machine Learning"The raw salary range text as it appears on the listing, including currency symbols and formatting.
Example:
null when no salary is disclosed.Example:
"30.000€ - 40.000€"| Field | Type | Nullable | Description | Example |
|---|---|---|---|---|
titulo | string | No | Job offer title | "Data Scientist Senior" |
empresa | string | Yes | Hiring company name | "Accenture" |
ubicacion | string | Yes | City or province with embedded work modality | "Madrid (Híbrido)" |
salario | string | Yes | Raw salary range text | "30.000€ - 40.000€" |
tipo_contrato | string | Yes | Contract type extracted from the detail page | "Indefinido" |
skills | string | Yes | Comma-separated required tech skills | "Python, SQL" |
busqueda | string | No | Search term slug that surfaced this offer | "data-scientist" |
url | string | Yes | Full absolute URL to the offer on Tecnoempleo | "https://www.tecnoempleo.com/oferta/..." |
The
titulo and busqueda columns are the only fields guaranteed to be non-null — titulo is used as the primary deduplication key and busqueda is injected by the scraper itself. All other fields depend on what Tecnoempleo exposes for a given listing.Processed Dataset Additional Fields
The cleaning pipeline appends six derived columns to the raw schema. These are present exclusively indata/processed/clean_tecnoempleo_jobs.csv.
| Field | Type | Description |
|---|---|---|
modalidad | string | Work modality parsed from ubicacion: 'En Remoto', 'Híbrido', 'Presencial', or 'No especificado' |
ciudad | string | Clean city name with modality suffixes (e.g., " - españa", "(híbrido)") stripped from ubicacion |
salario_min | float | Minimum salary boundary (annual EUR) parsed from the raw salario text; NaN when unparseable |
salario_max | float | Maximum salary boundary (annual EUR) parsed from the raw salario text; NaN when unparseable |
salario_medio | float | Arithmetic mean of salario_min and salario_max; NaN when either bound is missing |
es_outlier | bool | True if salario_medio falls outside the asymmetric IQR bounds [Q1 − 1.5×IQR, Q3 + 3×IQR] |
Rows where
salario_min < 10,000 are removed entirely from the processed dataset. These arise from parsing artefacts (e.g., a salary text that contains a year like "2024" as its first numeric token) and are not viable salary observations.Data Sources
Tecnoempleo.com
Primary source for all job listing data. Spain’s leading technology employment portal, scraped in real time using the TinderJob scraper module.
| Metadata | Value |
|---|---|
| Extraction date | 27/05/2026 |
| Total raw records | 1,151 |
| Records after cleaning | 1,147 |
| Search profiles covered | 24 |
DS Salaries Reference Dataset
Secondary dataset used for international salary benchmarking and experience-level comparisons across the analytics and data science market.
| Metadata | Value |
|---|---|
| File | data/raw/ds_salaries.csv |
| Total records | 607 |
| Spain (ES) share | 2.3% (14 records) |
DS Salaries Reference Dataset
Thedata/raw/ds_salaries.csv file is a curated dataset of data science compensation figures used to provide international and experience-level benchmarking context alongside the live Tecnoempleo data. The following columns are used in the TinderJob analysis:
| Column | Description |
|---|---|
experience_level | Seniority tier: EN (Entry), MI (Mid), SE (Senior), EX (Executive) |
salary_in_usd | Annual compensation in USD, converted to EUR using a fixed rate of ×0.92 |
company_size | Employer size: S (small), M (medium), L (large) |
remote_ratio | Percentage of work performed remotely: 0, 50, or 100 |
work_year | Year the salary was reported |
company_location | ISO 3166-1 alpha-2 country code of the hiring company |