Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HelenDiMo/TinderJob/llms.txt

Use this file to discover all available pages before exploring further.

This page is the authoritative field-level reference for both datasets produced by the TinderJob data pipeline. The raw dataset (data/raw/tecnoempleo_jobs.csv) is the direct output of the Tecnoempleo scraper and contains eight columns exactly as extracted from the portal. The processed dataset (data/processed/clean_tecnoempleo_jobs.csv) extends the raw schema with six additional columns derived during the cleaning pipeline — work modality, clean city name, numeric salary bounds, and an outlier flag. Both datasets are documented in full below, including types, nullability, representative examples, and caveats.

Raw Dataset Fields

These eight columns are present in data/raw/tecnoempleo_jobs.csv after every scraper run. They are stored verbatim — no normalization or transformation has been applied.
titulo
string
required
The job offer title exactly as it appears in the Tecnoempleo listing card.
Example: "Data Scientist Senior"
skills
string
A comma-separated list of technology and skill badges extracted from the listing card. Deduplicated at scrape time to remove repeated badges within a single offer.
Example: "Python, SQL, Power BI, Machine Learning"
salario
string
The raw salary range text as it appears on the listing, including currency symbols and formatting. null when no salary is disclosed.
Example: "30.000€ - 40.000€"
The full raw field reference is provided in the table below:
FieldTypeNullableDescriptionExample
titulostringNoJob offer title"Data Scientist Senior"
empresastringYesHiring company name"Accenture"
ubicacionstringYesCity or province with embedded work modality"Madrid (Híbrido)"
salariostringYesRaw salary range text"30.000€ - 40.000€"
tipo_contratostringYesContract type extracted from the detail page"Indefinido"
skillsstringYesComma-separated required tech skills"Python, SQL"
busquedastringNoSearch term slug that surfaced this offer"data-scientist"
urlstringYesFull absolute URL to the offer on Tecnoempleo"https://www.tecnoempleo.com/oferta/..."
The titulo and busqueda columns are the only fields guaranteed to be non-null — titulo is used as the primary deduplication key and busqueda is injected by the scraper itself. All other fields depend on what Tecnoempleo exposes for a given listing.

Processed Dataset Additional Fields

The cleaning pipeline appends six derived columns to the raw schema. These are present exclusively in data/processed/clean_tecnoempleo_jobs.csv.
FieldTypeDescription
modalidadstringWork modality parsed from ubicacion: 'En Remoto', 'Híbrido', 'Presencial', or 'No especificado'
ciudadstringClean city name with modality suffixes (e.g., " - españa", "(híbrido)") stripped from ubicacion
salario_minfloatMinimum salary boundary (annual EUR) parsed from the raw salario text; NaN when unparseable
salario_maxfloatMaximum salary boundary (annual EUR) parsed from the raw salario text; NaN when unparseable
salario_mediofloatArithmetic mean of salario_min and salario_max; NaN when either bound is missing
es_outlierboolTrue if salario_medio falls outside the asymmetric IQR bounds [Q1 − 1.5×IQR, Q3 + 3×IQR]
Rows where salario_min < 10,000 are removed entirely from the processed dataset. These arise from parsing artefacts (e.g., a salary text that contains a year like "2024" as its first numeric token) and are not viable salary observations.

Data Sources

Tecnoempleo.com

Primary source for all job listing data. Spain’s leading technology employment portal, scraped in real time using the TinderJob scraper module.
MetadataValue
Extraction date27/05/2026
Total raw records1,151
Records after cleaning1,147
Search profiles covered24

DS Salaries Reference Dataset

Secondary dataset used for international salary benchmarking and experience-level comparisons across the analytics and data science market.
MetadataValue
Filedata/raw/ds_salaries.csv
Total records607
Spain (ES) share2.3% (14 records)

DS Salaries Reference Dataset

The data/raw/ds_salaries.csv file is a curated dataset of data science compensation figures used to provide international and experience-level benchmarking context alongside the live Tecnoempleo data. The following columns are used in the TinderJob analysis:
ColumnDescription
experience_levelSeniority tier: EN (Entry), MI (Mid), SE (Senior), EX (Executive)
salary_in_usdAnnual compensation in USD, converted to EUR using a fixed rate of ×0.92
company_sizeEmployer size: S (small), M (medium), L (large)
remote_ratioPercentage of work performed remotely: 0, 50, or 100
work_yearYear the salary was reported
company_locationISO 3166-1 alpha-2 country code of the hiring company
Spain (ES) represents only 2.3% of records in this dataset — 14 out of 607 rows. Conclusions drawn from this dataset about Spanish market conditions should be treated as illustrative benchmarks rather than statistically robust estimates. The Tecnoempleo scraped data is the authoritative source for Spain-specific analysis in this project.

Build docs developers (and LLMs) love