Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HelenDiMo/TinderJob/llms.txt

Use this file to discover all available pages before exploring further.

The Sesgos (Bias) tab makes dataset limitations transparent and directly actionable for the consultants and analysts who rely on TinderJob outputs. Every dataset carries assumptions, gaps, and structural choices made during collection — and when those gaps are invisible, they produce misleading conclusions. Before DataTalent uses any figure from TinderJob to design a curriculum, advise a candidate, or present to management, the team must understand exactly where the data falls short and how that affects interpretation. This tab surfaces three specific, quantified biases and closes with six concrete strategic recommendations.

Bias Summary Table

The following table reproduces the executive summary rendered at the top of the Sesgos tab:
BiasDatasetType% AffectedImpact
MNAR — Missing SalariesTecnoempleoMNAR80.7% nullNo salary analysis possible from Tecnoempleo
Search Term BiasTecnoempleoSelection24 fixed termsProfiles outside these 24 are excluded
Geographic UnderrepresentationDS SalariesGeographic2.3% (14/607)Spanish stats from this dataset unreliable
MNAR (Missing Not At Random) means the missingness is correlated with the missing value itself — companies that don’t publish salaries tend to be those paying below-market rates. This is the most dangerous type of missing data because naive imputation would systematically underestimate salaries, and ignoring it hides a real market pattern.

Chart 1 — Selection Bias: Offers by Search Term

A horizontal bar chart showing the number of offers per busqueda search term, sorted ascending. This chart is deliberately identical in structure to the demand chart in the Mercado España tab — but the framing here is critical. The dataset contains only 24 predefined search terms. These were chosen at scraper design time and represent the developer’s prior assumptions about which tech profiles matter. Any role with a title that doesn’t map to one of these 24 terms — product manager, QA engineer, site reliability engineer, blockchain developer, and hundreds of others — is structurally absent from the dataset, not because those roles don’t exist in Spain, but because they were never scraped. This is a selection bias by design, not a data quality failure. The remedy is straightforward: expand the scraper’s search term list before the next data collection run.

Chart 2 — Geographic Underrepresentation: DS Salaries

A bar chart showing the top 15 countries by record count in the DS Salaries dataset, with Spain (country code ES) highlighted in Tinder red and all other countries in pink.
CountryRecordsShare
US35558.5%
Spain (ES)142.3%
The United States dominates the dataset with 355 records — nearly 29× more than Spain’s 14 entries. While the DS Salaries dataset is used throughout the dashboard as a salary reference, this chart makes explicit that it is a US-centric dataset. Any salary figure presented to a Spanish candidate as a “benchmark” carries significant uncertainty at the country level and should be labeled as a directional global reference, not a local market guarantee.

Strategic Recommendations for DataTalent

The following six recommendations are rendered directly in the Sesgos tab as a numbered list:
  1. Always communicate the median salary (€93,444), never the mean. The right-skewed salary distribution makes the mean (€103,314) unrepresentative of what a typical tech worker earns.
  2. Do not use Tecnoempleo as a salary source. With 80.7% of salary values null and the MNAR mechanism confirmed, any salary analysis built on Tecnoempleo data would be unreliable and potentially misleading.
  3. Expand the scraper’s search term list to reduce selection bias. Prioritize high-growth roles currently absent from the 24 scraped terms.
  4. Complement with Spanish-specific sources (InfoJobs, LinkedIn Spain) for salary benchmarking. These sources have higher Spanish record density and are more representative of local compensation norms.
  5. Do not train automated selection or recommendation models on these datasets without applying debiasing techniques first. Models trained on biased data reproduce and amplify those biases at scale.
  6. Communicate uncertainty to management. All figures derived from this dataset are directional and indicative, not precise market measurements. Present them with appropriate confidence intervals or explicit caveats.
Building automated hiring recommendation or scoring models on these datasets without first addressing MNAR salary missingness, search term selection bias, and Spain’s geographic underrepresentation in DS Salaries could perpetuate structural hiring inequalities. Debiasing is not optional — it is an ethical prerequisite for any downstream automated decision-making system.

Build docs developers (and LLMs) love