Bias Dashboard: MNAR Salaries, Selection Bias, and Geographic Gaps

The Sesgos (Bias) tab makes dataset limitations transparent and directly actionable for the consultants and analysts who rely on TinderJob outputs. Every dataset carries assumptions, gaps, and structural choices made during collection — and when those gaps are invisible, they produce misleading conclusions. Before DataTalent uses any figure from TinderJob to design a curriculum, advise a candidate, or present to management, the team must understand exactly where the data falls short and how that affects interpretation. This tab surfaces three specific, quantified biases and closes with six concrete strategic recommendations.

Bias Summary Table

The following table reproduces the executive summary rendered at the top of the Sesgos tab:

Bias	Dataset	Type	% Affected	Impact
MNAR — Missing Salaries	Tecnoempleo	MNAR	80.7% null	No salary analysis possible from Tecnoempleo
Search Term Bias	Tecnoempleo	Selection	24 fixed terms	Profiles outside these 24 are excluded
Geographic Underrepresentation	DS Salaries	Geographic	2.3% (14/607)	Spanish stats from this dataset unreliable

MNAR (Missing Not At Random) means the missingness is correlated with the missing value itself — companies that don’t publish salaries tend to be those paying below-market rates. This is the most dangerous type of missing data because naive imputation would systematically underestimate salaries, and ignoring it hides a real market pattern.

Chart 1 — Selection Bias: Offers by Search Term

A horizontal bar chart showing the number of offers per busqueda search term, sorted ascending. This chart is deliberately identical in structure to the demand chart in the Mercado España tab — but the framing here is critical. The dataset contains only 24 predefined search terms. These were chosen at scraper design time and represent the developer’s prior assumptions about which tech profiles matter. Any role with a title that doesn’t map to one of these 24 terms — product manager, QA engineer, site reliability engineer, blockchain developer, and hundreds of others — is structurally absent from the dataset, not because those roles don’t exist in Spain, but because they were never scraped. This is a selection bias by design, not a data quality failure. The remedy is straightforward: expand the scraper’s search term list before the next data collection run.

Chart 2 — Geographic Underrepresentation: DS Salaries

A bar chart showing the top 15 countries by record count in the DS Salaries dataset, with Spain (country code ES) highlighted in Tinder red and all other countries in pink.

Country	Records	Share
US	355	58.5%
Spain (ES)	14	2.3%

The United States dominates the dataset with 355 records — nearly 29× more than Spain’s 14 entries. While the DS Salaries dataset is used throughout the dashboard as a salary reference, this chart makes explicit that it is a US-centric dataset. Any salary figure presented to a Spanish candidate as a “benchmark” carries significant uncertainty at the country level and should be labeled as a directional global reference, not a local market guarantee.

Strategic Recommendations for DataTalent

The following six recommendations are rendered directly in the Sesgos tab as a numbered list:

Always communicate the median salary (€93,444), never the mean. The right-skewed salary distribution makes the mean (€103,314) unrepresentative of what a typical tech worker earns.
Do not use Tecnoempleo as a salary source. With 80.7% of salary values null and the MNAR mechanism confirmed, any salary analysis built on Tecnoempleo data would be unreliable and potentially misleading.
Expand the scraper’s search term list to reduce selection bias. Prioritize high-growth roles currently absent from the 24 scraped terms.
Complement with Spanish-specific sources (InfoJobs, LinkedIn Spain) for salary benchmarking. These sources have higher Spanish record density and are more representative of local compensation norms.
Do not train automated selection or recommendation models on these datasets without applying debiasing techniques first. Models trained on biased data reproduce and amplify those biases at scale.
Communicate uncertainty to management. All figures derived from this dataset are directional and indicative, not precise market measurements. Present them with appropriate confidence intervals or explicit caveats.

Building automated hiring recommendation or scoring models on these datasets without first addressing MNAR salary missingness, search term selection bias, and Spain’s geographic underrepresentation in DS Salaries could perpetuate structural hiring inequalities. Debiasing is not optional — it is an ethical prerequisite for any downstream automated decision-making system.

Overview

Data Pipeline

Analysis Notebooks

Streamlit Dashboard

Key Findings

Bias Dashboard: MNAR Salaries, Selection Bias, and Geographic Gaps

Bias Summary Table

Chart 1 — Selection Bias: Offers by Search Term

Chart 2 — Geographic Underrepresentation: DS Salaries

Strategic Recommendations for DataTalent

Build docs developers (and LLMs) love

Overview

Data Pipeline

Analysis Notebooks

Streamlit Dashboard

Key Findings

Documentation Index

​Bias Summary Table

​Chart 1 — Selection Bias: Offers by Search Term

​Chart 2 — Geographic Underrepresentation: DS Salaries

​Strategic Recommendations for DataTalent

Build docs developers (and LLMs) love

Bias Summary Table

Chart 1 — Selection Bias: Offers by Search Term

Chart 2 — Geographic Underrepresentation: DS Salaries

Strategic Recommendations for DataTalent