Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HelenDiMo/TinderJob/llms.txt

Use this file to discover all available pages before exploring further.

The TinderJob Streamlit dashboard is the primary interactive interface of the TinderJob project. Built entirely with Streamlit and Plotly, it serves two distinct audiences simultaneously: DataTalent consultants who need deep market analytics to inform curriculum and reskilling decisions, and job candidates who want a practical, CV-driven tool to discover their best-fit opportunities in the Spanish tech market. Every chart, filter, and recommendation in the dashboard is grounded in real Tecnoempleo scraped data and the DS Salaries global benchmark dataset.

Launching the Dashboard

From the project root, run:
streamlit run app/streamlit_app.py
The app opens at http://localhost:8501 by default. Keep the terminal running — Streamlit’s hot-reload will reflect any code changes automatically.

Dashboard Tabs

The dashboard is organized into five tabs, each covering a distinct analytical domain. Use the cards below to navigate to the detailed documentation for each tab.

📍 Spanish Market

Demand by tech profile, top 20 skills ranked by offer count, and work modality distribution across the Spanish market.

💰 Salary Analysis

Global salary benchmarks by experience level, company size, and year using the DS Salaries dataset.

🎲 Conditional Probability

P(A|B) models for high salary by level, remote work by company size, and flexible work by city.

⚖️ Bias Insights

Transparent audit of MNAR salary missingness, selection bias from fixed search terms, and geographic underrepresentation.

💘 TinderMatch

Upload your CV and receive a ranked list of Tecnoempleo offers scored by skill compatibility percentage.

Global Filters (Sidebar)

The left sidebar exposes three selectbox filters that apply globally to the KPI header row and to the main charts rendered above the tab interface. All three default to Todas / Cualquiera, meaning no filter is applied.
FilterTypeValues
CiudadselectboxTodas + all unique city values from the dataset, sorted alphabetically
SkillselectboxTodas + all unique skills extracted by splitting the skills column on commas
Tipo de posiciónselectboxTodas + all unique values from the tipo_contrato column
When a filter is set, only rows matching that value are included in the filtered DataFrame. Multiple filters stack — selecting both a city and a skill will return only offers in that city that require that skill.

KPI Header

Three metric cards are rendered immediately below the page header, before the tab interface. They always reflect the current sidebar filter selection:
KPICalculation
Salario mediomean() of salario_medio column across filtered offers, displayed in EUR
Ofertaslen() of the filtered DataFrame — total number of matching job postings
Ciudad lídermode()[0] of the ciudad column — the city appearing most frequently in filtered results

Tech Stack

The dashboard relies on the following Python libraries:
  • streamlit — reactive UI framework; manages state, tabs, sidebar, widgets, and layout columns
  • plotly — interactive charts including horizontal bar charts, histograms with violin marginals, scatter plots with OLS trendlines, pie charts, and heatmaps (px.imshow)
  • pdfplumber — PDF text extraction used in the TinderMatch tab to parse uploaded CV files
  • pandas + scipy — data loading, groupby aggregations, pivot tables, and statistical tests (Shapiro-Wilk normality check)

Data Sources

Two CSV files must be present at the paths below before launching the dashboard:
FileDescriptionLoader
data/processed/clean_tecnoempleo_jobs.csvScraped and cleaned Tecnoempleo job offers, including ciudad, skills, salario_medio, modalidad, tipo_contrato, busqueda, and url columnsload_data()
data/raw/ds_salaries.csvDS Salaries global dataset (607 records); salary_in_usd is automatically converted to EUR at a 0.92 exchange rate on loadload_salaries()
Both loaders are decorated with @st.cache_data — data is read from disk only on the first run and then cached in memory for the session lifetime.
Both CSV files must exist before launching the dashboard. Run the Tecnoempleo scraper and the data cleaning pipeline to generate clean_tecnoempleo_jobs.csv, then download and place ds_salaries.csv manually into data/raw/ before starting the app.

Build docs developers (and LLMs) love