This is the second notebook in the series (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/HelenDiMo/TinderJob/llms.txt
Use this file to discover all available pages before exploring further.
analisis_02_correlaciones_agrupaciones_probabilidad.ipynb), building directly on the descriptive foundation established in Notebook 1. Where the first notebook characterised the shape and quality of the data, this one uses it to answer specific business questions. Through multivariate correlation analysis, ranked skill-demand groupings, and conditional probability modeling, it translates raw job market data into actionable intelligence — revealing which variables genuinely predict salary, which company types offer the most remote flexibility, and how dramatically the probability of earning above the median shifts with experience level.
Multivariate Analysis
Understanding which numeric variables move together — and how strongly — is the starting point for any salary prediction or segmentation model. The correlation matrix captures pairwise linear relationships across the three key continuous variables in the DS Salaries dataset.Salary vs. Remote Ratio
Correlation: 0.13 — effectively negligible. Remote work status has almost no linear relationship with salary level.
Salary vs. Work Year
Correlation: ~0.17 — the strongest of the three, but still very weak. Salaries have drifted upward slightly over the years covered.
Remote Ratio vs. Work Year
Correlation: positive but weak — remote work availability increased over time, but not dramatically within this dataset.
The maximum correlation observed across all three pairs is 0.17. This means that none of these numeric variables are strong linear predictors of salary in isolation. Experience level — a categorical variable — is a far more powerful predictor, as the conditional probability analysis below demonstrates.
Business Groupings
Aggregation by category reveals the structure of the job market in ways that correlations cannot. Two groupings are particularly important for TinderJob’s product: skill demand from Tecnoempleo, and salary by experience level from DS Salaries.Top 20 Skills by Demand
Skills are stored as comma-separated strings in theskills column. Exploding this column and counting occurrences gives an unambiguous demand ranking:
| Rank | Skill | Job Postings |
|---|---|---|
| 1 | Python | 168 |
| 2 | Java | 159 |
| 3 | SQL | 96 |
| 4 | Angular | 61 |
| 5 | Azure | 58 |
Salary Median by Experience Level
Grouping DS Salaries records by experience label reveals how salary scales with seniority:Work Modality Distribution by City
A pivot table cross-referencing city and work modality (presencial, híbrido, remoto) reveals geographic patterns in how companies offer flexible arrangements:
Conditional Probability Modeling
Conditional probability — P(A | B), the probability of event A given that condition B is known — transforms the grouping analysis into predictive statements. Three business scenarios are modeled below using the DS Salaries dataset and the Tecnoempleo dataset respectively.P(High Salary | Experience Level)
Business question: What is the probability that a candidate earns above the global salary median, given their experience level?Results by experience level:
The most significant inflection point in salary probability is the Mid-level → Senior transition — not Junior → Mid-level. This insight should inform TinderJob’s career progression guidance: the most valuable investment a candidate can make is acquiring the credentials and experience to cross the Senior threshold.
| Experience Level | P(High Salary) |
|---|---|
| Junior | 11.4% |
| Mid-level | ~40% |
| Senior | 73.2% |
P(Remote Work | Company Size)
Business question: What is the probability of 100% remote work, given company size?Company size is encoded in DS Salaries as
Medium-sized companies offer the highest probability of full remote work — likely because they have the operational flexibility of larger firms without the rigid office-attendance policies that large enterprises tend to enforce. Candidates prioritising remote work should specifically target the medium-size segment.
S (small, <50 employees), M (medium, 50–250), and L (large, >250).| Company Size | P(100% Remote) |
|---|---|
| Small (<50) | Moderate |
| Medium (50–250) | 69.3% — highest |
| Large (>250) | 53.5% |
P(Flexible Work | City)
Business question: What is the probability of hybrid or remote work given the city of the job listing, for cities with at least 5 postings?
Alcobendas — home to many multinational tech firms’ Spanish headquarters — leads the ranking with 86% flexible modality probability. Madrid and Barcelona, despite their volume of listings, offer roughly equal and comparatively lower flexibility rates, suggesting that density of competition may correlate with stricter in-office requirements.
| City | P(Hybrid or Remote) |
|---|---|
| Alcobendas | 86% |
| Almería | 76% |
| Barcelona | 45% |
| Madrid | 44% |
Key Findings
Remote ≠ Salary Predictor
Remote ratio correlates with salary at only 0.13 — it is not a meaningful predictor. Experience level is the dominant variable.
Target Medium Companies
Candidates seeking remote work should focus on medium-sized companies (50–250 employees), where P(100% remote) reaches 69.3%.
Senior Threshold Effect
Reaching Senior level triples the probability of a high salary compared to Junior level (73.2% vs. 11.4%). The Mid → Senior jump is the most financially impactful career move.