Skip to main content

Overview

The statistics dashboard provides instant insights into the filtering results, including total records, filtered tasks, occurrence rates, and per-keyword breakdown.

Core Function

Statistics generation is handled by gerar_estatisticas() in app.py:55:
def gerar_estatisticas(df_original, df_filtrado, palavras_chave):
    """Gera estatísticas simples dos dados"""
    total = len(df_original)
    filtrados = len(df_filtrado)
    
    stats = {
        'total': total,
        'filtrados': filtrados,
        'percentual': round((filtrados/total)*100, 1) if total > 0 else 0,
        'por_palavra': {}
    }
    
    for palavra in palavras_chave:
        if not df_filtrado.empty:
            count = int(df_filtrado['Relato'].str.contains(
                palavra, case=False, na=False
            ).sum())
            if count > 0:
                stats['por_palavra'][palavra] = count
    
    return stats

Metrics Provided

Total Records

Count of all rows in the original file (before filtering)

Filtered Tasks

Number of tasks matching at least one keyword

Occurrence Rate

Percentage of filtered tasks relative to total records

Per-Keyword Breakdown

Individual match count for each keyword used

Statistics Structure

The function returns a dictionary with this structure:
{
    'total': 1523,              # Total rows in original file
    'filtrados': 147,           # Rows matching keywords
    'percentual': 9.7,          # Percentage (1 decimal place)
    'por_palavra': {            # Per-keyword counts
        'quebrado': 45,
        'solicitar peça': 32,
        'trocar cabo': 28,
        'danificado': 18,
        'instalar': 24
    }
}

Calculation Details

Percentage Formula

percentual = round((filtrados/total)*100, 1) if total > 0 else 0
  • Divides filtered count by total count
  • Multiplies by 100 to get percentage
  • Rounds to 1 decimal place using round()
  • Returns 0 if total is 0 (prevents division by zero)

Per-Keyword Counting

Each keyword is counted independently using pandas string matching:
for palavra in palavras_chave:
    if not df_filtrado.empty:
        count = int(df_filtrado['Relato'].str.contains(
            palavra, case=False, na=False
        ).sum())
        if count > 0:
            stats['por_palavra'][palavra] = count
Keywords with zero matches are excluded from the por_palavra dictionary to keep the output clean.

Important Behaviors

Overlapping Matches: A single task can match multiple keywords. The sum of per-keyword counts may exceed the total filtered count.Example: A task with “solicitar peça quebrada” matches both “solicitar peça” and “quebrado”.

Empty DataFrame Handling

The function safely handles empty results:
if not df_filtrado.empty:
    # Only calculate per-keyword stats if there are results
This prevents errors when no tasks match any keywords.

Usage in Application

Statistics are generated during file upload and stored in the Flask session:
# From app.py:141
stats = gerar_estatisticas(df_original, resultado_final, palavras_chave)
session['last_stats'] = stats

Display in Results Page

The stats dictionary is passed to the resultado.html template:
return render_template('resultado.html', 
                       table=tabela_html, 
                       has_results=not resultado_final.empty,
                       stats=stats,
                       palavras_utilizadas=palavras_chave)

Statistics in Exports

Statistics are exported to a dedicated “Estatísticas” sheet:
stats_df = pd.DataFrame([
    ['Total de Registros', stats.get('total', 'N/A')],
    ['Tarefas Encontradas', stats.get('filtrados', 'N/A')],
    ['Taxa de Ocorrência (%)', stats.get('percentual', 'N/A')],
    ['Data de Geração', datetime.now().strftime('%d/%m/%Y %H:%M')]
], columns=['Métrica', 'Valor'])
stats_df.to_excel(writer, index=False, sheet_name='Estatísticas')

Data Types

FieldTypeExample
totalint1523
filtradosint147
percentualfloat9.7
por_palavradict{"quebrado": 45}
The per_palavra counts are explicitly converted to int using int() to ensure consistent JSON serialization when storing in Flask sessions.

Performance

Statistics generation is fast even for large datasets:
  • Uses vectorized pandas operations
  • Runs in O(n × k) where n = filtered rows, k = number of keywords
  • Typically completes in less than 100ms for files with 10,000 rows

Build docs developers (and LLMs) love