Skip to main content

Application Overview

Extrator de Tarefas Auvo is a Flask-based web application that processes task reports from the Auvo system, filters them by keywords, and generates downloadable reports.

Technology Stack

Backend Framework

Flask 3.1.2 - Lightweight WSGI web application framework

Data Processing

Pandas 2.3.2 - DataFrame manipulation and CSV/Excel I/O

PDF Generation

WeasyPrint 66.0 - HTML to PDF conversion engine

Excel Processing

openpyxl 3.1.5 - Excel file reading and writing
Reference: requirements.txt:8,14-15,27

Application Structure

extrator-tarefas-auvo/
├── app.py                 # Main application file
├── requirements.txt       # Python dependencies
├── .env                   # Environment variables (not in repo)
├── temp/                  # Temporary file storage (auto-created)
├── templates/             # Jinja2 HTML templates
│   ├── index.html        # Home/upload page
│   ├── config.html       # Keyword configuration
│   ├── historico.html    # Processing history
│   └── resultado.html    # Results display
└── static/               # CSS, JS, images (if any)

Application Initialization

from flask import Flask
import os
from dotenv import load_dotenv

# Load environment variables from .env
load_dotenv()

# Initialize Flask application
app = Flask(__name__)
app.secret_key = os.getenv('SECRET_KEY')

# Configure temporary file storage
TEMP_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
if not os.path.exists(TEMP_FOLDER):
    os.makedirs(TEMP_FOLDER)
Reference: app.py:1-21

Core Helper Functions

Converts URLs to clickable HTML links for display.
def criar_links(url):
    """Transforma URLs em links HTML clicáveis"""
    if isinstance(url, str) and url.startswith('http'):
        return f'<a href="{url}" target="_blank">{url}</a>'
    return url
Reference: app.py:26-30
url
string
Value from OS Digital column
return
string
HTML anchor tag if URL detected, otherwise original value

processar_arquivo()

Core file processing function that reads and filters data.
def processar_arquivo(file, palavras_chave):
    """Processa o arquivo CSV ou Excel e retorna os dados filtrados"""
    filename = file.filename.lower()
    
    # Read file based on extension
    if filename.endswith('.csv'):
        df = pd.read_csv(file, skiprows=5)
    elif filename.endswith(('.xls', '.xlsx')):
        df = pd.read_excel(file, skiprows=5, engine='openpyxl')
    else:
        raise ValueError("Formato de arquivo não suportado. Use .csv, .xls ou .xlsx")

    # Build regex pattern from keywords
    regex_busca = '|'.join(palavras_chave)
    
    # Filter rows containing keywords in Relato column
    coluna_descricao = 'Relato'
    necessidades = df[df[coluna_descricao].astype(str).str.contains(
        regex_busca, case=False, na=False
    )].copy()
    
    # Return original and filtered DataFrames
    colunas_resultado = ['Data', 'Cliente', 'Endereco', 'OS Digital', 'Relato']
    return df, necessidades[colunas_resultado]
Reference: app.py:32-53
file
FileStorage
Uploaded file object from Flask request
palavras_chave
list[string]
Keywords to search for in Relato column
df
DataFrame
Complete original DataFrame with all rows
necessidades
DataFrame
Filtered DataFrame containing only matching rows with selected columns

gerar_estatisticas()

Generates statistics about filtering results.
def gerar_estatisticas(df_original, df_filtrado, palavras_chave):
    """Gera estatísticas simples dos dados"""
    total = len(df_original)
    filtrados = len(df_filtrado)
    
    stats = {
        'total': total,
        'filtrados': filtrados,
        'percentual': round((filtrados/total)*100, 1) if total > 0 else 0,
        'por_palavra': {}
    }
    
    # Count occurrences per keyword
    for palavra in palavras_chave:
        if not df_filtrado.empty:
            count = int(df_filtrado['Relato'].str.contains(
                palavra, case=False, na=False
            ).sum())
            if count > 0:
                stats['por_palavra'][palavra] = count
    
    return stats
Reference: app.py:55-73
stats
object
Statistics object with total, filtered count, percentage, and per-keyword breakdown

salvar_historico()

Stores processing history in session.
def salvar_historico(filename, stats):
    """Salva o histórico de processamentos na sessão"""
    hist = session.get('historico', [])
    hist.insert(0, {
        'arquivo': filename,
        'data': datetime.now().strftime('%d/%m/%Y %H:%M'),
        'encontrados': stats['filtrados'],
        'total': stats['total']
    })
    session['historico'] = hist[:10]  # Keep only last 10 entries
Reference: app.py:75-84

get_dataframe_from_temp_file()

Retrieves processed results from temporary storage.
def get_dataframe_from_temp_file():
    """Pega o nome do arquivo da sessão e o lê como um DataFrame."""
    temp_filename = session.get('temp_filename')
    if not temp_filename:
        return None
    
    file_path = os.path.join(TEMP_FOLDER, temp_filename)
    
    if os.path.exists(file_path):
        return pd.read_csv(file_path)
    return None
Reference: app.py:87-98
Returns None if temp file is missing or session has expired. Routes must handle this gracefully.

Application Routes

GET / (index)

Home page with file upload form.
@app.route('/')
def index():
    error = request.args.get('error')
    return render_template('index.html', error=error)
Reference: app.py:103-106
template
string
error
string
Error message from query parameter (displayed to user)

GET/POST /config

Keyword configuration interface.
@app.route('/config', methods=['GET', 'POST'])
def config():
    if request.method == 'POST':
        keywords = request.form.get('keywords', '').split(',')
        session['custom_keywords'] = [k.strip() for k in keywords if k.strip()]
        return redirect(url_for('index'))
    
    current_keywords = session.get('custom_keywords', DEFAULT_KEYWORDS)
    return render_template('config.html', keywords=', '.join(current_keywords))
Reference: app.py:108-116 POST Parameters:
keywords
string
Comma-separated list of keywords
GET Response:
template
string
keywords
string
Current keywords joined by commas

GET /historico

Displays processing history.
@app.route('/historico')
def historico():
    hist = session.get('historico', [])
    return render_template('historico.html', historico=hist)
Reference: app.py:118-121
template
string
historico
list[object]
Last 10 processing operations

POST /upload

Handles file upload and processing.
@app.route('/upload', methods=['POST'])
def upload_file():
    # 1. Validate file presence
    if 'arquivo_excel' not in request.files:
        return redirect(url_for('index', error='Nenhum arquivo selecionado'))
    
    file = request.files['arquivo_excel']
    if file.filename == '':
        return redirect(url_for('index', error='Nenhum arquivo selecionado'))

    # 2. Validate file extension
    if not file.filename.lower().endswith(('.csv', '.xls', '.xlsx')):
        return redirect(url_for('index', 
            error='Por favor, selecione um arquivo .csv, .xls ou .xlsx'))

    try:
        # 3. Get keywords from session
        palavras_chave = session.get('custom_keywords', DEFAULT_KEYWORDS)
        
        # 4. Process file
        df_original, resultado_final = processar_arquivo(file, palavras_chave)
        stats = gerar_estatisticas(df_original, resultado_final, palavras_chave)
        
        # 5. Save to temporary file
        temp_filename = f"{uuid.uuid4().hex}.csv"
        file_path = os.path.join(TEMP_FOLDER, temp_filename)
        resultado_final.to_csv(file_path, index=False)
        
        # 6. Store filename in session
        session['temp_filename'] = temp_filename
        session['last_stats'] = stats
        salvar_historico(file.filename, stats)
        
        # 7. Prepare HTML table for display
        resultado_para_exibicao = resultado_final.copy()
        if not resultado_final.empty:
            resultado_para_exibicao['OS Digital'] = \
                resultado_para_exibicao['OS Digital'].apply(criar_links)
        
        tabela_html = resultado_para_exibicao.to_html(
            classes="table table-striped table-hover",
            table_id="tabela-resultados",
            index=False,
            justify="left",
            border=0,
            escape=False
        )
        
        # 8. Render results
        return render_template('resultado.html', 
                             table=tabela_html, 
                             has_results=not resultado_final.empty,
                             stats=stats,
                             palavras_utilizadas=palavras_chave)
        
    except Exception as e:
        print(f"Ocorreu um erro: {e}")
        return redirect(url_for('index', error=f'Erro ao processar o arquivo: {str(e)}'))
Reference: app.py:123-176 POST Parameters:
arquivo_excel
file
required
File upload field (accepts CSV, XLS, XLSX)
Success Response:
template
string
table
string
HTML table string with Bootstrap classes
has_results
boolean
True if filtered results exist
stats
object
Statistics dictionary
palavras_utilizadas
list[string]
Keywords used for filtering

GET /download/excel

Exports filtered results as Excel file.
@app.route('/download/excel')
def download_excel():
    # 1. Retrieve data from temp file
    df = get_dataframe_from_temp_file()
    if df is None:
        return redirect(url_for('index', 
            error="Os resultados expiraram. Por favor, processe o arquivo novamente."))
    
    stats = session.get('last_stats', {})
    
    # 2. Create Excel file in memory
    output = io.BytesIO()
    with pd.ExcelWriter(output, engine='openpyxl') as writer:
        # Sheet 1: Filtered tasks
        df.to_excel(writer, index=False, sheet_name='Tarefas Encontradas')
        
        # Sheet 2: Statistics
        stats_df = pd.DataFrame([
            ['Total de Registros', stats.get('total', 'N/A')],
            ['Tarefas Encontradas', stats.get('filtrados', 'N/A')],
            ['Taxa de Ocorrência (%)', stats.get('percentual', 'N/A')],
            ['Data de Geração', datetime.now().strftime('%d/%m/%Y %H:%M')]
        ], columns=['Métrica', 'Valor'])
        stats_df.to_excel(writer, index=False, sheet_name='Estatísticas')
    
    output.seek(0)
    
    # 3. Return as downloadable file
    return Response(
        output,
        mimetype="application/vnd.openxmlformats-officedocument.spreadsheet.sheet",
        headers={"Content-Disposition": "attachment;filename=relatorio_filtrado.xlsx"}
    )
Reference: app.py:178-204
mimetype
string
application/vnd.openxmlformats-officedocument.spreadsheet.sheet
filename
string

GET /download/pdf

Exports filtered results as PDF report.
@app.route('/download/pdf')
def download_pdf():
    # 1. Retrieve data from temp file
    df = get_dataframe_from_temp_file()
    if df is None:
        return redirect(url_for('index', 
            error="Os resultados expiraram. Por favor, processe o arquivo novamente."))
    
    stats = session.get('last_stats', {})
    palavras_utilizadas = session.get('custom_keywords', [])
    
    # 2. Prepare data with clickable links
    df_render = df.copy()
    if not df.empty:
        df_render['OS Digital'] = df_render['OS Digital'].apply(criar_links)

    # 3. Generate HTML table
    tabela_html = df_render.to_html(
        index=False, escape=False, classes="tabela-pdf"
    ) if not df.empty else "<p>Nenhuma tarefa encontrada.</p>"
    
    # 4. Build complete HTML document with styling
    full_html = f"""
    <html>
      <head>
        <meta charset="utf-8">
        <style>
          body {{ font-family: Arial, sans-serif; margin: 20px; color: #333; }}
          h1 {{ color: #0056b3; }}
          .header {{ padding: 20px; margin-bottom: 20px; 
                     border-radius: 5px; background-color: #f8f9fa; }}
          .stats {{ display: flex; justify-content: space-around; 
                    margin: 20px 0; background-color: #e9ecef; 
                    padding: 15px; border-radius: 5px; }}
          .stat-box {{ text-align: center; }}
          .stat-number {{ font-size: 24px; font-weight: bold; color: #007bff; }}
          .palavras-chave {{ background-color: #fff3cd; padding: 15px; 
                             margin: 20px 0; border-radius: 5px; 
                             border-left: 4px solid #ffc107; }}
          .tabela-pdf {{ border-collapse: collapse; width: 100%; font-size: 10px; }}
          .tabela-pdf th, .tabela-pdf td {{ text-align: left; padding: 6px; 
                                            border: 1px solid #ddd; }}
          .tabela-pdf th {{ background-color: #f2f2f2; }}
          a {{ color: #007bff; text-decoration: none; }}
        </style>
      </head>
      <body>
        <div class="header">
          <h1>Relatório de Tarefas com Ação Necessária</h1>
          <p><strong>Gerado em:</strong> 
             {datetime.now().strftime('%d/%m/%Y às %H:%M')}</p>
        </div>
        <div class="stats">
          <div class="stat-box">
            <div class="stat-number">{stats.get('total', 'N/A')}</div>
            <div>Total de Registros</div>
          </div>
          <div class="stat-box">
            <div class="stat-number">{stats.get('filtrados', 'N/A')}</div>
            <div>Tarefas Encontradas</div>
          </div>
          <div class="stat-box">
            <div class="stat-number">{stats.get('percentual', 'N/A')}%</div>
            <div>Taxa de Ocorrência</div>
          </div>
        </div>
        <div class="palavras-chave">
          <h3>Palavras-chave utilizadas:</h3>
          <p>{', '.join(palavras_utilizadas)}</p>
        </div>
        {tabela_html}
      </body>
    </html>
    """
    
    # 5. Convert HTML to PDF using WeasyPrint
    pdf = HTML(string=full_html).write_pdf()

    # 6. Return as downloadable file
    return Response(
        pdf,
        mimetype="application/pdf",
        headers={"Content-Disposition": "attachment;filename=relatorio_filtrado.pdf"}
    )
Reference: app.py:206-263
mimetype
string
filename
string

File Processing Flow

┌─────────────────────────────────────────────────────────────┐
│ 1. User uploads CSV/Excel file via web form                │
│    POST /upload with multipart/form-data                   │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 2. File validation                                          │
│    - Check file exists                                      │
│    - Validate extension (.csv, .xls, .xlsx)                │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 3. processar_arquivo(file, keywords)                        │
│    - Read file with pandas (skiprows=5)                    │
│    - Build regex pattern from keywords                     │
│    - Filter rows where Relato matches pattern              │
│    - Extract specific columns                              │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 4. gerar_estatisticas(original_df, filtered_df, keywords)  │
│    - Count total and filtered rows                         │
│    - Calculate percentage                                  │
│    - Count matches per keyword                             │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 5. Save results to temporary file                          │
│    - Generate UUID filename                                │
│    - Save filtered DataFrame as CSV                        │
│    - Store filename in session                             │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 6. Display results                                          │
│    - Convert OS Digital URLs to links                      │
│    - Generate HTML table                                   │
│    - Render resultado.html template                        │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 7. Optional: Export results                                 │
│    - GET /download/excel → XLSX with 2 sheets              │
│    - GET /download/pdf → Styled PDF report                 │
│    Both read from temp file using session filename         │
└─────────────────────────────────────────────────────────────┘

Temporary File Storage Architecture

Storage Strategy

Instead of storing DataFrames in session (which has size limits), the application uses a file-based approach:
  1. Generate unique filename using UUID
  2. Save filtered results as CSV to temp/ directory
  3. Store only filename in session (lightweight)
  4. Read from file when needed for exports

Benefits

Session Size

Session contains only a small filename string instead of large data

Memory Efficiency

Data is on disk, not in memory, reducing RAM usage

Consistency

Same data used for Excel and PDF exports

Security

UUID filenames prevent enumeration attacks

Limitations

No Automatic CleanupTemporary files persist indefinitely. Implement a cleanup strategy:
  • Cron job to delete files older than X hours
  • Application startup cleanup routine
  • Periodic manual maintenance
Session DependencyIf session expires or is cleared, the reference to temp files is lost, even though files still exist on disk.

Error Handling

File Processing Errors

try:
    df_original, resultado_final = processar_arquivo(file, palavras_chave)
    # ... processing ...
except Exception as e:
    print(f"Ocorreu um erro: {e}")  # Server-side logging
    return redirect(url_for('index', 
        error=f'Erro ao processar o arquivo: {str(e)}'))
Reference: app.py:173-176

Missing Temp File Errors

df = get_dataframe_from_temp_file()
if df is None:
    return redirect(url_for('index', 
        error="Os resultados expiraram. Por favor, processe o arquivo novamente.")
Reference: app.py:182-183, 209-211

Application Entry Point

if __name__ == '__main__':
    app.run(debug=True)
Reference: app.py:265-266
Debug Mode in Productiondebug=True should NEVER be used in production. Set debug=False or omit it entirely for production deployment.

Production Deployment Considerations

WSGI Server

Use Gunicorn or uWSGI instead of Flask’s built-in server
gunicorn -w 4 -b 0.0.0.0:8000 app:app

Environment Variables

  • Set SECRET_KEY via environment, not .env file
  • Use strong, random key in production
  • Never commit secrets to version control

File Cleanup

Implement scheduled cleanup of temp/ directory
# Crontab example: delete files older than 6 hours
0 * * * * find /path/to/temp -type f -mmin +360 -delete

Session Storage

Consider Redis or database-backed sessions for:
  • Multiple application instances
  • Load balancing scenarios
  • Persistent sessions across restarts

Security Headers

Add security middleware for production:
  • HTTPS enforcement
  • CSRF protection
  • Content Security Policy
  • Rate limiting on /upload

Build docs developers (and LLMs) love