Architecture - Extrator de Tarefas Auvo

Application Overview

Extrator de Tarefas Auvo is a Flask-based web application that processes task reports from the Auvo system, filters them by keywords, and generates downloadable reports.

Technology Stack

Backend Framework

Flask 3.1.2 - Lightweight WSGI web application framework

Data Processing

Pandas 2.3.2 - DataFrame manipulation and CSV/Excel I/O

PDF Generation

WeasyPrint 66.0 - HTML to PDF conversion engine

Excel Processing

openpyxl 3.1.5 - Excel file reading and writing

Reference: requirements.txt:8,14-15,27

Application Structure

extrator-tarefas-auvo/
├── app.py                 # Main application file
├── requirements.txt       # Python dependencies
├── .env                   # Environment variables (not in repo)
├── temp/                  # Temporary file storage (auto-created)
├── templates/             # Jinja2 HTML templates
│   ├── index.html        # Home/upload page
│   ├── config.html       # Keyword configuration
│   ├── historico.html    # Processing history
│   └── resultado.html    # Results display
└── static/               # CSS, JS, images (if any)

Application Initialization

from flask import Flask
import os
from dotenv import load_dotenv

# Load environment variables from .env
load_dotenv()

# Initialize Flask application
app = Flask(__name__)
app.secret_key = os.getenv('SECRET_KEY')

# Configure temporary file storage
TEMP_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
if not os.path.exists(TEMP_FOLDER):
    os.makedirs(TEMP_FOLDER)

Reference: app.py:1-21

Core Helper Functions

criar_links()

Converts URLs to clickable HTML links for display.

def criar_links(url):
    """Transforma URLs em links HTML clicáveis"""
    if isinstance(url, str) and url.startswith('http'):
        return f'<a href="{url}" target="_blank">{url}</a>'
    return url

Reference: app.py:26-30

url

string

Value from OS Digital column

return

string

HTML anchor tag if URL detected, otherwise original value

processar_arquivo()

Core file processing function that reads and filters data.

def processar_arquivo(file, palavras_chave):
    """Processa o arquivo CSV ou Excel e retorna os dados filtrados"""
    filename = file.filename.lower()
    
    # Read file based on extension
    if filename.endswith('.csv'):
        df = pd.read_csv(file, skiprows=5)
    elif filename.endswith(('.xls', '.xlsx')):
        df = pd.read_excel(file, skiprows=5, engine='openpyxl')
    else:
        raise ValueError("Formato de arquivo não suportado. Use .csv, .xls ou .xlsx")

    # Build regex pattern from keywords
    regex_busca = '|'.join(palavras_chave)
    
    # Filter rows containing keywords in Relato column
    coluna_descricao = 'Relato'
    necessidades = df[df[coluna_descricao].astype(str).str.contains(
        regex_busca, case=False, na=False
    )].copy()
    
    # Return original and filtered DataFrames
    colunas_resultado = ['Data', 'Cliente', 'Endereco', 'OS Digital', 'Relato']
    return df, necessidades[colunas_resultado]

Reference: app.py:32-53

file

FileStorage

Uploaded file object from Flask request

palavras_chave

list[string]

Keywords to search for in Relato column

DataFrame

Complete original DataFrame with all rows

necessidades

DataFrame

Filtered DataFrame containing only matching rows with selected columns

gerar_estatisticas()

Generates statistics about filtering results.

def gerar_estatisticas(df_original, df_filtrado, palavras_chave):
    """Gera estatísticas simples dos dados"""
    total = len(df_original)
    filtrados = len(df_filtrado)
    
    stats = {
        'total': total,
        'filtrados': filtrados,
        'percentual': round((filtrados/total)*100, 1) if total > 0 else 0,
        'por_palavra': {}
    }
    
    # Count occurrences per keyword
    for palavra in palavras_chave:
        if not df_filtrado.empty:
            count = int(df_filtrado['Relato'].str.contains(
                palavra, case=False, na=False
            ).sum())
            if count > 0:
                stats['por_palavra'][palavra] = count
    
    return stats

Reference: app.py:55-73

stats

object

Statistics object with total, filtered count, percentage, and per-keyword breakdown

salvar_historico()

Stores processing history in session.

def salvar_historico(filename, stats):
    """Salva o histórico de processamentos na sessão"""
    hist = session.get('historico', [])
    hist.insert(0, {
        'arquivo': filename,
        'data': datetime.now().strftime('%d/%m/%Y %H:%M'),
        'encontrados': stats['filtrados'],
        'total': stats['total']
    })
    session['historico'] = hist[:10]  # Keep only last 10 entries

Reference: app.py:75-84

get_dataframe_from_temp_file()

Retrieves processed results from temporary storage.

def get_dataframe_from_temp_file():
    """Pega o nome do arquivo da sessão e o lê como um DataFrame."""
    temp_filename = session.get('temp_filename')
    if not temp_filename:
        return None
    
    file_path = os.path.join(TEMP_FOLDER, temp_filename)
    
    if os.path.exists(file_path):
        return pd.read_csv(file_path)
    return None

Reference: app.py:87-98

Returns None if temp file is missing or session has expired. Routes must handle this gracefully.

Application Routes

GET / (index)

Home page with file upload form.

@app.route('/')
def index():
    error = request.args.get('error')
    return render_template('index.html', error=error)

Reference: app.py:103-106

template

string

error

string

Error message from query parameter (displayed to user)

GET/POST /config

Keyword configuration interface.

@app.route('/config', methods=['GET', 'POST'])
def config():
    if request.method == 'POST':
        keywords = request.form.get('keywords', '').split(',')
        session['custom_keywords'] = [k.strip() for k in keywords if k.strip()]
        return redirect(url_for('index'))
    
    current_keywords = session.get('custom_keywords', DEFAULT_KEYWORDS)
    return render_template('config.html', keywords=', '.join(current_keywords))

Reference: app.py:108-116 POST Parameters:

keywords

string

Comma-separated list of keywords

GET Response:

template

string

keywords

string

Current keywords joined by commas

GET /historico

Displays processing history.

@app.route('/historico')
def historico():
    hist = session.get('historico', [])
    return render_template('historico.html', historico=hist)

Reference: app.py:118-121

template

string

historico

list[object]

Last 10 processing operations

POST /upload

Handles file upload and processing.

@app.route('/upload', methods=['POST'])
def upload_file():
    # 1. Validate file presence
    if 'arquivo_excel' not in request.files:
        return redirect(url_for('index', error='Nenhum arquivo selecionado'))
    
    file = request.files['arquivo_excel']
    if file.filename == '':
        return redirect(url_for('index', error='Nenhum arquivo selecionado'))

    # 2. Validate file extension
    if not file.filename.lower().endswith(('.csv', '.xls', '.xlsx')):
        return redirect(url_for('index', 
            error='Por favor, selecione um arquivo .csv, .xls ou .xlsx'))

    try:
        # 3. Get keywords from session
        palavras_chave = session.get('custom_keywords', DEFAULT_KEYWORDS)
        
        # 4. Process file
        df_original, resultado_final = processar_arquivo(file, palavras_chave)
        stats = gerar_estatisticas(df_original, resultado_final, palavras_chave)
        
        # 5. Save to temporary file
        temp_filename = f"{uuid.uuid4().hex}.csv"
        file_path = os.path.join(TEMP_FOLDER, temp_filename)
        resultado_final.to_csv(file_path, index=False)
        
        # 6. Store filename in session
        session['temp_filename'] = temp_filename
        session['last_stats'] = stats
        salvar_historico(file.filename, stats)
        
        # 7. Prepare HTML table for display
        resultado_para_exibicao = resultado_final.copy()
        if not resultado_final.empty:
            resultado_para_exibicao['OS Digital'] = \
                resultado_para_exibicao['OS Digital'].apply(criar_links)
        
        tabela_html = resultado_para_exibicao.to_html(
            classes="table table-striped table-hover",
            table_id="tabela-resultados",
            index=False,
            justify="left",
            border=0,
            escape=False
        )
        
        # 8. Render results
        return render_template('resultado.html', 
                             table=tabela_html, 
                             has_results=not resultado_final.empty,
                             stats=stats,
                             palavras_utilizadas=palavras_chave)
        
    except Exception as e:
        print(f"Ocorreu um erro: {e}")
        return redirect(url_for('index', error=f'Erro ao processar o arquivo: {str(e)}'))

Reference: app.py:123-176 POST Parameters:

arquivo_excel

file

required

File upload field (accepts CSV, XLS, XLSX)

Success Response:

template

string

table

string

HTML table string with Bootstrap classes

has_results

boolean

True if filtered results exist

stats

object

Statistics dictionary

palavras_utilizadas

list[string]

Keywords used for filtering

GET /download/excel

Exports filtered results as Excel file.

@app.route('/download/excel')
def download_excel():
    # 1. Retrieve data from temp file
    df = get_dataframe_from_temp_file()
    if df is None:
        return redirect(url_for('index', 
            error="Os resultados expiraram. Por favor, processe o arquivo novamente."))
    
    stats = session.get('last_stats', {})
    
    # 2. Create Excel file in memory
    output = io.BytesIO()
    with pd.ExcelWriter(output, engine='openpyxl') as writer:
        # Sheet 1: Filtered tasks
        df.to_excel(writer, index=False, sheet_name='Tarefas Encontradas')
        
        # Sheet 2: Statistics
        stats_df = pd.DataFrame([
            ['Total de Registros', stats.get('total', 'N/A')],
            ['Tarefas Encontradas', stats.get('filtrados', 'N/A')],
            ['Taxa de Ocorrência (%)', stats.get('percentual', 'N/A')],
            ['Data de Geração', datetime.now().strftime('%d/%m/%Y %H:%M')]
        ], columns=['Métrica', 'Valor'])
        stats_df.to_excel(writer, index=False, sheet_name='Estatísticas')
    
    output.seek(0)
    
    # 3. Return as downloadable file
    return Response(
        output,
        mimetype="application/vnd.openxmlformats-officedocument.spreadsheet.sheet",
        headers={"Content-Disposition": "attachment;filename=relatorio_filtrado.xlsx"}
    )

Reference: app.py:178-204

mimetype

string

application/vnd.openxmlformats-officedocument.spreadsheet.sheet

filename

string

GET /download/pdf

Exports filtered results as PDF report.

@app.route('/download/pdf')
def download_pdf():
    # 1. Retrieve data from temp file
    df = get_dataframe_from_temp_file()
    if df is None:
        return redirect(url_for('index', 
            error="Os resultados expiraram. Por favor, processe o arquivo novamente."))
    
    stats = session.get('last_stats', {})
    palavras_utilizadas = session.get('custom_keywords', [])
    
    # 2. Prepare data with clickable links
    df_render = df.copy()
    if not df.empty:
        df_render['OS Digital'] = df_render['OS Digital'].apply(criar_links)

    # 3. Generate HTML table
    tabela_html = df_render.to_html(
        index=False, escape=False, classes="tabela-pdf"
    ) if not df.empty else "<p>Nenhuma tarefa encontrada.</p>"
    
    # 4. Build complete HTML document with styling
    full_html = f"""
    <html>
      <head>
        <meta charset="utf-8">
        <style>
          body {{ font-family: Arial, sans-serif; margin: 20px; color: #333; }}
          h1 {{ color: #0056b3; }}
          .header {{ padding: 20px; margin-bottom: 20px; 
                     border-radius: 5px; background-color: #f8f9fa; }}
          .stats {{ display: flex; justify-content: space-around; 
                    margin: 20px 0; background-color: #e9ecef; 
                    padding: 15px; border-radius: 5px; }}
          .stat-box {{ text-align: center; }}
          .stat-number {{ font-size: 24px; font-weight: bold; color: #007bff; }}
          .palavras-chave {{ background-color: #fff3cd; padding: 15px; 
                             margin: 20px 0; border-radius: 5px; 
                             border-left: 4px solid #ffc107; }}
          .tabela-pdf {{ border-collapse: collapse; width: 100%; font-size: 10px; }}
          .tabela-pdf th, .tabela-pdf td {{ text-align: left; padding: 6px; 
                                            border: 1px solid #ddd; }}
          .tabela-pdf th {{ background-color: #f2f2f2; }}
          a {{ color: #007bff; text-decoration: none; }}
        </style>
      </head>
      <body>
        <div class="header">
          <h1>Relatório de Tarefas com Ação Necessária</h1>
          <p><strong>Gerado em:</strong> 
             {datetime.now().strftime('%d/%m/%Y às %H:%M')}</p>
        </div>
        <div class="stats">
          <div class="stat-box">
            <div class="stat-number">{stats.get('total', 'N/A')}</div>
            <div>Total de Registros</div>
          </div>
          <div class="stat-box">
            <div class="stat-number">{stats.get('filtrados', 'N/A')}</div>
            <div>Tarefas Encontradas</div>
          </div>
          <div class="stat-box">
            <div class="stat-number">{stats.get('percentual', 'N/A')}%</div>
            <div>Taxa de Ocorrência</div>
          </div>
        </div>
        <div class="palavras-chave">
          <h3>Palavras-chave utilizadas:</h3>
          <p>{', '.join(palavras_utilizadas)}</p>
        </div>
        {tabela_html}
      </body>
    </html>
    """
    
    # 5. Convert HTML to PDF using WeasyPrint
    pdf = HTML(string=full_html).write_pdf()

    # 6. Return as downloadable file
    return Response(
        pdf,
        mimetype="application/pdf",
        headers={"Content-Disposition": "attachment;filename=relatorio_filtrado.pdf"}
    )

Reference: app.py:206-263

mimetype

string

filename

string

File Processing Flow

┌─────────────────────────────────────────────────────────────┐
│ 1. User uploads CSV/Excel file via web form                │
│    POST /upload with multipart/form-data                   │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. File validation                                          │
│    - Check file exists                                      │
│    - Validate extension (.csv, .xls, .xlsx)                │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. processar_arquivo(file, keywords)                        │
│    - Read file with pandas (skiprows=5)                    │
│    - Build regex pattern from keywords                     │
│    - Filter rows where Relato matches pattern              │
│    - Extract specific columns                              │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. gerar_estatisticas(original_df, filtered_df, keywords)  │
│    - Count total and filtered rows                         │
│    - Calculate percentage                                  │
│    - Count matches per keyword                             │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│ 5. Save results to temporary file                          │
│    - Generate UUID filename                                │
│    - Save filtered DataFrame as CSV                        │
│    - Store filename in session                             │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│ 6. Display results                                          │
│    - Convert OS Digital URLs to links                      │
│    - Generate HTML table                                   │
│    - Render resultado.html template                        │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│ 7. Optional: Export results                                 │
│    - GET /download/excel → XLSX with 2 sheets              │
│    - GET /download/pdf → Styled PDF report                 │
│    Both read from temp file using session filename         │
└─────────────────────────────────────────────────────────────┘

Temporary File Storage Architecture

Storage Strategy

Instead of storing DataFrames in session (which has size limits), the application uses a file-based approach:

Generate unique filename using UUID
Save filtered results as CSV to temp/ directory
Store only filename in session (lightweight)
Read from file when needed for exports

Benefits

Session Size

Session contains only a small filename string instead of large data

Memory Efficiency

Data is on disk, not in memory, reducing RAM usage

Consistency

Same data used for Excel and PDF exports

Security

UUID filenames prevent enumeration attacks

Limitations

No Automatic CleanupTemporary files persist indefinitely. Implement a cleanup strategy:

Cron job to delete files older than X hours
Application startup cleanup routine
Periodic manual maintenance

Session DependencyIf session expires or is cleared, the reference to temp files is lost, even though files still exist on disk.

Error Handling

File Processing Errors

try:
    df_original, resultado_final = processar_arquivo(file, palavras_chave)
    # ... processing ...
except Exception as e:
    print(f"Ocorreu um erro: {e}")  # Server-side logging
    return redirect(url_for('index', 
        error=f'Erro ao processar o arquivo: {str(e)}'))

Reference: app.py:173-176

Missing Temp File Errors

df = get_dataframe_from_temp_file()
if df is None:
    return redirect(url_for('index', 
        error="Os resultados expiraram. Por favor, processe o arquivo novamente.")

Reference: app.py:182-183, 209-211

Application Entry Point

if __name__ == '__main__':
    app.run(debug=True)

Reference: app.py:265-266

Debug Mode in Productiondebug=True should NEVER be used in production. Set debug=False or omit it entirely for production deployment.

Production Deployment Considerations

WSGI Server

Use Gunicorn or uWSGI instead of Flask’s built-in server

gunicorn -w 4 -b 0.0.0.0:8000 app:app

Environment Variables

Set SECRET_KEY via environment, not .env file
Use strong, random key in production
Never commit secrets to version control

File Cleanup

Implement scheduled cleanup of temp/ directory

# Crontab example: delete files older than 6 hours
0 * * * * find /path/to/temp -type f -mmin +360 -delete

Session Storage

Consider Redis or database-backed sessions for:

Multiple application instances
Load balancing scenarios
Persistent sessions across restarts

Security Headers

Add security middleware for production:

HTTPS enforcement
CSRF protection
Content Security Policy
Rate limiting on /upload

Getting Started

User Guide

Features

Technical Reference

Support

​Application Overview

​Technology Stack

Backend Framework

Data Processing

PDF Generation

Excel Processing

​Application Structure

​Application Initialization

​Core Helper Functions

​criar_links()

​processar_arquivo()

​gerar_estatisticas()

​salvar_historico()

​get_dataframe_from_temp_file()

​Application Routes

​GET / (index)

​GET/POST /config

​GET /historico

​POST /upload

​GET /download/excel

​GET /download/pdf

​File Processing Flow

​Temporary File Storage Architecture

​Storage Strategy

​Benefits

Session Size

Memory Efficiency

Consistency

Security

​Limitations

​Error Handling

​File Processing Errors

​Missing Temp File Errors

​Application Entry Point

​Production Deployment Considerations

WSGI Server

Environment Variables

File Cleanup

Session Storage

Security Headers

Build docs developers (and LLMs) love

Application Overview

Technology Stack

Application Structure

Application Initialization

Core Helper Functions

criar_links()

processar_arquivo()

gerar_estatisticas()

salvar_historico()

get_dataframe_from_temp_file()

Application Routes

GET / (index)

GET/POST /config

GET /historico

POST /upload

GET /download/excel

GET /download/pdf

File Processing Flow

Temporary File Storage Architecture

Storage Strategy

Benefits

Limitations

Error Handling

File Processing Errors

Missing Temp File Errors

Application Entry Point

Production Deployment Considerations