VIGIA integrates with the US Food and Drug Administration (FDA) to retrieve medical device recalls, early alerts, and safety communications from the FDA’s public portal. The system automatically translates content to Spanish and normalizes product information.
class FDAItem(dict): titulo: str # Translated title medicamento: str # Generic type + brand (multiline) evento: str # Event description / reason url: str # Source URL fecha_publicada: Optional[datetime] # Publication date (NY timezone)
Example:
{ "titulo": "Alerta temprana para el sistema de acceso vascular WATCHMAN", "medicamento": "Sistema de acceso\nWATCHMAN", "evento": "El dispositivo puede desprenderse durante el procedimiento, causando complicaciones vasculares graves.", "url": "https://www.fda.gov/medical-devices/medical-device-recalls/...", "fecha_publicada": datetime(2024, 8, 5, 12, 0, tzinfo=ZoneInfo('America/New_York'))}
def scrape_fda(url: str) -> List[FDAItem]: """ Entry point for FDA scraping. - If URL is index page: collects and parses up to 60 detail pages - If URL is detail page: parses single item """ path = re.sub(r"^https?://[^/]+", "", (url or "").strip()) items: List[FDAItem] = [] if INDEX_PATH_RE.match(path): detail_links = _collect_detail_links(url, limit=INDEX_LIMIT) for href in detail_links: it = _parse_detail(href) if it: items.append(it) return items if DETAIL_PATH_RE.match(path): it = _parse_detail(url) return [it] if it else [] # Fallback: try as detail it = _parse_detail(url) return [it] if it else []
_BRAND_ALLCAPS_RE = re.compile(r"\b([A-Z][A-Z0-9\-]{3,})\b")_BRAND_CAMEL_RE = re.compile(r"\b([A-Z][a-zA-Z0-9\-]{3,})\b")_BRAND_STOP = { "ACCESS", "SYSTEM", "SET", "INFUSION", "STENT", "CATHETER", "DEVICE", "PUMP", "VALVE", "SENSOR", "MONITOR", "CIRCUIT"}def _guess_brand_better(*texts: str) -> str: joined = " || ".join([_norm(x or "") for x in texts if x]) # Try all-caps patterns first (e.g., WATCHMAN, DEXCOM) for m in _BRAND_ALLCAPS_RE.finditer(joined): c = m.group(1) if c.upper() not in _BRAND_STOP: return c # Then try CamelCase for m in _BRAND_CAMEL_RE.finditer(joined): c = m.group(1) if c.upper() not in _BRAND_STOP: return c return ""
When GEMINI_API_KEY is configured, the system uses Gemini 1.5 Flash to refine extracted data:
def _llm_refine_fields(raw: Dict[str, Any]) -> Optional[Dict[str, str]]: """ Uses Gemini to normalize fields to Spanish and identify generic products. Returns: { "titulo_es": "Translated and cleaned title", "evento_es": "1-3 sentence event summary", "producto_generico_es": "Generic device type (brief)", "marca_o_linea": "Brand/series if clear", "modelo_o_variante": "Model/lot if adds value" } """ model = genai.GenerativeModel("gemini-1.5-flash") prompt = f"""You are a regulatory analyst preparing health reports in SPANISH.You will receive information extracted from an FDA page (recalls/early alerts) and must returnSTRICT JSON with this EXACT form (no extra text):{{ "titulo_es": "...", "evento_es": "...", "producto_generico_es": "...", "marca_o_linea": "...", "modelo_o_variante": "..."}}Rules:- "titulo_es": translate and adjust title to Spanish, clear and concise- "evento_es": summarize in 1-3 sentences the FIRST PARAGRAPH explaining cause- "producto_generico_es": return GENERIC device name in Spanish, brief (e.g.: "Sistema de acceso", "Conjunto de infusión", "Stent vascular", "Catéter", "Monitor de glucosa (CGM)", "Ingrediente farmacéutico activo (IFA)")- "marca_o_linea": if there's a clear brand/series, indicate it; otherwise ""- "modelo_o_variante": lot/model/variant if adds value; otherwise ""- Do not invent data. If something is unclear, put "" in that field.- Respond ONLY the JSON, no comments.Data:- title_en: {json.dumps(raw.get("title_en"))}- reason_paragraph_en: {json.dumps(raw.get("reason_en"))}- product_candidates: {json.dumps(raw.get("product_candidates"), ensure_ascii=False)}- page_excerpt_en: {json.dumps(raw.get("page_excerpt"))}""".strip() resp = model.generate_content(prompt) txt = (resp.text or "").strip() # Extract JSON from response start = txt.find("{") end = txt.rfind("}") if start >= 0 and end > start: txt = txt[start:end+1] data = json.loads(txt) return data
q (required): Search term (name/IFA) in Spanish or English
max_results (optional): Maximum results (1-25, default: 10)
Response:
[ { "titulo": "Alerta temprana para el sistema de acceso vascular WATCHMAN", "medicamento": "Sistema de acceso\nWATCHMAN", "evento": "El dispositivo puede desprenderse durante el procedimiento, causando complicaciones vasculares graves.", "url": "https://www.fda.gov/medical-devices/medical-device-recalls/...", "fecha_publicada": "2024-08-05T16:00:00Z" }]