Cause: Backend cannot reach InfluxDB (wrong URL or network issue).Solution:
Verify INFLUX_URL matches your InfluxDB Cloud region:
# US EastINFLUX_URL=https://us-east-1-1.aws.cloud2.influxdata.com# US WestINFLUX_URL=https://us-west-2-1.aws.cloud2.influxdata.com# EU CentralINFLUX_URL=https://eu-central-1-1.aws.cloud2.influxdata.com
Test connectivity:
curl -I $INFLUX_URL/health
Check firewall/VPN settings blocking port 443
Error: Query returned 0 results despite data existing
Symptom:
Expected data for Motor-01, got 0 results
Cause: Flux query filter applied beforepivot() (see ENGINEERING_LOG Phase 2).Solution:WRONG:
Explanation: Tag-based filters must come afterpivot() when using pivoted column names.
Error: Data available but queries return empty
Symptom:
Integration tests fail intermittently with 0 results immediately after writes.Cause: InfluxDB 2.x has eventual consistency (see ENGINEERING_LOG Phase 2).Solution:Add a delay after writes before querying:
import time# Write datadb.write_sensor_event(...)# Wait for data to become queryabletime.sleep(5) # Minimum 5 seconds for InfluxDB Cloud# Now queryresults = db.query_sensor_data(...)
Best Practice: For production, use write confirmations via InfluxDB’s /write response.
Error: name 'np' is not defined (Type Annotations)
Symptom:
NameError: name 'np' is not defined
Cause: Type annotations evaluated at import time, but numpy is lazy-loaded (see ENGINEERING_LOG Phase 18).Solution:Add this to the top of ML modules:
from __future__ import annotations # MUST be first importimport numpy as np # Inside function, not at module leveldef score(self, X: np.ndarray): # Annotation is now a string import numpy as np # Lazy import # ...
Access to fetch at 'http://localhost:8000/health' from origin 'http://localhost:3001'has been blocked by CORS policy
Cause: Frontend running on alternate port (3001) not in CORS allowed origins (see ENGINEERING_LOG Phase 12).Solution:Add the port to backend/api/main.py:
app.add_middleware( CORSMiddleware, allow_origins=[ "http://localhost:3000", "http://localhost:3001", # Add this "http://localhost:5173", "http://127.0.0.1:3001", # And this # ... ], allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"], allow_headers=["*"],)
Restart the backend:
docker-compose restart backend
Error: Method PUT not allowed
Symptom:
405 Method Not Allowed: PUT requests blocked by CORS
Cause:PUT not in allow_methods (see ENGINEERING_LOG Phase 20).Solution:Update CORS config:
After 15 minutes of inactivity, first request fails or times out.Cause: Render free tier spins down containers after inactivity. Cold start takes 30-60 seconds (see ENGINEERING_LOG Phase 18).Solution:Option 1: Keep-Alive Heartbeat (Implemented)The frontend sends a ping every 10 minutes:
Option 2: Upgrade to Render Starter$7/month removes cold starts and spin-downs.Option 3: External Keep-Alive ServiceUse UptimeRobot (free) to ping /health every 5 minutes.
Error: Container killed during startup
Symptom:
Render logs show:
Starting service...Importing sklearn...[KILLED] Out of memory
Cause: Heavy ML imports (sklearn, numpy, pandas) at module level exceed 512MB RAM limit (see ENGINEERING_LOG Phase 18).Solution:Lazy-load ML dependencies:
# ❌ DON'T: Module-level importsimport numpy as npfrom sklearn.ensemble import IsolationForestclass BatchAnomalyDetector: def train(self, data): # Use np and IsolationForest# ✅ DO: Lazy imports inside functionsclass BatchAnomalyDetector: def train(self, data): import numpy as np from sklearn.ensemble import IsolationForest # Now use them
Also add:
from __future__ import annotations # First line
This defers type annotation evaluation, preventing import-time failures.
Health check timeout
Symptom:
Render dashboard shows “Health check failed” during startup.Cause:/health endpoint loads heavy ML modules, exceeding health check timeout.Solution:Use a lightweight/ping endpoint for health checks:
@app.get("/ping")def ping(): return {"status": "ok"} # No DB, no ML imports
Symptom:
Degradation Index (DI) increases during healthy monitoring.Cause:Self-Harming DI bug — healthy noise accumulates phantom damage (see ENGINEERING_LOG Phase 20).Solution:Implement dead-zone in assessor.py:
HEALTHY_FLOOR = 0.65 # Scores below this = zero damageif batch_score < HEALTHY_FLOOR: effective_severity = 0.0 # No damageelse: # Remap scores ≥ 0.65 to [0, 1] effective_severity = (batch_score - HEALTHY_FLOOR) / (1.0 - HEALTHY_FLOOR)# Only effective_severity > 0 accumulates DIDI_increment = (effective_severity ** 2) * SENSITIVITY_CONSTANT * dt
Jitter faults not detected
Symptom:
Motor with high vibration variance (σ=0.17g) but normal mean (0.15g) shows health=100%.Cause: Legacy v2 model only sees 1Hz averages, not variance (see ENGINEERING_LOG Phase 15).Solution:Ensure batch model (v3) is active:
Symptom:
Chart shows single data point suspended mid-axis, not anchored to X-axis.Cause:connectNulls=true connects single point to empty space (see ENGINEERING_LOG Phase 16).Solution:Only render lines when ≥2 points exist:
Symptom:
Y-axis auto-scales to data range, making 0.01g vibration change look like a spike.Cause: Auto-scaling Y-axis domain (see ENGINEERING_LOG Phase 16).Solution:Use fixed domains per signal type:
Symptom:
Time axis shows 0-60s and expands to 0-120s instead of sliding.Cause:domain={['dataMin', 'dataMax']} grows with data (see ENGINEERING_LOG Phase 16).Solution:Hard-code 60s right-anchored window:
<XAxis dataKey="timestamp" domain={[Date.now() - 60000, Date.now()]} // Last 60 seconds type="number" tickFormatter={(ts) => new Date(ts).toLocaleTimeString()}/>
Symptom:
Downloaded Excel report has blank Anomaly_Score column.Cause: Anomaly scores only computed at ingestion time, not at report generation (see ENGINEERING_LOG Phase 19).Solution:Compute range-check scores in generator.py during report creation:
for row in sensor_data: # Check if value exceeds baseline bounds v = row["voltage_v"] v_min, v_max = baseline["voltage_v"] if v < v_min or v > v_max: row["anomaly_score"] = min((abs(v - v_min) / v_min), 1.0) else: row["anomaly_score"] = 0.0
Operator logs show test gibberish (asyfkk)
Symptom:
PDF reports include operator log notes like “asyfkk” or “test123456”.Cause: No validation on operator log input (see ENGINEERING_LOG Phase 19).Solution:Sanitize logs in report generators:
import reVALID_LOG_PATTERN = re.compile(r"^[a-zA-Z0-9\s.,!?;:'\"\-]+$")for log in operator_logs: if not VALID_LOG_PATTERN.match(log["description"]): log["description"] = "Maintenance event recorded"
ReportLab error: 'Canvas' object has no attribute 'stroke'
Symptom:
AttributeError: 'Canvas' object has no attribute 'stroke'
Cause: ReportLab API doesn’t have canvas.stroke() (see ENGINEERING_LOG Phase 10).Solution:Use drawPath() for arcs:
Warning: INFLUX_TOKEN not found (but it exists in .env)
Symptom:
WARNING: INFLUX_TOKEN environment variable not set
But .env file has INFLUX_TOKEN=...Cause: Validation checks os.environ instead of settings object (see ENGINEERING_LOG Phase 20).Solution:Check settings object, not raw env:
# ❌ WRONGif not os.environ.get("INFLUX_TOKEN"): print("WARNING: Token missing")# ✅ CORRECTfrom backend.config import settingsif not settings.influx_token: print("WARNING: Token missing")
requirements.txt lists packages not installed
Symptom:
ERROR: Could not find a version that satisfies the requirement xyz==1.2.3
Cause:requirements.txt manually edited with wrong versions.Solution:Regenerate from actual environment: