Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BhaveshBytess/PREDICTIVE-MAINTENANCE/llms.txt

Use this file to discover all available pages before exploring further.

Overview

This guide covers common issues, error messages, and solutions based on real engineering challenges documented in the ENGINEERING_LOG.md.

InfluxDB Connection Issues

Symptom:
ERROR: InfluxDBError: 401 Unauthorized
Cause: Invalid InfluxDB token or expired credentials.Solution:
  1. Verify token in .env file:
    cat backend/.env | grep INFLUX_TOKEN
    
  2. Generate a new token in InfluxDB Cloud:
    • Go to InfluxDB Cloud
    • Navigate to Data > API Tokens
    • Click Generate API TokenAll Access Token
    • Copy and update INFLUX_TOKEN in .env
  3. Restart the backend:
    docker-compose restart backend
    
Symptom:
ERROR: [Errno 111] Connection refused
Cause: Backend cannot reach InfluxDB (wrong URL or network issue).Solution:
  1. Verify INFLUX_URL matches your InfluxDB Cloud region:
    # US East
    INFLUX_URL=https://us-east-1-1.aws.cloud2.influxdata.com
    
    # US West
    INFLUX_URL=https://us-west-2-1.aws.cloud2.influxdata.com
    
    # EU Central
    INFLUX_URL=https://eu-central-1-1.aws.cloud2.influxdata.com
    
  2. Test connectivity:
    curl -I $INFLUX_URL/health
    
  3. Check firewall/VPN settings blocking port 443
Symptom:
Expected data for Motor-01, got 0 results
Cause: Flux query filter applied before pivot() (see ENGINEERING_LOG Phase 2).Solution:WRONG:
from(bucket: "sensor_data")
  |> range(start: -1h)
  |> filter(fn: (r) => r.asset_id == "Motor-01")  // ❌ Before pivot
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
CORRECT:
from(bucket: "sensor_data")
  |> range(start: -1h)
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> filter(fn: (r) => r.asset_id == "Motor-01")  // ✅ After pivot
Explanation: Tag-based filters must come after pivot() when using pivoted column names.
Symptom: Integration tests fail intermittently with 0 results immediately after writes.Cause: InfluxDB 2.x has eventual consistency (see ENGINEERING_LOG Phase 2).Solution:Add a delay after writes before querying:
import time

# Write data
db.write_sensor_event(...)

# Wait for data to become queryable
time.sleep(5)  # Minimum 5 seconds for InfluxDB Cloud

# Now query
results = db.query_sensor_data(...)
Best Practice: For production, use write confirmations via InfluxDB’s /write response.

Model Loading Errors

Symptom:
ModuleNotFoundError: No module named 'sklearn'
Cause: Scikit-learn not installed or virtual environment not activated.Solution:
# Activate virtual environment
source venv/bin/activate  # Linux/Mac
.\venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Verify installation
python -c "import sklearn; print(sklearn.__version__)"
Symptom:
NameError: name 'np' is not defined
Cause: Type annotations evaluated at import time, but numpy is lazy-loaded (see ENGINEERING_LOG Phase 18).Solution:Add this to the top of ML modules:
from __future__ import annotations  # MUST be first import

import numpy as np  # Inside function, not at module level

def score(self, X: np.ndarray):  # Annotation is now a string
    import numpy as np  # Lazy import
    # ...
Why: from __future__ import annotations defers annotation evaluation (PEP 563).
Symptom:
FileNotFoundError: backend/models/Motor-01_batch_detector_v3.pkl
Cause: Model hasn’t been trained yet or file was deleted.Solution:
  1. Check if models directory exists:
    ls -la backend/models/
    
  2. Calibrate the system to train models:
    curl -X POST http://localhost:8000/system/calibrate \
      -H "Content-Type: application/json" \
      -d '{"asset_id": "Motor-01", "duration_seconds": 300}'
    
  3. Or retrain manually:
    python -m scripts.retrain_batch_model --asset Motor-01 --seconds 300
    
Symptom:
UserWarning: X has 12 features, but IsolationForest is expecting 16 features
Cause: Feature engineering code changed, but old model still loaded.Solution:
  1. Delete old models:
    rm backend/models/*.pkl
    
  2. Retrain from scratch:
    python -m scripts.retrain_batch_model --asset Motor-01 --seconds 600
    
Changing feature definitions invalidates existing models. Always retrain when features change.

CORS Issues

Symptom:
Access to fetch at 'http://localhost:8000/health' from origin 'http://localhost:3001'
has been blocked by CORS policy
Cause: Frontend running on alternate port (3001) not in CORS allowed origins (see ENGINEERING_LOG Phase 12).Solution:Add the port to backend/api/main.py:
app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "http://localhost:3000",
        "http://localhost:3001",  # Add this
        "http://localhost:5173",
        "http://127.0.0.1:3001",  # And this
        # ...
    ],
    allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"],
    allow_headers=["*"],
)
Restart the backend:
docker-compose restart backend
Symptom:
405 Method Not Allowed: PUT requests blocked by CORS
Cause: PUT not in allow_methods (see ENGINEERING_LOG Phase 20).Solution:Update CORS config:
allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"],  # Add PUT, DELETE, OPTIONS

Render Free Tier Issues

Symptom:
Error: 503 Site Can't Be Reached
After 15 minutes of inactivity, first request fails or times out.Cause: Render free tier spins down containers after inactivity. Cold start takes 30-60 seconds (see ENGINEERING_LOG Phase 18).Solution:Option 1: Keep-Alive Heartbeat (Implemented)The frontend sends a ping every 10 minutes:
setInterval(() => {
  fetch(`${API_URL}/ping`).catch(() => {});
}, 10 * 60 * 1000);
Option 2: Upgrade to Render Starter$7/month removes cold starts and spin-downs.Option 3: External Keep-Alive ServiceUse UptimeRobot (free) to ping /health every 5 minutes.
Symptom: Render logs show:
Starting service...
Importing sklearn...
[KILLED] Out of memory
Cause: Heavy ML imports (sklearn, numpy, pandas) at module level exceed 512MB RAM limit (see ENGINEERING_LOG Phase 18).Solution:Lazy-load ML dependencies:
# ❌ DON'T: Module-level imports
import numpy as np
from sklearn.ensemble import IsolationForest

class BatchAnomalyDetector:
    def train(self, data):
        # Use np and IsolationForest

# ✅ DO: Lazy imports inside functions
class BatchAnomalyDetector:
    def train(self, data):
        import numpy as np
        from sklearn.ensemble import IsolationForest
        # Now use them
Also add:
from __future__ import annotations  # First line
This defers type annotation evaluation, preventing import-time failures.
Symptom: Render dashboard shows “Health check failed” during startup.Cause: /health endpoint loads heavy ML modules, exceeding health check timeout.Solution:Use a lightweight /ping endpoint for health checks:
@app.get("/ping")
def ping():
    return {"status": "ok"}  # No DB, no ML imports
Update Render health check path:
  1. Open Render Dashboard → Service Settings
  2. Health Check Path: /ping
  3. Save

Windows Development Issues

Symptom: Vercel deployment fails:
Error 126: Permission denied: node_modules/.bin/vite
Cause: Windows binaries in node_modules/ committed to Git (see README).Solution:
  1. Add node_modules/ to .gitignore:
    node_modules/
    
  2. Remove from Git history:
    git rm -r --cached node_modules/
    git commit -m "Remove node_modules from Git"
    git push
    
  3. Vercel will install dependencies on Linux during build
NEVER commit node_modules/ from Windows. It causes cross-platform deployment failures.
Symptom:
'venv\Scripts\activate' is not recognized as an internal or external command
Cause: Using Linux command syntax on Windows.Solution:Use correct activation command:
# PowerShell
.\venv\Scripts\Activate.ps1

# Command Prompt
.\venv\Scripts\activate.bat

Data Quality Issues

Symptom: System shows red anomaly lines during normal operations (no fault injected).Cause: Three potential issues (see ENGINEERING_LOG Phase 17):
  1. Overly sensitive range checks (10% tolerance too strict)
  2. Majority aggregation threshold too low (15% anomalous points)
  3. No event debouncing (single-tick transitions)
Solution:1. Widen tolerance in system_routes.py and integration_routes.py:
# Change from 10% to 25%
tolerance = 0.25
2. Require majority vote in database.py:
# At least 15/100 points must be anomalous
is_faulty = 1 if is_faulty_val >= 0.15 else 0
3. Add debounce in EventEngine:
# Require 2 consecutive faulty seconds before firing event
if self._consecutive_faulty_count >= 2:
    self._fire_anomaly_detected()
Symptom: Degradation Index (DI) increases during healthy monitoring.Cause: Self-Harming DI bug — healthy noise accumulates phantom damage (see ENGINEERING_LOG Phase 20).Solution:Implement dead-zone in assessor.py:
HEALTHY_FLOOR = 0.65  # Scores below this = zero damage

if batch_score < HEALTHY_FLOOR:
    effective_severity = 0.0  # No damage
else:
    # Remap scores ≥ 0.65 to [0, 1]
    effective_severity = (batch_score - HEALTHY_FLOOR) / (1.0 - HEALTHY_FLOOR)

# Only effective_severity > 0 accumulates DI
DI_increment = (effective_severity ** 2) * SENSITIVITY_CONSTANT * dt
Symptom: Motor with high vibration variance (σ=0.17g) but normal mean (0.15g) shows health=100%.Cause: Legacy v2 model only sees 1Hz averages, not variance (see ENGINEERING_LOG Phase 15).Solution:Ensure batch model (v3) is active:
  1. Check model file exists:
    ls -la backend/models/*_batch_detector_v3.pkl
    
  2. If missing, retrain:
    python -m scripts.retrain_batch_model --asset Motor-01 --seconds 600
    
  3. Restart backend to load batch model:
    docker-compose restart backend
    
Why v3 detects jitter:
  • v3 has std and peak_to_peak features
  • v2 only has mean (blind to variance)

Chart Visualization Issues

Symptom: Chart shows single data point suspended mid-axis, not anchored to X-axis.Cause: connectNulls=true connects single point to empty space (see ENGINEERING_LOG Phase 16).Solution:Only render lines when ≥2 points exist:
{data.length >= 2 && (
  <Line
    type="monotone"
    dataKey="voltage_v"
    stroke="#3B82F6"
    connectNulls={false}  // Don't connect across gaps
  />
)}
Symptom: Y-axis auto-scales to data range, making 0.01g vibration change look like a spike.Cause: Auto-scaling Y-axis domain (see ENGINEERING_LOG Phase 16).Solution:Use fixed domains per signal type:
{/* Voltage axis */}
<YAxis yAxisId="voltage" domain={[0, 300]} />

{/* Current axis (hidden) */}
<YAxis yAxisId="current" domain={[0, 40]} hide />

{/* Vibration axis */}
<YAxis yAxisId="vibration" domain={[0, 2.0]} orientation="right" />
Symptom: Time axis shows 0-60s and expands to 0-120s instead of sliding.Cause: domain={['dataMin', 'dataMax']} grows with data (see ENGINEERING_LOG Phase 16).Solution:Hard-code 60s right-anchored window:
<XAxis
  dataKey="timestamp"
  domain={[Date.now() - 60000, Date.now()]}  // Last 60 seconds
  type="number"
  tickFormatter={(ts) => new Date(ts).toLocaleTimeString()}
/>

Report Generation Issues

Symptom: Downloaded Excel report has blank Anomaly_Score column.Cause: Anomaly scores only computed at ingestion time, not at report generation (see ENGINEERING_LOG Phase 19).Solution:Compute range-check scores in generator.py during report creation:
for row in sensor_data:
    # Check if value exceeds baseline bounds
    v = row["voltage_v"]
    v_min, v_max = baseline["voltage_v"]
    
    if v < v_min or v > v_max:
        row["anomaly_score"] = min((abs(v - v_min) / v_min), 1.0)
    else:
        row["anomaly_score"] = 0.0
Symptom: PDF reports include operator log notes like “asyfkk” or “test123456”.Cause: No validation on operator log input (see ENGINEERING_LOG Phase 19).Solution:Sanitize logs in report generators:
import re

VALID_LOG_PATTERN = re.compile(r"^[a-zA-Z0-9\s.,!?;:'\"\-]+$")

for log in operator_logs:
    if not VALID_LOG_PATTERN.match(log["description"]):
        log["description"] = "Maintenance event recorded"
Symptom:
AttributeError: 'Canvas' object has no attribute 'stroke'
Cause: ReportLab API doesn’t have canvas.stroke() (see ENGINEERING_LOG Phase 10).Solution:Use drawPath() for arcs:
# ❌ WRONG
canvas.arc(...)
canvas.stroke()

# ✅ CORRECT
path = canvas.beginPath()
path.arc(x, y, r, start_angle, end_angle)
canvas.drawPath(path, stroke=1, fill=0)

Environment Configuration

Symptom:
WARNING: INFLUX_TOKEN environment variable not set
But .env file has INFLUX_TOKEN=...Cause: Validation checks os.environ instead of settings object (see ENGINEERING_LOG Phase 20).Solution:Check settings object, not raw env:
# ❌ WRONG
if not os.environ.get("INFLUX_TOKEN"):
    print("WARNING: Token missing")

# ✅ CORRECT
from backend.config import settings

if not settings.influx_token:
    print("WARNING: Token missing")
Symptom:
ERROR: Could not find a version that satisfies the requirement xyz==1.2.3
Cause: requirements.txt manually edited with wrong versions.Solution:Regenerate from actual environment:
# Activate venv
source venv/bin/activate

# Freeze installed packages
pip freeze > requirements.txt

# Remove local packages (if any)
sed -i '/^-e /d' requirements.txt

Getting Help

If your issue isn’t covered here:
1

Check Engineering Log

Review ENGINEERING_LOG.md for detailed technical context on past issues.
2

Enable Debug Logging

# Add to .env
LOG_LEVEL=DEBUG

# Restart backend
docker-compose restart backend

# View detailed logs
docker-compose logs -f backend
3

Run Health Checks

# Backend health
curl http://localhost:8000/health

# InfluxDB health
curl -H "Authorization: Token $INFLUX_TOKEN" $INFLUX_URL/health

# System state
curl http://localhost:8000/system/state
4

Open GitHub Issue

If still stuck, open an issue at GitHub Issues with:
  • Error message and full stack trace
  • Steps to reproduce
  • Environment (Docker/systemd, OS, Python version)
  • Relevant logs

Monitoring

Production monitoring best practices

Model Retraining

Fix model accuracy issues

InfluxDB Setup

Complete database configuration guide

API Reference

API endpoint documentation

Build docs developers (and LLMs) love