ScraperService - Mi API BCV

Overview

The ScraperService class (app/Services/ScraperService.php) is responsible for fetching exchange rate data from bank websites. It handles HTTP requests, HTML parsing, and data extraction using Symfony’s DomCrawler component.

Class Structure

namespace App\Services;

use Illuminate\Support\Facades\Http;
use Symfony\Component\DomCrawler\Crawler;

class ScraperService
{
    public function scrapeData(string $url, string $banco);
    private function parseBanplusData($crawler);
    private function parseBNCData($crawler);
    private function parseBCVData($crawler);
    private function cleanValue($value);
}

Main Method: scrapeData()

The entry point for all scraping operations.

public function scrapeData(string $url, string $banco)
{
    $response = Http::withoutVerifying()->withHeaders([
        'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9',
        'Accept-Language' => 'es-ES,es;q=0.9,en;q=0.8',
        'Referer' => $url,
    ])->get($url);

    if ($response->status() !== 200) {
        return null;
    }

    $crawler = new Crawler($response->body());

    return match ($banco) {
        'banplus' => $this->parseBanplusData($crawler),
        'bnc' => $this->parseBNCData($crawler),
        'bcv' => $this->parseBCVData($crawler),
        default  => 0.00,
    };
}

HTTP Client Configuration

withoutVerifying() Disables SSL certificate verification. This is sometimes necessary for Venezuelan bank sites with SSL issues.

Disabling SSL verification should only be used when absolutely necessary and you trust the target site.

Custom Headers

User-Agent: Mimics Chrome browser to avoid blocking
Accept: Declares accepted content types
Accept-Language: Prefers Spanish content
Referer: Sets the referring URL

Response Handling

Returns null for non-200 status codes, allowing the caller to handle failures gracefully.

Bank Routing

Uses PHP 8’s match expression to route to the appropriate parser based on bank identifier.

Parser Methods

parseBanplusData()

Extracts exchange rate from Banplus news ticker.

private function parseBanplusData($crawler)
{
    $element = $crawler->filter('.awb-news-ticker-link');
    if ($element->count() === 0) {
        throw new \Exception('No se encontró el elemento esperado en Banplus');
    }

    $text = $element->text();

    $valor = null;
    if (preg_match('/tasa de cambio\s+(.*)/', $text, $matches)) {
        if (preg_match_all('/[0-9]+,[0-9]+/', $matches[1], $coincidencias) && !empty($coincidencias[0])) {
            $valor = trim($coincidencias[0][0]);
        }
    }

    if ($valor === null) {
        $valor = $text;
    }

    return $this->cleanValue($valor);
}

Selector: .awb-news-ticker-link Pattern: "tasa de cambio XX,XX" Regex: /[0-9]+,[0-9]+/ extracts decimal number with comma separator Fallback: If regex fails, uses entire text and attempts cleaning

parseBNCData()

Extracts USD purchase rate from Banco Nacional de Crédito.

private function parseBNCData($crawler)
{
    $items = $crawler->filter('.ItemSpace')->each(function (Crawler $node) {
        $text = $node->text();
        return str_contains($text, 'USD $ Compra Bs:') ? $text : null;
    });

    $filteredItems = array_values(array_filter($items));
    if (empty($filteredItems)) {
        throw new \Exception('No se encontró el texto esperado en BNC');
    }

    preg_match_all('/[0-9]+,[0-9]+/', $filteredItems[0], $matches);
    if (empty($matches[0])) {
        throw new \Exception('No se encontró el valor numérico en BNC');
    }

    $value = $matches[0][0];
    return $this->cleanValue($value);
}

Selector: .ItemSpace Search Text: “USD $ Compra Bs:” Strategy:

Find all .ItemSpace elements
Filter for the one containing “USD $ Compra Bs:”
Extract numeric value with regex

parseBCVData()

Extracts official rate from Banco Central de Venezuela.

private function parseBCVData($crawler)
{
    $element = $crawler->filter('#dolar');
    if ($element->count() === 0) {
        throw new \Exception('No se encontró el elemento esperado en BCV');
    }

    $text = $element->text();

    if (preg_match('/USD\s+(.*)/', $text, $matches)) {
        if (preg_match_all('/[0-9]+,[0-9]+/', $matches[1], $coincidencias) && !empty($coincidencias[0])) {
            $valor = trim($coincidencias[0][0]);
        }
    }

    if ($valor === null) {
        $valor = $text;
    }

    return $this->cleanValue($valor);
}

Selector: #dolar Pattern: "USD XX,XX" Regex: /USD\s+(.*)/ captures everything after “USD”, then extracts number

Value Normalization

cleanValue()

Converts extracted strings to float values.

private function cleanValue($value)
{
    return (float) preg_replace('/[^0-9.]/', '', str_replace(',', '.', $value));
}

Process:

Replace comma with period (Spanish → English decimal)
Remove all non-numeric characters except periods
Cast to float

Examples:

"69,50" → 69.50
"Tasa: 69,50 Bs" → 69.50
"69.50" → 69.50

Error Handling

Exceptions Thrown

throw new \Exception('No se encontró el elemento esperado en Banplus');
throw new \Exception('No se encontró el texto esperado en BNC');
throw new \Exception('No se encontró el valor numérico en BNC');
throw new \Exception('No se encontró el elemento esperado en BCV');

The actual source code has a bug on line 85 of ScraperService.php - the parseBCVData() method throws an error message saying “Banplus” instead of “BCV”. This is shown correctly in the code example above.

These exceptions are caught by the FetchExchangeRates command, which logs them and continues with the next bank.

Null Returns

Returning null for HTTP failures allows the strategy to return null, which the command interprets as “no value fetched.”

DomCrawler Usage

The Symfony DomCrawler component provides jQuery-like selectors for HTML parsing.

Basic Filtering

$crawler->filter('#dolar')           // ID selector
$crawler->filter('.ItemSpace')       // Class selector
$crawler->filter('.awb-news-ticker-link') // Class selector

Extracting Text

$element->text()  // Get text content of element

Iteration

$crawler->filter('.ItemSpace')->each(function (Crawler $node) {
    return $node->text();
});

Count Check

if ($element->count() === 0) {
    throw new \Exception('Element not found');
}

Regex Patterns

Extracting Decimals

preg_match_all('/[0-9]+,[0-9]+/', $text, $matches);

Matches: "69,50", "1,234,56" (comma as decimal separator)

Pattern Matching

preg_match('/tasa de cambio\s+(.*)/', $text, $matches);

Captures everything after “tasa de cambio” into $matches[1].

preg_match('/USD\s+(.*)/', $text, $matches);

Captures everything after “USD” into $matches[1].

Testing Scrapers

You can test individual parsers by calling the service directly:

use App\Services\ScraperService;

$scraper = new ScraperService();
$rate = $scraper->scrapeData('https://www.bcv.org.ve/', 'bcv');

echo "BCV Rate: {$rate}";

Create a test artisan command for debugging scrapers without triggering the full update process.

Common Issues

SSL Certificate Errors

Problem: cURL error 60: SSL certificate problem Solution: Already handled with withoutVerifying()

Bot Detection

Problem: 403 Forbidden or CAPTCHA responses Solution:

Use realistic User-Agent headers
Add delays between requests
Rotate IP addresses if necessary

Selector Changes

Problem: Bank redesigns their website, selectors break Solution:

Inspect new HTML structure
Update selector in parser method
Test thoroughly
Consider monitoring for selector changes

Rate Format Changes

Problem: Bank changes how they display rates Solution:

Update regex pattern
Test with real examples
Add fallback logic if possible

Performance Considerations

HTTP Timeouts

Laravel’s HTTP client has default timeouts. For slow bank sites, increase:

Http::timeout(30)->withoutVerifying()->get($url);

Parallel Requests

Currently scrapers run sequentially. For faster updates, use Http::pool():

$responses = Http::pool(fn ($pool) => [
    $pool->get('https://www.bcv.org.ve/'),
    $pool->get('https://www.banplus.com/'),
    $pool->get('https://www.bnc.com.ve/'),
]);

Retry Logic

Add automatic retries for transient failures:

Http::retry(3, 100)->withoutVerifying()->get($url);

Extending ScraperService

Adding a New Parser

Add parse method

private function parseNuevoBancoData($crawler)
{
    $element = $crawler->filter('.exchange-rate');
    $text = $element->text();
    // ... extraction logic
    return $this->cleanValue($text);
}

Add to match expression

return match ($banco) {
    'banplus' => $this->parseBanplusData($crawler),
    'bnc' => $this->parseBNCData($crawler),
    'bcv' => $this->parseBCVData($crawler),
    'nuevobanco' => $this->parseNuevoBancoData($crawler),
    default  => 0.00,
};

Create strategy class

See Adding Banks for complete guide.

Next Steps

Strategy Pattern

Learn how strategies use ScraperService

Adding Banks

Complete guide to adding new data sources

Architecture

Extending

Documentation Index

​Overview

​Class Structure

​Main Method: scrapeData()

​HTTP Client Configuration

​Response Handling

​Bank Routing

​Parser Methods

​parseBanplusData()

​parseBNCData()

​parseBCVData()

​Value Normalization

​cleanValue()

​Error Handling

​Exceptions Thrown

​Null Returns

​DomCrawler Usage

​Basic Filtering

​Extracting Text

​Iteration

​Count Check

​Regex Patterns

​Extracting Decimals

​Pattern Matching

​Testing Scrapers

​Common Issues

​SSL Certificate Errors

​Bot Detection

​Selector Changes

​Rate Format Changes

​Performance Considerations

​HTTP Timeouts

​Parallel Requests

​Retry Logic

​Extending ScraperService

​Adding a New Parser

​Next Steps

Strategy Pattern

Adding Banks

Build docs developers (and LLMs) love

Overview

Class Structure

Main Method: scrapeData()

HTTP Client Configuration

Response Handling

Bank Routing

Parser Methods

parseBanplusData()

parseBNCData()

parseBCVData()

Value Normalization

cleanValue()

Error Handling

Exceptions Thrown

Null Returns

DomCrawler Usage

Basic Filtering

Extracting Text

Iteration

Count Check

Regex Patterns

Extracting Decimals

Pattern Matching

Testing Scrapers

Common Issues

SSL Certificate Errors

Bot Detection

Selector Changes

Rate Format Changes

Performance Considerations

HTTP Timeouts

Parallel Requests

Retry Logic

Extending ScraperService

Adding a New Parser

Next Steps