Every time a client callsDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/miikorz/DailyNews/llms.txt
Use this file to discover all available pages before exploring further.
GET /feed, DailyNews automatically fetches the homepages of El País (https://elpais.com/) and El Mundo (https://elmundo.es/), parses the HTML with Cheerio, and extracts the top 5 article headlines from each source. The freshly scraped articles are then persisted to MongoDB before the full feed list is returned to the client. Currently two news sources are supported — El País and El Mundo — and new sources can be added by implementing a single interface.
How Scraping Works
The scraping pipeline is triggered insideFeedService.getAllFeeds() and follows a clear chain of responsibility:
FeedService iterates the scrapers array
FeedService is constructed with an array of ScrapperRepositoryInterface implementations. On getAllFeeds(), it loops over every scraper in that array.ScrapperService wraps each scraper
For each scraper, a new
ScrapperService instance is created, delegating the actual fetch-and-parse logic to the underlying ScrapperRepositoryInterface implementation via scrapperService.getTopNews().Each scraper fetches and parses HTML
The concrete repository class (e.g.,
ElPaisScrapperRepository) calls the native fetch API to retrieve the news homepage, then passes the raw HTML to Cheerio. It selects article elements and reads CSS-targeted child nodes for the title, author, description, link, and portrait image — stopping after 5 items.Results are accumulated
Scraped
Feed objects from all sources are pushed into a shared scrappedFeeds array.saveScrappedFeeds() persists new items
The accumulated results are passed to
feedRepository.saveScrappedFeeds(), which deduplicates by link before inserting only new articles into MongoDB.ScrapperRepositoryInterface
Every news source must implement this minimal interface. A single async method,getTopNews(), is responsible for fetching and returning a list of Feed objects.
El País Scraper
ElPaisScrapperRepository fetches https://elpais.com/ and selects all article elements, capping results at 5. It targets the following CSS structure inside each article:
| Field | CSS Selector |
|---|---|
title | h2 (inner text) |
author | a.c_a_a — first anchor with class c_a_a |
description | p.c_d |
link | header a — first anchor inside the <header> |
portrait | img.c_m_e._re.lazyload.a_m-h › fallback img |
El Mundo Scraper
ElMundoScrapperRepository fetches https://elmundo.es/ with an explicit Content-Type header and selects all article elements, capping results at 5. Unlike the El País scraper, it uses a manual feedCount counter (rather than the Cheerio index i) so that articles without a link are skipped entirely — those entries are typically video-only cards.
| Field | CSS Selector |
|---|---|
title | h2 (inner text) |
author | span.ue-c-cover-content__byline-name (with "Redacción: " prefix stripped) |
description | div.ue-c-cover-content__footer |
link | header a — first anchor inside the <header> (skipped if empty) |
portrait | img.ue-c-cover-content__image › fallback img |
Adding a New News Source
To add a third news source, implementScrapperRepositoryInterface and register the new class in feedController.ts.
Create the scraper file
Add a new file, for example
src/infrastructure/repositories/scrapper/mynews/MyNewsScrapperRepository.ts.Implement ScrapperRepositoryInterface
Implement
getTopNews() to fetch the target URL and parse headlines with Cheerio:Register the scraper in feedController.ts
Import and add your new class to the
scrappers array in src/api/controllers/feedController.ts:Persistence
After scraping completes,FeedService.getAllFeeds() calls feedRepository.saveScrappedFeeds(scrappedFeeds). The repository implementation handles deduplication: it queries MongoDB for any existing documents whose link field matches any of the scraped links, builds a Set of known links, and then calls insertMany() only for items whose links are not already in the database. This means repeated GET /feed calls will not produce duplicate entries in MongoDB.
Some scraped articles may have an empty
portrait field. This happens when the news homepage does not include a visible <img> tag within the article element — for example, when the hero image is loaded lazily via JavaScript after the initial HTML response. Fetching each article’s detail page to retrieve the image would significantly slow down the scraping process and is therefore not implemented.