MARLO is a Java web application that stores research-management data in a phase-aware MySQL database, renders form-driven screens via Apache Struts 2, and exposes a REST API via Spring MVC. All production data sits in AWS (RDS + S3), feeds a Microsoft Fabric / Power BI analytics pipeline, and is consumed by AI services running on AWS Bedrock. This page explains how those pieces fit together and why the platform is designed the way it is.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/CCAFS/MARLO/llms.txt
Use this file to discover all available pages before exploring further.
Phase-aware data model
The central design decision in MARLO is that phases are first-class citizens. Every record that changes over time — deliverables, innovations, partners, outcome impact case reports, budgets — carries an explicitphase_id foreign key.
The phases table enumerates every cycle moment across all programs:
| Phase name | When it is active | Typical work |
|---|---|---|
| POWB (Plan of Work and Budget) | Start of the year | Annual planning per cluster |
| UpKeep | Mid-year | Progress monitoring and adjustments |
| Annual Report (AR) | End of year | Narrative reporting, evidence upload, QA |
Forward-only replication
Past phases are immutable. You cannot edit a record that belongs to a closed phase. This is a deliberate design constraint that preserves the audit trail.
- A save in POWB replicates to UpKeep and to AR.
- A save in UpKeep replicates to AR.
- A delete follows the same chain — removing a deliverable from POWB removes it from all future phases.
Multi-tenant model
MARLO runs multiple research programs on a single instance. Each program is called a Global Unit, which can be one of three types:- CRP (CGIAR Research Program) — for multi-country, multi-cluster research programs such as AICCRA and CCAFS.
- Platform — for cross-cutting service platforms.
- Center — for CGIAR Centers with their own reporting requirements.
- Phases and annual cycle dates
- Roles and user assignments
- Partner and location data
- Feature flags (“specificities”) that enable or disable program-specific behavior
/projects/aiccra/description.do — and the platform enforces this boundary through authentication interceptors on every request.
How data flows: PMRL cycle
Data enters through Struts 2 form actions (cluster coordinators filling in sections), passes through the manager layer (which applies validation and phase replication), persists to MySQL on AWS RDS, and eventually flows into the Microsoft Fabric Lakehouse for analytics.Dual web architecture
MARLO uses two distinct web layers, each with a different responsibility:Struts 2 — form-driven UI
All interactive screens used by cluster coordinators, PMU staff, and QA reviewers are Struts 2 actions. URLs end in.do:
/projects/{crp}/projectDeliverable.do— deliverable entry form/powb/{crp}/financialPlan.do— POWB financial plan/annualReport/{crp}/melia.do— AR MELIA section/qualityAssessment/{crp}/detail.do— QA review screen
Spring MVC — REST API
External integrations use the REST API under/api/v2/*. These endpoints are used by:
- The CLARISA reference-data service (institutions, geographies, taxonomies)
- Power BI / Microsoft Fabric ingestion pipelines
- AWS AI services (Reports Generator, Chatbot, Text Mining)
- Quality assurance token-based clients
Struts 2 is explicitly excluded from handling
/api/* paths. These two layers do not overlap. All form-based user interactions go through Struts; all machine-to-machine integrations go through Spring MVC.Module structure
MARLO is a multi-module Maven project. The modules divide responsibility cleanly:| Module | What it contains |
|---|---|
marlo-data | 540+ JPA entities, Manager interfaces and implementations, DAOs, audit listeners. The domain layer. |
marlo-web | Struts actions, Spring MVC REST controllers, FreeMarker templates, validators, interceptors, Flyway SQL migrations, frontend assets. |
marlo-core | Cross-cutting configuration: Apache Shiro security wiring, Hibernate session factory, database config. |
marlo-utils | Pure utility classes for dates, strings, Excel/CSV/PDF processing, and JSON helpers. |
marlo-parent | Root Maven aggregator — declares dependency versions and plugin configuration. No executable code. |
marlo-data follows a four-layer pattern:
Integration points
| Integration | Direction | Purpose |
|---|---|---|
| CGIAR Active Directory | Inbound (auth) | Enterprise single sign-on via Apache Shiro |
| CLARISA | Outbound (REST) | Reference data: institutions, countries, partner types, taxonomies |
| Microsoft Fabric / Power BI | Outbound (extract + embed) | Bronze / Silver / Gold Lakehouse; embedded dashboards refreshed every 8 hours (results) and 30 minutes (QA) |
| AWS Bedrock (Claude, Titan) | Outbound | AI narrative generation, embeddings, RAG pipelines |
| Amazon OpenSearch | Outbound | Vector indices for Reports Generator and Chatbot |
| AWS S3 | Outbound | Document storage, daily backups |
| CGSpace | Outbound (REST) | Open-access deposit metadata and DOI lookup |
| Pusher | Outbound (WebSocket) | Real-time in-platform notifications |
CLARISA
CLARISA is the CGIAR reference-data service. MARLO calls CLARISA to populate institution selectors, geographic lookups, and partner-type lists. These tables are read-only from the MARLO user perspective — do not hand-edit CLARISA-backed records directly in the database.Power BI and Microsoft Fabric
Operational data is extracted from MySQL and ingested into a Microsoft Fabric Lakehouse following a Bronze (raw) → Silver (cleaned) → Gold (aggregated) pipeline. Power BI dashboards are embedded directly in MARLO screens and in a public embed tool. Key dashboards include cluster completeness status, QA progress, and results indicators.AI services
Three AI capabilities are available when enabled for a program:- Text Mining — surfaces related literature and prior results from MARLO data.
- Reports Generator — produces cluster-level narratives grounded in structured MARLO records using AWS Bedrock (Claude) and Amazon OpenSearch vector indices.
- Chatbot — conversational interface for exploring program data.
Security model
Authentication uses Apache Shiro with the CGIAR Active Directory as the primary identity realm. An internal MD5-backed fallback exists for programs that do not use CGIAR AD. Authorization is layered:- Identity — the
userstable identifies the person. - Program access — the
crp_userstable controls which Global Units a user can enter. - Role assignment — the
user_roletable controls what a user can do within a program.
canEditProject, canEditDeliverable, canEditPowbSynthesis, etc.) before the action class runs.
Next steps
Annual planning workflow
Walk through what cluster coordinators do during the POWB phase
Quality assurance
How the QA review cycle works across phases
Analytics and Power BI
How MARLO data flows into the Fabric Lakehouse and Power BI dashboards
Developer: phase replication
Technical details of the ManagerImpl forward-only replication contract