Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt

Use this file to discover all available pages before exploring further.

RFM segmentation is a behaviour-based customer classification technique that scores every buyer on three independent dimensions — how recently they purchased, how often they purchase, and how much revenue they generate. By combining those three scores into a composite profile, the Fini platform can classify its entire customer base into five actionable segments without requiring manual rules or machine-learning infrastructure. The pipeline reads raw transaction data directly from fact_sales, computes all metrics in pandas, and writes the final scored table back to PostgreSQL as customer_rfm, ready for use in Power BI or downstream campaign tooling.

The Three RFM Dimensions

Each dimension is scored from 1 to 5 using pd.qcut, which divides customers into five equal-sized quantile bins. This means scores are always relative to the current customer population rather than fixed absolute thresholds.

Recency (R)

Days since the customer’s last purchase, measured from a snapshot date of max(sale_date) + 1 day. Scored 5 = most recent down to 1 = least recent. Lower recency days → higher R score.

Frequency (F)

Total number of purchase transactions per customer. Scored 1 = fewest purchases up to 5 = most purchases. Higher transaction count → higher F score.

Monetary (M)

Total cumulative revenue generated by the customer. Scored 1 = lowest revenue up to 5 = highest revenue. Higher lifetime spend → higher M score.

RFM Calculation

The script reads customer_id, sale_date, and revenue from fact_sales, then derives the three raw metrics through a single groupby aggregation before scoring each with pd.qcut.
sales = pd.read_sql("""
SELECT
    customer_id,
    sale_date,
    revenue
FROM fact_sales
""", engine)

sales["sale_date"] = pd.to_datetime(
    sales["sale_date"]
)

snapshot_date = (
    sales["sale_date"].max()
    + pd.Timedelta(days=1)
)

rfm = sales.groupby("customer_id").agg({
    "sale_date": lambda x: (
        snapshot_date - x.max()
    ).days,
    "customer_id": "count",
    "revenue": "sum"
})

rfm.columns = [
    "recency",
    "frequency",
    "monetary"
]

rfm.reset_index(inplace=True)

rfm["R_score"] = pd.qcut(
    rfm["recency"],
    5,
    labels=[5, 4, 3, 2, 1]
).astype(int)

rfm["F_score"] = pd.qcut(
    rfm["frequency"],
    5,
    labels=[1, 2, 3, 4, 5]
).astype(int)

rfm["M_score"] = pd.qcut(
    rfm["monetary"],
    5,
    labels=[1, 2, 3, 4, 5]
).astype(int)
The snapshot date is always derived dynamically as the latest sale_date in the dataset plus one day. This ensures recency is computed consistently regardless of when the pipeline runs.

Customer Segments

After scoring, each customer is assigned to exactly one segment by the segment_customer function, which applies priority-ordered conditions. A customer matches the first rule that applies; all remaining customers fall into Others.
def segment_customer(row):

    if row["R_score"] >= 4 and \
       row["F_score"] >= 4 and \
       row["M_score"] >= 4:

        return "Champions"

    elif row["F_score"] >= 4:

        return "Loyal Customers"

    elif row["R_score"] >= 4:

        return "Potential Loyalists"

    elif row["R_score"] <= 2 and \
         row["F_score"] >= 3:

        return "At Risk"

    else:
        return "Others"


rfm["segment"] = rfm.apply(
    segment_customer,
    axis=1
)

Champions

Rule: R_score ≥ 4 AND F_score ≥ 4 AND M_score ≥ 4Customers who purchased recently, purchase frequently, and generate the highest revenue. The brand’s most valuable cohort — ideal targets for loyalty rewards, early product access, and brand ambassador programmes.

Loyal Customers

Rule: F_score ≥ 4 (any R and M)High-frequency buyers regardless of recency or spend level. They return consistently and respond well to subscription offers, bulk discounts, and frequency-based rewards.

Potential Loyalists

Rule: R_score ≥ 4 (any F and M)Recently active customers who have not yet reached high purchase frequency. With the right nurture campaigns — such as onboarding sequences or second-purchase incentives — they can be converted into Loyal Customers.

At Risk

Rule: R_score ≤ 2 AND F_score ≥ 3Previously frequent buyers whose purchasing activity has dropped off significantly. Win-back campaigns, personalised re-engagement offers, and lapsed-customer discounts are recommended for this group.
Customers who do not meet the criteria for any of the four named segments are labelled Others. This group is typically low-frequency, low-recency, and low-spend — occasional buyers who have not yet established a pattern.

Output Table: customer_rfm

The final scored DataFrame is written to PostgreSQL using to_sql with if_exists="replace", which drops and recreates the table on every pipeline run to keep scores current.
rfm.to_sql(
    "customer_rfm",
    engine,
    if_exists="replace",
    index=False
)
The customer_rfm table contains one row per customer with the following columns:
ColumnTypeDescription
customer_idtext / integerUnique customer identifier, matches dim_customers.customer_id
recencyintegerDays since the customer’s last purchase relative to the snapshot date
frequencyintegerTotal number of purchase transactions in fact_sales
monetarynumericTotal cumulative revenue generated by the customer
R_scoreintegerRecency quantile score (1–5); 5 = most recent
F_scoreintegerFrequency quantile score (1–5); 5 = most frequent
M_scoreintegerMonetary quantile score (1–5); 5 = highest spend
segmenttextAssigned segment label: Champions, Loyal Customers, Potential Loyalists, At Risk, or Others
Connect the customer_rfm table directly to Power BI using the PostgreSQL connector and use the segment column as a slicer. This lets marketing teams filter any dashboard visual — revenue, product mix, regional map — by customer segment without writing additional SQL.

Build docs developers (and LLMs) love