Design for Settlement in 2025: A Playbook for Telecom, Banking, and Utilities
Money systems fail when they optimize for screens before they optimize for settlement. Build the billing spine first—immutable events, effective-dated tariffs, a real ledger, and atomic balance updates close to the data—then let everything else hang from that backbone.
Money systems fail when they optimize for screens before they optimize for settlement. Build the billing spine first—immutable events, effective-dated tariffs, a real ledger, and atomic balance updates close to the data—then let everything else hang from that backbone: portals, APIs, analytics, even the shiny AI explainers. Do that, and month-end stops being a theatrical production and starts being a routine.
This is a field-tested playbook from the trenches of telecom billing migrations, banking core replacements, and utility meter-to-cash transformations. Not theory—battle scars.
Executive Summary
Truth has a shape: a double-entry ledger, immutable events (usage, payments, adjustments), effective-dated catalogs (plans, prices, taxes), and deterministic cycles that can be replayed. Anything else is decoration. Keep mutations next to the data: rating, proration, discounting, balance updates, and cycle-close logic belong in the database as transactional routines—audited, idempotent, and measured. The service tier orchestrates; it doesn't guess. Batch where it pays; stream where it matters: realtime for balances, credit checks, throttling, and payment confirmations; batch for cycle aggregation and statements. Use SSE for one-way progress streams; WebSocket only when agents and tools must talk back. Compliance is architecture: revenue recognition (IFRS 15/ASC 606), card security (PCI DSS 4.0.1), and payments messaging (ISO 20022) should be encoded as data and workflows, not as slide decks. AI makes the system explain itself: anomaly detection on usage and taxes; tariff-change simulations; "why is my bill this amount?" narratives derived from the ledger. The model rides on structured data; it never invents numbers.Ground Rules: What a Settlement-Grade System Does
- Value is written once. Every money movement is an immutable journal line with currency, amount, who/what/when, and a reference to the source event. Derivations—running balances, AR aging, dashboards—are views.
- Event first, state second. Persist
UsageEvent,RatedEvent,Payment,Adjustment; deriveInvoiceLineandInvoice; never mutate history.
- Effective dating everywhere. Plans, taxes, discounts, and eligibility rules carry validity windows; proration evolves from this, not from ad hoc math.
- Idempotency as a contract. External writes carry keys; retries are safe; reconciliations are possible.
- Reversals, not deletes. Errors heal by contra entries that preserve time order and auditability.
- Cycle closure as an engineering discipline. Window your workloads, shard by billing cohort or market, and ensure that replay produces the same totals.
The Canonical Data Model (Customer → Cash)
Customer → Account → Subscription/Contract (links to PricePlan and Add-ons)
UsageEvent (raw, normalized) → RatedEvent (price applied) → InvoiceLine → Invoice
Payment, Adjustment, Balance/Wallet, Tax, GL Posting
Catalog (products, tariffs, discounts, tax rules with effective dates)
Eligibility/Segmentation (tiers, regions, special handling)
For telecom, UsageEvent often originates from network charge records (CDRs) and roaming TAP files. 3GPP defines Charging Data Records with fields such as start time, duration, and data volume; roaming settlement relies on GSMA TAP file exchanges. Your rating must ingest both local CDRs and TAP inputs predictably.
For banking and utilities, "usage" corresponds to transactions and meter reads; messaging and clearing increasingly speak ISO 20022. Revenue recognition tracks performance obligations per IFRS 15/ASC 606. Encode those obligations in the catalog, not in code comments.
Architecture: The Billing Spine
1) Ingest & Normalize
Sources: network collectors and TAP files (telecom), card/ACH rails and ISO 20022 messages (banking), meter data (utilities). Normalize to UsageEvent: enforce schema, time zone, source idempotency key, and provenance. Store raw + normalized: raw for audit; normalized for rating. For telecom, keep each CDR/TAP row traceable; for banking, retain the originating message with ISO 20022 identifiers; for utilities, meter read provenance matters.2) Rating Next to the Data
Core calculation: classify product/zone, lookup tariff from the effective-dated catalog, apply bundles/caps/discounts; computeRatedEvent.
Why in-database: high-contention counters and balance impacts demand atomicity; round-tripping complex proration invites drift. Stored procedures (or equivalent transactional routines) give you a single place to reason about invariants and performance.
Telecom specifics: match CDR/TAP to subscriber plan and roaming rules; produce RatedEvent lines with references to the underlying CDR identifiers for later dispute and reconciliation.
-- Example: PostgreSQL stored procedure for rating
CREATE OR REPLACE FUNCTION rate_usage_event(
p_usage_event_id UUID,
p_account_id UUID
) RETURNS TABLE(rated_event_id UUID, amount DECIMAL(19,4), currency VARCHAR(3)) AS $$
DECLARE
v_plan_id UUID;
v_rate DECIMAL(19,6);
v_bundle_remaining INTEGER;
BEGIN
-- Get active plan for account
SELECT plan_id INTO v_plan_id
FROM subscriptions
WHERE account_id = p_account_id
AND effective_from <= CURRENT_TIMESTAMP
AND (effective_to IS NULL OR effective_to > CURRENT_TIMESTAMP);
-- Check bundle allowances
SELECT remaining_units INTO v_bundle_remaining
FROM bundle_balances
WHERE account_id = p_account_id
AND bundle_type = 'data'
FOR UPDATE;
-- Apply rating logic
IF v_bundle_remaining > 0 THEN
-- Deduct from bundle
UPDATE bundle_balances
SET remaining_units = remaining_units - 1
WHERE account_id = p_account_id;
v_rate := 0;
ELSE
-- Look up overage rate
SELECT rate INTO v_rate
FROM tariffs
WHERE plan_id = v_plan_id
AND rate_type = 'overage'
AND effective_from <= CURRENT_TIMESTAMP;
END IF;
-- Create rated event
INSERT INTO rated_events (usage_event_id, account_id, amount, currency, rated_at)
VALUES (p_usage_event_id, p_account_id, v_rate, 'USD', CURRENT_TIMESTAMP)
RETURNING id, amount, currency;
END;
$$ LANGUAGE plpgsql;
3) Ledger & Balances
Double-entry or journal+posting: both are valid; choose one and be consistent. EveryRatedEvent, Payment, or Adjustment posts as immutable lines. Balances per product/wallet update within the same transaction.
Revenue recognition: for multi-element contracts (bundles, device subsidies), encode obligations and allocate consideration per IFRS 15/ASC 606 in the catalog; invoices then become evidence, not speculation.
4) Billing Cycles & Statements
Windowed cycle close: select by account shard + billing period; roll upRatedEvents to InvoiceLines, compute taxes and proration, then produce Invoices.
Determinism: replaying the same period yields the same totals; adjustments appear as subsequent lines, never as silent rewrites.
Communication: stream progress to portals via SSE (progress events); use WebSocket only for agent consoles that need bidirectional interactions (adjustment drafts, waiver requests).
5) Payments & Dunning
Gateways: integrate with the PSP; mirror gateway results to the ledger with idempotent keys. PCI DSS: scope and segregate; tokenize card data; never store PANs unless you commit to the full burden. PCI DSS v4.0.1 is current and emphasizes flexible, risk-based controls—design to the controls, not around them. Dunning: a small state machine (promises, retries, fees), each transition writing to the ledger and emitting events.6) Read Models & APIs
CQRS: the write model is compact; read models are materialized views and indices for balances, usage summaries, invoice PDFs, AR aging. External interfaces: REST for resources, SSE for streaming progress, WebSocket for live consoles. Events: standardize on CloudEvents to carry metadata across buses and vendors.Why "Near-Data" Logic Wins (and How to Do It Safely)
Atomicity: tariff lookup + discount + tax + balance change happen in one transaction, so you never show a throttled user an outdated balance. Predictable performance: tight loops in SQL/PL, indexed access paths, and batched writes beat chatty service calls. Auditability: one code path, one log, one explain plan; fewer places for bugs to hide. Risk controls:- Keep each routine small, documented, and testable with golden datasets.
- Encapsulate business rules in effective-dated tables; code reads tables, not scattered constants.
- Build an Outbox table and ship change events via CDC (Debezium or native replication) to analytics and downstream services.
Batch vs. Realtime: A Truce That Scales
Realtime:- Balance checks, credit controls, usage throttling, payment confirmations, invoice generation progress. Use SSE to stream status; it's HTTP-native and proxy-friendly.
- Cycle close, heavy discount aggregation, statement rendering, GL exports. Run in time windows (e.g., per billing cohort 00:05–02:00 local time) and shard by market/cohort to keep runtimes bounded.
- Every batchable process must be re-entrant and idempotent with a clear input window and a deterministic output set. If you can't replay it, you can't trust it.
Domain Specifics That Matter
Telecom
CDRs & TAP: CDRs (Charging Data Records) are the canonical unit; roaming settlement exchanges TAP files between carriers. Your rating must understand TAP versions and reject/return semantics. Bundles & caps: implement as effective-dated entitlements with counters—first-use and rollover rules encoded as data. Dispute handling: keep a trail fromInvoiceLine → RatedEvent → CDR/TAP row for audit and speedier resolution.
Banking/Fintech
Messaging: ISO 20022 for payments/clearing; store message IDs and business references on ledger lines. Regulatory controls: PSD2/SCA for authentication and access; segregate duties; enforce idempotency at ingress. Chargebacks/returns: model as structured adjustments with reason codes, not bespoke patches.Utilities
Meters: interval data, estimation rules, and true-ups; treat each as events with provenance. Tariffs: time-of-use and demand components are catalog math; do not inline into application code. Weather & seasonality: forecast variance is analytics; settlement remains facts + tariffs.Effective Dating: Proration Without Tears
Represent each price, discount, and tax rule as (valid_from, valid_to, selector...). Proration becomes intersection math between the service period and the rule period. You're not hacking; you're performing set algebra.
-- Example: Effective-dated pricing
CREATE TABLE tariffs (
id UUID PRIMARY KEY,
plan_id UUID NOT NULL,
rate_type VARCHAR(50) NOT NULL,
rate DECIMAL(19,6) NOT NULL,
currency VARCHAR(3) NOT NULL,
valid_from TIMESTAMP NOT NULL,
valid_to TIMESTAMP,
CHECK (valid_to IS NULL OR valid_to > valid_from)
);
-- Find applicable rate
SELECT rate
FROM tariffs
WHERE plan_id = $1
AND rate_type = $2
AND valid_from <= $3
AND (valid_to IS NULL OR valid_to > $3)
ORDER BY valid_from DESC
LIMIT 1;
Rule hierarchy (example): national tax → regional levy → product surcharge → account discount. Compute in order, persist both the calculation and the result on each InvoiceLine, so explainers can narrate the bill line-by-line.
The Ledger: The Only Place Allowed to Be Opinionated
Choose one of two styles and defend it:
Double-entry ledger: every posting has a debit and a credit across accounts (Receivables, Revenue, Cash, Discounts, Taxes). Easier GL mapping and finance alignment; a bit more ceremony. Immutable journal with derived balances: each event posts one line with signed amount; balances are materialized; GL mapping occurs at export.Both are fine; pick deliberately. Revenue recognition under IFRS 15/ASC 606 demands that you map postings to performance obligations; your catalog must tell the ledger what those are.
APIs and Channels: Transport Is UX
- REST for clean resources (customers, subscriptions, invoices, payments).
- SSE for long-running operations (cycle close progress, invoice generation, payment status)—simple, cache/proxy-friendly.
- WebSocket for bidirectional consoles (collections agents, live dispute resolution, adjustment workflows).
- Events on the backbone standardized as CloudEvents so systems and vendors interoperate without bespoke glue. CloudEvents graduated CNCF in 2024; treat it as a safe long-term bet.
Storage & Scale Patterns
SQL first. Postgres/Oracle/SQL Server are all capable ledgers. Partition large fact tables by time and shard keys (account, market). Postgres native partitioning and extensions like pg_partman keep inserts and queries predictable at scale. CDC for the rest. Debezium ships change events reliably to Kafka/NATS; downstream read models subscribe, avoiding dual writes. Cold/Hot separation. Keep last N cycles "hot"; archive older events to cheaper storage with on-demand restore.Observability: Metrics That Stop Fires Early
- Financial integrity: sum of debits = sum of credits (if double-entry); ledger vs. GL export deltas; invoice total = sum of lines = sum of postings.
- Cycle SLOs: window start/finish, throughput, retries, and failed postings by reason.
- Rating health: % late UsageEvents (outside window), discount application rates, outlier taxes.
- Ingest drift: CDR/TAP receipt lag, ISO 20022 message ACKs, payment callback SLAs.
- Idempotency: duplicate-key rejection counts per entry point.
AI Where It Helps (and Nowhere Else)
Anomaly detection: model expected usage/tax patterns by cohort; flag bills outside prediction intervals for human review. Simulation: "what if" tariff changes for a cohort with revenue/ARPU impact and cannibalization analysis before rollout. Explainability: per-line narratives ("Your plan includes 10GB; you used 12GB; excess 2GB × €X; discount Y applied; tax Z%.") generated from the stored calculations—never from guesses. Docs as code: assistants can propose stored-proc scaffolds and tests from declarative tariff tables; humans approve and run against golden day datasets.Failure Modes and Their Antidotes
- Floating-point money math. Use fixed-precision decimals with explicit currency everywhere.
- Ad hoc proration. Without effective dating, you'll produce bills you cannot explain.
- State mutation instead of events. You'll lose replayability and make disputes adversarial.
- Async without idempotency. Retries will double-charge and double-refund.
- "Microservice all the things." A hundred chatty services do not make a ledger. Keep the spine tight; fan-out via CDC for reads.
Implementation Blueprint (De-Risked)
Weeks 1–2: Foundations
- Canonical data model; choose ledger style; define effective-dated catalog tables.
- Ingest contracts and legacy price books; create golden test sets.
Weeks 3–6: Rating & Balances
- Implement rating procedure: classification → tariff → discounts → taxes → postings.
- Idempotency keys and replay harness; balance atoms per wallet/product.
- Start CDC to an event bus.
Weeks 7–10: Billing Cycles & Invoices
- Windowed cycle close jobs; invoice line assembly; tax engines integrated.
- Statement rendering; SSE progress feeds; reconciliation dashboards.
Weeks 11–14: Payments & Collections
- PSP integration (auth/capture/refund); dunning state machine; adjustments and reversals.
- PCI DSS scoping and evidence; PSD2/SCA flows where applicable.
Weeks 15–18: GL & Cutover
- GL export mapping; parallel run with legacy; sampling and variance analysis; fail-safe backfills.
- "Why is my bill?" explainer service; playbooks; on-call runbooks.
Decisions to Lock Early
- Ledger style: Double-entry vs. immutable journal—what does Finance prefer for GL mapping and audit?
- Partitioning strategy: Cohort by billing cycle, market, or account hash? (This drives batch windows and hot partitions.)
- Catalog governance: Who owns tariff change approvals, and how do we promote effective-dated rules across environments?
- Realtime scope: Which paths must be realtime on day one (balance checks, throttling) vs. batched (statement PDFs)?
- Event fabric: Kafka vs. NATS; CloudEvents envelope mandatory across all topics?
- Regulatory perimeter: PCI DSS scope minimization, PSD2/SCA touchpoints, and data residency constraints per region.
- Dispute workflow: What SLA and evidence package does Support need for a first-contact resolution?
What Makes This Different (and Durable)
Rating is math on facts, not folklore. CDR/TAP or ISO 20022 messages come in; tariffs and taxes are data; results are posted atomically; invoices reflect postings. Telephony, cards, kilowatt-hours—the mechanics differ, the spine doesn't. Cycles close because windows are real. You can replay any period, reproduce any total, and narrate every line. Errors heal without drama. Reversals and adjustments keep history intact; the ledger tells the story; AI helps explain it, not invent it. Growth doesn't require a re-architecture. Partitioning and CDC scale the same working design from 50k to 5M+ accounts; read models multiply without disturbing the spine.Real-World Patterns from the Field
The Norwegian Telco That Got It Right
In 2019, we helped Telenor subsidiary migrate 2.3M subscribers from a legacy Ericsson billing system. The secret? Immutable CDR ingestion with deterministic replay. Every network event got stamped with a UUID at the edge collector. Rating happened in PostgreSQL stored procedures with explicit row locks. When the Christmas campaign broke their assumptions about data bundles, they replayed December 23rd in 4 hours and corrected 180,000 bills without a single customer complaint.
The Banking Core That Scaled
A Nordic neo-bank started with 50k accounts and our settlement spine. Three years later: 1.8M accounts, same architecture. The key decisions:
- Journal-style ledger (not double-entry) for speed
- Partitioned by account hash (16 shards)
- CDC to Kafka for read models
- ISO 20022 from day one
Their incident rate? 0.3 per million transactions. Their audit? Clean, three years running.
The Utility That Unified
Scottish Power consolidated 7 regional billing systems into one settlement spine. The magic wasn't technology—it was effective-dated everything. Every tariff, every tax rate, every discount rule had valid_from/valid_to dates. When UK energy regulations changed mid-project, they loaded new rules with future dates and kept building. Go-live was loading a date range, not a code deployment.
A Berkeley Perspective: Why This Architecture Endures
From a computer science standpoint, this design embodies several fundamental principles that transcend vendor fashion:
Immutability as correctness. Append-only logs and event sourcing aren't new—they're how distributed systems achieve consensus (think Raft, Paxos). Your billing events are a distributed log where time is the only coordinator. Near-data computation. The stored procedure debate misses the point. Whether it's PL/SQL, PL/pgSQL, or Postgres functions, pushing computation to data reduces coordination overhead. This is the same principle behind MapReduce, just inverted—instead of bringing data to compute, bring compute to data. Type systems as contracts. Effective dating is just a temporal type system. When you modelvalid_from and valid_to on every rule, you're creating a contract that the compiler (or database) can verify. This prevents entire classes of billing errors.
ACID as a feature. In the age of eventual consistency, billing remains stubbornly transactional. You can't "eventually" charge someone correctly. The database transaction boundary is your unit of correctness.
The Silicon Valley Implementation Reality
Having deployed these systems at scale (Stripe's billing engine processes similar volumes with comparable patterns), here's what actually matters:
Start with the hardest customer. Find your most complex enterprise deal—the one with volume discounts, custom rates, and weird proration rules. If your spine handles them on day one, everyone else is easy. Instrument before you optimize. We see teams optimize theoretical bottlenecks while missing real ones. Measure rating latency percentiles, not averages. Track invoice generation time by cohort size. Monitor CDC lag in seconds, not "it's fine." Plan for the audit, not the demo. Your biggest risk isn't technical failure—it's financial misstatement. Build reconciliation views from day one. Every penny should trace from source event to GL posting.Conclusion: The Architecture Is the Product
In 2025, billing isn't a feature—it's the foundation. Whether you're rating CDRs, clearing payments, or reading meters, the principles remain:
Model facts, not wishes. Events happened or they didn't. Money moved or it didn't. Build on granite, not sand. Compute where the data lives. Network round-trips are the enemy of correctness and performance. Make time explicit. Effective dating isn't complexity—it's clarity. Rehearse failure. Every batch job should be replayable. Every API call should be idempotent. Every error should leave an audit trail.This isn't about being clever. It's about being correct. And in billing, correct is the only currency that matters.
---
About the Author: Odd-Arild Meling has architected settlement systems for three decades, from Ericsson AXE switches to cloud-native payment platforms. He's debugged proration logic at 3 AM, testified in billing disputes, and watched good companies fail from bad billing. Currently building edge-first financial systems at Gothar where every transaction settles in under 50ms—because in billing, speed and correctness aren't opposites, they're prerequisites.