Turning Sparse Relationship Data into a Complete Economic Map
How RedGraphs Estimates Hidden Money Flows Between Companies
Let’s start with a question we often hear from our clients:
“If only five percent of company relationships have actual reported dollar values, what’s the value of estimating the other ninety-five?”
It’s a great question, and the answer is what makes RedGraphs unique.
Every relationship in our dataset is real. Each one comes from verified public disclosures verified by S&P Global, representing genuine customer and supplier links where money changes hands. The only missing piece for most relationships is how much money changes hands.
Ignoring those 95% of links would be like trying to model the economy by looking only at companies that disclose their top customers. You would be missing almost all of the structure. Our job is to fill in the blanks, not by guessing, but by applying a rigorous mathematical method called Iterative Proportional Fitting (IPF). This patented approach lets us estimate the most likely money flows across the entire network in a way that is consistent, auditable, and grounded in financial reality.
We also ensure there is no forward-looking bias. Every estimate is calculated strictly based on the information available at that moment in time, just as it would have been seen by the market on that date. It is a true point-in-time network.
What We Are Estimating
Imagine the economy as a giant network of companies, each one a node, and every transaction between them a link. Formally, let G = (V, E) be a directed graph where each edge (i, j) in E represents a flow of money from supplier i to customer j. We want to estimate the dollar value xij ≥ 0 for every link.
We know some of these values directly, which we call x̂ij (x-hat), but most are missing. We also know each company’s total revenue and cost of goods sold, which give us row and column sums that all flows must satisfy. The goal is to fill in the unknowns in a way that is mathematically consistent with those totals.
Formally, we estimate X = {xij} such that it:
- Respects known values x̂ij for reported relationships
- Matches each company’s total revenue Ri and spend Cj
- Remains as close as possible to a prior belief qij about the intensity of each link (based on text, filings, or similarity)
Why This Matters for Investors
For quant funds and macro analysts, the benefits are immediate:
- Complete coverage. Every corporate connection is real. Estimating missing values reveals the true network topology and the hidden dependencies and second-order exposures that filings alone cannot show.
- Accounting consistency. IPF enforces hard balance constraints so portfolio-level exposures aggregate perfectly to reported financials.
- Signal discovery. Once you have a dollar-weighted network, you can trace shocks, measure concentration risk, and compute contagion and centrality metrics that were previously impossible.
- No forward bias. The entire network is built with point-in-time logic, meaning every historical snapshot reflects what was known on that date and is suitable for backtesting.
- Auditability. Every estimated flow can be traced to its constraints, priors, and convergence steps.
How IPF Works
Under the hood, IPF solves an optimization problem that looks like this:
\[ \min_{x_{ij}\ge 0} \sum_{(i,j)\in E} \left( x_{ij}\log\frac{x_{ij}}{q_{ij}} - x_{ij} + q_{ij} \right) \]
subject to the following linear constraints:
- \[\, x_{ij} = \hat{x}_{ij} \,\] for all known values
- \[\, \sum_j x_{ij} = R_i^{\ast} \,\] for each supplier’s total revenue
- \[\, \sum_i x_{ij} = C_j^{\ast} \,\] for each buyer’s total spend
This objective minimizes the information divergence between the final flows xij and their priors qij, while forcing the totals to align with the companies’ actual financial statements. In statistics, this is called a Kullback–Leibler projection. It finds the maximum likelihood estimate of the full matrix X.
The result has a simple closed form:
\[ x_{ij}^{\star} = a_i\, b_j\, q_{ij} \]
where ai and bj are scaling multipliers chosen so that every row and column adds up to the right totals. IPF alternates between adjusting rows and columns until everything balances perfectly.
The Algorithm in Simple Terms
- Start with what you know. Initialize xij = qij everywhere and lock in any actual reported values.
- Scale each supplier. Adjust rows so each company’s outgoing flows add up to its total revenue.
- Scale each buyer. Adjust columns so each company’s incoming flows add up to its total cost or spend.
- Repeat until balance. After a few iterations, everything converges to the maximum-likelihood configuration of flows.
It is simple, elegant, and proven to converge to the unique solution regardless of network size. In practice, we can scale this to millions of relationships in just a few passes.
Example
Suppose supplier A sells to two buyers, B and C. We know A’s total revenue is 100, and that one reported relationship is xAB = 20. We do not know xAC, but we have a prior belief that B accounts for 40% and C for 60% of A’s business.
IPF takes these constraints and quickly converges to xAC = 80. If we add more constraints, such as C’s total cost or sector averages, the algorithm refines the estimate to match all of them simultaneously. Multiply that logic across millions of companies and you get a coherent, data-driven picture of the global flow of money.
Why Not Just Use Actual Values?
Because the economy does not stop where disclosure does. Real corporate networks are dense, interconnected systems. If we only used links with reported amounts, we would lose 95% of the information about how those systems function. IPF fills in the missing values in a way that respects all accounting and structural realities, creating a complete and self-consistent map of global dependencies.
That completeness is what allows RedGraphs to unlock signals such as supplier contagion, customer momentum, concentration risk, and sector fragility. These features consistently show alpha in our backtests on Russell 2000 and 3000 universes.
No Forward Bias: Point-in-Time Precision
All estimates are built using a strict two-axis point-in-time framework. One axis tracks when a document was published and first processed by our system, and the other records the periods covered in that document. This means you can build a network as of May 2025 using only documents known by that date, even if they reference earlier years. It is the same principle used in financial backtesting, where no future information is ever used.
What This Unlocks
- Event propagation: Model how revenue surprises spread through supply chains.
- Concentration risk: Identify fragile networks or overexposed suppliers.
- Network momentum: Weight classic factors by upstream and downstream dollar centrality.
- Scenario testing: Simulate shocks, policy changes, or demand shifts in seconds.
Closing Thoughts
Estimating unknown values is not about guessing. It is about completing the picture. IPF gives us a mathematically sound, economically faithful way to infer hidden money flows between companies, producing a network that is both realistic and analytically powerful.
Once the network is complete, it becomes a living map of the global economy that lets you trace shocks, measure dependencies, and find alpha in the most unlikely places.
That is the beauty of RedGraphs: turning sparse public data into a coherent, measurable, and investable model of how money actually moves through the world.