METHODOLOGY & TRANSPARENCY

How the FOMC Engine Works

A complete, honest account of how this simulation produces its results — what it sees, how it reasons, how accurate it has been, and exactly where it can be wrong — so you can decide for yourself how much to trust it.

↳ Prefer the short version? Read the FAQ

0Start here

This tool runs a simulation of a Federal Open Market Committee (FOMC) meeting. It populates a virtual committee with language-model "personas" of the real participants, feeds them the same kind of economic information a real member would read before a meeting, lets them deliberate and vote, and reports the rate decision the simulated committee reached. When the meeting being simulated has already happened, we show the engine's call next to what the Fed actually did. When it hasn't happened yet, we deliberately show nothing for the real outcome — only the engine's forward-looking call.

The honest one-screen summary

What gives it credibility

On a 17-meeting backtest (Mar 2024 – Mar 2026) it called the direction correctly 94.1% of the time, with a mean error of 1.5 basis points.
It uses only information that was available before each meeting — no peeking at the answer.
Every claim in this document is traceable to the actual code and to a dated evaluation snapshot.

What should temper your trust

Each result you see is one random draw of a stochastic process. Run it again and it can differ.
The headline accuracy is from one model configuration on 17 meetings; your own run, with your own provider, may not match it.
It is a research/educational simulation, not a forecast you should trade on.

How to read this document

Every chapter opens with a plain-language box like this one. If you just want the gist, read the boxes and the glossary. If you work in monetary policy or markets and want the mechanics, the detail below each box is for you. The two chapters that matter most for trust are §6 (track record) and §7 (limits).

1What the engine is — and what it is not

In plain terms

Think of it as a flight simulator for an FOMC meeting. It is a careful imitation of how the committee reasons, not a crystal ball and not a feed from inside the Fed. It can be useful and still be wrong.

The claim, in one sentence

Given the economic information available before a particular FOMC meeting, the engine simulates the committee's deliberation and reports the policy decision that simulation reaches, together with a vote breakdown, a policy statement, and meeting minutes written in the committee's style.

What it is not

Not a statistical forecasting model. It does not regress rates on macro variables. It reasons in natural language through personas, which is a strength for realism and a weakness for reproducibility.
Not trained on the answer. The simulation for a given meeting is constructed only from information dated before that meeting. The real decision is never an input.
Not the real FOMC. The personas are models of real people, assembled from public biographies and statements. They are not the individuals, and they do not have access to the confidential staff materials (the Tealbook) that real members see.
Not a market or a prediction market. It does not aggregate trader money or crowd forecasts.
Not financial advice. See §7.

The intellectual lineage

The approach follows a line of research on using large-language-model agents to simulate committee decision-making — most directly the "FOMC in silico" dual-track framework of Kazinnik & Sinclair (2025), and earlier work by Park et al. (2022) and Horton (2023) on generative agents. This engine productizes the LLM-deliberation half of that research and runs it on live, point-in-time data.

2What the committee sees — the inputs

In plain terms

Before the meeting, every simulated member is handed the same packet: the latest economic numbers, a current "state of the regions" report, recent news and speeches, and a reminder of what the Fed did last time. The quality of the answer depends entirely on the quality and timing of this packet.

2.1 Macro indicators

The committee receives a dashboard of core U.S. indicators — growth and output, inflation and prices, the labor market, and financial conditions (for example real GDP, CPI and core PCE, the unemployment rate and nonfarm payrolls, the fed funds rate and Treasury yields). These are sourced from official series (FRED and BLS) by a separate daily pipeline and persisted in render-ready form; the application reads them, it does not call FRED on the request path.

2.2 The synthetic Beige Book

Be aware

The regional economic narrative the committee reads is synthetic. It combines the most recent official Federal Reserve Beige Book with current web research across the twelve districts to produce an up-to-date assessment. It is a reasonable, sourced approximation — not an official Fed document — and it can drift from, or misread, conditions on the ground.

This design exists because the official Beige Book is published only eight times a year and is often weeks stale by meeting day; the synthetic addendum keeps the regional picture current. Where the official and synthetic assessments disagree, that disagreement is itself information the committee weighs.

2.3 News and speeches

The packet includes a curated set of recent economic headlines and the recent public speeches of the participants. These shape how a persona frames the trade-off between inflation and employment.

2.4 The prior meeting and the regime anchor

The engine inherits a policy regime from the previous meeting — easing, tightening, or neutral — derived first from the prior rate change (a cut of 5bp or more → easing; a hike of 5bp or more → tightening) and otherwise from explicit transition language in the prior statement, defaulting to neutral. This single anchor is, by the project's own evaluation, the largest accuracy lever: without it, a flat "hold is most likely" prior suppresses cuts during an active easing cycle.

2.5 Point-in-time discipline (why this matters for trust)

Why this matters

A backtest is only honest if the model cannot see the future. Each simulation is assembled from data dated before the meeting it simulates, and the real decision is fetched on a separate channel that the simulation never reads. This is what lets us claim the track record in §6 is a genuine out-of-sample test rather than hindsight.

3How the personas are built

In plain terms

Each seat at the table is a character sheet: who the person is, how hawkish or dovish they lean, how data-driven they are, and how often they have historically broken from the group. The engine builds these from public records, then steers each agent to stay in character through the whole meeting.

3.1 Who is in the room

The committee is modeled as a 19-participant body (seven Governors and twelve Reserve Bank presidents), with the twelve voting seats resolved per year according to the real FOMC rotation. Non-voting participants still speak in the deliberation, as they do in real meetings.

3.2 The source of truth for each persona

Personas are authored from a curated members database holding each participant's biography, tenure, education, district, and public policy record. The runtime does not read the database live; personas are exported to files the engine loads.

3.3 The four-axis trait vector

Each persona carries a calibrated four-axis vector that steers the agent's tone and vote:

hawkishness (−1 dove … +1 hawk)
data_dependence (0 … 1 — how much they move with incoming data)
guidance_weight (0 … 1 — how much they anchor on prior forward guidance)
dissent_propensity (0 … 1 — baseline willingness to break from consensus)

The vector is re-asserted to the agent immediately before it votes, because deliberation pressure across a dozen turns can otherwise push a minority-stance member toward the majority.

3.4 Historical dissent priors

On top of the trait vector, each member is given a person-specific base rate of dissent computed from their real historical voting record. This is more empirically grounded than a one-size-fits-all rule and is what keeps reliable consensus-voters voting with the chair and habitual dissenters willing to break on close calls.

Be aware

A persona is an inferred model of a real person, not the person. Stance estimates are built from public statements and can be wrong, out of date, or miss a private change of view. New appointees with little public record are the hardest to model.

4The simulation, step by step

In plain terms

The meeting runs like a real one: a staff briefing, members forming views, a moderated debate, the chair proposing a rate, a vote, and finally a written statement and minutes. End to end it is roughly two dozen model calls.

Phase 1 — BriefingA staff agent synthesizes the input packet (§2) into a concise pre-meeting brief shared, byte-for-byte, with every member so the deliberation starts from a common information set.
Phase 2 — Committee initializationEach member is instantiated with the shared brief plus its own persona and trait vector. (The shared portion is identical across members, which lets the providers' prompt cache cut cost.)
Phase 3 — DeliberationMembers speak in turn over roughly a dozen turns: presenting views, responding to one another, and weighing the employment–inflation trade-off in character.
Phase 4 — Chair proposalThe chair summarizes the discussion and proposes an agenda rate, anchored on the inherited regime (§2.4).
Phase 5 — VotingEach member casts an explicit assent or dissent against the proposal, with a one-line reason on dissent, and submits an individual rate projection used to build the dot plot.
Phase 6 — Summary, statement & minutesThe engine assembles the vote, then writes a policy statement and meeting minutes in the committee's documentary style.

A full run is on the order of ~28 model calls. On the reference configuration this costs roughly $0.20 of the visitor's own API budget per simulation. The run streams to your browser live, so you watch the meeting unfold rather than waiting for a finished transcript.

5From debate to a number

In plain terms

The chair's proposal is turned into a concrete call — hold, cut, or hike, by how many basis points, and the resulting target range — plus a dot plot of where each member sees rates heading.

The headline call has three parts: a direction (HOLD / CUT / HIKE) read from the chair's structured proposal; a basis-point change parsed from that proposal; and the resulting target range, computed by shifting the prior range by that change. Each member's individual rate projection feeds the Summary of Economic Projections dot plot on quarterly (March / June / September / December) meetings, which is overlaid against the Fed's real published dots once a meeting is released.

A known soft spot

The basis-point figure is currently extracted from the chair's free-text proposal by pattern-matching. It is robust in practice but is a text-parsing step, not a closed-form aggregation of member votes. Hardening this into a deterministic rule is on the roadmap.

6How much should you trust it?

In plain terms

On a backtest of 17 recent meetings the engine got the direction right almost every time and was off by only about a basis point on average. That is a real result — but it is a small sample, from one model, and a backtest is not a promise about the future.

6.1 How the evaluation was run

The engine was run on 17 FOMC meetings from March 2024 to March 2026, each simulated point-in-time (the model saw only pre-meeting data) and scored blind against the real outcome. Text outputs were scored with sentence embeddings and a three-judge LLM ensemble.

6.2 The headline numbers

94.1%Direction accuracy (16/17)

1.5 bpMean absolute error

82.8%Vote alignment

4.07/5Statement fidelity (judge)

What these do — and do not — mean:

Direction accuracy (94.1%) — the engine called hold vs. cut vs. hike correctly in 16 of 17 meetings. It does not mean it will be right 94% of the time on the next meeting; small samples carry wide error bars.
Mean absolute error (1.5 bp) — the average gap between the engine's basis-point call and the real change. Low partly because most meetings were holds.
Vote alignment (82.8%) — share of individual member votes that matched the real record. Dissents are the hardest thing to get right and are where most of the gap lives.
Statement fidelity (4.07/5) — how close the engine's written statement reads to the real one, per an LLM-judge ensemble. This is a stylistic/semantic similarity score, not a measure of decision correctness.

6.3 The one miss

The single directional miss was 7 May 2025: the engine called a cut (−38 bp) while the committee held. It is shown here on purpose — a transparency document that hides its failures is not transparent. The 2025 regime-shift failure mode that produced it has since been addressed by the inherited-regime anchor (§2.4), but the meeting remains a fair warning that turning points are where the engine is weakest.

6.4 Per-meeting record

Meeting	Real	Engine	Δbp	Regime	Match
2024-03-20	HOLD	HOLD	0	neutral	✓
2024-05-01	HOLD	HOLD	0	neutral	✓
2024-06-12	HOLD	HOLD	0	neutral	✓
2024-07-31	HOLD	HOLD	0	neutral	✓
2024-09-18	CUT −50	CUT	−50	neutral	✓
2024-11-07	CUT −25	CUT	−25	easing	✓
2024-12-18	CUT −25	CUT	−25	easing	✓
2025-01-29	HOLD	HOLD	0	easing	✓
2025-03-19	HOLD	HOLD	0	neutral	✓
2025-05-07	HOLD	CUT	−38	neutral	✕
2025-06-18	HOLD	HOLD	0	neutral	✓
2025-07-30	HOLD	HOLD	0	neutral	✓
2025-09-17	CUT −25	CUT	−25	neutral	✓
2025-10-29	CUT −25	CUT	−25	easing	✓
2025-12-10	CUT −25	CUT	−25	easing	✓
2026-01-28	HOLD	HOLD	0	easing	✓
2026-03-18	HOLD	HOLD	0	neutral	✓

Read the track record carefully

Seventeen meetings is a small sample, and the window was dominated by holds and a clean easing cycle — relatively easy to call. The personas and dissent priors are calibrated on historical behavior that overlaps this window, so the test is not fully out-of-sample at the persona level. And the headline numbers come from one model configuration; a live run with a different provider or model can perform differently. Treat 94.1% as "this approach has been promising on recent history," not as a probability for the next meeting.

7Limitations and sources of uncertainty

In plain terms

Here is everything we think could make a given result unreliable. Read this chapter before you lean on any single run.

7.1 You are seeing a single random draw

The biggest limitation

A simulation is a stochastic process. The result on your screen is one sample, not a distribution. Run the same meeting again and the debate, the vote split, and sometimes the call itself can change. The research this engine is based on runs the simulation ~100 times to get a distribution; the public demo currently runs it once for cost reasons. Until a probabilistic view ships (it is on the roadmap), treat one run as one opinion, not a consensus estimate.

7.2 Model nondeterminism and model dependence

Language models are nondeterministic, and results depend on which provider and model you choose. The published track record (§6) is for one specific configuration. Your run with a different model is a different experiment and may be more or less accurate.

7.3 Synthetic and inferred inputs

The regional narrative is a synthetic Beige Book (§2.2) and the participants are inferred personas (§3). Both are sourced approximations of reality and can be wrong, especially for new participants or fast-moving regional conditions.

7.4 Small evaluation window, turning points, novel shocks

The engine has been tested on 17 mostly-calm meetings. It is weakest exactly where it matters most: turning points (the one miss was a turning-point meeting) and novel shocks with no historical analog. Regime changes, surprise data revisions, and political-pressure episodes are harder than the average meeting in the backtest.

7.5 It is not a market forecast or financial advice

The engine reasons about what a committee might decide; it does not price assets, does not know what markets have already priced in, and is not calibrated for trading. Do not make investment, hedging, or risk decisions on the basis of its output.

7.6 What we deliberately do not do

For any meeting whose real outcome has not yet been published, the engine refuses to display a real decision at all — the "Committee's Call" card stays blurred. We would rather show nothing than invent a number or imply we know the answer. See §8.

8Honesty by design

In plain terms

Several choices in the product exist specifically to stop it from misleading you, even by accident.

The blur. Unreleased meetings render the real-decision card as a frosted placeholder labeled "To Be Released." No invented values, ever.
Fact and opinion travel on separate channels. The engine's simulation (an opinion) streams over one channel; the real, published decision (a fact) is served from a separate, cacheable endpoint. The "unreleased" state is therefore a true data condition, not a UI trick.
Point-in-time inputs. No future data enters a simulation (§2.5).
The failures are shown. The one miss is in the table in §6, not hidden.

9Privacy and your API key

In plain terms

You bring your own model API key. It is used to run the meeting and is never stored, logged, or shown to anyone — and the engine caps how much of it a single run can spend.

Held in memory only. Your key lives in your browser tab's memory and in the server's memory for the duration of one simulation. It is never written to disk, a cookie, or local storage, and is discarded when the run ends.
Encrypted in transit. It travels over a TLS-protected WebSocket on the simulation's opening message — never in a URL.
Never logged; redacted from errors. Provider error messages that might echo a key fragment are scrubbed before they reach you, and the key is never written to server logs.
Spend is bounded. A run is automatically stopped if you walk away, and there is a hard time cap, so an abandoned simulation cannot keep burning your budget.

10Glossary

FOMC: Federal Open Market Committee — the body within the Federal Reserve that sets U.S. monetary policy, including the federal funds rate target.
Federal funds rate / target range: The Fed's main policy interest rate, set as a range (e.g. 4.25%–4.50%). "Effective" fed funds is where the market actually trades within that range.
Basis point (bp): One hundredth of a percentage point. A 25 bp cut lowers the rate by 0.25%.
Hold / Cut / Hike: The three possible policy directions: keep the rate, lower it, or raise it.
Hawkish / Dovish: A hawk prioritizes fighting inflation (favors higher rates); a dove prioritizes employment and growth (favors lower rates).
Dissent: A formal vote against the committee's decision. Real dissents are rare (historically well under 10% of votes) and are the hardest thing for the engine to predict.
Beige Book: The Fed's qualitative report on economic conditions across its twelve districts, published eight times a year. This tool uses a synthetic, more-current version (§2.2).
SEP / Dot plot: Summary of Economic Projections — each participant's view of where rates should head, published quarterly as an anonymous scatter of "dots."
Regime: Here, the policy stance inherited from the prior meeting: easing, tightening, or neutral (§2.4).
Point-in-time: Using only data that existed at a chosen past moment, so a backtest cannot accidentally see the future.

11Versions and sources

This document describes the engine as built and is kept in step with it. Where it cites numbers, those numbers are a snapshot of an upstream evaluation artifact and are not hand-edited here.

Item	Source / value
Accuracy figures (§6)	Evaluation snapshot dated 2026-04-24, eval window 2024-03 → 2026-03, n = 17 meetings, reference engine configuration.
Text-fidelity scoring	Sentence-embedding similarity plus a three-judge LLM ensemble; mean-across-judges reported.
Pipeline & inputs (§2–§5)	Described from the engine and API source as of this document's revision.
Research lineage (§1)	Kazinnik & Sinclair (2025), "FOMC in Silico"; Park et al. (2022); Horton (2023).

Disclaimer

This tool is a research and educational simulation. It is not investment, financial, legal, or tax advice, and it is not affiliated with or endorsed by the Federal Reserve. Outputs can be incorrect. Do not rely on it for financial decisions.