Architecture¶
Four-layer model¶
The source design distinguishes four architectural layers — treat them as layers, not optional features. Skipping any one breaks the others.
| Layer | What it answers |
|---|---|
| Entity | What things exist in the graph (companies, sectors, indicators, statement items, events, people, products, regimes, documents). |
| Relationship | How those entities connect — categorical (BELONGS_TO, COMPETES_WITH), inter-company (SUPPLIES, CUSTOMER_OF), causal (INFLUENCES with magnitude/sign/lag/regime), temporal (TRIGGERED), provenance (SOURCED_FROM). |
| Property | Attributes carried by nodes and edges — time-bounded, sourced, confidence-rated. |
| Query | How the system retrieves and reasons — Cypher patterns, multi-hop traversals, the LLM-driven natural-language interface on top. |
Properties are time-bounded from day one. Without temporal property bindings, you cannot answer "who was Apple's CEO when the Vision Pro launched?" — and you have no way to backtest historical state.
Reasoning patterns¶
Six reusable Cypher templates cover the great majority of useful financial questions. The LLM's job is to map a natural-language question to the right pattern, fill in parameters, and compose the result — never to write Cypher freehand.
| Pattern | Question shape |
|---|---|
causal_attribution |
"Why is X happening?" |
forward_propagation |
"What happens if X moves?" |
comparative_analysis |
"How does A compare to B?" |
supply_chain_traversal |
"Who depends on X?" |
risk_decomposition |
"What are the biggest risks to X?" |
regime_counterfactual |
"How would X behave under a different regime?" |
Six-stage LLM pipeline¶
flowchart LR
Q[User question] --> I[1. Intent<br/>classification]
I --> P[2. Parameter<br/>extraction + validation]
P --> X[3. Templated<br/>query execution]
X --> E[4. Result<br/>enrichment]
E --> C[5. Answer<br/>composition]
C --> V[6. Verification<br/>claim-to-result]
V --> A[Cited answer]
- The LLM never writes raw query code; it only fills slots in pre-tested patterns.
- Validation in stage 2 catches most hallucinations early (a fabricated ticker fails to resolve).
- Verification in stage 6 catches numerical hallucinations in composition (any claim not present in the structured result is rejected).
- Every stage produces a logged artifact — the audit trail is what makes the system defensible.
Hybrid storage¶
The graph is the source of relational truth. It is not the time-series store, the document store, or the vector store.
| Store | Role |
|---|---|
| Neo4j | Entities, relationships, structural properties. Source of truth for traversal. |
| Postgres + TimescaleDB | Indicator time-series, financial historical series, market data. |
| Postgres + pgvector | Embeddings of news, transcripts, 10-K business descriptions. |
| Postgres (audit schema) | Append-only audit log of every Q&A. |
| S3-compatible (MinIO/R2) | Raw filings, PDFs, large unstructured artifacts. |
Graph nodes hold handles (URLs, IDs) into the other stores. The orchestrator joins them at query time.
Free-tier deployment topology¶
flowchart TB
subgraph GH["GitHub (free, public repo)"]
SRC["Source · CI · Actions cron"]
REG["GHCR<br/>(container registry)"]
PG["GitHub Pages<br/>(static UI + docs)"]
REL["Releases / LFS<br/>(Parquet snapshots)"]
end
subgraph PARTNERS["Free-tier partners"]
NEO["Neo4j AuraDB Free<br/>200k nodes / 400k rels"]
SUPA["Supabase Free<br/>Postgres + pgvector"]
R2["Cloudflare R2 Free<br/>10 GB object store"]
HFS["Hugging Face Space<br/>FastAPI container"]
end
USER[User browser<br/>BYOK Anthropic key] --> PG
PG --> HFS
HFS --> NEO
HFS --> SUPA
HFS --> R2
USER --> ANT[Anthropic API]
SRC -- scheduled ingest --> NEO
SRC -- scheduled ingest --> SUPA
REG -- container pull --> HFS
SRC -- mkdocs build --> PG
Repo layout¶
market-view/
├── schema/ # Cypher constraints, node listing, migrations, seed YAML
├── ingestion/ # Source-specific workers (EDGAR, FRED, Treasury, news)
├── kg/ # Neo4j client, repositories, the 6 query patterns
├── orchestrator/ # Six-stage pipeline
├── llm/ # Provider-pluggable client, versioned prompts
├── api/ # FastAPI surface
├── eval/ # Benchmark + adversarial sets, runner, metrics
├── ui/ # Next.js, exports to static (Phase 4)
├── docs/ # MkDocs → GitHub Pages
└── tests/ # unit + integration (testcontainers) + golden patterns
Phased delivery¶
Phase 0 (this commit): skeleton + docs site + local stack. Phase 4 is the first end-to-end live demo on the free tier. See SYSTEM_REQUIREMENTS §5 (in the parent directory) for the full 11-phase plan.