Architecture¶

Four-layer model¶

The source design distinguishes four architectural layers — treat them as layers, not optional features. Skipping any one breaks the others.

Layer	What it answers
Entity	What things exist in the graph (companies, sectors, indicators, statement items, events, people, products, regimes, documents).
Relationship	How those entities connect — categorical (BELONGS_TO, COMPETES_WITH), inter-company (SUPPLIES, CUSTOMER_OF), causal (INFLUENCES with magnitude/sign/lag/regime), temporal (TRIGGERED), provenance (SOURCED_FROM).
Property	Attributes carried by nodes and edges — time-bounded, sourced, confidence-rated.
Query	How the system retrieves and reasons — Cypher patterns, multi-hop traversals, the LLM-driven natural-language interface on top.

Properties are time-bounded from day one. Without temporal property bindings, you cannot answer "who was Apple's CEO when the Vision Pro launched?" — and you have no way to backtest historical state.

Reasoning patterns¶

Six reusable Cypher templates cover the great majority of useful financial questions. The LLM's job is to map a natural-language question to the right pattern, fill in parameters, and compose the result — never to write Cypher freehand.

Pattern	Question shape
`causal_attribution`	"Why is X happening?"
`forward_propagation`	"What happens if X moves?"
`comparative_analysis`	"How does A compare to B?"
`supply_chain_traversal`	"Who depends on X?"
`risk_decomposition`	"What are the biggest risks to X?"
`regime_counterfactual`	"How would X behave under a different regime?"

Six-stage LLM pipeline¶

flowchart LR
    Q[User question] --> I[1. Intent<br/>classification]
    I --> P[2. Parameter<br/>extraction + validation]
    P --> X[3. Templated<br/>query execution]
    X --> E[4. Result<br/>enrichment]
    E --> C[5. Answer<br/>composition]
    C --> V[6. Verification<br/>claim-to-result]
    V --> A[Cited answer]

The LLM never writes raw query code; it only fills slots in pre-tested patterns.
Validation in stage 2 catches most hallucinations early (a fabricated ticker fails to resolve).
Verification in stage 6 catches numerical hallucinations in composition (any claim not present in the structured result is rejected).
Every stage produces a logged artifact — the audit trail is what makes the system defensible.

Hybrid storage¶

The graph is the source of relational truth. It is not the time-series store, the document store, or the vector store.

Store	Role
Neo4j	Entities, relationships, structural properties. Source of truth for traversal.
Postgres + TimescaleDB	Indicator time-series, financial historical series, market data.
Postgres + pgvector	Embeddings of news, transcripts, 10-K business descriptions.
Postgres (audit schema)	Append-only audit log of every Q&A.
S3-compatible (MinIO/R2)	Raw filings, PDFs, large unstructured artifacts.

Graph nodes hold handles (URLs, IDs) into the other stores. The orchestrator joins them at query time.

Free-tier deployment topology¶

flowchart TB
    subgraph GH["GitHub (free, public repo)"]
        SRC["Source · CI · Actions cron"]
        REG["GHCR<br/>(container registry)"]
        PG["GitHub Pages<br/>(static UI + docs)"]
        REL["Releases / LFS<br/>(Parquet snapshots)"]
    end

    subgraph PARTNERS["Free-tier partners"]
        NEO["Neo4j AuraDB Free<br/>200k nodes / 400k rels"]
        SUPA["Supabase Free<br/>Postgres + pgvector"]
        R2["Cloudflare R2 Free<br/>10 GB object store"]
        HFS["Hugging Face Space<br/>FastAPI container"]
    end

    USER[User browser<br/>BYOK Anthropic key] --> PG
    PG --> HFS
    HFS --> NEO
    HFS --> SUPA
    HFS --> R2
    USER --> ANT[Anthropic API]

    SRC -- scheduled ingest --> NEO
    SRC -- scheduled ingest --> SUPA
    REG -- container pull --> HFS
    SRC -- mkdocs build --> PG

Repo layout¶

market-view/
├── schema/         # Cypher constraints, node listing, migrations, seed YAML
├── ingestion/      # Source-specific workers (EDGAR, FRED, Treasury, news)
├── kg/             # Neo4j client, repositories, the 6 query patterns
├── orchestrator/   # Six-stage pipeline
├── llm/            # Provider-pluggable client, versioned prompts
├── api/            # FastAPI surface
├── eval/           # Benchmark + adversarial sets, runner, metrics
├── ui/             # Next.js, exports to static (Phase 4)
├── docs/           # MkDocs → GitHub Pages
└── tests/          # unit + integration (testcontainers) + golden patterns

Phased delivery¶

Phase 0 (this commit): skeleton + docs site + local stack. Phase 4 is the first end-to-end live demo on the free tier. See SYSTEM_REQUIREMENTS §5 (in the parent directory) for the full 11-phase plan.