Skip to content

Architecture

Four-layer model

The source design distinguishes four architectural layers — treat them as layers, not optional features. Skipping any one breaks the others.

Layer What it answers
Entity What things exist in the graph (companies, sectors, indicators, statement items, events, people, products, regimes, documents).
Relationship How those entities connect — categorical (BELONGS_TO, COMPETES_WITH), inter-company (SUPPLIES, CUSTOMER_OF), causal (INFLUENCES with magnitude/sign/lag/regime), temporal (TRIGGERED), provenance (SOURCED_FROM).
Property Attributes carried by nodes and edges — time-bounded, sourced, confidence-rated.
Query How the system retrieves and reasons — Cypher patterns, multi-hop traversals, the LLM-driven natural-language interface on top.

Properties are time-bounded from day one. Without temporal property bindings, you cannot answer "who was Apple's CEO when the Vision Pro launched?" — and you have no way to backtest historical state.

Reasoning patterns

Six reusable Cypher templates cover the great majority of useful financial questions. The LLM's job is to map a natural-language question to the right pattern, fill in parameters, and compose the result — never to write Cypher freehand.

Pattern Question shape
causal_attribution "Why is X happening?"
forward_propagation "What happens if X moves?"
comparative_analysis "How does A compare to B?"
supply_chain_traversal "Who depends on X?"
risk_decomposition "What are the biggest risks to X?"
regime_counterfactual "How would X behave under a different regime?"

Six-stage LLM pipeline

flowchart LR
    Q[User question] --> I[1. Intent<br/>classification]
    I --> P[2. Parameter<br/>extraction + validation]
    P --> X[3. Templated<br/>query execution]
    X --> E[4. Result<br/>enrichment]
    E --> C[5. Answer<br/>composition]
    C --> V[6. Verification<br/>claim-to-result]
    V --> A[Cited answer]
  • The LLM never writes raw query code; it only fills slots in pre-tested patterns.
  • Validation in stage 2 catches most hallucinations early (a fabricated ticker fails to resolve).
  • Verification in stage 6 catches numerical hallucinations in composition (any claim not present in the structured result is rejected).
  • Every stage produces a logged artifact — the audit trail is what makes the system defensible.

Hybrid storage

The graph is the source of relational truth. It is not the time-series store, the document store, or the vector store.

Store Role
Neo4j Entities, relationships, structural properties. Source of truth for traversal.
Postgres + TimescaleDB Indicator time-series, financial historical series, market data.
Postgres + pgvector Embeddings of news, transcripts, 10-K business descriptions.
Postgres (audit schema) Append-only audit log of every Q&A.
S3-compatible (MinIO/R2) Raw filings, PDFs, large unstructured artifacts.

Graph nodes hold handles (URLs, IDs) into the other stores. The orchestrator joins them at query time.

Free-tier deployment topology

flowchart TB
    subgraph GH["GitHub (free, public repo)"]
        SRC["Source · CI · Actions cron"]
        REG["GHCR<br/>(container registry)"]
        PG["GitHub Pages<br/>(static UI + docs)"]
        REL["Releases / LFS<br/>(Parquet snapshots)"]
    end

    subgraph PARTNERS["Free-tier partners"]
        NEO["Neo4j AuraDB Free<br/>200k nodes / 400k rels"]
        SUPA["Supabase Free<br/>Postgres + pgvector"]
        R2["Cloudflare R2 Free<br/>10 GB object store"]
        HFS["Hugging Face Space<br/>FastAPI container"]
    end

    USER[User browser<br/>BYOK Anthropic key] --> PG
    PG --> HFS
    HFS --> NEO
    HFS --> SUPA
    HFS --> R2
    USER --> ANT[Anthropic API]

    SRC -- scheduled ingest --> NEO
    SRC -- scheduled ingest --> SUPA
    REG -- container pull --> HFS
    SRC -- mkdocs build --> PG

Repo layout

market-view/
├── schema/         # Cypher constraints, node listing, migrations, seed YAML
├── ingestion/      # Source-specific workers (EDGAR, FRED, Treasury, news)
├── kg/             # Neo4j client, repositories, the 6 query patterns
├── orchestrator/   # Six-stage pipeline
├── llm/            # Provider-pluggable client, versioned prompts
├── api/            # FastAPI surface
├── eval/           # Benchmark + adversarial sets, runner, metrics
├── ui/             # Next.js, exports to static (Phase 4)
├── docs/           # MkDocs → GitHub Pages
└── tests/          # unit + integration (testcontainers) + golden patterns

Phased delivery

Phase 0 (this commit): skeleton + docs site + local stack. Phase 4 is the first end-to-end live demo on the free tier. See SYSTEM_REQUIREMENTS §5 (in the parent directory) for the full 11-phase plan.