Guide

Everything about how Atlas works, where its data comes from, how it forms its judgments, and how to get the most out of it.

What Atlas is

Atlas maps Siemens Energy's publicly disclosed portfolio against cyber-regulation regimes per country. It answers three questions for Irina and the wider SCADA / Comms / Cyber org:

Where does regulation bite us today (and where is it about to)?
Which products does each obligation actually apply to - with citations?
What's the gap between regulatory expectation and Siemens Energy's posture? (computed offline in the Preparedness Workbook - see below)

Where the data comes from

Data	Source	How it's refreshed
Countries & tiers	Curated list (SE legal entities + HVDC/Gamesa footprint + extraterritorial regimes)	Manual seed; admins can re-run or promote tier in UI
Jurisdictions	Known regulators + frameworks (NCSC, BSI, FERC/NERC, NCA, ANSSI, etc.)	Curated; extensible via admin
Portfolio items	`siemens-energy.com` public product hub - LLM-navigated (no blind crawl)	Admin action "Discover portfolio"
Regulation text	Local authoritative baselines (CRA, CAF v3.1/v3.2/v4) + curated summaries (NIS2, NERC CIP, PSTI, NCA OTCC)	Admin action "Ingest local baselines" (~10-15 min)
Horizon instruments	Public trackers + government announcements (CRSB, CRA phased deadlines, NERC CIP-015, EU AI Act Annex III, CAF v4 adoption, OTCC v2 draft)	Admin action "Seed regulations" or daily horizon-scan
Applicability assertions	Computed: embedding retrieval → rerank → `analyst` LLM judgment, grounded in regulation chunks	Admin action "Run applicability engine"

The globe

Colour of each country polygon:

Bright purple - Mapped. Active regulatory regime + significant SE footprint. Fully detailed in Atlas.
Mid purple - Indexed. Meaningful footprint or emerging regulation; less depth.
Grey - Watchlist. Horizon regulation or minor footprint.
Amber glow - country has at least one upcoming instrument on the horizon within 24 months.

Click any country to drill into it. Drag to rotate (auto-rotation pauses for 20s). Scroll / pinch to zoom. A bottom-left legend mirrors the colour meaning.

Country panel (left sidebar)

For a mapped country, Atlas shows:

Tier pill (mapped / indexed / watchlist) - human-editable by admins
Rationale - why this tier
Regulatory exposure gauge - see below
SE footprint - why SE cares about this country
Jurisdictions - the regulators and frameworks in play (e.g. UK-NCSC, EU-NIS2, UK-PSTI)
Portfolio applicability - which SE offerings have Atlas-judged applicability. Click "evidence pack" to download a SE-branded PDF for a specific (product × country) pair.
Star / pin products - click the ☆ next to a portfolio item to pin it to the top of the list. Pinned items appear in a "★ Pinned" block above the rest. Pinning is per-user and global: star HVDC once and it's pinned at the top in every country where HVDC has applicability, so you can jump country-to-country without re-pinning.

Regulatory exposure score

The computation:

For each obligation Atlas has an active applicability assertion for in a country:
• Verdict applies → row score = confidence × 1.0
• Verdict partially_applies → row score = confidence × 0.5
• Verdict does_not_apply / uncertain → excluded
Average the row scores for each portfolio item, then average across portfolio items → country exposure

For actual preparedness (met / partial / not met per obligation) use the Preparedness Workbook export and fill in the Compliance Status column locally. The workbook's charts and formulas recompute real preparedness as you type.

Horizon radar

Instruments that have been announced, drafted, or assigned an effective date within the next 24 months. Examples currently tracked:

UK Cyber Security & Resilience Bill (CRSB)
EU CRA Article 14 (vuln-reporting) - 11 Sep 2026; CRA full - 11 Dec 2027
EU AI Act Annex III - 2 Aug 2026
UK CAF v4 adoption by OFGEM for 2026/27 audit cycle
US NERC CIP-015 (INSM) - 2028; CIP-003-9 - 2026
Saudi NCA OTCC v2 - draft watchdog

Open from the right-hand dock (🌐 icon). Badge number = countries with at least one upcoming instrument. Click an item to jump to that country.

Ask Atlas (chat)

A retrieval-augmented chat scoped to the regulation corpus. Ask things like:

"What does the CRA say about vulnerability handling?"
"Does NERC CIP-007 apply to transformers?"
"What's in CAF B4 about secure configuration?"

Every factual claim in Atlas's answer is followed by a numeric citation [1], which maps to a regulation chunk Atlas used as evidence. Click a citation to open the full source text in a modal.

Atlas will not hallucinate regulation content - if the corpus doesn't cover your question, it will say so and suggest where to look. Questions outside the corpus (weather, personal chat, roleplay, opinions) and classic prompt-injection patterns ("ignore previous instructions…") are politely refused.

Chat responses are rendered with minimal markdown - **bold**, *italic*, `code`, and bullet lists - with inline [N] citations and tappable source cards below each answer.

Exports

Three flavours, each with "INDICATIVE · pending review" and the public-data disclaimer on every page:

Bulk workbook (XLSX) + Word narrative (DOCX) - everything Atlas knows, filterable by items / instruments / confidence / verdicts. Good for bulk review.
Preparedness Workbook (XLSX) - see next section. The flagship Excel deliverable.
Evidence Pack (PDF) - one SE-branded PDF for a specific (portfolio item × country) pair. Contains executive summary, applicable obligations, verbatim source extracts, and provenance. Suitable to hand to a customer procurement team.

Preparedness Workbook

This is the tool that bridges Atlas (public data) and SE's internal compliance position (confidential data).

What Atlas ships you:

Sheet Instructions - usage notes and formula explanation
Sheet Compliance Matrix - one row per (portfolio × obligation × country) Atlas-judged applicable, with Atlas verdict + confidence + empty Compliance Status dropdown, Evidence Ref, and Reviewer columns
Sheet Country Dashboard - auto-computed per-country preparedness % + known-review-coverage % + bar charts
Sheet Portfolio Heatmap - item × country grid, colour-scaled preparedness
Sheet Gaps - auto-filtered Not-Met + Partially-Met rows with copy of all row context
Sheet Provenance - Atlas version + regulation version dates + content hashes

Usage: fill in the Compliance Status column (dropdown: Met / Partially Met / Not Met / N/A / Unknown). All charts, scores, and the dashboard update live. Nothing is uploaded back to Atlas.

Evidence packs

Per-product per-country PDF with cover, executive summary, applicable obligations (with Atlas's verdict, confidence, and rationale), verbatim source extracts from the regulation text, a table of obligations that don't apply (with reasons), and a provenance page listing regulation version dates and source URLs. Irina can hand this unchanged to a customer procurement or legal team.

Applicability assertions

The core Atlas judgment. For each (portfolio item × obligation × country) triple:

Embed the obligation text (BGE-large, 1024-dim)
Retrieve top candidate chunks from the regulation corpus via vector kNN (sqlite-vec)
Rerank with bge-reranker-v2-m3
Ask the analyst role (Gemma 4 26B) for a grounded verdict: applies / partially_applies / does_not_apply / uncertain
Require numeric citations into the evidence chunks; store confidence + rationale
Freeze the evidence chunk hashes so that if the regulation later changes, Atlas can identify every assertion whose evidence moved

At scale we use an embedding-based relevance pre-filter (top-20 most-similar obligations per item, cosine ≥ 0.25) to avoid asking the LLM to judge obviously-irrelevant pairs (e.g. PSTI default-passwords for a subsea transformer).

Change cascade (horizon scan)

Runs daily at 03:17 UK time (APScheduler). For each registered horizon source URL:

HEAD → compare ETag / Last-Modified. Unchanged → bail.
Fetch body → SHA256 hash. Unchanged → bail.
Re-chunk + per-chunk embed diff (cosine < 0.98). Pinpoint what moved.
LLM summary of the diff → change_events row.
Any assertion whose frozen evidence_chunk_hashes intersects the removed set is marked stale.
Telegram digest is sent to Bill.

How the LLMs are used

Role	Model	Used for
`analyst`	Gemma 4 26B-A4B-heretic Q8_0 (round-robin across 4 DGX Spark nodes)	Applicability judgment, obligation extraction, Ask-Atlas answers, regulation diff summaries
`coder`	Qwen3.6-35B-A3B-heretic (spark-53:8002)	Tool-calling portfolio discovery, structured JSON extraction from siemens-energy.com pages
`embed-large`	BGE-large-en-v1.5 (spark-52:8000, 1024-dim)	Chunk embedding for vector retrieval; relevance pre-filter for assertions
`rerank`	bge-reranker-v2-m3 (spark-51:8000)	Top-K reranking of retrieved chunks before handing to analyst

All LLM inference is on-premise (Bill's homelab DGX Spark cluster). No request leaves the internal network. Thinking mode is explicitly disabled (chat_template_kwargs.enable_thinking = false) for latency; model temperatures and sampler settings match the fleet's published no-think recipes.

Users & roles

admin - full access, including Actions menu, ingest triggers, user management, assertion runs.
user - read-only of the pitch surface: globe, country drill-down, chat, evidence packs, exports. Actions menu is hidden.

Admins manage users from Actions → Manage users: add, reset passwords, disable, promote/demote, delete.

Admin actions (what each button does)

Action	What it does	Time
Seed countries	Idempotent curated seed of 45 countries + 71 jurisdictions	Instant
Seed regulations	Curated summaries of NIS2 / CRA / PSTI / NERC CIP / OTCC + horizon instruments	~15s
Ingest local baselines	Full-text ingest of CRA + CAF v3.1 / v3.2 / v4 from the read-only baselines mount - replaces curated summaries with authoritative text, cascade-invalidates dependent assertions	~10-15 min
Discover portfolio	LLM-driven navigation of siemens-energy.com to rediscover portfolio items	~5-8 min
Run applicability engine	Full cross-product with embedding pre-filter → top 20 per item → analyst judgment, 10-way parallel	~10-20 min
Run horizon scan	Fires the daily scheduler manually	~1-5 min

Applicability vs commercial presence

Two different concepts, easy to confuse. Atlas today answers one of them:

Regulatory applicability (what "Portfolio applicability" in the country panel means today): given the cyber regulations in country X, which SE products do those regs bind? Driven entirely by the regulation text Atlas has ingested. If a country has registered jurisdictions but no regulation text loaded yet, Atlas cannot compute applicability for it and the country panel shows a "data gap" state, NOT a statement that SE has no products there.
Commercial / delivery presence: where has SE actually delivered, sold, or operated this product? HVDC interconnectors in Ireland, Gamesa wind farms in Brazil, Siemens Energy syncons in the UK. This is a separate data layer Atlas does not have today — it's planned for v0.3 as the Delivery Footprint Layer. The sources will be public (press releases, 4C Offshore for wind, GCCIA / ENTSO-E for HVDC, investor-day maps).

When v0.3 lands, the globe will have a colour-mode toggle: Regulatory exposure vs SE delivery presence vs Both.

Regulation browser (`/docs`)

Three-pane reader for every regulation Atlas has ingested. Left: instrument list. Middle: outline built from the document's heading path. Right: clause text with a language toggle (EN / DE / FR / IT / ES) and in-document semantic search. Every clause has a copy-link button that yields a permanent URL like /docs#CAF-v3.1/clause-42 — drop those links into emails or decks and they open straight to the clause.

Translations are generated on demand, cached on the chunk content hash, and survive future re-ingests. "Show original" toggle keeps the English source one click away.

Delivery Footprint Layer

Separate from regulatory applicability. Tracks where Siemens Energy has publicly-announced project deliveries — HVDC interconnectors, offshore grid connections, gas turbines, transformers, GIS, syncons, hydrogen, grid software. All sources are public (press releases, investor maps, regulator-published project lists). Closes the confusing "country shows nothing" state from v0.2 by answering the other natural question: "does SE actually sell/operate here?".

Scenario simulator

Actions → 🔮. Pick any horizon instrument and Atlas runs a dry-run cascade: if this instrument came into force today, how many live assertions would go stale, how many portfolio items affected, which in-force instruments overlap with it. Uses embedding similarity over the horizon instrument's summary. Nothing is persisted.

Obligation cross-walk

Actions → 🔗. Type a keyword or concept ("supply chain", "incident reporting in 24h", "MFA for remote access") and Atlas returns the semantically equivalent clauses across all ingested regulations. Useful for proving "a control that meets X also covers Y% of Z". Each result deep-links into the Regulation Browser at the exact clause.

Auto-narrated pitch demo

Actions → ▶ Play pitch demo. Atlas takes over the camera and flies through UK → Germany → USA → Saudi → India with 50 seconds of voice-over subtitles explaining the value at each stop. Esc to stop. Useful when you want Atlas to tell its own story.

v0.5 show-stopper pack

v0.5 shipped 23 features in one pass. None of them replaced existing behaviour - all are additive, reachable from the Actions menu, the header, or keyboard shortcuts. Everything here respects the public-data contract: nothing internal to SE is stored.

Feature	Where	What it does
3D obligation graph	/graph	Force-directed network of obligations, edges between semantically-grouped cross-instrument pairs, coloured by theme.
Regulatory Gantt	/gantt	2020-2030 timeline of every instrument and milestone. Today-line overlay. Purple = active, amber = horizon, dashed = draft.
Spec-compare	Actions → Spec-compare	Paste or drop a product spec. Atlas extracts claims, KNN-matches each to regulation clauses, returns obligation map.
Obligation → control mapper	Any obligation panel	Click "Controls for this" - maps the obligation to ISO 27001 Annex A / NIST CSF / IEC 62443 candidates.
Citation copy	Obligation panel	One-click copy of a formatted citation string to clipboard.
TL;DR	Assertion panel	One-line LLM summary of any assertion or clause.
Rebuttal generator	Actions → Rebuttal	Paste a client pushback, get a citation-backed counter plus plain-English explanation.
Chart wizard	Actions → Chart	Ask for a chart in natural language, get an inline SVG back.
Weekly snapshot	Actions → Save snapshot	Point-in-time JSON of state. Lets Irina do reproducible before/after comparisons.
Threaded comments	Any target page	Comments (with @mentions) attach to countries, obligations, or instruments.
Export sign-off	Export modal	Submit an export for reviewer ack. The approval chain is stored in `export_signoffs`.
Saved views	Actions → Save view	Encodes the current state (country, filters, timeline, dock) into a URL hash for sharing.
Model card	Actions → Model card	Public accountability: which LLMs we use, where they run, what for, and calibration state.
Transparency report	Actions → Transparency	Quarterly numbers: assertions, calibration agreement %, red-team refusal rate, change events.
Payload signing	`POST /api/v1/sign`	HMAC-SHA256 over any JSON payload for non-repudiation demos. Verify via `/verify`.
Source verifier	Actions → Verify sources	Re-fetches every cited URL and flags 404s, redirects, or content drift vs stored hash.
DOCX drag-and-drop	Actions → Drop DOCX	Drop a regulation doc onto Atlas. It extracts text, chunks, embeds, and stages for admin review.
In-app feedback	Floating 💬 (bottom-right)	Sends a message straight to Bill's Telegram; also stored in the `feedback` table.
Weekly post composer	Actions → Weekly post	Turns the week's change events into a LinkedIn-ready 3-paragraph post.
Voice commands	Shift+V	WebSpeechAPI listener. Spoken phrases route to searches, page jumps, or power-tool actions.
Delivery timeline player	Actions → Play timeline	Animated walkthrough of the change-event feed - good for demos.
Power tools group	Actions menu	New menu section bundles all the above into one reachable place.
Globe tier filter	Sidebar	Hide indexed / watchlist tiers to declutter the globe for presentations.

Forecast vs actual: ~110 human-hours forecast, ~10.5 minutes Rex-time actual. Compression is above the usual 1h=1min baseline because v0.5 reuses v0.4's routing, modal, dock, and action-menu scaffolding - this is surface work, not new architecture.

v0.6.0 — corpus consolidation

Atlas's corpus landed in real-content shape on 2026-04-24. Before this release, only the CAF UK baselines had full text; everything else was 3–6 chunks of curated summary. Now ~30 instruments carry real clause text, and every EU member state Atlas tracks has proper applicability coverage (not just Germany).

Metric	Before	After
Instruments with real content	CAF only	~30
Total obligations	~1,440	6,266
Active assertions	~1,440	2,667
EU member states with coverage	1 (DEU, against a duplicate stub)	14
Canonical instruments (deduped)	mixed duplicates	49 unique

Noteworthy architecture shifts:

EU-wide instruments live under an "EU" pseudo-country. NIS2, CRA and the AI Act Annex III are each a single record. Applicability judgements run once per (portfolio × obligation), then a SQL fan-out duplicates the assertion row across the 14 EU member states. Code-wise, countries.py UNIONs EU-wide instruments into any member state's view via an EU_MEMBER_ISO3 constant. See memory-api decision #55 for the rationale.
Instrument dedup. Pre-v0.6 each of NIS2/CRA/AI-Act had two rows — an old DEU-anchored curated stub and an EU-anchored real ingest. The stubs are gone; 393 stale DEU assertions CASCADEd with them.
Data-gap banners in /docs. Regulations whose authoritative text can't be programmatically fetched (ANEEL-964 behind Cloudflare, KISA-ISMS-P behind Java form-POSTs, SOCI-RMP by design) now carry a visible red or amber banner explaining the access limitation and the resolution path.
Tier 2 narrated tour. Site tour picker has a fourth option — "Deep tour (live navigation)" — that navigates to /graph, narrates the network + auto-types a search, then to /gantt and auto-zooms to 2026, then back to the globe for the outro.
thetunnel integration. For geo-blocked sources (Malaysia CSMA, Korea KISA) we call http://192.168.0.251:8080/fetch with {url, country, session, preload_url, verify_tls} to grab the content through a country-selected Proton WG exit. Brazil ANEEL is unwinnable via VPN (Cloudflare blanket-blocks all datacentre/VPN IPs); residential fetch only.
CRSB fortnightly watch. The UK Cyber Security and Resilience Bill was introduced to Parliament 12 Nov 2025. Atlas now re-fetches the bill PDF every Sunday at 04:12; on content-hash change, auto-re-ingests. Catches each stage's amendments without manual intervention.
Pdf extractor upgrade. Switched primary from pypdf to pdfplumber. pypdf dropped first letters on decorative PDFs (Security → ecurity); pdfplumber handles ligatures properly and is the primary now with pypdf as fallback.
Embedder degrades gracefully. On a 400 Bad Request from BGE (chunk exceeds 512-token window), the embedder now bisects the batch and skips the single oversized chunk with a warning, rather than killing the whole ingest. One missing embedding beats zero.

Suppliers + risk matrix

Track key third-party suppliers and which portfolio items they feed into. Atlas joins this against active applicability assertions to produce the supply-chain risk matrix: for every supplier × country × product, how many regulatory obligations fire. The hotspots surface first.

Access via Actions → 🏭 Suppliers + risk matrix. Every supplier carries:

Origin country — primary country of origin, picked from a dropdown of known countries (geopolitical risk indicator).
Category — free-text, comma-separated for multi-category suppliers (e.g. "firmware, hardware, cloud-services"). Rendered as pills in the list.
Markets served — multi-select of countries the supplier actually operates in. Empty = global. Set this for country-scoped suppliers (e.g. "Insight — UK only") so they don't pollute the risk matrix with irrelevant country rows.
Linked portfolio items — expandable per-supplier view. Link a supplier to any portfolio item with a severity (direct / tier-2 / tier-3).

The risk matrix respects the markets filter — a UK-only supplier shows only UK rows, even if the product it feeds is deployed in 10 countries.

Stub ingestion pipeline

Atlas can ingest a regulation end-to-end — fetch the authoritative source, parse it, translate if needed, chunk, embed, extract obligations — from a single admin endpoint. This turns "we have 5 chunks of PSTI" into "we have 262 chunks of CRA with 434 extracted obligations" in one pass.

Sources live in a research manifest (one JSON file per region at /app/data/research/manifest-*.json) listing authoritative URL, fallback URLs, format, language, and known gotchas per instrument. The pipeline is driven by the manifests — adding a regulation is a manifest edit plus one instrument-placement mapping in stub_ingest.py.

Endpoints (Actions → Admin → Stub ingestion):

POST /admin/stubs/fetch — download + parse + stage. Idempotent; skips already-staged files. Short_codes query param filters.
POST /admin/stubs/ingest — process staged files: translate non-English via Gemma 4/Qwen3.6, chunk, embed, extract obligations. Requires fetch first.
POST /admin/stubs/full — fetch then ingest in one go. Used by the overnight cron.

Known limitations per source:

Cloudflare-protected sites (ANEEL, sometimes NERC) — may need a VPN-sourced manual PDF drop into /app/data/stub_staging/<short_code>/source.bin.
EUR-Lex — frontend is AWS WAF-gated. We route through publications.europa.eu/resource/celex/... which serves clean XHTML.
Flaky TLS chains (cea.nic.in, isms.kisa.or.kr) — the fetcher's verify=False quirk handles these.
Portal click-through gates (Malaysia AGC, KISA jsessionid) — manual PDF drop required.

Automatic tour (pitch demo)

Actions → ▶ Play pitch demo — a 60-90 second auto-narrated walkthrough. Camera flies between countries, a UK-English voice reads the captions, and the obligation counts in the script are pulled live from the corpus so they never go stale.

Controls bottom-right while playing: ⏸ pause (Space), ⏭ skip to next scene (→), ✕ stop (Esc). Scene progress shows N/12.

The underlying Narrated Tour player is a zero-dep reusable pattern — drop-in files at /files/appdata/config/shared/narrated-tour/ for any other project that wants the same behaviour.

Non-goals

Legal advice. Atlas is an indicator; it is not a substitute for counsel.
Real-time regulation ingest. Sources refresh on their cadence (weekly/monthly at most).
Ingestion of internal SE data. Ever.
Generative regulation interpretation without citations. Atlas refuses to answer outside the corpus.

Maintained by Bill for Irina · all information used is available to the general public, nothing internal to Siemens Energy.