Architecture · v0.8.1

Architecture overview

This page explains how the Agentic Software Factory is built — from the UI layer down to the adapters that talk to the coding CLIs. It targets architects and developers who want a feel for layers, modules, data flows and the key design decisions without diving straight into the source. If you'd rather have a step-by-step usage walkthrough, the introduction is the better starting point.

What is the architectural goal?

The platform is a control plane for AI-assisted software development. It bundles four jobs into a single web application: project capture, artifact generation, run orchestration and quality assessment. You don't drive the Claude Code, Codex, Gemini or Aider CLIs from your shell — the platform invokes them as subprocesses, collects their output, persists it in the database and surfaces it live in the UI.

The architecture follows two very classic principles, deliberately not dressed up with buzzwords for v1:

Hexagonal per module: every functional module (e.g. run, wizard, settings) has its own domain, application, web and infrastructure layer. Domain depends on nothing, application only on domain, web on application, infrastructure implements domain ports.
Local execution: the platform is one Spring Boot process, plus a Postgres database in Docker next to it. No cluster, no Kafka, no service mesh. That's intentional — a lead developer in a mid-sized company should be able to run it on a laptop or a small VM.

Browser → Spring MVC + Thymeleaf → Application services → Domain model
                                                              ↓
                       PostgreSQL ←─ Repositories ←─ Adapters (Claude / Codex / Gemini / Aider / Mock / Git / Filesystem)

Mantra: The platform is small and readable. Once you understand the module boundaries, you understand the platform.

Layers and modules

Inside the Maven module app, all functional modules live under io.softwarefabrik.app.<module>. Each module has at most four sub-packages:

Per-module layer layout

domain/ — entities, value objects, repository interfaces, business services. No Spring, JPA or web annotations.
application/ — use-case services that orchestrate domain methods. This is where transaction boundaries (@Transactional) live.
web/ — Spring MVC controllers, Thymeleaf bindings, UI DTOs, REST endpoints for SSE.
infrastructure/ — JPA implementations of repositories, external adapters, filesystem access, scheduled jobs.

Dependency rules

domain depends on nothing.
application depends only on domain.
web depends on application, not directly on domain.
infrastructure implements domain ports and may use Spring magic.
No module reaches across into another module's domain package — only via that module's application service.

ArchUnit tests under app/src/test/java/.../architecture/ pin those rules. A wrong dependency turns CI red — architecture is enforced by automated tests, not by good intentions.

v0.8.1 modules at a glance:

agentauditcommonconductorexecutiongitlicensepolicyprojectdefinitionpromptqualitygatereviewrunsecretssecuritysettingsteamvalidationwebwizard

Modules in detail

Short profiles per module — handy when reading the code or hunting for a feature:

agent

Manages the agent roles (Architect, Developer, Reviewer, QA, Security, Documentation, Merge/Release) including preferredModel per role. Main classes: AgentDefinition, AgentRoleService. Routes: /agents.

audit

Append-only audit log for security-relevant events — login attempts, setting changes, run starts, approval decisions. Main class: AuditEvent. Table audit_event.

common

Shared utilities without business meaning: ID generator, clock abstraction, generic validators. Kept deliberately small.

conductor

New since v0.6.0: before every run writes .claude/settings.local.json (plugin whitelist + skills path) and .claude/agents/<role>.md per active team member into the workspace. Main classes: PluginCatalog, SkillsCatalog, ConductorWorkspaceWriter, ConductorRunPreparation.

execution

The heart of run execution: ExecutionAdapterRegistry, sandbox implementations LocalProcessSandbox + ContainerProcessSandbox with ExecutionSandboxFactory (selection via setting execution.sandbox.variant), adapters for Claude, Codex, Gemini, Aider and Mock. One infrastructure/<name> sub-package per adapter. Under claudecode/, since v0.7.0, additionally ClaudeStreamJsonParser for live token events and TokenEstimator for local pre-estimates.

git

Workspace Git operations: git init, auto-commits, diff and log output for the run detail page. Uses jgit.

license

License logic: DEMO / COMMUNITY / FULL tiers, lease-JWT verification against the Keycloak public key, limit enforcement. Routes: /license, /admin/license.

policy

Approval policies: which phases need a human gate, which run through. Stores ApprovalDecision records.

projectdefinition

The ProjectDefinition entity with all editor fields (vision, audience, tech, architecture, security, …). Routes: /projects, /projects/{id}/edit.

prompt

Generates the six Markdown artifacts (PROJECT.md, INSTRUCTIONS.md, AGENTS.md, WORKFLOW.md, DEFINITION_OF_DONE.md, README.md) from project fields. Templates live under resources/prompts/.

qualitygate

Aggregates reviewer findings into the quality-gate verdict (PASSED / WARNING / FAILED / SKIPPED / ERROR). Special rules for SECURITY/HIGH and ARCHITECTURE/CRITICAL. Routes: /runs/{id}/quality-gate.

review

Reviewer implementations: aider-review, claude-review, security, architecture-reviewer, hallucination-review. CLI invocations in read-only mode or static heuristics.

run

The run model with phases, status, logs, token usage and workspace path. Main classes: Run, RunPhase, RunOrchestrationService, WorkspaceService. Routes: /runs, /runs/{id}.

secrets

Encrypted storage of API keys (Anthropic, OpenAI, Gemini) in Postgres. AES-GCM with master key from SOFTWAREFABRIK_SECRETS_MASTER_KEY. Routes: /integrations.

security

Spring Security configuration: login, bootstrap admin, role model (USER / ADMIN), CSRF, BCrypt passwords. Main class: SecurityConfig.

settings

Global platform defaults at /einstellungen with a 5-minute TTL cache and override order PROJECT > USER > GLOBAL > YAML. Table app_setting (V9). Since v0.7.0 also the key execution.sandbox.variant for the container sandbox.

team

Per-project team composition from agent roles. Serialised into AGENTS.md.

validation

Per-run build validation: mvn verify, npm run build or a configured build command. Writes BuildResult.

web

UI cross-cutting concerns: global layouts, kopfbereich.html, theme toggle, dashboard controller. No business logic.

wizard

The four-step assistant at /wizard. Main classes: WizardController, WizardService, WizardDraft, TemplateRegistry, ToggleRegistry, VersionLookupClient, WizardCostEstimator. Tables wizard_draft (V12), version_cache (V13). Six templates registered (Spring Boot, Static Frontend, .NET, Python, Node, Existing-Repo-Import); v0.8.x adds progress stepper, objective preview and local cost estimate in step 4.

Data flows: from wizard to run

A typical project lifecycle has five stations. Each one writes persistently to Postgres, each is visible in the UI:

┌──────────┐   ┌────────────┐   ┌────────────┐   ┌───────┐   ┌──────────────┐
│ Wizard   │ → │ Project    │ → │ Artifacts  │ → │ Run   │ → │ Quality gate │
│ /wizard  │   │ (DRAFT)    │   │ (Markdown) │   │       │   │ (Verdict)    │
└──────────┘   └────────────┘   └────────────┘   └───────┘   └──────────────┘
     │              │                │              │                │
     ▼              ▼                ▼              ▼                ▼
 wizard_draft   project_def   prompt_artifact   run, run_phase    quality_gate_run
                                                run_log,
                                                token_usage

Wizard collects answers in wizard_draft (survives browser reload and server restart). On completion a ProjectDefinition is created and the draft flips to completed.
ProjectDefinition holds all editor fields. Status starts as DRAFT and moves to READY after artifact generation.
Artifacts are produced by PromptAssemblyService from project fields. They're editable — the agent always reads the most recently saved version.
Run orchestrates the phases. RunOrchestrationService is asynchronous (Spring @Async); the UI doesn't poll, it listens via SSE.
Quality gate fires at the end or on-demand, runs reviewers in parallel, aggregates findings into a verdict.

Persistence: Postgres + Flyway

The platform uses PostgreSQL as the single data source. Schema changes happen exclusively via Flyway migrations under app/src/main/resources/db/migration/. Each migration is numbered (V<n>__<name>.sql) and immutable once it's been in production — new changes go in as additional migrations, never as edits to old ones.

Migration	Content
`V1`	Initial schema: `project_definition`, `agent_definition`, `team`, `run`, `run_phase`.
`V2`	Logs and token usage: `run_log`, `token_usage_event`.
`V3`	Approval policies: `approval_policy`, `approval_decision`.
`V4`	Encrypted secrets: `integration_secret`.
`V5`	Audit log: `audit_event`.
`V6`	Quality-gate tables: `quality_gate_run`, `review_finding`.
`V7`	License tables: `license`, `license_lease`.
`V8`	Build results: `build_result`.
`V9`	v0.4.0: `app_setting` with scope column (`GLOBAL` / `USER` / `PROJECT`) and audit trail.
`V10`	Run tags and quick-start markers.
`V11`	v0.4.0: `agent_preferred_model` — column for role-specific model selection.
`V12`	v0.4.0: `wizard_draft` — JSON state column, step counter, template ID, cleanup TTL.
`V13`	v0.4.0: `version_cache` — cache key, JSON value, `refreshed_at`, `last_error`. Java-computed staleness, no DB computed column (so the H2 test profile keeps working).
`V14`	v0.6.0: extended `agent_definition` for the conductor (mission, active skills).
`V15`	v0.7.0: `run_template` — adapter, team, objective, optional project scope, audit fields. Source for "start run from template".

Test strategy: Tests run against H2 in Postgres-compatibility mode. To make that work, migrations avoid Postgres-specific features like GENERATED ALWAYS AS. Where unavoidable, a db.migration.h2/ override file fills the gap.

Adapter registry

Adapters are Spring beans that implement the ExecutionAdapter interface. At startup the ExecutionAdapterRegistry collects all available beans into a map <name → adapter>. The name comes from a constant on the adapter (e.g. "claudecode", "codex", "gemini", "aider", "mock").

When a run starts, the adapter is resolved in this order:

Run-specific override — if the user picked an adapter when creating the run.
Project default — if the project has a preferredAdapter.
User default — if the logged-in user set a USER-scope setting.
Global default — from app_setting with scope GLOBAL.
YAML default — from application.yml as a last fallback.

This override order — PROJECT > USER > GLOBAL > YAML — is implemented by SettingService and applies everywhere on the platform, not just to adapters.

// simplified
ExecutionAdapter adapter = registry.byName(
    settingService.resolve("execution.adapter.default", scope)
);

Mock adapter: always registered, works without an API key. It writes deterministic pseudo tokens into the workspace and is ideal for learning the mechanics or reproducing bugs locally — at zero cost.

Version cache and @Scheduled jobs

So the wizard can always pre-fill current stable versions, the platform keeps a cache of version lookups. The cache refreshes daily at 03:00 via a @Scheduled job; a second job at 04:00 deletes expired wizard_draft rows (TTL 30 days).

Version lookups follow the strategy pattern: a VersionLookupClient interface, three implementations:

MavenCentralLookupClient — queries https://search.maven.org/solrsearch/select for Java libraries (Spring Boot, ArchUnit, OWASP Dependency-Check).
GithubReleasesLookupClient — fetches the latest release tag from https://api.github.com/repos/<owner>/<repo>/releases/latest (e.g. for the Trivy CLI).
NpmRegistryLookupClient — queries https://registry.npmjs.org/<package> for Node packages (Playwright).

The cache key is a functional constant, not a technical path — e.g. spring-boot.3.x, archunit.latest, owasp.dependency-check.latest, trivy.cli.latest, playwright.npm.latest. That lets us swap the source without breaking consumers.

Staleness is computed in Java (threshold: 25 hours, a small buffer over the 03:00 job): now - refreshed_at > 25h. We deliberately do not use a Postgres GENERATED column, because the H2 test profile would trip over it.

Admins can inspect the cache at /einstellungen/wizard/versions. Each row shows cache key, returned value, refreshed_at, last_error and a Stale badge; a button "Refresh now" triggers a synchronous lookup for that key.

Sandbox model

Every run gets its own workspace — a local directory under SOFTWAREFABRIK_WORKSPACES_ROOT/<project-slug>/<run-id>. The adapter (e.g. claudecode) is launched in that directory as its own process (LocalProcessSandbox). Each run thus has its own Git history, its own build output, its own node_modules or target folder — runs cannot accidentally clobber each other.

LocalProcessSandbox sets environment variables cleanly per process (no System.setenv at platform level), terminates processes hard on cancel/timeout (destroyForcibly()) and writes stdout and stderr line-by-line into the run_log table while pushing the same lines into the SSE stream.

Container variant: Since v0.7.0 a ContainerProcessSandbox ships alongside (ADR-0011 Accepted, variant B). It starts every agent in an ephemeral Docker or Podman container with --cpus 2 --memory 4g --pids-limit 512 --read-only, a bindmount on the workspace and --network=none by default. Enabled via setting execution.sandbox.variant=container; if Docker is missing from the PATH, the ExecutionSandboxFactory falls back to the local variant with a log warning. Single-tenant hosts notice nothing; Enterprise tier can flip without code changes.

Security: bootstrap admin, audit, settings, approvals

Security in this architecture is not one module, it's a cross-cutting discipline. The points that matter:

Bootstrap admin: created at first start from SOFTWAREFABRIK_ADMIN_USER + ...PASSWORD. If the password is omitted, no admin is created — weak defaults would be a bug.
BCrypt: all passwords are BCrypt-hashed with cost 12. Cleartext exists only briefly in the login request, never again in memory.
Encrypted secrets: API keys (Anthropic, OpenAI, …) are AES-GCM-encrypted under the master key from SOFTWAREFABRIK_SECRETS_MASTER_KEY in Postgres. Never plaintext, never in logs.
Audit log: every setting change, every login, every run start, every approval decision is recorded in audit_event. Append-only, with timestamp, user subject and a diff of the change.
Approval policies: critical phases (e.g. EXECUTION, COMPLETION) wait for an explicit user decision. The default policy is conservative — approval enabled at plan and review boundaries.
CVE hygiene: pgJDBC 42.7.11 (CVE-2026-42198 fix). docker-compose.yml pins the Postgres mapping to 127.0.0.1:5432 instead of 0.0.0.0:5432, so the port isn't accidentally exposed externally. Trivy/OWASP scans run per CI build.

Settings architecture

The settings module is the platform's control panel. It combines three design choices that work together:

Scope hierarchy: every setting belongs to a scope (GLOBAL, USER, PROJECT). SettingService.resolve(key, context) searches narrow-to-wide: PROJECT first, then USER, then GLOBAL, then YAML.
5-minute TTL cache: SettingService caches resolved values, so a resolve is essentially free 99 % of the time. UI changes don't hard-invalidate the cache — at most 5 minutes later the new value is live. No app restart required.
Audit trail: every write to app_setting creates an audit_event entry with subject, key, old value, new value. Who changed what and when is always traceable.

Concrete settings managed today:

Key	Meaning	Typical value
`workspace.root`	Root directory for all run workspaces	`/var/softwarefabrik/workspaces`
`git.user.name`	Git author for auto-commits	`Software Factory Bot`
`git.user.email`	Git email	`bot@softwarefabrik.local`
`execution.adapter.default`	Default execution adapter	`claudecode`
`execution.model.claudecode`	Default model for Claude Code	`claude-sonnet-4-6`
`execution.model.codex`	Default model for Codex	`gpt-5`
`budget.tokens.daily`	Daily token cap	`2000000`
`budget.tokens.weekly`	Weekly token cap	`10000000`
`budget.threshold.soft`	Soft threshold in percent (warns only)	`80`

Live streaming and SSE

The run detail page streams three things live from the backend to the browser: logs, phase updates and token counter. Instead of polling the platform uses Server-Sent Events (SSE) — an open HTTP stream that pushes one-way messages from the backend.

Heartbeat: every 20 seconds the server sends an empty :keepalive comment. That keeps the connection alive and exposes dead TCP sockets quickly.
Auto-reconnect: the browser EventSource reconnects on its own — the platform sets retry: 5000, so 5 s.
Lost-tail replay: on reconnect the server first replays the last 50 log entries so no gap appears.
Backpressure: each SSE session has a bounded buffer. If the browser falls behind (e.g. tab inactive), the connection is closed cleanly instead of letting the heap grow.

SSE endpoints sit at /runs/{id}/stream/logs, /runs/{id}/stream/phases, /runs/{id}/stream/tokens. If a reverse proxy blocks HTTP streaming (some corporate setups do), the UI falls back to a 2-second polling loop with the same payload.

Test strategy and coverage gate

Tests are not optional; they are part of the architecture. Three pillars:

Unit tests: per service class. Located at app/src/test/java/.../<module>/<…Service>Test.java. Fast, no Spring context, Mockito-based.
Integration tests: with Spring Boot test slices (@WebMvcTest, @DataJpaTest) against H2.
ArchUnit tests: pin layer and module boundaries. Reach from web straight into infrastructure and a test fails.

Coverage gate: JaCoCo 80/80 (lines + branches). A service merged without tests doesn't pass mvn verify. That's intentional — the platform is no longer a prototype.

CI pipeline (.github/workflows/ci.yml): build + test + ArchUnit + JaCoCo + OWASP Dependency-Check + Trivy file scan. OWASP runs non-blocking without an NVD key, Trivy only in its dedicated job.

What is deliberately NOT in the architecture

What the platform omits is as important as what it does. These omissions keep v1 readable and extensible:

No multi-tenant: one instance = one team. Multiple tenants need multiple instances. Multi-tenant sharding is v0.7 at the earliest.
No own web IDE: the platform doesn't open a code editor in the browser. You work on the workspace with your local tools (VS Code, IntelliJ, …).
No own build server: mvn verify or npm run build run inside the workspace, not on a build cluster. Anyone wanting to scale that builds their own infrastructure.
No own LLM hosting: the platform only invokes coding CLIs (Claude Code, Codex, Gemini, Aider). The LLM behind them is the vendor's job. Air-gap setups go local CLI + local LLM (e.g. Ollama).
No real-time co-editing: two users on the same run = last-writer-wins. Optimistic locking lands in v0.6.
No plugin API for external adapters: in v1 adapters are added as Java classes in the codebase, not loaded as plugins. A plugin API is on the backlog (ADR-0014).

Mantra: v1 doesn't need to do everything. It needs to set the right boundaries and extension points cleanly. That keeps the code readable six months from now.

License and identity stack

Beside the product backend there is a separate stack for identity and licensing, decoupled from the platform DB:

Keycloak as the OIDC provider (users, roles, device authorization grant).
License service as a Spring Boot microservice with its own PostgreSQL.
Lease JWT (RS256, 7-day validity) with embedded per-tier limits.
Offline verification in the client via the embedded public key, fail closed on lease expiry.

The client starts in a registration-free DEMO mode (mock adapter only, hard-coded limits, no server contact). After registration via the device flow, a COMMUNITY license is created automatically; an admin can upgrade the tier to FULL in the admin UI. The lease determines which adapter may be used and how run creation and team size are limited.

Where to go next

IntroductionUsage-oriented walkthrough from login to first run.

Quick startShortest path to a first local run with the mock adapter.

TutorialFull workflow against Claude Code.

Whitepaper76 pages of conceptual background.

FAQAnswers to typical questions.

Live demoClick through a real instance.