Architecture Β· v0.8.1

Architecture overview

This page explains how the Agentic Software Factory is built β€” from the UI layer down to the adapters that talk to the coding CLIs. It targets architects and developers who want a feel for layers, modules, data flows and the key design decisions without diving straight into the source. If you'd rather have a step-by-step usage walkthrough, the introduction is the better starting point.

What is the architectural goal?

The platform is a control plane for AI-assisted software development. It bundles four jobs into a single web application: project capture, artifact generation, run orchestration and quality assessment. You don't drive the Claude Code, Codex, Gemini or Aider CLIs from your shell β€” the platform invokes them as subprocesses, collects their output, persists it in the database and surfaces it live in the UI.

The architecture follows two very classic principles, deliberately not dressed up with buzzwords for v1:

Browser β†’ Spring MVC + Thymeleaf β†’ Application services β†’ Domain model
                                                              ↓
                       PostgreSQL ←─ Repositories ←─ Adapters (Claude / Codex / Gemini / Aider / Mock / Git / Filesystem)
Mantra: The platform is small and readable. Once you understand the module boundaries, you understand the platform.

Layers and modules

Inside the Maven module app, all functional modules live under io.softwarefabrik.app.<module>. Each module has at most four sub-packages:

Per-module layer layout

  • domain/ β€” entities, value objects, repository interfaces, business services. No Spring, JPA or web annotations.
  • application/ β€” use-case services that orchestrate domain methods. This is where transaction boundaries (@Transactional) live.
  • web/ β€” Spring MVC controllers, Thymeleaf bindings, UI DTOs, REST endpoints for SSE.
  • infrastructure/ β€” JPA implementations of repositories, external adapters, filesystem access, scheduled jobs.

Dependency rules

  • domain depends on nothing.
  • application depends only on domain.
  • web depends on application, not directly on domain.
  • infrastructure implements domain ports and may use Spring magic.
  • No module reaches across into another module's domain package β€” only via that module's application service.

ArchUnit tests under app/src/test/java/.../architecture/ pin those rules. A wrong dependency turns CI red β€” architecture is enforced by automated tests, not by good intentions.

v0.8.1 modules at a glance:

agentauditcommonconductorexecutiongitlicensepolicyprojectdefinitionpromptqualitygatereviewrunsecretssecuritysettingsteamvalidationwebwizard

Modules in detail

Short profiles per module β€” handy when reading the code or hunting for a feature:

agent

Manages the agent roles (Architect, Developer, Reviewer, QA, Security, Documentation, Merge/Release) including preferredModel per role. Main classes: AgentDefinition, AgentRoleService. Routes: /agents.

audit

Append-only audit log for security-relevant events β€” login attempts, setting changes, run starts, approval decisions. Main class: AuditEvent. Table audit_event.

common

Shared utilities without business meaning: ID generator, clock abstraction, generic validators. Kept deliberately small.

conductor

New since v0.6.0: before every run writes .claude/settings.local.json (plugin whitelist + skills path) and .claude/agents/<role>.md per active team member into the workspace. Main classes: PluginCatalog, SkillsCatalog, ConductorWorkspaceWriter, ConductorRunPreparation.

execution

The heart of run execution: ExecutionAdapterRegistry, sandbox implementations LocalProcessSandbox + ContainerProcessSandbox with ExecutionSandboxFactory (selection via setting execution.sandbox.variant), adapters for Claude, Codex, Gemini, Aider and Mock. One infrastructure/<name> sub-package per adapter. Under claudecode/, since v0.7.0, additionally ClaudeStreamJsonParser for live token events and TokenEstimator for local pre-estimates.

git

Workspace Git operations: git init, auto-commits, diff and log output for the run detail page. Uses jgit.

license

License logic: DEMO / COMMUNITY / FULL tiers, lease-JWT verification against the Keycloak public key, limit enforcement. Routes: /license, /admin/license.

policy

Approval policies: which phases need a human gate, which run through. Stores ApprovalDecision records.

projectdefinition

The ProjectDefinition entity with all editor fields (vision, audience, tech, architecture, security, …). Routes: /projects, /projects/{id}/edit.

prompt

Generates the six Markdown artifacts (PROJECT.md, INSTRUCTIONS.md, AGENTS.md, WORKFLOW.md, DEFINITION_OF_DONE.md, README.md) from project fields. Templates live under resources/prompts/.

qualitygate

Aggregates reviewer findings into the quality-gate verdict (PASSED / WARNING / FAILED / SKIPPED / ERROR). Special rules for SECURITY/HIGH and ARCHITECTURE/CRITICAL. Routes: /runs/{id}/quality-gate.

review

Reviewer implementations: aider-review, claude-review, security, architecture-reviewer, hallucination-review. CLI invocations in read-only mode or static heuristics.

run

The run model with phases, status, logs, token usage and workspace path. Main classes: Run, RunPhase, RunOrchestrationService, WorkspaceService. Routes: /runs, /runs/{id}.

secrets

Encrypted storage of API keys (Anthropic, OpenAI, Gemini) in Postgres. AES-GCM with master key from SOFTWAREFABRIK_SECRETS_MASTER_KEY. Routes: /integrations.

security

Spring Security configuration: login, bootstrap admin, role model (USER / ADMIN), CSRF, BCrypt passwords. Main class: SecurityConfig.

settings

Global platform defaults at /einstellungen with a 5-minute TTL cache and override order PROJECT > USER > GLOBAL > YAML. Table app_setting (V9). Since v0.7.0 also the key execution.sandbox.variant for the container sandbox.

team

Per-project team composition from agent roles. Serialised into AGENTS.md.

validation

Per-run build validation: mvn verify, npm run build or a configured build command. Writes BuildResult.

web

UI cross-cutting concerns: global layouts, kopfbereich.html, theme toggle, dashboard controller. No business logic.

wizard

The four-step assistant at /wizard. Main classes: WizardController, WizardService, WizardDraft, TemplateRegistry, ToggleRegistry, VersionLookupClient, WizardCostEstimator. Tables wizard_draft (V12), version_cache (V13). Six templates registered (Spring Boot, Static Frontend, .NET, Python, Node, Existing-Repo-Import); v0.8.x adds progress stepper, objective preview and local cost estimate in step 4.

Data flows: from wizard to run

A typical project lifecycle has five stations. Each one writes persistently to Postgres, each is visible in the UI:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Wizard   β”‚ β†’ β”‚ Project    β”‚ β†’ β”‚ Artifacts  β”‚ β†’ β”‚ Run   β”‚ β†’ β”‚ Quality gate β”‚
β”‚ /wizard  β”‚   β”‚ (DRAFT)    β”‚   β”‚ (Markdown) β”‚   β”‚       β”‚   β”‚ (Verdict)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚              β”‚                β”‚              β”‚                β”‚
     β–Ό              β–Ό                β–Ό              β–Ό                β–Ό
 wizard_draft   project_def   prompt_artifact   run, run_phase    quality_gate_run
                                                run_log,
                                                token_usage
  1. Wizard collects answers in wizard_draft (survives browser reload and server restart). On completion a ProjectDefinition is created and the draft flips to completed.
  2. ProjectDefinition holds all editor fields. Status starts as DRAFT and moves to READY after artifact generation.
  3. Artifacts are produced by PromptAssemblyService from project fields. They're editable β€” the agent always reads the most recently saved version.
  4. Run orchestrates the phases. RunOrchestrationService is asynchronous (Spring @Async); the UI doesn't poll, it listens via SSE.
  5. Quality gate fires at the end or on-demand, runs reviewers in parallel, aggregates findings into a verdict.

Persistence: Postgres + Flyway

The platform uses PostgreSQL as the single data source. Schema changes happen exclusively via Flyway migrations under app/src/main/resources/db/migration/. Each migration is numbered (V<n>__<name>.sql) and immutable once it's been in production β€” new changes go in as additional migrations, never as edits to old ones.

MigrationContent
V1Initial schema: project_definition, agent_definition, team, run, run_phase.
V2Logs and token usage: run_log, token_usage_event.
V3Approval policies: approval_policy, approval_decision.
V4Encrypted secrets: integration_secret.
V5Audit log: audit_event.
V6Quality-gate tables: quality_gate_run, review_finding.
V7License tables: license, license_lease.
V8Build results: build_result.
V9v0.4.0: app_setting with scope column (GLOBAL / USER / PROJECT) and audit trail.
V10Run tags and quick-start markers.
V11v0.4.0: agent_preferred_model β€” column for role-specific model selection.
V12v0.4.0: wizard_draft β€” JSON state column, step counter, template ID, cleanup TTL.
V13v0.4.0: version_cache β€” cache key, JSON value, refreshed_at, last_error. Java-computed staleness, no DB computed column (so the H2 test profile keeps working).
V14v0.6.0: extended agent_definition for the conductor (mission, active skills).
V15v0.7.0: run_template β€” adapter, team, objective, optional project scope, audit fields. Source for "start run from template".
Test strategy: Tests run against H2 in Postgres-compatibility mode. To make that work, migrations avoid Postgres-specific features like GENERATED ALWAYS AS. Where unavoidable, a db.migration.h2/ override file fills the gap.

Adapter registry

Adapters are Spring beans that implement the ExecutionAdapter interface. At startup the ExecutionAdapterRegistry collects all available beans into a map <name β†’ adapter>. The name comes from a constant on the adapter (e.g. "claudecode", "codex", "gemini", "aider", "mock").

When a run starts, the adapter is resolved in this order:

  1. Run-specific override β€” if the user picked an adapter when creating the run.
  2. Project default β€” if the project has a preferredAdapter.
  3. User default β€” if the logged-in user set a USER-scope setting.
  4. Global default β€” from app_setting with scope GLOBAL.
  5. YAML default β€” from application.yml as a last fallback.

This override order β€” PROJECT > USER > GLOBAL > YAML β€” is implemented by SettingService and applies everywhere on the platform, not just to adapters.

// simplified
ExecutionAdapter adapter = registry.byName(
    settingService.resolve("execution.adapter.default", scope)
);
Mock adapter: always registered, works without an API key. It writes deterministic pseudo tokens into the workspace and is ideal for learning the mechanics or reproducing bugs locally β€” at zero cost.

Version cache and @Scheduled jobs

So the wizard can always pre-fill current stable versions, the platform keeps a cache of version lookups. The cache refreshes daily at 03:00 via a @Scheduled job; a second job at 04:00 deletes expired wizard_draft rows (TTL 30 days).

Version lookups follow the strategy pattern: a VersionLookupClient interface, three implementations:

The cache key is a functional constant, not a technical path β€” e.g. spring-boot.3.x, archunit.latest, owasp.dependency-check.latest, trivy.cli.latest, playwright.npm.latest. That lets us swap the source without breaking consumers.

Staleness is computed in Java (threshold: 25 hours, a small buffer over the 03:00 job): now - refreshed_at > 25h. We deliberately do not use a Postgres GENERATED column, because the H2 test profile would trip over it.

Admins can inspect the cache at /einstellungen/wizard/versions. Each row shows cache key, returned value, refreshed_at, last_error and a Stale badge; a button "Refresh now" triggers a synchronous lookup for that key.

Sandbox model

Every run gets its own workspace β€” a local directory under SOFTWAREFABRIK_WORKSPACES_ROOT/<project-slug>/<run-id>. The adapter (e.g. claudecode) is launched in that directory as its own process (LocalProcessSandbox). Each run thus has its own Git history, its own build output, its own node_modules or target folder β€” runs cannot accidentally clobber each other.

LocalProcessSandbox sets environment variables cleanly per process (no System.setenv at platform level), terminates processes hard on cancel/timeout (destroyForcibly()) and writes stdout and stderr line-by-line into the run_log table while pushing the same lines into the SSE stream.

Container variant: Since v0.7.0 a ContainerProcessSandbox ships alongside (ADR-0011 Accepted, variant B). It starts every agent in an ephemeral Docker or Podman container with --cpus 2 --memory 4g --pids-limit 512 --read-only, a bindmount on the workspace and --network=none by default. Enabled via setting execution.sandbox.variant=container; if Docker is missing from the PATH, the ExecutionSandboxFactory falls back to the local variant with a log warning. Single-tenant hosts notice nothing; Enterprise tier can flip without code changes.

Security: bootstrap admin, audit, settings, approvals

Security in this architecture is not one module, it's a cross-cutting discipline. The points that matter:

Settings architecture

The settings module is the platform's control panel. It combines three design choices that work together:

  1. Scope hierarchy: every setting belongs to a scope (GLOBAL, USER, PROJECT). SettingService.resolve(key, context) searches narrow-to-wide: PROJECT first, then USER, then GLOBAL, then YAML.
  2. 5-minute TTL cache: SettingService caches resolved values, so a resolve is essentially free 99 % of the time. UI changes don't hard-invalidate the cache β€” at most 5 minutes later the new value is live. No app restart required.
  3. Audit trail: every write to app_setting creates an audit_event entry with subject, key, old value, new value. Who changed what and when is always traceable.

Concrete settings managed today:

KeyMeaningTypical value
workspace.rootRoot directory for all run workspaces/var/softwarefabrik/workspaces
git.user.nameGit author for auto-commitsSoftware Factory Bot
git.user.emailGit emailbot@softwarefabrik.local
execution.adapter.defaultDefault execution adapterclaudecode
execution.model.claudecodeDefault model for Claude Codeclaude-sonnet-4-6
execution.model.codexDefault model for Codexgpt-5
budget.tokens.dailyDaily token cap2000000
budget.tokens.weeklyWeekly token cap10000000
budget.threshold.softSoft threshold in percent (warns only)80

Live streaming and SSE

The run detail page streams three things live from the backend to the browser: logs, phase updates and token counter. Instead of polling the platform uses Server-Sent Events (SSE) β€” an open HTTP stream that pushes one-way messages from the backend.

SSE endpoints sit at /runs/{id}/stream/logs, /runs/{id}/stream/phases, /runs/{id}/stream/tokens. If a reverse proxy blocks HTTP streaming (some corporate setups do), the UI falls back to a 2-second polling loop with the same payload.

Test strategy and coverage gate

Tests are not optional; they are part of the architecture. Three pillars:

Coverage gate: JaCoCo 80/80 (lines + branches). A service merged without tests doesn't pass mvn verify. That's intentional β€” the platform is no longer a prototype.

CI pipeline (.github/workflows/ci.yml): build + test + ArchUnit + JaCoCo + OWASP Dependency-Check + Trivy file scan. OWASP runs non-blocking without an NVD key, Trivy only in its dedicated job.

What is deliberately NOT in the architecture

What the platform omits is as important as what it does. These omissions keep v1 readable and extensible:

Mantra: v1 doesn't need to do everything. It needs to set the right boundaries and extension points cleanly. That keeps the code readable six months from now.

License and identity stack

Beside the product backend there is a separate stack for identity and licensing, decoupled from the platform DB:

The client starts in a registration-free DEMO mode (mock adapter only, hard-coded limits, no server contact). After registration via the device flow, a COMMUNITY license is created automatically; an admin can upgrade the tier to FULL in the admin UI. The lease determines which adapter may be used and how run creation and team size are limited.

Where to go next

IntroductionUsage-oriented walkthrough from login to first run.
Quick startShortest path to a first local run with the mock adapter.
TutorialFull workflow against Claude Code.
Whitepaper76 pages of conceptual background.
FAQAnswers to typical questions.
Live demoClick through a real instance.