Skip to content

Architecture Overview

AMP is built as a distributed system designed for reliability, scalability, and provider flexibility.

System Architecture

graph TB
    subgraph "Clients"
        Web[Web Dashboard]
        CLI[CLI]
        SDK[SDKs]
    end

    subgraph "API Layer"
        LB[Load Balancer]
        API1[API Server]
        API2[API Server]
    end

    subgraph "Processing Layer"
        NATS[NATS JetStream]
        W1[Worker]
        W2[Worker]
        W3[Worker]
    end

    subgraph "Data Layer"
        PG[(PostgreSQL)]
        Redis[(Redis)]
    end

    subgraph "External Services"
        Clerk[Clerk Auth]
        Claude[Claude]
        OpenAI[OpenAI]
        Metricool[Metricool]
    end

    Web --> LB
    CLI --> LB
    SDK --> LB

    LB --> API1
    LB --> API2

    API1 --> NATS
    API2 --> NATS
    API1 --> PG
    API2 --> PG
    API1 --> Redis
    API2 --> Redis

    NATS --> W1
    NATS --> W2
    NATS --> W3

    W1 --> PG
    W2 --> PG
    W3 --> PG

    W1 --> Claude
    W1 --> OpenAI
    W1 --> Metricool

    API1 --> Clerk
    API2 --> Clerk

Components

API Server

The API server handles all HTTP traffic. Built with Go and the Chi router framework.

Responsibilities:

  • REST API endpoints
  • Request authentication and authorization
  • Rate limiting
  • Job submission to queue
  • Real-time status queries

Key characteristics:

  • Stateless — scales horizontally
  • Request timeout: 30 seconds
  • Graceful shutdown with in-flight request completion
  • Structured logging with request correlation

Endpoints structure:

/health         → Health check (no auth)
/ready          → Readiness check (no auth)
/api/v1/...     → API endpoints (authenticated)

Worker Service

Workers process jobs from the NATS queue. Each worker handles one job at a time through the complete pipeline.

Responsibilities:

  • Job dequeuing from NATS JetStream
  • Pipeline stage execution
  • Provider API calls (LLM, images, publishing)
  • Result persistence
  • Retry handling

Key characteristics:

  • Configurable concurrency (default: 4 workers per instance)
  • Manual acknowledgment for reliability
  • Automatic retry with exponential backoff
  • Graceful drain on shutdown

PostgreSQL

Primary data store for all persistent state.

Stored data:

  • User accounts and authentication
  • Tenant configuration
  • Missions and their parameters
  • Generated content
  • Publishing history
  • Analytics data
  • Job state and history

Schema highlights:

  • UUID primary keys throughout
  • JSONB for flexible structured data (config, metadata)
  • Proper foreign key constraints
  • Indexes on common query patterns

Redis

Caching and ephemeral state.

Uses:

  • Session caching
  • Rate limit counters
  • API response caching
  • Real-time job status
  • Provider health status

NATS JetStream

Durable message queue for async job processing.

Configuration:

  • Stream: AMP_JOBS
  • Subjects: amp.jobs.*
  • Retention: WorkQueue (messages deleted after ack)
  • Max delivery attempts: 3

Message flow:

sequenceDiagram
    participant API
    participant NATS
    participant Worker
    participant DB

    API->>NATS: Publish job (amp.jobs.create)
    NATS->>Worker: Deliver message
    Worker->>DB: Update job status (running)
    Worker->>Worker: Execute pipeline
    Worker->>DB: Save results
    Worker->>NATS: Acknowledge message

Provider Abstraction Layer (PAL)

PAL decouples AMP from specific service providers.

graph LR
    subgraph "AMP Core"
        Pipeline[Pipeline]
        PAL[Provider Abstraction Layer]
    end

    subgraph "LLM Providers"
        Claude[Claude]
        GPT[OpenAI GPT]
        Gemini[Google Gemini]
    end

    subgraph "Image Providers"
        DALLE[DALL-E]
        Stability[Stability AI]
        Replicate[Replicate]
    end

    subgraph "Publishing"
        Metricool[Metricool]
    end

    Pipeline --> PAL
    PAL --> Claude
    PAL --> GPT
    PAL --> Gemini
    PAL --> DALLE
    PAL --> Stability
    PAL --> Replicate
    PAL --> Metricool

Provider Interface

All providers implement a common interface:

type Provider interface {
    Name() string
    Type() ProviderType
    Capabilities() []Capability
    Health(ctx context.Context) error
}

type LLMProvider interface {
    Provider
    Complete(ctx context.Context, req CompletionRequest) (*CompletionResponse, error)
    Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
}

type ImageProvider interface {
    Provider
    Generate(ctx context.Context, req ImageRequest) (*ImageResponse, error)
}

Provider Selection

PAL selects providers based on:

  1. Tenant preference — Configured default provider
  2. Capability match — Required features (vision, long context, etc.)
  3. Availability — Health check status
  4. Cost — Within budget constraints
// Selector chooses the best available provider
provider, err := selector.Select(ctx, SelectionCriteria{
    Type:         ProviderTypeLLM,
    Capabilities: []Capability{CapabilityVision},
    MaxCost:      0.05,  // per request
    Tenant:       tenant,
})

Request Flow

API Request

sequenceDiagram
    participant Client
    participant Middleware
    participant Handler
    participant Service
    participant DB

    Client->>Middleware: HTTP Request
    Middleware->>Middleware: Generate Request ID
    Middleware->>Middleware: Validate Auth
    Middleware->>Middleware: Extract Tenant
    Middleware->>Middleware: Check Rate Limit
    Middleware->>Handler: Authorized Request
    Handler->>Service: Business Logic
    Service->>DB: Data Operations
    DB-->>Service: Result
    Service-->>Handler: Response Data
    Handler-->>Client: HTTP Response

Job Processing

sequenceDiagram
    participant API
    participant NATS
    participant Worker
    participant PAL
    participant Providers
    participant DB

    API->>DB: Create Job Record
    API->>NATS: Publish Job Message
    NATS->>Worker: Deliver Job

    loop Each Pipeline Stage
        Worker->>DB: Update Stage Status
        Worker->>PAL: Request Processing
        PAL->>Providers: API Call
        Providers-->>PAL: Response
        PAL-->>Worker: Processed Result
        Worker->>DB: Save Stage Output
    end

    Worker->>DB: Mark Job Complete
    Worker->>NATS: Acknowledge Message

Deployment Architecture

Single Node (Development)

graph TB
    subgraph "Single Server"
        API[API Server]
        Worker[Worker]
        PG[(PostgreSQL)]
        Redis[(Redis)]
        NATS[NATS]
    end

    API --> PG
    API --> Redis
    API --> NATS
    Worker --> PG
    Worker --> NATS
graph TB
    subgraph "Load Balancer"
        LB[nginx / ALB]
    end

    subgraph "API Tier"
        API1[API Server 1]
        API2[API Server 2]
        API3[API Server 3]
    end

    subgraph "Worker Tier"
        W1[Worker Pool 1]
        W2[Worker Pool 2]
    end

    subgraph "Data Tier"
        PG[(PostgreSQL Primary)]
        PG_R[(PostgreSQL Replica)]
        Redis[(Redis Cluster)]
        NATS[(NATS Cluster)]
    end

    LB --> API1
    LB --> API2
    LB --> API3

    API1 --> NATS
    API2 --> NATS
    API3 --> NATS

    NATS --> W1
    NATS --> W2

    API1 --> PG
    W1 --> PG
    PG --> PG_R

    API1 --> Redis

Security Architecture

Authentication Flow

sequenceDiagram
    participant Client
    participant API
    participant Clerk

    alt API Key
        Client->>API: Authorization: Bearer amp_live_xxx
        API->>API: Validate Key Hash
        API-->>Client: Authenticated
    else JWT Token
        Client->>API: Authorization: Bearer eyJ...
        API->>Clerk: Verify Token
        Clerk-->>API: Valid + Claims
        API-->>Client: Authenticated
    end

Data Isolation

Tenant A                    Tenant B
┌─────────────────┐        ┌─────────────────┐
│ Missions        │        │ Missions        │
│ Content         │        │ Content         │
│ Brand Context   │        │ Brand Context   │
│ Analytics       │        │ Analytics       │
│ API Keys        │        │ API Keys        │
└─────────────────┘        └─────────────────┘
        │                          │
        └──────────┬───────────────┘
            PostgreSQL
         (Row-Level Security)

Performance Characteristics

Component Metric Target
API Server Request latency (p99) < 100ms
API Server Throughput 1000 req/s per instance
Worker Job throughput 10 jobs/min per worker
Database Connection pool 25 per API instance
NATS Message delivery < 10ms
Redis Cache hit ratio > 90%

Failure Handling

API Server Failure

  • Load balancer health checks detect failure
  • Traffic routes to healthy instances
  • Stateless design means no session loss

Worker Failure

  • NATS redelivers unacknowledged messages
  • Job resumes from last checkpoint
  • Automatic retry with backoff

Database Failure

  • Connection pool handles transient failures
  • Read replicas for query failover
  • Point-in-time recovery for data loss

Provider Failure

  • PAL fallback to alternative provider
  • Cached responses for repeated requests
  • Circuit breaker prevents cascade