Architecture Overview¶

AMP is built as a distributed system designed for reliability, scalability, and provider flexibility.

System Architecture¶

graph TB
    subgraph "Clients"
        Web[Web Dashboard]
        CLI[CLI]
        SDK[SDKs]
    end

    subgraph "API Layer"
        LB[Load Balancer]
        API1[API Server]
        API2[API Server]
    end

    subgraph "Processing Layer"
        NATS[NATS JetStream]
        W1[Worker]
        W2[Worker]
        W3[Worker]
    end

    subgraph "Data Layer"
        PG[(PostgreSQL)]
        Redis[(Redis)]
    end

    subgraph "External Services"
        Clerk[Clerk Auth]
        Claude[Claude]
        OpenAI[OpenAI]
        Metricool[Metricool]
    end

    Web --> LB
    CLI --> LB
    SDK --> LB

    LB --> API1
    LB --> API2

    API1 --> NATS
    API2 --> NATS
    API1 --> PG
    API2 --> PG
    API1 --> Redis
    API2 --> Redis

    NATS --> W1
    NATS --> W2
    NATS --> W3

    W1 --> PG
    W2 --> PG
    W3 --> PG

    W1 --> Claude
    W1 --> OpenAI
    W1 --> Metricool

    API1 --> Clerk
    API2 --> Clerk

Components¶

API Server¶

The API server handles all HTTP traffic. Built with Go and the Chi router framework.

Responsibilities:

REST API endpoints
Request authentication and authorization
Rate limiting
Job submission to queue
Real-time status queries

Key characteristics:

Stateless — scales horizontally
Request timeout: 30 seconds
Graceful shutdown with in-flight request completion
Structured logging with request correlation

Endpoints structure:

/health         → Health check (no auth)
/ready          → Readiness check (no auth)
/api/v1/...     → API endpoints (authenticated)

Worker Service¶

Workers process jobs from the NATS queue. Each worker handles one job at a time through the complete pipeline.

Responsibilities:

Job dequeuing from NATS JetStream
Pipeline stage execution
Provider API calls (LLM, images, publishing)
Result persistence
Retry handling

Key characteristics:

Configurable concurrency (default: 4 workers per instance)
Manual acknowledgment for reliability
Automatic retry with exponential backoff
Graceful drain on shutdown

PostgreSQL¶

Primary data store for all persistent state.

Stored data:

User accounts and authentication
Tenant configuration
Missions and their parameters
Generated content
Publishing history
Analytics data
Job state and history

Schema highlights:

UUID primary keys throughout
JSONB for flexible structured data (config, metadata)
Proper foreign key constraints
Indexes on common query patterns

Redis¶

Caching and ephemeral state.

Uses:

Session caching
Rate limit counters
API response caching
Real-time job status
Provider health status

NATS JetStream¶

Durable message queue for async job processing.

Configuration:

Stream: AMP_JOBS
Subjects: amp.jobs.*
Retention: WorkQueue (messages deleted after ack)
Max delivery attempts: 3

Message flow:

sequenceDiagram
    participant API
    participant NATS
    participant Worker
    participant DB

    API->>NATS: Publish job (amp.jobs.create)
    NATS->>Worker: Deliver message
    Worker->>DB: Update job status (running)
    Worker->>Worker: Execute pipeline
    Worker->>DB: Save results
    Worker->>NATS: Acknowledge message

Provider Abstraction Layer (PAL)¶

PAL decouples AMP from specific service providers.

graph LR
    subgraph "AMP Core"
        Pipeline[Pipeline]
        PAL[Provider Abstraction Layer]
    end

    subgraph "LLM Providers"
        Claude[Claude]
        GPT[OpenAI GPT]
        Gemini[Google Gemini]
    end

    subgraph "Image Providers"
        DALLE[DALL-E]
        Stability[Stability AI]
        Replicate[Replicate]
    end

    subgraph "Publishing"
        Metricool[Metricool]
    end

    Pipeline --> PAL
    PAL --> Claude
    PAL --> GPT
    PAL --> Gemini
    PAL --> DALLE
    PAL --> Stability
    PAL --> Replicate
    PAL --> Metricool

Provider Interface¶

All providers implement a common interface:

type Provider interface {
    Name() string
    Type() ProviderType
    Capabilities() []Capability
    Health(ctx context.Context) error
}

type LLMProvider interface {
    Provider
    Complete(ctx context.Context, req CompletionRequest) (*CompletionResponse, error)
    Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
}

type ImageProvider interface {
    Provider
    Generate(ctx context.Context, req ImageRequest) (*ImageResponse, error)
}

Provider Selection¶

PAL selects providers based on:

Tenant preference — Configured default provider
Capability match — Required features (vision, long context, etc.)
Availability — Health check status
Cost — Within budget constraints

// Selector chooses the best available provider
provider, err := selector.Select(ctx, SelectionCriteria{
    Type:         ProviderTypeLLM,
    Capabilities: []Capability{CapabilityVision},
    MaxCost:      0.05,  // per request
    Tenant:       tenant,
})

Request Flow¶

API Request¶

sequenceDiagram
    participant Client
    participant Middleware
    participant Handler
    participant Service
    participant DB

    Client->>Middleware: HTTP Request
    Middleware->>Middleware: Generate Request ID
    Middleware->>Middleware: Validate Auth
    Middleware->>Middleware: Extract Tenant
    Middleware->>Middleware: Check Rate Limit
    Middleware->>Handler: Authorized Request
    Handler->>Service: Business Logic
    Service->>DB: Data Operations
    DB-->>Service: Result
    Service-->>Handler: Response Data
    Handler-->>Client: HTTP Response

Job Processing¶

sequenceDiagram
    participant API
    participant NATS
    participant Worker
    participant PAL
    participant Providers
    participant DB

    API->>DB: Create Job Record
    API->>NATS: Publish Job Message
    NATS->>Worker: Deliver Job

    loop Each Pipeline Stage
        Worker->>DB: Update Stage Status
        Worker->>PAL: Request Processing
        PAL->>Providers: API Call
        Providers-->>PAL: Response
        PAL-->>Worker: Processed Result
        Worker->>DB: Save Stage Output
    end

    Worker->>DB: Mark Job Complete
    Worker->>NATS: Acknowledge Message

Deployment Architecture¶

Single Node (Development)¶

graph TB
    subgraph "Single Server"
        API[API Server]
        Worker[Worker]
        PG[(PostgreSQL)]
        Redis[(Redis)]
        NATS[NATS]
    end

    API --> PG
    API --> Redis
    API --> NATS
    Worker --> PG
    Worker --> NATS

Production (Recommended)¶

graph TB
    subgraph "Load Balancer"
        LB[nginx / ALB]
    end

    subgraph "API Tier"
        API1[API Server 1]
        API2[API Server 2]
        API3[API Server 3]
    end

    subgraph "Worker Tier"
        W1[Worker Pool 1]
        W2[Worker Pool 2]
    end

    subgraph "Data Tier"
        PG[(PostgreSQL Primary)]
        PG_R[(PostgreSQL Replica)]
        Redis[(Redis Cluster)]
        NATS[(NATS Cluster)]
    end

    LB --> API1
    LB --> API2
    LB --> API3

    API1 --> NATS
    API2 --> NATS
    API3 --> NATS

    NATS --> W1
    NATS --> W2

    API1 --> PG
    W1 --> PG
    PG --> PG_R

    API1 --> Redis

Security Architecture¶

Authentication Flow¶

sequenceDiagram
    participant Client
    participant API
    participant Clerk

    alt API Key
        Client->>API: Authorization: Bearer amp_live_xxx
        API->>API: Validate Key Hash
        API-->>Client: Authenticated
    else JWT Token
        Client->>API: Authorization: Bearer eyJ...
        API->>Clerk: Verify Token
        Clerk-->>API: Valid + Claims
        API-->>Client: Authenticated
    end

Data Isolation¶

Tenant A                    Tenant B
┌─────────────────┐        ┌─────────────────┐
│ Missions        │        │ Missions        │
│ Content         │        │ Content         │
│ Brand Context   │        │ Brand Context   │
│ Analytics       │        │ Analytics       │
│ API Keys        │        │ API Keys        │
└─────────────────┘        └─────────────────┘
        │                          │
        └──────────┬───────────────┘
                   │
            PostgreSQL
         (Row-Level Security)

Performance Characteristics¶

Component	Metric	Target
API Server	Request latency (p99)	< 100ms
API Server	Throughput	1000 req/s per instance
Worker	Job throughput	10 jobs/min per worker
Database	Connection pool	25 per API instance
NATS	Message delivery	< 10ms
Redis	Cache hit ratio	> 90%

Failure Handling¶

API Server Failure¶

Load balancer health checks detect failure
Traffic routes to healthy instances
Stateless design means no session loss

Worker Failure¶

NATS redelivers unacknowledged messages
Job resumes from last checkpoint
Automatic retry with backoff

Database Failure¶

Connection pool handles transient failures
Read replicas for query failover
Point-in-time recovery for data loss

Provider Failure¶

PAL fallback to alternative provider
Cached responses for repeated requests
Circuit breaker prevents cascade