Architecture Overview¶
AMP is built as a distributed system designed for reliability, scalability, and provider flexibility.
System Architecture¶
graph TB
subgraph "Clients"
Web[Web Dashboard]
CLI[CLI]
SDK[SDKs]
end
subgraph "API Layer"
LB[Load Balancer]
API1[API Server]
API2[API Server]
end
subgraph "Processing Layer"
NATS[NATS JetStream]
W1[Worker]
W2[Worker]
W3[Worker]
end
subgraph "Data Layer"
PG[(PostgreSQL)]
Redis[(Redis)]
end
subgraph "External Services"
Clerk[Clerk Auth]
Claude[Claude]
OpenAI[OpenAI]
Metricool[Metricool]
end
Web --> LB
CLI --> LB
SDK --> LB
LB --> API1
LB --> API2
API1 --> NATS
API2 --> NATS
API1 --> PG
API2 --> PG
API1 --> Redis
API2 --> Redis
NATS --> W1
NATS --> W2
NATS --> W3
W1 --> PG
W2 --> PG
W3 --> PG
W1 --> Claude
W1 --> OpenAI
W1 --> Metricool
API1 --> Clerk
API2 --> Clerk Components¶
API Server¶
The API server handles all HTTP traffic. Built with Go and the Chi router framework.
Responsibilities:
- REST API endpoints
- Request authentication and authorization
- Rate limiting
- Job submission to queue
- Real-time status queries
Key characteristics:
- Stateless — scales horizontally
- Request timeout: 30 seconds
- Graceful shutdown with in-flight request completion
- Structured logging with request correlation
Endpoints structure:
/health → Health check (no auth)
/ready → Readiness check (no auth)
/api/v1/... → API endpoints (authenticated)
Worker Service¶
Workers process jobs from the NATS queue. Each worker handles one job at a time through the complete pipeline.
Responsibilities:
- Job dequeuing from NATS JetStream
- Pipeline stage execution
- Provider API calls (LLM, images, publishing)
- Result persistence
- Retry handling
Key characteristics:
- Configurable concurrency (default: 4 workers per instance)
- Manual acknowledgment for reliability
- Automatic retry with exponential backoff
- Graceful drain on shutdown
PostgreSQL¶
Primary data store for all persistent state.
Stored data:
- User accounts and authentication
- Tenant configuration
- Missions and their parameters
- Generated content
- Publishing history
- Analytics data
- Job state and history
Schema highlights:
- UUID primary keys throughout
- JSONB for flexible structured data (config, metadata)
- Proper foreign key constraints
- Indexes on common query patterns
Redis¶
Caching and ephemeral state.
Uses:
- Session caching
- Rate limit counters
- API response caching
- Real-time job status
- Provider health status
NATS JetStream¶
Durable message queue for async job processing.
Configuration:
- Stream:
AMP_JOBS - Subjects:
amp.jobs.* - Retention: WorkQueue (messages deleted after ack)
- Max delivery attempts: 3
Message flow:
sequenceDiagram
participant API
participant NATS
participant Worker
participant DB
API->>NATS: Publish job (amp.jobs.create)
NATS->>Worker: Deliver message
Worker->>DB: Update job status (running)
Worker->>Worker: Execute pipeline
Worker->>DB: Save results
Worker->>NATS: Acknowledge message Provider Abstraction Layer (PAL)¶
PAL decouples AMP from specific service providers.
graph LR
subgraph "AMP Core"
Pipeline[Pipeline]
PAL[Provider Abstraction Layer]
end
subgraph "LLM Providers"
Claude[Claude]
GPT[OpenAI GPT]
Gemini[Google Gemini]
end
subgraph "Image Providers"
DALLE[DALL-E]
Stability[Stability AI]
Replicate[Replicate]
end
subgraph "Publishing"
Metricool[Metricool]
end
Pipeline --> PAL
PAL --> Claude
PAL --> GPT
PAL --> Gemini
PAL --> DALLE
PAL --> Stability
PAL --> Replicate
PAL --> Metricool Provider Interface¶
All providers implement a common interface:
type Provider interface {
Name() string
Type() ProviderType
Capabilities() []Capability
Health(ctx context.Context) error
}
type LLMProvider interface {
Provider
Complete(ctx context.Context, req CompletionRequest) (*CompletionResponse, error)
Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
}
type ImageProvider interface {
Provider
Generate(ctx context.Context, req ImageRequest) (*ImageResponse, error)
}
Provider Selection¶
PAL selects providers based on:
- Tenant preference — Configured default provider
- Capability match — Required features (vision, long context, etc.)
- Availability — Health check status
- Cost — Within budget constraints
// Selector chooses the best available provider
provider, err := selector.Select(ctx, SelectionCriteria{
Type: ProviderTypeLLM,
Capabilities: []Capability{CapabilityVision},
MaxCost: 0.05, // per request
Tenant: tenant,
})
Request Flow¶
API Request¶
sequenceDiagram
participant Client
participant Middleware
participant Handler
participant Service
participant DB
Client->>Middleware: HTTP Request
Middleware->>Middleware: Generate Request ID
Middleware->>Middleware: Validate Auth
Middleware->>Middleware: Extract Tenant
Middleware->>Middleware: Check Rate Limit
Middleware->>Handler: Authorized Request
Handler->>Service: Business Logic
Service->>DB: Data Operations
DB-->>Service: Result
Service-->>Handler: Response Data
Handler-->>Client: HTTP Response Job Processing¶
sequenceDiagram
participant API
participant NATS
participant Worker
participant PAL
participant Providers
participant DB
API->>DB: Create Job Record
API->>NATS: Publish Job Message
NATS->>Worker: Deliver Job
loop Each Pipeline Stage
Worker->>DB: Update Stage Status
Worker->>PAL: Request Processing
PAL->>Providers: API Call
Providers-->>PAL: Response
PAL-->>Worker: Processed Result
Worker->>DB: Save Stage Output
end
Worker->>DB: Mark Job Complete
Worker->>NATS: Acknowledge Message Deployment Architecture¶
Single Node (Development)¶
graph TB
subgraph "Single Server"
API[API Server]
Worker[Worker]
PG[(PostgreSQL)]
Redis[(Redis)]
NATS[NATS]
end
API --> PG
API --> Redis
API --> NATS
Worker --> PG
Worker --> NATS Production (Recommended)¶
graph TB
subgraph "Load Balancer"
LB[nginx / ALB]
end
subgraph "API Tier"
API1[API Server 1]
API2[API Server 2]
API3[API Server 3]
end
subgraph "Worker Tier"
W1[Worker Pool 1]
W2[Worker Pool 2]
end
subgraph "Data Tier"
PG[(PostgreSQL Primary)]
PG_R[(PostgreSQL Replica)]
Redis[(Redis Cluster)]
NATS[(NATS Cluster)]
end
LB --> API1
LB --> API2
LB --> API3
API1 --> NATS
API2 --> NATS
API3 --> NATS
NATS --> W1
NATS --> W2
API1 --> PG
W1 --> PG
PG --> PG_R
API1 --> Redis Security Architecture¶
Authentication Flow¶
sequenceDiagram
participant Client
participant API
participant Clerk
alt API Key
Client->>API: Authorization: Bearer amp_live_xxx
API->>API: Validate Key Hash
API-->>Client: Authenticated
else JWT Token
Client->>API: Authorization: Bearer eyJ...
API->>Clerk: Verify Token
Clerk-->>API: Valid + Claims
API-->>Client: Authenticated
end Data Isolation¶
Tenant A Tenant B
┌─────────────────┐ ┌─────────────────┐
│ Missions │ │ Missions │
│ Content │ │ Content │
│ Brand Context │ │ Brand Context │
│ Analytics │ │ Analytics │
│ API Keys │ │ API Keys │
└─────────────────┘ └─────────────────┘
│ │
└──────────┬───────────────┘
│
PostgreSQL
(Row-Level Security)
Performance Characteristics¶
| Component | Metric | Target |
|---|---|---|
| API Server | Request latency (p99) | < 100ms |
| API Server | Throughput | 1000 req/s per instance |
| Worker | Job throughput | 10 jobs/min per worker |
| Database | Connection pool | 25 per API instance |
| NATS | Message delivery | < 10ms |
| Redis | Cache hit ratio | > 90% |
Failure Handling¶
API Server Failure¶
- Load balancer health checks detect failure
- Traffic routes to healthy instances
- Stateless design means no session loss
Worker Failure¶
- NATS redelivers unacknowledged messages
- Job resumes from last checkpoint
- Automatic retry with backoff
Database Failure¶
- Connection pool handles transient failures
- Read replicas for query failover
- Point-in-time recovery for data loss
Provider Failure¶
- PAL fallback to alternative provider
- Cached responses for repeated requests
- Circuit breaker prevents cascade