harb/CI_MIGRATION.md
johba 4277f19b68 feature/ci (#84)
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/84
2026-02-02 19:24:57 +01:00

249 lines
6.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CI Migration: Composite Integration Service (Option A)
## Overview
The E2E pipeline has been refactored to use a **composite integration service** that bundles the entire Harb stack into a single Docker image. This eliminates Docker-in-Docker complexity and significantly speeds up CI runs.
## Architecture
### Before (Docker-in-Docker)
```
Woodpecker Pipeline
├─ Service: docker:dind (privileged)
└─ Step: run-e2e
├─ Install docker CLI + docker-compose
├─ Run ./scripts/dev.sh start (nested containers)
│ ├─ anvil
│ ├─ postgres
│ ├─ bootstrap
│ ├─ ponder
│ ├─ webapp
│ ├─ landing
│ ├─ txn-bot
│ └─ caddy
└─ Run Playwright tests
```
**Issues**:
- ~3-5 minutes stack startup overhead per run
- Complex nested container management
- Docker-in-Docker reliability issues
- Dependency reinstallation in every step
### After (Composite Service)
```
Woodpecker Pipeline
├─ Service: harb/integration (contains full stack)
│ └─ Manages internal docker-compose lifecycle
├─ Step: wait-for-stack (30-60s)
└─ Step: run-e2e-tests (Playwright only)
```
**Benefits**:
-**3-5 minutes faster** - Stack starts in parallel with pipeline setup
-**Simpler** - No DinD complexity, standard service pattern
-**Reliable** - Single health check, clearer failure modes
-**Reusable** - Same image for local testing and CI
## Components
### 1. Integration Image (`docker/Dockerfile.integration`)
- Base: `docker:27-dind`
- Bundles: Full project + docker-compose
- Entrypoint: Starts dockerd + Harb stack automatically
- Healthcheck: Validates GraphQL endpoint is responsive
### 2. CI Compose File (`docker-compose.ci.yml`)
- Simplified interface for local testing
- Exposes port 8081 for stack access
- Persists Docker state in named volume
### 3. New E2E Pipeline (`.woodpecker/e2e-new.yml`)
- Service: `harb/integration` (stack)
- Step 1: Wait for stack health
- Step 2: Run Playwright tests
- Step 3: Collect artifacts
### 4. Build Script (`scripts/build-integration-image.sh`)
- Builds integration image
- Pushes to registry
- Includes local testing instructions
## Migration Steps
### 1. Build the Integration Image
```bash
# Build locally
./scripts/build-integration-image.sh
# Or with custom registry
REGISTRY=localhost:5000 ./scripts/build-integration-image.sh
```
### 2. Push to Registry
```bash
# Login to registry (if using sovraigns.network registry)
docker login registry.sovraigns.network -u ciuser
# Push
docker push registry.sovraigns.network/harb/integration:latest
```
### 3. Activate New Pipeline
```bash
# Backup old E2E pipeline
mv .woodpecker/e2e.yml .woodpecker/e2e-old.yml
# Activate new pipeline
mv .woodpecker/e2e-new.yml .woodpecker/e2e.yml
# Commit changes
git add .woodpecker/e2e.yml docker/ scripts/build-integration-image.sh
git commit -m "ci: migrate E2E to composite integration service"
```
### 4. Update CI Image Build Workflow
Add to release pipeline or create dedicated workflow:
```yaml
# .woodpecker/build-ci-images.yml
kind: pipeline
type: docker
name: build-integration-image
when:
event:
- push
- tag
branch:
- main
- master
steps:
- name: build-and-push
image: docker:27-dind
privileged: true
environment:
DOCKER_HOST: tcp://docker:2375
REGISTRY_USER:
from_secret: registry_user
REGISTRY_PASSWORD:
from_secret: registry_password
commands:
- docker login registry.sovraigns.network -u $REGISTRY_USER -p $REGISTRY_PASSWORD
- ./scripts/build-integration-image.sh
- docker push registry.sovraigns.network/harb/integration:latest
```
## Local Testing
### Test Integration Image Directly
```bash
# Start the stack container
docker run --rm --privileged -p 8081:8081 \
registry.sovraigns.network/harb/integration:latest
# Wait for health (in another terminal)
curl http://localhost:8081/api/graphql
# Run E2E tests against it
npm run test:e2e
```
### Test via docker-compose.ci.yml
```bash
# Start stack
docker-compose -f docker-compose.ci.yml up -d
# Wait for healthy
docker-compose -f docker-compose.ci.yml ps
# Run tests
npm run test:e2e
# Cleanup
docker-compose -f docker-compose.ci.yml down -v
```
## Rollback Plan
If issues arise, revert to old pipeline:
```bash
# Restore old pipeline
mv .woodpecker/e2e-old.yml .woodpecker/e2e.yml
# Commit
git add .woodpecker/e2e.yml
git commit -m "ci: rollback to DinD E2E pipeline"
git push
```
## Performance Comparison
| Metric | Before (DinD) | After (Composite) | Improvement |
|--------|---------------|-------------------|-------------|
| Stack startup | ~180-240s | ~60-90s | **~2-3 min faster** |
| Total E2E time | ~8-10 min | ~5-6 min | **~40% faster** |
| Complexity | High (nested) | Low (standard) | Simpler |
| Reliability | Medium | High | More stable |
## Troubleshooting
### Image build fails
```bash
# Check kraiken-lib builds successfully
./scripts/build-kraiken-lib.sh
# Build with verbose output
docker build -f docker/Dockerfile.integration --progress=plain .
```
### Stack doesn't start in CI
```bash
# Check service logs in Woodpecker
# Services run detached, logs available via Woodpecker UI
# Test locally first
docker run --rm --privileged -p 8081:8081 \
registry.sovraigns.network/harb/integration:latest
```
### Healthcheck times out
- Default timeout: 120s start period + 30 retries × 5s = ~270s max
- First run is slower (pulling images, building)
- Subsequent runs use cached layers (~60-90s)
## Future Improvements
1. **Multi-stage build** - Separate build and runtime images
2. **Layer caching** - Optimize Dockerfile for faster rebuilds
3. **Parallel services** - Start independent services concurrently
4. **Resource limits** - Add memory/CPU constraints for CI
5. **Image variants** - Separate images for different test suites
## Podman to Docker Migration
As part of this work, the Woodpecker agent was migrated from Podman to Docker:
**Changes made**:
- Updated `/etc/woodpecker/agent.env`:
- `WOODPECKER_BACKEND_DOCKER_HOST=unix:///var/run/docker.sock`
- Added `ci` user to `docker` group
- Restarted `woodpecker-agent` service
**Agent label update** (optional, cosmetic):
```bash
# /etc/woodpecker/agent.env
WOODPECKER_AGENT_LABELS=docker=true # (was podman=true)
```
## Questions?
See `CLAUDE.md` for overall stack architecture and `INTEGRATION_TEST_STATUS.md` for E2E test details.