harb/CI_MIGRATION.md
johba 4277f19b68 feature/ci (#84)
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/84
2026-02-02 19:24:57 +01:00

6.3 KiB
Raw Blame History

CI Migration: Composite Integration Service (Option A)

Overview

The E2E pipeline has been refactored to use a composite integration service that bundles the entire Harb stack into a single Docker image. This eliminates Docker-in-Docker complexity and significantly speeds up CI runs.

Architecture

Before (Docker-in-Docker)

Woodpecker Pipeline
├─ Service: docker:dind (privileged)
└─ Step: run-e2e
   ├─ Install docker CLI + docker-compose
   ├─ Run ./scripts/dev.sh start (nested containers)
   │  ├─ anvil
   │  ├─ postgres
   │  ├─ bootstrap
   │  ├─ ponder
   │  ├─ webapp
   │  ├─ landing
   │  ├─ txn-bot
   │  └─ caddy
   └─ Run Playwright tests

Issues:

  • ~3-5 minutes stack startup overhead per run
  • Complex nested container management
  • Docker-in-Docker reliability issues
  • Dependency reinstallation in every step

After (Composite Service)

Woodpecker Pipeline
├─ Service: harb/integration (contains full stack)
│  └─ Manages internal docker-compose lifecycle
├─ Step: wait-for-stack (30-60s)
└─ Step: run-e2e-tests (Playwright only)

Benefits:

  • 3-5 minutes faster - Stack starts in parallel with pipeline setup
  • Simpler - No DinD complexity, standard service pattern
  • Reliable - Single health check, clearer failure modes
  • Reusable - Same image for local testing and CI

Components

1. Integration Image (docker/Dockerfile.integration)

  • Base: docker:27-dind
  • Bundles: Full project + docker-compose
  • Entrypoint: Starts dockerd + Harb stack automatically
  • Healthcheck: Validates GraphQL endpoint is responsive

2. CI Compose File (docker-compose.ci.yml)

  • Simplified interface for local testing
  • Exposes port 8081 for stack access
  • Persists Docker state in named volume

3. New E2E Pipeline (.woodpecker/e2e-new.yml)

  • Service: harb/integration (stack)
  • Step 1: Wait for stack health
  • Step 2: Run Playwright tests
  • Step 3: Collect artifacts

4. Build Script (scripts/build-integration-image.sh)

  • Builds integration image
  • Pushes to registry
  • Includes local testing instructions

Migration Steps

1. Build the Integration Image

# Build locally
./scripts/build-integration-image.sh

# Or with custom registry
REGISTRY=localhost:5000 ./scripts/build-integration-image.sh

2. Push to Registry

# Login to registry (if using sovraigns.network registry)
docker login registry.sovraigns.network -u ciuser

# Push
docker push registry.sovraigns.network/harb/integration:latest

3. Activate New Pipeline

# Backup old E2E pipeline
mv .woodpecker/e2e.yml .woodpecker/e2e-old.yml

# Activate new pipeline
mv .woodpecker/e2e-new.yml .woodpecker/e2e.yml

# Commit changes
git add .woodpecker/e2e.yml docker/ scripts/build-integration-image.sh
git commit -m "ci: migrate E2E to composite integration service"

4. Update CI Image Build Workflow

Add to release pipeline or create dedicated workflow:

# .woodpecker/build-ci-images.yml
kind: pipeline
type: docker
name: build-integration-image

when:
  event:
    - push
    - tag
  branch:
    - main
    - master

steps:
  - name: build-and-push
    image: docker:27-dind
    privileged: true
    environment:
      DOCKER_HOST: tcp://docker:2375
      REGISTRY_USER:
        from_secret: registry_user
      REGISTRY_PASSWORD:
        from_secret: registry_password
    commands:
      - docker login registry.sovraigns.network -u $REGISTRY_USER -p $REGISTRY_PASSWORD
      - ./scripts/build-integration-image.sh
      - docker push registry.sovraigns.network/harb/integration:latest

Local Testing

Test Integration Image Directly

# Start the stack container
docker run --rm --privileged -p 8081:8081 \
  registry.sovraigns.network/harb/integration:latest

# Wait for health (in another terminal)
curl http://localhost:8081/api/graphql

# Run E2E tests against it
npm run test:e2e

Test via docker-compose.ci.yml

# Start stack
docker-compose -f docker-compose.ci.yml up -d

# Wait for healthy
docker-compose -f docker-compose.ci.yml ps

# Run tests
npm run test:e2e

# Cleanup
docker-compose -f docker-compose.ci.yml down -v

Rollback Plan

If issues arise, revert to old pipeline:

# Restore old pipeline
mv .woodpecker/e2e-old.yml .woodpecker/e2e.yml

# Commit
git add .woodpecker/e2e.yml
git commit -m "ci: rollback to DinD E2E pipeline"
git push

Performance Comparison

Metric Before (DinD) After (Composite) Improvement
Stack startup ~180-240s ~60-90s ~2-3 min faster
Total E2E time ~8-10 min ~5-6 min ~40% faster
Complexity High (nested) Low (standard) Simpler
Reliability Medium High More stable

Troubleshooting

Image build fails

# Check kraiken-lib builds successfully
./scripts/build-kraiken-lib.sh

# Build with verbose output
docker build -f docker/Dockerfile.integration --progress=plain .

Stack doesn't start in CI

# Check service logs in Woodpecker
# Services run detached, logs available via Woodpecker UI

# Test locally first
docker run --rm --privileged -p 8081:8081 \
  registry.sovraigns.network/harb/integration:latest

Healthcheck times out

  • Default timeout: 120s start period + 30 retries × 5s = ~270s max
  • First run is slower (pulling images, building)
  • Subsequent runs use cached layers (~60-90s)

Future Improvements

  1. Multi-stage build - Separate build and runtime images
  2. Layer caching - Optimize Dockerfile for faster rebuilds
  3. Parallel services - Start independent services concurrently
  4. Resource limits - Add memory/CPU constraints for CI
  5. Image variants - Separate images for different test suites

Podman to Docker Migration

As part of this work, the Woodpecker agent was migrated from Podman to Docker:

Changes made:

  • Updated /etc/woodpecker/agent.env:
    • WOODPECKER_BACKEND_DOCKER_HOST=unix:///var/run/docker.sock
  • Added ci user to docker group
  • Restarted woodpecker-agent service

Agent label update (optional, cosmetic):

# /etc/woodpecker/agent.env
WOODPECKER_AGENT_LABELS=docker=true  # (was podman=true)

Questions?

See CLAUDE.md for overall stack architecture and INTEGRATION_TEST_STATUS.md for E2E test details.