refactor: consolidate CI and local dev orchestration (#108)

## Summary
- Extract shared bootstrap functions into `scripts/bootstrap-common.sh` (eliminates ~120 lines of duplicated forge/cast commands from e2e.yml)
- Create reusable `scripts/wait-for-service.sh` for health checks (replaces 60-line inline wait-for-stack)
- Merge dev and CI entrypoints into unified scripts branching on `CI` env var (delete `docker/ci-entrypoints/`)
- Replace 4 per-service CI Dockerfiles with parameterized `docker/Dockerfile.service-ci`
- Add `sync-tax-rates.mjs` to CI image builder stage
- Fix: CI now grants txnBot recenter access (was missing)
- Fix: txnBot funding parameterized (CI=10eth, local=1eth)
- Delete 5 obsolete migration docs and 4 DinD integration files

Net: -1540 lines removed

Closes #107

## Test plan
- [ ] E2E pipeline passes (bootstrap sources shared script, services use old images with commands override)
- [ ] build-ci-images pipeline builds all 4 services with unified Dockerfile
- [ ] Local dev stack boots via `./scripts/dev.sh start`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/harb/pulls/108
This commit is contained in:
johba 2026-02-03 12:07:28 +01:00
parent 4277f19b68
commit e5e1308e72
45 changed files with 882 additions and 2627 deletions

View file

@ -11,13 +11,33 @@
3. **Compete** - Snatch undervalued positions to optimise returns.
## Operating the Stack
- Start everything with `nohup ./scripts/dev.sh start &` and stop via `./scripts/dev.sh stop`. Do not launch services individually.
- **Restart modes** for faster iteration:
- `./scripts/dev.sh restart --light` - Fast restart (~10-20s): only webapp + txnbot, preserves Anvil/Ponder state. Use for frontend changes.
- `./scripts/dev.sh restart --full` - Full restart (~3-4min): redeploys contracts, fresh state. Use for contract changes.
- Supported environments: `BASE_SEPOLIA_LOCAL_FORK` (default Anvil fork), `BASE_SEPOLIA`, and `BASE`. Match contract addresses and RPCs accordingly.
- The stack uses Docker containers orchestrated via docker-compose. The script boots Anvil, deploys contracts, seeds liquidity, starts Ponder, launches the landing site, and runs the txnBot. Wait for logs to settle before manual testing.
- **Prerequisites**: Docker Engine (Linux) or Colima (Mac). See installation instructions below.
### Quick Start
```bash
nohup ./scripts/dev.sh start & # start (takes ~3-6 min first time)
tail -f nohup.out # watch progress
./scripts/dev.sh health # verify all services healthy
./scripts/dev.sh stop # stop and clean up
```
Do not launch services individually — `dev.sh` enforces phased startup with health gates.
### Restart Modes
- `./scripts/dev.sh restart --light` — Fast (~10-20s): only webapp + txnbot, preserves Anvil/Ponder state. Use for frontend changes.
- `./scripts/dev.sh restart --full` — Full (~3-6min): redeploys contracts, fresh state. Use for contract changes.
### Common Pitfalls
- **Docker disk full**: `dev.sh start` refuses to run if Docker disk usage exceeds 20GB. Fix: `./scripts/dev.sh stop` (auto-prunes) or `docker system prune -af --volumes`.
- **Stale Ponder state**: If Ponder fails with schema errors after contract changes, delete its state: `rm -rf services/ponder/.ponder/` then `./scripts/dev.sh restart --full`.
- **kraiken-lib out of date**: If services fail with import errors or missing exports, rebuild: `./scripts/build-kraiken-lib.sh`. The dev script does this automatically on `start`, but manual rebuilds are needed if you change kraiken-lib while the stack is already running.
- **Container not found errors**: `dev.sh` expects Docker Compose v2 container names (`harb-anvil-1`, hyphens not underscores). Verify with `docker compose version`.
- **Port conflicts**: The stack uses ports 8545 (Anvil), 5173 (webapp), 5174 (landing), 42069 (Ponder), 43069 (txnBot), 8081 (Caddy). Check with `lsof -i :<port>` if startup fails.
- **npm ci failures in containers**: Named Docker volumes cache `node_modules/`. If dependencies change and installs fail, remove the volume: `docker volume rm harb_webapp_node_modules` (or similar), then restart.
### Environments
Supported: `BASE_SEPOLIA_LOCAL_FORK` (default Anvil fork), `BASE_SEPOLIA`, and `BASE`. Match contract addresses and RPCs accordingly.
### Prerequisites
Docker Engine (Linux) or Colima (Mac). See `docs/docker.md` for installation.
## Component Guides
- `onchain/` - Solidity + Foundry contracts, deploy scripts, and fuzzing helpers ([details](onchain/AGENTS.md)).
@ -30,13 +50,13 @@
- Contracts: run `forge build`, `forge test`, and `forge snapshot` inside `onchain/`.
- Fuzzing: scripts under `onchain/analysis/` (e.g., `./analysis/run-fuzzing.sh [optimizer] debugCSV`) generate replayable scenarios.
- Integration: after the stack boots, inspect Anvil logs, hit `http://localhost:8081/api/graphql` for Ponder, and poll `http://localhost:8081/api/txn/status` for txnBot health.
- **E2E Tests**: Playwright-based full-stack tests in `tests/e2e/` verify complete user journeys (mint ETH → swap KRK → stake). Run with `npm run test:e2e` from repo root. Tests use mocked wallet provider with Anvil accounts and automatically start/stop the stack. See `INTEGRATION_TEST_STATUS.md` and `SWAP_VERIFICATION.md` for details.
- **E2E Tests**: Playwright-based full-stack tests in `tests/e2e/` verify complete user journeys (mint ETH → swap KRK → stake). Run with `npm run test:e2e` from repo root. Tests use mocked wallet provider with Anvil accounts. In CI, the Woodpecker e2e pipeline runs these against pre-built service images.
## Version Validation System
- **Contract VERSION**: `Kraiken.sol` exposes a `VERSION` constant (currently v1) that must be incremented for breaking changes to TAX_RATES, events, or core data structures.
- **Ponder Validation**: On startup, Ponder reads the contract VERSION and validates against `COMPATIBLE_CONTRACT_VERSIONS` in `kraiken-lib/src/version.ts`. Fails hard (exit 1) on mismatch to prevent indexing wrong data.
- **Frontend Check**: Web-app validates `KRAIKEN_LIB_VERSION` at runtime (currently placeholder; future: query Ponder GraphQL for full 3-way validation).
- **CI Enforcement**: GitHub workflow validates that contract VERSION is in `COMPATIBLE_CONTRACT_VERSIONS` before merging PRs.
- **CI Enforcement**: Woodpecker `release.yml` pipeline validates that contract VERSION matches `COMPATIBLE_CONTRACT_VERSIONS` before release.
- See `VERSION_VALIDATION.md` for complete architecture, workflows, and troubleshooting.
## Docker Installation & Setup
@ -48,7 +68,8 @@
docker ps # verify installation
```
- **Container Orchestration**: `docker-compose.yml` has NO `depends_on` declarations. All service ordering is handled in `scripts/dev.sh` via phased startup with explicit health checks.
- **Startup Phases**: (1) Create all containers, (2) Start anvil+postgres and wait for healthy, (3) Start bootstrap and wait for completion, (4) Start ponder and wait for healthy, (5) Start webapp/landing/txn-bot, (6) Start caddy.
- **Startup Phases**: (1) Start anvil+postgres and wait for healthy, (2) Start bootstrap and wait for exit, (3) Start ponder and wait for healthy, (4) Start webapp/landing/txn-bot, (5) Start caddy, (6) Smoke test via `scripts/wait-for-service.sh`.
- **Shared Bootstrap**: Contract deployment, seeding, and funding logic lives in `scripts/bootstrap-common.sh`, sourced by both `containers/bootstrap.sh` (local dev) and `scripts/ci-bootstrap.sh` (CI). Constants (FEE_DEST, WETH, SWAP_ROUTER, default keys) are defined once there.
- **Logging Configuration**: All services have log rotation configured (max 10MB per file, 3 files max = 30MB per container) to prevent disk bloat. Logs are automatically rotated by Docker.
- **Disk Management** (Portable, No Per-Machine Setup Required):
- **20GB Hard Limit**: The stack enforces a 20GB total Docker disk usage limit (images + containers + volumes + build cache).
@ -98,8 +119,8 @@
- **Logs**: `journalctl -u woodpecker-server -f` (NOT `docker logs`)
### Pipeline Configs
- `.woodpecker/build-ci-images.yml` — Builds Docker CI images. Triggers on **push** to `master` or `feature/ci` when files in `docker/`, `.woodpecker/`, `kraiken-lib/`, `onchain/out/`, or `web-app/` change.
- `.woodpecker/e2e.yml` — Runs Playwright E2E tests. Triggers on **pull_request** to `master`.
- `.woodpecker/build-ci-images.yml` — Builds Docker CI images using unified `docker/Dockerfile.service-ci`. Triggers on **push** to `master` or `feature/ci` when files in `docker/`, `.woodpecker/`, `containers/`, `kraiken-lib/`, `onchain/`, `services/`, `web-app/`, or `landing/` change.
- `.woodpecker/e2e.yml` — Runs Playwright E2E tests. Bootstrap step sources `scripts/bootstrap-common.sh` for shared deploy/seed logic. Health checks use `scripts/wait-for-service.sh`. Triggers on **pull_request** to `master`.
- Pipeline numbering: even = build-ci-images (push events), odd = E2E (pull_request events). This is not guaranteed but was the observed pattern.
### Monitoring Pipelines via DB
@ -133,17 +154,47 @@ PGPASSWORD='<db_password>' psql -h 127.0.0.1 -U woodpecker -d woodpecker -c \
- **API auth limitation**: The server caches user token hashes in memory. Inserting a token directly into the DB does not work without restarting the server (`sudo systemctl restart woodpecker-server`).
### CI Docker Images
- `docker/Dockerfile.webapp-ci` — Webapp CI image with Vite dev server.
- **Symlinks fix** (lines 57-59): Creates `/web-app`, `/kraiken-lib`, `/onchain` symlinks to work around Vite's `removeBase()` stripping `/app/` prefix from filesystem paths.
- `docker/Dockerfile.service-ci` — Unified parameterized Dockerfile for all service CI images (ponder, webapp, landing, txnBot). Uses `--build-arg` for service-specific configuration (SERVICE_DIR, SERVICE_PORT, ENTRYPOINT_SCRIPT, NEEDS_SYMLINKS, etc.).
- **sync-tax-rates**: Builder stage runs `scripts/sync-tax-rates.mjs` to sync tax rates from `Stake.sol` into kraiken-lib before TypeScript compilation.
- **Symlinks fix** (webapp only, `NEEDS_SYMLINKS=true`): Creates `/web-app`, `/kraiken-lib`, `/onchain` symlinks to work around Vite's `removeBase()` stripping `/app/` prefix from filesystem paths.
- **CI env detection** (`CI=true`): Disables Vue DevTools plugin in `vite.config.ts` to prevent 500 errors caused by path resolution issues with `/app/` base path.
- **HEALTHCHECK**: `--retries=84 --interval=5s` = 420s (7 min) total wait, aligned with `wait-for-stack` step timeout.
- **HEALTHCHECK**: Configurable via build args; webapp uses `--retries=84 --interval=5s` = 420s (7 min), aligned with `wait-for-stack` step timeout.
- **Shared entrypoints**: Each service uses a unified entrypoint script (`containers/<service>-entrypoint.sh`) that branches on `CI=true` env var for CI vs local dev paths. Common helpers in `containers/entrypoint-common.sh`.
- **Shared bootstrap**: `scripts/bootstrap-common.sh` contains shared contract deployment, seeding, and funding functions used by both `containers/bootstrap.sh` (local dev) and `.woodpecker/e2e.yml` (CI).
- CI images are tagged with git SHA and `latest`, pushed to a local registry.
### CI Agent & Registry Auth
- **Agent**: Runs as user `ci` (uid 1001) on `harb-staging`, same host as the dev environment. Binary at `/usr/local/bin/woodpecker-agent`.
- **Registry credentials**: The `ci` user must have Docker auth configured at `/home/ci/.docker/config.json` to pull private images from `registry.niovi.voyage`. If images fail to pull with "no basic auth credentials", fix with:
```bash
sudo mkdir -p /home/ci/.docker
sudo cp /home/debian/.docker/config.json /home/ci/.docker/config.json
sudo chown -R ci:ci /home/ci/.docker
sudo chmod 600 /home/ci/.docker/config.json
```
- **Shared Docker daemon**: The `ci` and `debian` users share the same Docker daemon. Running `docker system prune` as `debian` removes images cached for CI pipelines. If CI image pulls fail after a prune, either fix registry auth (above) or pre-pull images as `debian`: `docker pull registry.niovi.voyage/harb/ponder-ci:latest` etc.
### CI Debugging Tips
- If pipelines aren't being created after a push, check Codeberg webhook delivery logs first.
- The Woodpecker server needs `sudo` to restart. Without it, you cannot: refresh API tokens, clear cached state, or recover from webhook auth issues.
- E2E pipeline failures often come from `wait-for-stack` timing out. Check the webapp HEALTHCHECK alignment and Ponder indexing time.
- The `web-app/vite.config.ts` `allowedHosts` array must include container hostnames (`webapp`, `caddy`) for health checks to succeed inside Docker networks.
- **Never use `bash -lc`** in Woodpecker pipeline commands — login shell resets PATH via `/etc/profile`, losing Foundry and other tools set by Docker ENV. Use `bash -c` instead.
## Codeberg API Access
- **Auth**: Codeberg API tokens are stored in `~/.netrc` (standard `curl --netrc` format, `chmod 600`):
```
machine codeberg.org
login johba
password <api-token>
```
The `password` field holds the API token — this is standard `.netrc` convention, not an actual password.
- **Generate tokens** at `https://codeberg.org/user/settings/applications`.
- **Usage**: Pass `--netrc` to curl for authenticated Codeberg API calls:
```bash
curl --netrc -s https://codeberg.org/api/v1/repos/johba/harb/issues | jq '.[0].title'
```
- **Note**: The repo uses SSH for git push/pull (`ssh://git@codeberg.org`), so `.netrc` is only used for REST API interactions (issues, PRs, releases).
## References
- Deployment history: `onchain/deployments-local.json`, `onchain/broadcast/`.