Troubleshooting
Start with glci doctor — it runs pre-flight checks on Docker, the daemon, your GitLab token, CI config, and git repository in one command:
glci doctor
Daemon logs#
Check the daemon log first when something goes wrong:
glci daemon logs # last 50 lines
glci daemon logs -F # follow in real time
glci daemon logs -n 0 # full log
Or read directly:
cat ~/.glci/daemon.log
tail -f ~/.glci/daemon.log
Daemon won’t start#
If glci run fails with “daemon did not start within 30s”:
| Cause | Fix |
|---|---|
| Port conflict (stale socket) | rm ~/.glci/daemon.sock |
| Stale PID file | rm ~/.glci/daemon.pid |
| Permission error | Ensure ~/.glci/ is writable |
Force a clean restart:
glci daemon stop --force
glci daemon start
Daemon crashes or misbehaves#
The daemon automatically recovers from crashes on startup. If issues persist:
glci daemon status
glci daemon stop --force
glci daemon start
# Nuclear option: clean all daemon state
glci daemon stop --force
rm -rf ~/.glci/daemon.pid ~/.glci/daemon.sock ~/.glci/daemon.log
glci daemon start
Docker image missing or stale#
If glci run fails with “glci:local Docker image not found” or “image is stale”:
make docker
The CLI checks that the glci:local image matches the binary’s build commit. After upgrading, always re-run make docker.
Segmentation faults on Apple Silicon (Colima)#
If jobs crash with signal: segmentation fault (core dumped) from Go toolchain binaries, Colima needs Rosetta enabled:
colima stop
colima delete # needed if changing --vm-type
colima start --vm-type=vz --vz-rosetta --cpu 12 --memory 16
Pipeline hangs or won’t cancel#
glci stop <pipeline-id>
glci stop <id> handles orphaned pipelines automatically — if the daemon lost track of a pipeline (e.g., after a restart), it force-removes leftover containers and the pipeline network, then marks it as canceled in history. Restarting the daemon is only needed if force-stop itself doesn’t resolve the issue:
glci daemon stop
glci daemon start
“Waiting for pipeline preparation to finish…”#
What you see: Running glci run shows Waiting for pipeline preparation to finish... and does not start immediately.
Cause: The daemon serializes pipeline preparation per directory. Another glci run in the same project is already being prepared, so your request is queued until it finishes.
This is normal. The pipeline will start automatically once the earlier preparation completes. Press Ctrl+C to cancel if you don’t want to wait.
Config template errors#
Template parse error#
What you see: runner.config_template: invalid Go template: ... or runners.<name>.config_template: invalid Go template: ... in glci config output.
Cause: The Go template syntax is invalid (unclosed braces, unknown functions, etc.).
Fix: Check your template syntax. Common mistakes:
| Error | Fix |
|---|---|
unexpected "}" in command | Missing opening {{ |
function "xyz" not defined | Only standard Go template functions are available |
unexpected EOF | Unclosed {{ block |
Template render error#
What you see: rendering custom template for runner "<name>": ... in daemon logs.
Cause: The template references a field that doesn’t exist in the template context.
Fix: Use only fields from the template variable reference. Common fields: .URL, .Executor, .DefaultImage, .PullPolicy.
Named runner name invalid#
What you see: runners.<name>: name must match [a-zA-Z0-9][a-zA-Z0-9_-]* in glci config output.
Fix: Runner names must start with a letter or digit and contain only letters, digits, hyphens, and underscores.
Jobs not routing to named runner#
What you see: A job runs on the default runner instead of the named runner.
Cause: None of the job’s CI tags: match any named runner name. Tag matching is exact and case-sensitive.
Fix: Ensure the job has a tags: entry that exactly matches the runner name defined in [runners.<name>]:
# .gitlab-ci.yml
my-job:
tags: [gpu] # must match [runners.gpu] in config
script: echo "runs on GPU runner"
Docker networking issues#
Each build gets its own Docker network via FF_NETWORK_PER_BUILD. Job containers reach the mock server via extra_hosts (host-gateway). The runner and mock containers share the per-pipeline network for Docker DNS resolution. Problems here usually surface as connection timeouts or host resolution failures inside jobs.
Container can’t reach the mock server#
What you see: Jobs fail with errors like connection refused, no such host, or could not resolve host when trying to reach $CI_SERVER_URL or $CI_REGISTRY.
| Cause | Fix |
|---|---|
| Mock server container crashed | Check glci daemon logs for mock server not healthy errors. Restart the daemon. |
| Network was removed mid-run | Run glci daemon stop --force && glci daemon start to recreate networks. |
| Firewall blocking container-to-container traffic | On Linux, check iptables -L -n for DROP rules on the docker0 or br-* interfaces. |
Debugging steps:
# List pipeline networks
docker network ls --filter name=glci-net-
# Inspect a specific pipeline network — look for Containers section
docker network inspect glci-net-<pipeline-id>
# Check if the mock server container is running and attached
docker ps --filter name=glci-mock
DNS resolution failures inside jobs#
What you see: getaddrinfo or DNS resolution failed errors for external hosts (e.g., registry.gitlab.com, github.com).
Cause: The job container’s DNS resolver can’t reach an upstream DNS server. Common with custom Docker networks and restricted host DNS configs.
Fix: Check the host’s /etc/resolv.conf or Docker’s DNS settings. If using Colima, restart with --dns to override:
colima start --dns 8.8.8.8 --dns 1.1.1.1
Or configure extra hosts in .glciconfig.toml to bypass DNS for known hosts:
[network.extra_hosts]
entries = ["internal-registry.corp:10.0.0.50"]
Port conflicts#
What you see: address already in use errors in the daemon log when starting a pipeline.
Cause: The mock server requires port 39741 (default) on the Docker host. This port may be taken by another process or a previous daemon that didn’t shut down cleanly. Registry listeners can also conflict if their bind addresses overlap.
Fix:
# Find what's using the port
lsof -i :39741
# Force-restart the daemon to clean up stale listeners
glci daemon stop --force
glci daemon start
You can change the mock server port or registry bind addresses in ~/.glci/config.toml:
[network]
mock_server_port = 39741 # default; change requires daemon restart
registry_bind = "127.0.0.1:0" # HTTPS listener
registry_http_bind = "0.0.0.0:0" # HTTP listener
Note: Changing
mock_server_portrequires a full daemon restart to take effect.
Token & authentication failures#
GitLab token not found#
What you see: gitlab token not set (set GITLAB_TOKEN env var or use --token) or no GitLab token or project configured, using offline parser.
Cause: No token is available, so glci falls back to the offline parser. This means include: project:, include: component:, and remote CI/CD variable fetching are all disabled.
Fix: Configure a token using any of these methods (first match wins):
# Option 1: environment variable
export GITLAB_TOKEN="glpat-xxxxxxxxxxxxxxxxxxxx"
# Option 2: glab CLI (token is picked up automatically)
glab auth login
# Option 3: config file
cat >> .glciconfig.toml <<'EOF'
[gitlab]
token = "$GITLAB_TOKEN" # env var references are expanded
EOF
# Option 4: command-line flag
glci run --token "glpat-xxxxxxxxxxxxxxxxxxxx"
For self-managed GitLab instances, also set the URL:
export GITLAB_URL="https://gitlab.example.com"
# or in .glciconfig.toml:
# [gitlab]
# url = "https://gitlab.example.com"
CI_JOB_TOKEN permissions differ from production#
What you see: API calls inside jobs succeed locally but fail in real CI (or vice versa), because CI_JOB_TOKEN in glci is actually your personal access token (with different scopes).
Cause: glci forwards your host GitLab token as CI_JOB_TOKEN. In production, CI_JOB_TOKEN is a short-lived token scoped to the job.
Fix:
# Disable token forwarding entirely
glci run --no-token
# Or test with reduced secrets
glci run --secrets none
“API returned 401” or “API returned 403” when fetching variables#
What you see: Warnings in daemon logs like could not fetch project variables: API returned 401.
| Cause | Fix |
|---|---|
| Token expired or revoked | Generate a new PAT on GitLab and update GITLAB_TOKEN |
Token lacks api or read_api scope | Create a token with at least read_api scope |
| Wrong project detected | Override with glci run --project group/subgroup/project |
| Self-hosted GitLab, wrong URL | Set [gitlab] url in .glciconfig.toml |
Secrets cache is stale#
What you see: Variable values don’t match what you see on GitLab, even after updating them.
Cause: Remote variables are cached in daemon memory for 6 hours by default.
Fix:
# Force a fresh fetch for this run
glci run --refresh-secrets
Or change the TTL in .glciconfig.toml:
[gitlab]
secrets_ttl = "0" # disable caching entirely
Registry & image issues#
Push fails with “unknown blob” or “manifest invalid”#
What you see: docker push $CI_REGISTRY_IMAGE fails with errors about unknown blobs or invalid manifests.
Cause: The embedded registry lost its blob storage (e.g., after a docker volume rm glci-registry or glci system prune --all).
Fix:
# Rebuild and re-push — the registry volume was wiped
glci registry clean
glci run
“image blobs not found in local registry”#
What you see: glci registry pull fails with image blobs not found in local registry (image may have been proxied from upstream without caching blobs — re-push the image to persist it).
Cause: The image was pulled through the registry as a read-through cache hit. The manifest is stored locally but the blobs were streamed directly from upstream. Only images explicitly pushed to $CI_REGISTRY have their blobs stored.
Fix: Re-push the image from your pipeline (use docker push $CI_REGISTRY_IMAGE/...) so blobs are stored locally.
Insecure registry / TLS certificate errors#
What you see: x509: certificate signed by unknown authority or server gave HTTP response to HTTPS client when pulling from or pushing to the embedded registry.
Cause: The embedded registry uses a self-signed CA. glci automatically configures trust, but some scenarios break it:
| Cause | Fix |
|---|---|
| DinD service container doesn’t trust the CA | glci should inject certs automatically. Check daemon logs for warning: writing CA cert errors. Restart daemon. |
| Colima VM doesn’t have the CA | Restart daemon — it installs CA certs via colima ssh on startup. |
| Buildkit/buildx doesn’t trust the CA | glci injects buildkitd.toml with the registry marked as insecure. If this fails, check glci daemon logs for buildkit config errors. |
| Stale certs after daemon restart | Delete cert dirs and restart: rm -rf ~/.glci/registry-certs* && glci daemon stop --force && glci daemon start |
Cross-platform image issues#
What you see: exec format error when running containers, or build failures with wrong-architecture binaries.
Cause: The image was built for a different CPU architecture (e.g., amd64 image on Apple Silicon).
Fix:
# Ensure QEMU binfmt handlers are registered (glci does this automatically)
docker run --privileged --rm tonistiigi/binfmt --install all
# For Colima with Rosetta (preferred for Apple Silicon)
colima stop
colima delete
colima start --vm-type=vz --vz-rosetta --cpu 12 --memory 16
If glci daemon logs shows warning: could not install QEMU binfmt handlers, QEMU registration failed. Run the docker run --privileged command above manually.
$CI_REGISTRY images fail with wrong platform#
What you see: Jobs that use $CI_REGISTRY/... as their image fail with no matching manifest for linux/arm64/v8 (or another platform) even though the image exists on the real registry.
Pulling docker image 127.0.0.1:32768/group/project/image:v1.0 ...
ERROR: Job failed: failed to pull image "127.0.0.1:32768/group/project/image:v1.0"
with specified policies [if-not-present]: Error response from daemon:
no matching manifest for linux/arm64/v8 in the manifest list entries
Cause: $CI_REGISTRY resolves to the embedded registry (127.0.0.1:<port>), which proxies the image from upstream. When the upstream image is a multi-arch manifest list, the local Docker daemon selects its native platform (e.g., linux/arm64 on Apple Silicon), but the image may only provide linux/amd64 manifests.
This is common with projects that reference $CI_REGISTRY images to avoid external dependencies.
Fix: Two approaches:
Push multi-arch images — if you control the upstream images, build and push them as multi-arch manifests (e.g., with
docker buildx build --platform linux/amd64,linux/arm64). This fixes the problem at the source for all consumers.Use per-job field overrides to force the correct platform without touching
.gitlab-ci.yml:
# .glciconfig.toml
# Nested form
[jobs."renovate_validate".image.docker]
platform = "linux/amd64"
# Or flat dotted key form (equivalent)
[jobs."renovate_validate"]
"image.docker.platform" = "linux/amd64"
# Or override the image entirely to bypass the registry proxy
[jobs."renovate_validate".image]
name = "registry.gitlab.com/group/project/image:v1.0"
Push-through mirror failures#
What you see: docker push succeeds locally but the image doesn’t appear on the upstream registry, or pushes fail with setting push-through config: HTTP 4xx/5xx.
| Cause | Fix |
|---|---|
| No upstream credentials configured | Add [registry.upstream] with username and password in .glciconfig.toml |
Token lacks write_registry scope | Create a deploy token or PAT with write_registry |
| Upstream registry unreachable | Check connectivity: curl -s https://registry.gitlab.com/v2/ |
# .glciconfig.toml
[registry]
push_through = true
[registry.upstream]
username = "deploy-token"
password = "$REGISTRY_WRITE_TOKEN"
Remote Docker issues#
When DOCKER_HOST points to a remote machine (TCP or SSH), the daemon and Docker daemon are on different hosts. This changes how bind mounts, networking, and port forwarding work.
Bind mount failures#
What you see: Job containers start but files are missing, or mounts fail with no such file or directory.
Cause: Bind mounts reference paths on the Docker host, not your local machine. When Docker is remote, /Users/you/project doesn’t exist on the remote host.
Fix: glci works around this by uploading your project as a tarball into the mock server container, so standard job execution works. But if you have custom volume mounts in your CI config, they will reference remote paths.
Localhost port forwarding doesn’t work#
What you see: Jobs try to reach 127.0.0.1:<port> for the mock server or registry, but get connection refused.
Cause: 127.0.0.1 inside a container on the remote Docker host refers to that container’s loopback, not your local machine. The mock server runs as a container on the remote host and is reachable by container name on the pipeline network, not via localhost.
Fix: This should work automatically — glci connects mock and runner containers to the same Docker network. If it doesn’t, check glci daemon logs for network errors.
Docker context detection#
glci resolves the Docker host at daemon startup with this priority:
[docker] hostin.glciconfig.toml(highest)- Docker context (
docker context inspect) DOCKER_HOSTenvironment variable- Default Docker socket
Set the host explicitly in config to override all auto-detection:
[docker]
host = "ssh://my-server"
Check what’s currently detected with glci config show --network.
Variable resolution issues#
Variables not resolving#
What you see: Job scripts contain literal $MY_VARIABLE instead of its value, or variables are empty.
Cause: The variable isn’t defined at any level, or it’s defined at a lower-precedence level and overridden to empty.
Variable precedence (lowest to highest):
| Priority | Source | How to set |
|---|---|---|
| 1 (lowest) | CI-derived | Automatic (git SHA, branch, etc.) |
| 2 | Global YAML | variables: at top of .gitlab-ci.yml |
| 3 | Group variables | Fetched from GitLab API |
| 4 | Project variables | Fetched from GitLab API |
| 5 | --env flags | glci run --env KEY=VALUE |
| 6 | --env-file | glci run --env-file .env.local |
| 7 | .glci.env | Auto-loaded from project root |
| 8 | Dotenv artifacts | artifacts: reports: dotenv: from dependency jobs |
| 9 (highest) | Job YAML | variables: inside a job definition |
Debugging steps:
# Override a specific variable for testing
glci run --env MY_VARIABLE=test_value
# Use --secrets none to test without remote variables
glci run --secrets none --env MY_VARIABLE=test_value
Dotenv variables not appearing in downstream jobs#
What you see: A producer job creates a dotenv report artifact but the consumer job does not have the expected variables.
| Cause | Fix |
|---|---|
Consumer in the same stage with no needs: | Dotenv vars propagate automatically from prior stages. For same-stage producers, add needs: [producer] |
| Dotenv file has invalid format | Keys must match [a-zA-Z_][a-zA-Z0-9_]*. Lines with invalid keys are skipped silently |
| Too many variables | Only the first 20 variables are kept (matching GitLab’s default limit) |
| File too large | Decompressed dotenv file must be under 5 MB |
| Parse error in dotenv file | Check daemon logs for warning: failed to parse dotenv artifact messages |
Note: needs: { job: producer, artifacts: false } blocks file artifact downloads but dotenv variables still propagate — matching GitLab CI behavior.
.glci.env not loading#
What you see: Variables defined in .glci.env are not available in jobs.
| Cause | Fix |
|---|---|
| File is in the wrong directory | .glci.env must be in the project root (same directory as .gitlab-ci.yml) |
| File has syntax errors | Each line must be KEY=VALUE. No spaces around =. No quotes needed. |
| File has a BOM or wrong line endings | Save as UTF-8 without BOM, with LF line endings |
Example .glci.env:
MY_SECRET=s3cr3t
DEPLOY_TOKEN=glpat-xxxx
DB_PASSWORD=hunter2
--secrets none still shows some variables#
What you see: Variables like CI_REGISTRY, CI_PROJECT_PATH, etc. are present even with --secrets none.
Cause: --secrets none only disables fetching remote variables from the GitLab API (project and group variables). CI-derived variables (git info, registry URLs) and YAML-defined variables are always resolved.
Rule evaluation issues#
Jobs unexpectedly skipped#
What you see: no jobs to run: requested [job-name] but none matched after rules evaluation or a job you expected is missing from glci show.
Cause: Rules evaluated to when: never in the simulated context. By default, glci simulates a merge_request context.
Debugging steps:
# See which jobs are included in the default context
glci show
# Compare with a different context
glci show --context branch=main
glci show --context tag=v1.0
# Check a specific job
glci jobs # lists all jobs and their when: status
Common reasons jobs are excluded:
| Rule | Why it doesn’t match | Fix |
|---|---|---|
if: $CI_PIPELINE_SOURCE == "push" | Default context is merge_request | Use --context branch=main |
if: $CI_COMMIT_TAG | No tag in context | Use --context tag=v1.0 |
if: $CI_COMMIT_BRANCH == "main" | Your branch isn’t main | Use --context branch=main |
changes: [path/**] | No diff context available | changes: matches everything when no diff is available, so this usually isn’t the problem. Check other clauses. |
exists: [file.txt] | File doesn’t exist locally | Create the file or adjust the rule |
Jobs unexpectedly included#
What you see: A job that should be skipped (e.g., deploy jobs) runs anyway.
Cause: The default merge_request context may match rules that your real CI wouldn’t. Or rules: changes: matches everything because glci has no diff context.
Fix:
# Simulate the exact context you want
glci run --context branch=feature-x
# Skip specific jobs by name
glci run --skip "deploy*"
Context simulation not matching real CI#
What you see: Jobs appear in glci show that don’t appear in the GitLab pipeline (or vice versa).
Cause: glci evaluates rules locally with the simulated context. Some differences from real CI:
$CI_PIPELINE_SOURCEis set topush(for branch/tag) ormerge_request_event(for merge_request context) — your real pipeline may have a different sourcerules: changes:may behave differently because glci uses local git diff while GitLab compares against the target branch- Protected/unprotected branch distinctions are not enforced locally
Fix: Use --context and --env together to match your real CI environment as closely as possible:
glci show --context merge_request --mr-source feature --mr-target main
Performance tips#
Slow image pulls#
The embedded registry acts as a pull-through cache – the first pull is slow but subsequent pulls are instant.
Slow startup due to secrets fetch#
Remote variable fetching adds latency to pipeline startup. If you don’t need remote secrets:
glci run --secrets none # skip all remote variable fetching
glci run --secrets project # skip group variables (slower due to pagination)
Reducing disk usage#
glci system df # check what glci is using
glci system cache clean # wipe CI cache
glci system prune # clean unused containers, networks, volumes
glci system prune --all # also remove registry data and history
Known limitations#
CI_JOB_TOKENis forwarded from the host’s GitLab token — permissions may differ from production. Use--no-tokento disable.include: project:andinclude: component:require a GitLab API token (other include types work offline). Component version selectors (@~latest,@~N,@~N.M) are supported and resolve to the latest matching semver release tag.- Protected variables are fetched regardless of branch protection status.
- Kubernetes executor is not supported — only Docker.
- Child pipelines —
trigger: include:supports up to 2 levels of nesting (matching GitLab). - Cross-project triggers —
trigger: project:resolves from local directories or clones from GitLab. - Crash recovery requires both mock server and runner containers to survive. Secrets are held only in daemon memory (mlocked, never written to disk) and are lost on daemon restart, so resumed pipelines always re-fetch remote variables from the GitLab API.