Troubleshooting

Start with glci doctor — it runs pre-flight checks on Docker, the daemon, your GitLab token, CI config, and git repository in one command:

glci doctor

Daemon logs#

Check the daemon log first when something goes wrong:

glci daemon logs        # last 50 lines
glci daemon logs -F     # follow in real time
glci daemon logs -n 0   # full log

Or read directly:

cat ~/.glci/daemon.log
tail -f ~/.glci/daemon.log

Daemon won’t start#

If glci run fails with “daemon did not start within 30s”:

Cause	Fix
Port conflict (stale socket)	`rm ~/.glci/daemon.sock`
Stale PID file	`rm ~/.glci/daemon.pid`
Permission error	Ensure `~/.glci/` is writable

Force a clean restart:

glci daemon stop --force
glci daemon start

Daemon crashes or misbehaves#

The daemon automatically recovers from crashes on startup. If issues persist:

glci daemon status
glci daemon stop --force
glci daemon start

# Nuclear option: clean all daemon state
glci daemon stop --force
rm -rf ~/.glci/daemon.pid ~/.glci/daemon.sock ~/.glci/daemon.log
glci daemon start

Docker image missing or stale#

If glci run fails with “glci:local Docker image not found” or “image is stale”:

make docker

The CLI checks that the glci:local image matches the binary’s build commit. After upgrading, always re-run make docker.

Segmentation faults on Apple Silicon (Colima)#

If jobs crash with signal: segmentation fault (core dumped) from Go toolchain binaries, Colima needs Rosetta enabled:

colima stop
colima delete  # needed if changing --vm-type
colima start --vm-type=vz --vz-rosetta --cpu 12 --memory 16

Pipeline hangs or won’t cancel#

glci stop <pipeline-id>

glci stop <id> handles orphaned pipelines automatically — if the daemon lost track of a pipeline (e.g., after a restart), it force-removes leftover containers and the pipeline network, then marks it as canceled in history. Restarting the daemon is only needed if force-stop itself doesn’t resolve the issue:

glci daemon stop
glci daemon start

“Waiting for pipeline preparation to finish…”#

What you see: Running glci run shows Waiting for pipeline preparation to finish... and does not start immediately.

Cause: The daemon serializes pipeline preparation per directory. Another glci run in the same project is already being prepared, so your request is queued until it finishes.

This is normal. The pipeline will start automatically once the earlier preparation completes. Press Ctrl+C to cancel if you don’t want to wait.

Config template errors#

Template parse error#

What you see: runner.config_template: invalid Go template: ... or runners.<name>.config_template: invalid Go template: ... in glci config output.

Cause: The Go template syntax is invalid (unclosed braces, unknown functions, etc.).

Fix: Check your template syntax. Common mistakes:

Error	Fix
`unexpected "}" in command`	Missing opening `{{`
`function "xyz" not defined`	Only standard Go template functions are available
`unexpected EOF`	Unclosed `{{` block

Template render error#

What you see: rendering custom template for runner "<name>": ... in daemon logs.

Cause: The template references a field that doesn’t exist in the template context.

Fix: Use only fields from the template variable reference. Common fields: .URL, .Executor, .DefaultImage, .PullPolicy.

Named runner name invalid#

What you see: runners.<name>: name must match [a-zA-Z0-9][a-zA-Z0-9_-]* in glci config output.

Fix: Runner names must start with a letter or digit and contain only letters, digits, hyphens, and underscores.

Jobs not routing to named runner#

What you see: A job runs on the default runner instead of the named runner.

Cause: None of the job’s CI tags: match any named runner name. Tag matching is exact and case-sensitive.

Fix: Ensure the job has a tags: entry that exactly matches the runner name defined in [runners.<name>]:

# .gitlab-ci.yml
my-job:
  tags: [gpu]    # must match [runners.gpu] in config
  script: echo "runs on GPU runner"

Docker networking issues#

Each build gets its own Docker network via FF_NETWORK_PER_BUILD. Job containers reach the mock server via extra_hosts (host-gateway). The runner and mock containers share the per-pipeline network for Docker DNS resolution. Problems here usually surface as connection timeouts or host resolution failures inside jobs.

Container can’t reach the mock server#

What you see: Jobs fail with errors like connection refused, no such host, or could not resolve host when trying to reach $CI_SERVER_URL or $CI_REGISTRY.

Cause	Fix
Mock server container crashed	Check `glci daemon logs` for `mock server not healthy` errors. Restart the daemon.
Network was removed mid-run	Run `glci daemon stop --force && glci daemon start` to recreate networks.
Firewall blocking container-to-container traffic	On Linux, check `iptables -L -n` for DROP rules on the `docker0` or `br-*` interfaces.

Debugging steps:

# List pipeline networks
docker network ls --filter name=glci-net-

# Inspect a specific pipeline network — look for Containers section
docker network inspect glci-net-<pipeline-id>

# Check if the mock server container is running and attached
docker ps --filter name=glci-mock

DNS resolution failures inside jobs#

What you see: getaddrinfo or DNS resolution failed errors for external hosts (e.g., registry.gitlab.com, github.com).

Cause: The job container’s DNS resolver can’t reach an upstream DNS server. Common with custom Docker networks and restricted host DNS configs.

Fix: Check the host’s /etc/resolv.conf or Docker’s DNS settings. If using Colima, restart with --dns to override:

colima start --dns 8.8.8.8 --dns 1.1.1.1

Or configure extra hosts in .glciconfig.toml to bypass DNS for known hosts:

[network.extra_hosts]
entries = ["internal-registry.corp:10.0.0.50"]

Port conflicts#

What you see: address already in use errors in the daemon log when starting a pipeline.

Cause: The mock server requires port 39741 (default) on the Docker host. This port may be taken by another process or a previous daemon that didn’t shut down cleanly. Registry listeners can also conflict if their bind addresses overlap.

Fix:

# Find what's using the port
lsof -i :39741

# Force-restart the daemon to clean up stale listeners
glci daemon stop --force
glci daemon start

You can change the mock server port or registry bind addresses in ~/.glci/config.toml:

[network]
mock_server_port = 39741           # default; change requires daemon restart
registry_bind = "127.0.0.1:0"      # HTTPS listener
registry_http_bind = "0.0.0.0:0"   # HTTP listener

Note: Changing mock_server_port requires a full daemon restart to take effect.

Token & authentication failures#

GitLab token not found#

What you see: gitlab token not set (set GITLAB_TOKEN env var or use --token) or no GitLab token or project configured, using offline parser.

Cause: No token is available, so glci falls back to the offline parser. This means include: project:, include: component:, and remote CI/CD variable fetching are all disabled.

Fix: Configure a token using any of these methods (first match wins):

# Option 1: environment variable
export GITLAB_TOKEN="glpat-xxxxxxxxxxxxxxxxxxxx"

# Option 2: glab CLI (token is picked up automatically)
glab auth login

# Option 3: config file
cat >> .glciconfig.toml <<'EOF'
[gitlab]
token = "$GITLAB_TOKEN"   # env var references are expanded
EOF

# Option 4: command-line flag
glci run --token "glpat-xxxxxxxxxxxxxxxxxxxx"

For self-managed GitLab instances, also set the URL:

export GITLAB_URL="https://gitlab.example.com"
# or in .glciconfig.toml:
# [gitlab]
# url = "https://gitlab.example.com"

CI_JOB_TOKEN permissions differ from production#

What you see: API calls inside jobs succeed locally but fail in real CI (or vice versa), because CI_JOB_TOKEN in glci is actually your personal access token (with different scopes).

Cause: glci forwards your host GitLab token as CI_JOB_TOKEN. In production, CI_JOB_TOKEN is a short-lived token scoped to the job.

Fix:

# Disable token forwarding entirely
glci run --no-token

# Or test with reduced secrets
glci run --secrets none

“API returned 401” or “API returned 403” when fetching variables#

What you see: Warnings in daemon logs like could not fetch project variables: API returned 401.

Cause	Fix
Token expired or revoked	Generate a new PAT on GitLab and update `GITLAB_TOKEN`
Token lacks `api` or `read_api` scope	Create a token with at least `read_api` scope
Wrong project detected	Override with `glci run --project group/subgroup/project`
Self-hosted GitLab, wrong URL	Set `[gitlab] url` in `.glciconfig.toml`

Instance-level CI/CD variables aren’t applied#

What you see: A variable defined as an instance-level CI/CD variable on a self-managed GitLab instance isn’t set in your jobs, or an include: that references it (e.g. project: $_GITLAB_TEMPLATES_REPO) fails with HTTP 404 because the variable wasn’t expanded.

Cause: glci fetches instance variables from GET /admin/ci/variables, which requires an administrator token. With a non-admin token GitLab returns 403 and glci skips them (could not fetch instance variables in the daemon log). Project and group variables are unaffected. Separately, remote variables are fetched in parallel with config parsing, so even an admin-fetched instance variable is not available for include: path expansion.

Fix: Supply the variable locally so it is available everywhere, including include resolution:

# .glci.env (gitignored), or --env KEY=VALUE, or a pipeline preset
_GITLAB_TEMPLATES_REPO=project/gitlab_templates
_GITLAB_TEMPLATES_REF=1.0.x

Instance variables are fetched under --secrets all (the default for glci run). See Variables & Secrets for the full precedence order and include-expansion behavior.

Secrets cache is stale#

What you see: Variable values don’t match what you see on GitLab, even after updating them.

Cause: Remote variables are cached in daemon memory for 6 hours by default.

Fix:

# Force a fresh fetch for this run
glci run --refresh-secrets

Or change the TTL in .glciconfig.toml:

[gitlab]
secrets_ttl = "0"   # disable caching entirely

Registry & image issues#

Push fails with “unknown blob” or “manifest invalid”#

What you see: docker push $CI_REGISTRY_IMAGE fails with errors about unknown blobs or invalid manifests.

Cause: The embedded registry lost its blob storage (e.g., after a docker volume rm glci-registry or glci system prune --all).

Fix:

# Rebuild and re-push — the registry volume was wiped
glci registry clean
glci run

“image blobs not found in local registry”#

What you see: glci registry pull fails with image blobs not found in local registry (image may have been proxied from upstream without caching blobs — re-push the image to persist it).

Cause: The image was pulled through the registry as a read-through cache hit. The manifest is stored locally but the blobs were streamed directly from upstream. Only images explicitly pushed to $CI_REGISTRY have their blobs stored.

Fix: Re-push the image from your pipeline (use docker push $CI_REGISTRY_IMAGE/...) so blobs are stored locally.

Insecure registry / TLS certificate errors#

What you see: x509: certificate signed by unknown authority or server gave HTTP response to HTTPS client when pulling from or pushing to the embedded registry.

Cause: The embedded registry uses a self-signed CA. glci automatically configures trust, but some scenarios break it:

Cause	Fix
DinD service container doesn’t trust the CA	glci should inject certs automatically. Check daemon logs for `warning: writing CA cert` errors. Restart daemon.
Colima VM doesn’t have the CA	Restart daemon — it installs CA certs via `colima ssh` on startup.
Buildkit/buildx doesn’t trust the CA	glci injects `buildkitd.toml` with the registry marked as insecure. If this fails, check `glci daemon logs` for buildkit config errors.
Stale certs after daemon restart	Delete cert dirs and restart: `rm -rf ~/.glci/registry-certs* && glci daemon stop --force && glci daemon start`

Cross-platform image issues#

What you see: exec format error when running containers, or build failures with wrong-architecture binaries.

Cause: The image was built for a different CPU architecture (e.g., amd64 image on Apple Silicon).

Fix:

# Ensure QEMU binfmt handlers are registered (glci does this automatically)
docker run --privileged --rm tonistiigi/binfmt --install all

# For Colima with Rosetta (preferred for Apple Silicon)
colima stop
colima delete
colima start --vm-type=vz --vz-rosetta --cpu 12 --memory 16

If glci daemon logs shows warning: could not install QEMU binfmt handlers, QEMU registration failed. Run the docker run --privileged command above manually.

`$CI_REGISTRY` images fail with wrong platform#

What you see: Jobs that use $CI_REGISTRY/... as their image fail with no matching manifest for linux/arm64/v8 (or another platform) even though the image exists on the real registry.

Pulling docker image 127.0.0.1:32768/group/project/image:v1.0 ...
ERROR: Job failed: failed to pull image "127.0.0.1:32768/group/project/image:v1.0"
  with specified policies [if-not-present]: Error response from daemon:
  no matching manifest for linux/arm64/v8 in the manifest list entries

Cause: $CI_REGISTRY resolves to the embedded registry (127.0.0.1:<port>), which proxies the image from upstream. When the upstream image is a multi-arch manifest list, the local Docker daemon selects its native platform (e.g., linux/arm64 on Apple Silicon), but the image may only provide linux/amd64 manifests.

This is common with projects that reference $CI_REGISTRY images to avoid external dependencies.

Fix: Two approaches:

Push multi-arch images — if you control the upstream images, build and push them as multi-arch manifests (e.g., with docker buildx build --platform linux/amd64,linux/arm64). This fixes the problem at the source for all consumers.
Use per-job field overrides to force the correct platform without touching .gitlab-ci.yml:

# .glciconfig.toml

# Nested form
[jobs."renovate_validate".image.docker]
platform = "linux/amd64"

# Or flat dotted key form (equivalent)
[jobs."renovate_validate"]
"image.docker.platform" = "linux/amd64"

# Or override the image entirely to bypass the registry proxy
[jobs."renovate_validate".image]
name = "registry.gitlab.com/group/project/image:v1.0"

# Or apply to every job in the project with a glob pattern
[jobs."*".image.docker]
platform = "linux/amd64"

Push-through mirror failures#

What you see: docker push succeeds locally but the image doesn’t appear on the upstream registry, or pushes fail with setting push-through config: HTTP 4xx/5xx.

Cause	Fix
No upstream credentials configured	Add `[registry.upstream]` with `username` and `password` in `.glciconfig.toml`
Token lacks `write_registry` scope	Create a deploy token or PAT with `write_registry`
Upstream registry unreachable	Check connectivity: `curl -s https://registry.gitlab.com/v2/`

# .glciconfig.toml
[registry]
push_through = true

[registry.upstream]
username = "deploy-token"
password = "$REGISTRY_WRITE_TOKEN"

Remote Docker issues#

When DOCKER_HOST points to a remote machine (TCP or SSH), the daemon and Docker daemon are on different hosts. This changes how bind mounts, networking, and port forwarding work.

Relay proxy containers#

When a named runner targets a Docker daemon on a different machine, glci deploys a relay proxy container (glci-proxy-*) on the remote host to bridge mock server communication. The relay uses the glci:local image (pulled automatically).

What you see: Jobs on the remote runner fail with connection refused or response is not application/json when trying to reach the mock server.

Possible causes:

The glci:local image is missing or outdated on the remote host — rebuild with make docker and ensure it’s available on the remote daemon
The relay container crashed — check docker ps -a on the remote host for exited glci-proxy-* containers
Protocol mismatch after upgrading glci — old relay containers may use an incompatible muxproto format. Clean up all glci containers on the remote host: docker rm -f $(docker ps -aq --filter name=glci-)

Cleanup: If relay containers are left behind after a crash, remove them on the remote host:

docker --context <remote> rm -f $(docker --context <remote> ps -aq --filter name=glci-proxy-)

Bind mount failures#

What you see: Job containers start but files are missing, or mounts fail with no such file or directory.

Cause: Bind mounts reference paths on the Docker host, not your local machine. When Docker is remote, /Users/you/project doesn’t exist on the remote host.

Fix: glci works around this by uploading your project as a tarball into the mock server container, so standard job execution works. But if you have custom volume mounts in your CI config, they will reference remote paths.

Localhost port forwarding doesn’t work#

What you see: Jobs try to reach 127.0.0.1:<port> for the mock server or registry, but get connection refused.

Cause: 127.0.0.1 inside a container on the remote Docker host refers to that container’s loopback, not your local machine. The mock server runs as a container on the remote host and is reachable by container name on the pipeline network, not via localhost.

Fix: This should work automatically — glci connects mock and runner containers to the same Docker network. If it doesn’t, check glci daemon logs for network errors.

Docker context detection#

glci resolves the Docker host at daemon startup with this priority:

[docker] host in .glciconfig.toml (highest)
Docker context (docker context inspect)
DOCKER_HOST environment variable
Default Docker socket

Set the host explicitly in config to override all auto-detection:

[docker]
host = "ssh://my-server"

Check what’s currently detected with glci config show --network.

Variable resolution issues#

Variables not resolving#

What you see: Job scripts contain literal $MY_VARIABLE instead of its value, or variables are empty.

Cause: The variable isn’t defined at any level, or it’s defined at a lower-precedence level and overridden to empty.

Variable precedence (lowest to highest):

Priority	Source	How to set
1 (lowest)	CI-derived	Automatic (git SHA, branch, etc.)
2	Global YAML	`variables:` at top of `.gitlab-ci.yml`
3	Instance variables	Fetched from GitLab API (admin token; `--secrets all`)
4	Group variables	Fetched from GitLab API (`--secrets all`)
5	Project variables	Fetched from GitLab API
6	`--env` flags	`glci run --env KEY=VALUE`
7	`--env-file`	`glci run --env-file .env.local`
8	Pipeline preset env	`env` of a `--pipeline`/context preset in `.glciconfig.toml`
9	`.glci.env`	Auto-loaded from project root
10	Dotenv artifacts	`artifacts: reports: dotenv:` from dependency jobs
11 (highest)	Job YAML	`variables:` inside a job definition

Debugging steps:

# Override a specific variable for testing
glci run --env MY_VARIABLE=test_value

# Use --secrets none to test without remote variables
glci run --secrets none --env MY_VARIABLE=test_value

Dotenv variables not appearing in downstream jobs#

What you see: A producer job creates a dotenv report artifact but the consumer job does not have the expected variables.

Cause	Fix
Consumer in the same stage with no `needs:`	Dotenv vars propagate automatically from prior stages. For same-stage producers, add `needs: [producer]`
Dotenv file has invalid format	Keys must match `[a-zA-Z_][a-zA-Z0-9_]*`. Lines with invalid keys are skipped silently
Too many variables	Only the first 20 variables are kept (matching GitLab’s default limit)
File too large	Decompressed dotenv file must be under 5 MB
Parse error in dotenv file	Check daemon logs for `warning: failed to parse dotenv artifact` messages

Note: needs: { job: producer, artifacts: false } blocks file artifact downloads but dotenv variables still propagate — matching GitLab CI behavior.

`.glci.env` not loading#

What you see: Variables defined in .glci.env are not available in jobs.

Cause	Fix
File is in the wrong directory	`.glci.env` must be in the project root (same directory as `.gitlab-ci.yml`)
File has syntax errors	Each line must be `KEY=VALUE`. No spaces around `=`. No quotes needed.
File has a BOM or wrong line endings	Save as UTF-8 without BOM, with LF line endings

Example .glci.env:

MY_SECRET=s3cr3t
DEPLOY_TOKEN=glpat-xxxx
DB_PASSWORD=hunter2

`--secrets none` still shows some variables#

What you see: Variables like CI_REGISTRY, CI_PROJECT_PATH, etc. are present even with --secrets none.

Cause: --secrets none only disables fetching remote variables from the GitLab API (project and group variables). CI-derived variables (git info, registry URLs) and YAML-defined variables are always resolved.

Rule evaluation issues#

Jobs unexpectedly skipped#

What you see: no jobs to run: requested [job-name] but none matched after rules evaluation or a job you expected is missing from glci show.

Cause: Rules evaluated to when: never in the simulated context. By default, glci simulates a merge_request context.

Debugging steps:

# See which jobs are included in the default context
glci show

# Compare with a different context
glci show --context branch=main
glci show --context tag=v1.0

# Check a specific job
glci jobs   # lists all jobs and their when: status

Common reasons jobs are excluded:

Rule	Why it doesn’t match	Fix
`if: $CI_PIPELINE_SOURCE == "push"`	Default context is `merge_request`	Use `--context branch=main`
`if: $CI_COMMIT_TAG`	No tag in context	Use `--context tag=v1.0`
`if: $CI_COMMIT_BRANCH == "main"`	Your branch isn’t `main`	Use `--context branch=main`
`changes: [path/**]`	No diff context available	`changes:` matches everything when no diff is available, so this usually isn’t the problem. Check other clauses.
`exists: [file.txt]`	File is gitignored, or doesn’t exist	`exists:` matches your non-ignored files (tracked + untracked, minus `.gitignore`). Un-ignore the file or create it; gitignored build output never matches

Jobs unexpectedly included#

What you see: A job that should be skipped (e.g., deploy jobs) runs anyway.

Cause: The default merge_request context may match rules that your real CI wouldn’t. Or rules: changes: matches everything because glci has no diff context.

Fix:

# Simulate the exact context you want
glci run --context branch=feature-x

# Skip specific jobs by name
glci run --skip "deploy*"

Context simulation not matching real CI#

What you see: Jobs appear in glci show that don’t appear in the GitLab pipeline (or vice versa).

Cause: glci evaluates rules locally with the simulated context. Some differences from real CI:

$CI_PIPELINE_SOURCE is set to push (for branch/tag) or merge_request_event (for merge_request context) — your real pipeline may have a different source
rules: changes: may behave differently because glci uses local git diff while GitLab compares against the target branch
rules: exists: evaluates your local working tree minus gitignored files (tracked + untracked), while GitLab evaluates the committed tree at the pipeline SHA — so uncommitted/untracked files satisfy exists: locally but won’t on GitLab until committed
Protected/unprotected branch distinctions are not enforced locally

Fix: Use --context and --env together to match your real CI environment as closely as possible:

glci show --context merge_request --mr-source feature --mr-target main

Performance tips#

Slow image pulls#

The embedded registry acts as a pull-through cache – the first pull is slow but subsequent pulls are instant.

Slow startup due to secrets fetch#

Remote variable fetching adds latency to pipeline startup. If you don’t need remote secrets:

glci run --secrets none      # skip all remote variable fetching
glci run --secrets project   # skip group variables (slower due to pagination)

Reducing disk usage#

glci system df               # check what glci is using
glci system cache clean      # wipe CI cache
glci system prune            # clean unused containers, networks, volumes
glci system prune --all      # also remove registry data and history

Known limitations#

CI_JOB_TOKEN is forwarded from the host’s GitLab token — permissions may differ from production. Use --no-token to disable.
include: project: and include: component: require a GitLab API token (other include types work offline). Component version selectors (@~latest, @~N, @~N.M) are supported and resolve to the latest matching semver release tag.
Protected variables are fetched regardless of branch protection status.
Kubernetes executor is not supported — only Docker.
Child pipelines — trigger: include: supports up to 2 levels of nesting (matching GitLab).
Dynamic child pipelines — trigger:include:artifact runs a child pipeline whose YAML a generator job produced. The generator must run before the trigger job (earlier stage or needs:); otherwise the trigger fails with artifact include "…": job "…" produced no artifacts. Each artifact-sourced config file is capped at 5 MiB and an empty file is rejected.
Cross-project triggers — trigger: project: resolves from local directories or clones from GitLab.
Crash recovery requires both mock server and runner containers to survive. Secrets are held only in daemon memory (mlocked, never written to disk) and are lost on daemon restart, so resumed pipelines always re-fetch remote variables from the GitLab API.

Edit on GitLab Report issue

Troubleshooting

Daemon logs#

Daemon won’t start#

Daemon crashes or misbehaves#

Docker image missing or stale#

Segmentation faults on Apple Silicon (Colima)#

Pipeline hangs or won’t cancel#

“Waiting for pipeline preparation to finish…”#

Config template errors#

Template parse error#

Template render error#

Named runner name invalid#

Jobs not routing to named runner#

Docker networking issues#

Container can’t reach the mock server#

DNS resolution failures inside jobs#

Port conflicts#

Token & authentication failures#

GitLab token not found#

CI_JOB_TOKEN permissions differ from production#

“API returned 401” or “API returned 403” when fetching variables#

Instance-level CI/CD variables aren’t applied#

Secrets cache is stale#

Registry & image issues#

Push fails with “unknown blob” or “manifest invalid”#

“image blobs not found in local registry”#

Insecure registry / TLS certificate errors#

Cross-platform image issues#

$CI_REGISTRY images fail with wrong platform#

Push-through mirror failures#

Remote Docker issues#

Relay proxy containers#

Bind mount failures#

Localhost port forwarding doesn’t work#

Docker context detection#

Variable resolution issues#

Variables not resolving#

Dotenv variables not appearing in downstream jobs#

.glci.env not loading#

--secrets none still shows some variables#

Rule evaluation issues#

Jobs unexpectedly skipped#

Jobs unexpectedly included#

Context simulation not matching real CI#

Performance tips#

Slow image pulls#

Slow startup due to secrets fetch#

Reducing disk usage#

Known limitations#

`$CI_REGISTRY` images fail with wrong platform#

`.glci.env` not loading#

`--secrets none` still shows some variables#