Skip to content
GitHub

Troubleshooting

This guide helps diagnose and resolve common issues when deploying and integrating Rafiki with your digital wallet.

Before troubleshooting issues, ensure you’ve completed all required customizations:

File/LocationVariables to Update
terraform.tfvarsproject_id, domain_name, region
cluster-issuer.yamlemail field
argocd/ingress.yamlhost fields
ingress-nginx/values.yamlloadBalancerIP
rafiki/values.yamlAll YOUR_DOMAIN.com references
wallet/values.yamlYOUR_DOMAIN.com, YOUR_REGISTRY
monitoring/values.yamlDomain references, passwords, SMTP
backup/postgres-backup.yamlGCS bucket, project ID
Environment secretsDatabase passwords, API tokens
DNS recordsPoint domains to static IP
Security ItemDescription
TLS CertificatesLet’s Encrypt configured for all domains
Database PasswordsStrong, randomly generated passwords
API Secrets32-byte secrets for auth and webhooks
Network PoliciesEnabled to restrict pod-to-pod communication
RBACProper service accounts and permissions
Image SecurityUsing official images with known vulnerabilities patched
Backup EncryptionKMS encryption for backup data

Solutions:

  1. Check GCP permissions:

    Terminal window
    # Verify current user has required permissions
    gcloud auth list
    gcloud projects get-iam-policy PROJECT_ID
    # Add required roles
    gcloud projects add-iam-policy-binding PROJECT_ID \
    --member="user:your-email@domain.com" \
    --role="roles/container.admin"
  2. Enable required APIs:

    Terminal window
    gcloud services enable container.googleapis.com
    gcloud services enable compute.googleapis.com
    gcloud services enable dns.googleapis.com
  3. Check quota limits:

    Terminal window
    gcloud compute project-info describe --project=PROJECT_ID

Symptoms:

  • Terraform hangs on cluster creation
  • Cluster shows “PROVISIONING” status for extended time

Solutions:

  1. Check region availability:

    Terminal window
    # List available zones in region
    gcloud compute zones list --filter="region:us-central1"
    # Try different region
    terraform apply -var="region=us-east1"
  2. Reduce initial node count:

    Terminal window
    terraform apply -var="min_node_count=1"
  3. Check for resource conflicts:

    Terminal window
    # List existing clusters
    gcloud container clusters list
    # Clean up if needed
    gcloud container clusters delete OLD_CLUSTER_NAME --region=REGION

Symptoms:

  • Certificate shows READY: False status
  • TLS certificate secret not created
  • HTTPS connections fail with certificate errors
  • Let’s Encrypt certificate challenges failing
Terminal window
kubectl get certificates -A
NAME READY SECRET AGE
rafiki-auth-tls False rafiki-auth-tls 10m

Solutions:

  1. Verify DNS records are correctly configured:

    Terminal window
    # Check if domain resolves to your cluster IP
    nslookup auth.YOUR_DOMAIN.com
    dig auth.YOUR_DOMAIN.com
    # Verify static IP is assigned
    kubectl get ingress -A
  2. Check cert-manager is running properly:

    Terminal window
    # Verify cert-manager pods are healthy
    kubectl get pods -n cert-manager
    # Check cluster-issuer status
    kubectl get clusterissuer
    kubectl describe clusterissuer letsencrypt-prod

3 . Check firewall rules allow HTTP/HTTPS traffic:

Terminal window
# Ensure HTTP challenge can reach your domain
gcloud compute firewall-rules list
curl -I http://auth.YOUR_DOMAIN.com/.well-known/acme-challenge/test

Debugging:

Terminal window
# Check certificate status
kubectl describe certificate rafiki-auth-tls -n rafiki
# Check certificate request
kubectl get certificaterequests -A
kubectl describe certificaterequest <name> -n rafiki
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
kubectl logs -n cert-manager deployment/cert-manager-webhook
kubectl logs -n cert-manager deployment/cert-manager-cainjector
# Check ingress-nginx logs for HTTP challenges
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller