Troubleshooting
This guide helps diagnose and resolve common issues when deploying and integrating Rafiki with your digital wallet.
Before troubleshooting issues, ensure you’ve completed all required customizations:
File/Location | Variables to Update |
---|---|
terraform.tfvars | project_id , domain_name , region |
cluster-issuer.yaml | email field |
argocd/ingress.yaml | host fields |
ingress-nginx/values.yaml | loadBalancerIP |
rafiki/values.yaml | All YOUR_DOMAIN.com references |
wallet/values.yaml | YOUR_DOMAIN.com , YOUR_REGISTRY |
monitoring/values.yaml | Domain references, passwords, SMTP |
backup/postgres-backup.yaml | GCS bucket, project ID |
Environment secrets | Database passwords, API tokens |
DNS records | Point domains to static IP |
Security Item | Description |
---|---|
TLS Certificates | Let’s Encrypt configured for all domains |
Database Passwords | Strong, randomly generated passwords |
API Secrets | 32-byte secrets for auth and webhooks |
Network Policies | Enabled to restrict pod-to-pod communication |
RBAC | Proper service accounts and permissions |
Image Security | Using official images with known vulnerabilities patched |
Backup Encryption | KMS encryption for backup data |
Solutions:
-
Check GCP permissions:
Terminal window # Verify current user has required permissionsgcloud auth listgcloud projects get-iam-policy PROJECT_ID# Add required rolesgcloud projects add-iam-policy-binding PROJECT_ID \--member="user:your-email@domain.com" \--role="roles/container.admin" -
Enable required APIs:
Terminal window gcloud services enable container.googleapis.comgcloud services enable compute.googleapis.comgcloud services enable dns.googleapis.com -
Check quota limits:
Terminal window gcloud compute project-info describe --project=PROJECT_ID
Symptoms:
- Terraform hangs on cluster creation
- Cluster shows “PROVISIONING” status for extended time
Solutions:
-
Check region availability:
Terminal window # List available zones in regiongcloud compute zones list --filter="region:us-central1"# Try different regionterraform apply -var="region=us-east1" -
Reduce initial node count:
Terminal window terraform apply -var="min_node_count=1" -
Check for resource conflicts:
Terminal window # List existing clustersgcloud container clusters list# Clean up if neededgcloud container clusters delete OLD_CLUSTER_NAME --region=REGION
Symptoms:
- Certificate shows
READY: False
status - TLS certificate secret not created
- HTTPS connections fail with certificate errors
- Let’s Encrypt certificate challenges failing
kubectl get certificates -ANAME READY SECRET AGErafiki-auth-tls False rafiki-auth-tls 10m
Solutions:
-
Verify DNS records are correctly configured:
Terminal window # Check if domain resolves to your cluster IPnslookup auth.YOUR_DOMAIN.comdig auth.YOUR_DOMAIN.com# Verify static IP is assignedkubectl get ingress -A -
Check cert-manager is running properly:
Terminal window # Verify cert-manager pods are healthykubectl get pods -n cert-manager# Check cluster-issuer statuskubectl get clusterissuerkubectl describe clusterissuer letsencrypt-prod
3 . Check firewall rules allow HTTP/HTTPS traffic:
# Ensure HTTP challenge can reach your domaingcloud compute firewall-rules listcurl -I http://auth.YOUR_DOMAIN.com/.well-known/acme-challenge/test
Debugging:
# Check certificate statuskubectl describe certificate rafiki-auth-tls -n rafiki
# Check certificate requestkubectl get certificaterequests -Akubectl describe certificaterequest <name> -n rafiki
# Check cert-manager logskubectl logs -n cert-manager deployment/cert-managerkubectl logs -n cert-manager deployment/cert-manager-webhookkubectl logs -n cert-manager deployment/cert-manager-cainjector
# Check ingress-nginx logs for HTTP challengeskubectl logs -n ingress-nginx deployment/ingress-nginx-controller