How to Troubleshoot Teranode (Kubernetes Operator)
Last modified: 6-March-2025
Index
- Health Checks and System Monitoring
- Service Status
- Detailed Container/Pod Health
- Configuring Health Checks
- Viewing Health Check Logs
- Monitoring System Resources
- Viewing Global Logs
- Viewing Logs for Specific Microservices
- Useful Options for Log Viewing
- Checking Logs for Specific Teranode Microservices
- Redirecting Logs to a File
- Check Services Dashboard**
- Recovery Procedures
Health Checks and System Monitoring
Service Status
kubectl get pods -n teranode-operator
Detailed Container/Pod Health
kubectl describe pod <pod-name> -n teranode-operator
Configuring Health Checks
In your Deployment or StatefulSet specification:
spec:
template:
spec:
containers:
- name: teranode-blockchain
...
readinessProbe:
httpGet:
path: /health
port: 8087
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /health
port: 8087
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
initialDelaySeconds: 40
Viewing Health Check Logs
Health check results are typically logged in the pod events:
kubectl describe pod <pod-name>
Monitoring System Resources
- Use
kubectl top
to view resource usage:
kubectl top pods
kubectl top nodes
For both environments:
- Consider setting up Prometheus and Grafana for more comprehensive monitoring.
- Look for services consuming unusually high resources.
Viewing Global Logs
kubectl logs -n teranode-operator -l app.kubernetes.io/part-of=teranode-operator
kubectl logs -n teranode-operator -f -l app.kubernetes.io/part-of=teranode-operator # Follow logs in real-time
kubectl logs -n teranode-operator --tail=100 -l app.kubernetes.io/part-of=teranode-operator # View only the most recent logs
Viewing Logs for Specific Microservices
kubectl logs -n teranode-operator <pod-name>
Useful Options for Log Viewing
- Show timestamps:
kubectl logs -n teranode-operator <pod-name> --timestamps=true
- Limit output:
kubectl logs -n teranode-operator <pod-name> --tail=50
- Since time:
kubectl logs -n teranode-operator <pod-name> --since-time="2023-07-01T00:00:00Z"
Checking Logs for Specific Teranode Microservices
Replace [service_name]
or <pod-name>
with the appropriate service or pod name:
- Propagation Service (service name:
propagation
) - Blockchain Service (service name:
blockchain
) - Asset Service (service name:
asset
) - Block Validation Service (service name:
block-validator
) - P2P Service (service name:
p2p
) - Block Assembly Service (service name:
block-assembly
) - Subtree Validation Service (service name:
subtree-validator
) - Miner Service (service name:
miner
) - RPC Server (service name:
rpc
) - Block Persister Service (service name:
block-persister
) - UTXO Persister Service (service name:
utxo-persister
)
Redirecting Logs to a File
kubectl logs -n teranode-operator -l app.kubernetes.io/part-of=teranode-operator > teranode_logs.txt
kubectl logs -n teranode-operator <pod-name> > pod_logs.txt
Remember to replace placeholders like [service_name]
, <pod-name>
, and label selectors with the appropriate values for your Teranode setup.
Check Services Dashboard**
Check your Grafana TERANODE Service Overview
dashboard:
-
Check that there's no blocks in the queue (
Queued Blocks in Block Validation
). We expect little or no queueing, and not creeping up. 3 blocks queued up are already a concern. -
Check that the propagation instances are handling around the same load to make sure the load is equally distributed among all the propagation servers. See the
Propagation Processed Transactions per Instance
diagram. -
Check that the cache is at a sustainable pattern rather than "exponentially" growing (see both the
Tx Meta Cache in Block Validation
andTx Meta Cache Size in Block Validation
diagrams). -
Check that go routines (
Goroutines
graph) are not creeping up or reaching excessive levels.
Recovery Procedures
Third Party Component Failure
Teranode is highly dependent on its third party dependencies. Postgres, Kafka and Aerospike are critical for Teranode operations, and the node cannot work without them.
If a third party service fails, you must restore its functionality. Once it is back, please restart Teranode cleanly following the instructions in the How to Start and Stop Teranode in Kubernetes guide.
Should you encounter a bug, please report it following the instructions in the Bug Reporting section.