Management & Operations
This guide covers the day-to-day management and operational aspects of the PipeOps Kubernetes Agent, including upgrades, monitoring, backup, and lifecycle management.
Agent Lifecycle Management
Checking Agent Status
Kubernetes Deployment:
# Check pod status
kubectl get pods -n pipeops-system -l app=pipeops-agent
# Check deployment health
kubectl get deployment pipeops-agent -n pipeops-system
# View detailed status
kubectl describe deployment pipeops-agent -n pipeops-system
Systemd Service:
# Check service status
sudo systemctl status pipeops-agent
# View recent logs
sudo journalctl -u pipeops-agent -n 100
Docker Container:
# Check container status
docker ps | grep pipeops-agent
# View container details
docker inspect pipeops-agent
Viewing Logs
Kubernetes:
# View current logs
kubectl logs deployment/pipeops-agent -n pipeops-system
# Follow logs in real-time
kubectl logs -f deployment/pipeops-agent -n pipeops-system
# View logs from previous pod instance
kubectl logs deployment/pipeops-agent -n pipeops-system --previous
# View logs with timestamps
kubectl logs deployment/pipeops-agent -n pipeops-system --timestamps
# View last 100 lines
kubectl logs deployment/pipeops-agent -n pipeops-system --tail=100
Systemd:
# View all logs
sudo journalctl -u pipeops-agent
# Follow logs
sudo journalctl -u pipeops-agent -f
# View logs since boot
sudo journalctl -u pipeops-agent -b
# View logs from last hour
sudo journalctl -u pipeops-agent --since "1 hour ago"
Docker:
# View logs
docker logs pipeops-agent
# Follow logs
docker logs -f pipeops-agent
# View last 100 lines
docker logs pipeops-agent --tail 100
Restarting the Agent
Kubernetes:
# Restart by deleting pod (deployment recreates it)
kubectl rollout restart deployment/pipeops-agent -n pipeops-system
# Or delete the pod directly
kubectl delete pod -n pipeops-system -l app=pipeops-agent
Systemd:
# Restart service
sudo systemctl restart pipeops-agent
# Reload configuration
sudo systemctl reload pipeops-agent
Docker:
# Restart container
docker restart pipeops-agent
# Stop and start
docker stop pipeops-agent
docker start pipeops-agent
Upgrading the Agent
Version Check
Check your current agent version:
# Kubernetes
kubectl get deployment pipeops-agent -n pipeops-system -o jsonpath='{.spec.template.spec.containers[0].image}'
# Binary
pipeops-agent version
# Docker
docker inspect pipeops-agent | grep Image
Check available versions:
# GitHub releases
curl -s https://api.github.com/repos/PipeOpsHQ/pipeops-k8-agent/releases/latest | grep tag_name
# Helm chart versions
helm search repo pipeops/pipeops-agent --versions
Helm Upgrade
Upgrade to Latest Version:
# Upgrade to latest version
helm upgrade pipeops-agent oci://ghcr.io/pipeopshq/pipeops-agent \
--namespace pipeops-system \
--reuse-values
Upgrade to Specific Version:
helm upgrade pipeops-agent oci://ghcr.io/pipeopshq/pipeops-agent:1.2.3 \
--namespace pipeops-system \
--reuse-values
Upgrade with New Configuration:
# Create updated values file
cat > updated-values.yaml <<EOF
agent:
image:
tag: "v1.2.3"
resources:
limits:
cpu: "1000m"
memory: "1Gi"
EOF
# Apply upgrade
helm upgrade pipeops-agent oci://ghcr.io/pipeopshq/pipeops-agent \
-f updated-values.yaml \
--namespace pipeops-system \
--reuse-values
Dry Run (Test Before Upgrade):
helm upgrade pipeops-agent oci://ghcr.io/pipeopshq/pipeops-agent \
--namespace pipeops-system \
--reuse-values \
--dry-run --debug
Kubernetes Manifest Upgrade
# Generate new manifests from helm chart
helm template pipeops-agent oci://ghcr.io/pipeopshq/pipeops-agent:1.2.3 \
--namespace pipeops-system > pipeops-agent-v1.2.3.yaml
# Review changes
diff pipeops-agent-current.yaml pipeops-agent-v1.2.3.yaml
# Apply upgrade
kubectl apply -f pipeops-agent-v1.2.3.yaml
Binary Upgrade
Linux:
# Download new version
curl -LO https://github.com/PipeOpsHQ/pipeops-k8-agent/releases/download/v1.2.3/pipeops-agent-linux-amd64
# Verify checksum (optional but recommended)
curl -LO https://github.com/PipeOpsHQ/pipeops-k8-agent/releases/download/v1.2.3/checksums.txt
sha256sum -c checksums.txt --ignore-missing
# Stop service
sudo systemctl stop pipeops-agent
# Backup current binary
sudo cp /usr/local/bin/pipeops-agent /usr/local/bin/pipeops-agent.backup
# Replace binary
chmod +x pipeops-agent-linux-amd64
sudo mv pipeops-agent-linux-amd64 /usr/local/bin/pipeops-agent
# Start service
sudo systemctl start pipeops-agent
# Verify upgrade
pipeops-agent version
Docker:
# Pull new image
docker pull ghcr.io/pipeopshq/pipeops-k8-agent:v1.2.3
# Stop and remove old container
docker stop pipeops-agent
docker rm pipeops-agent
# Start with new image
docker run -d \
--name pipeops-agent \
--restart always \
-e PIPEOPS_TOKEN="your-api-token" \
-e CLUSTER_NAME="your-cluster" \
-v $HOME/.kube/config:/config/.kube/config:ro \
ghcr.io/pipeopshq/pipeops-k8-agent:v1.2.3
Rollback
If an upgrade causes issues, you can rollback:
Helm Rollback:
# View upgrade history
helm history pipeops-agent -n pipeops-system
# Rollback to previous version
helm rollback pipeops-agent -n pipeops-system
# Rollback to specific revision
helm rollback pipeops-agent 3 -n pipeops-system
Binary Rollback:
# Stop service
sudo systemctl stop pipeops-agent
# Restore backup
sudo cp /usr/local/bin/pipeops-agent.backup /usr/local/bin/pipeops-agent
# Start service
sudo systemctl start pipeops-agent
Configuration Management
Updating Configuration
Helm (Recommended):
# Update single value
helm upgrade pipeops-agent oci://ghcr.io/pipeopshq/pipeops-agent \
--set agent.cluster.name="new-name" \
--namespace pipeops-system \
--reuse-values
# Update multiple values from file
helm upgrade pipeops-agent oci://ghcr.io/pipeopshq/pipeops-agent \
-f updated-values.yaml \
--namespace pipeops-system \
--reuse-values
Kubernetes ConfigMap:
# Edit ConfigMap directly
kubectl edit configmap pipeops-agent-config -n pipeops-system
# Or update from file
kubectl create configmap pipeops-agent-config \
--from-file=config.yaml \
--namespace pipeops-system \
--dry-run=client -o yaml | kubectl apply -f -
# Restart agent to apply changes
kubectl rollout restart deployment/pipeops-agent -n pipeops-system
Binary Configuration:
# Edit configuration file
sudo nano /etc/pipeops/config.yaml
# Reload service
sudo systemctl reload pipeops-agent
Updating Secrets
Update API Token:
# Create new secret
kubectl create secret generic pipeops-agent-config \
--from-literal=token=new-api-token \
--namespace pipeops-system \
--dry-run=client -o yaml | kubectl apply -f -
# Restart agent
kubectl rollout restart deployment/pipeops-agent -n pipeops-system
Using External Secret Management:
# Example: Using External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: pipeops-agent-secrets
namespace: pipeops-system
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: pipeops-agent-config
data:
- secretKey: token
remoteRef:
key: pipeops/agent-token
Resource Management
Monitoring Resource Usage
# Check pod resource usage
kubectl top pod -n pipeops-system
# Check node resource usage
kubectl top node
# Detailed resource metrics
kubectl describe pod -n pipeops-system -l app=pipeops-agent | grep -A 5 "Limits\|Requests"
Adjusting Resource Limits
Increase Resources:
helm upgrade pipeops-agent pipeops/pipeops-agent \
--set agent.resources.limits.cpu="1000m" \
--set agent.resources.limits.memory="1Gi" \
--set agent.resources.requests.cpu="500m" \
--set agent.resources.requests.memory="512Mi" \
--namespace pipeops-system \
--reuse-values
Monitoring Resource Recommendations:
# Install metrics-server if not already installed
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Get resource recommendations (if VPA is installed)
kubectl describe vpa pipeops-agent -n pipeops-system
Backup and Recovery
Backing Up Agent Configuration
Kubernetes:
# Backup all agent resources
kubectl get all,configmap,secret -n pipeops-system -o yaml > pipeops-agent-backup.yaml
# Backup Helm values
helm get values pipeops-agent -n pipeops-system > pipeops-agent-values-backup.yaml
Binary:
# Backup configuration
sudo cp -r /etc/pipeops /etc/pipeops.backup.$(date +%Y%m%d)
# Backup binary
sudo cp /usr/local/bin/pipeops-agent /usr/local/bin/pipeops-agent.backup.$(date +%Y%m%d)
Backing Up Monitoring Data
Prometheus:
# Backup Prometheus data directory
kubectl exec -n pipeops-monitoring deployment/prometheus-server -- tar czf /tmp/prometheus-backup.tar.gz /data
kubectl cp pipeops-monitoring/prometheus-server-xxx:/tmp/prometheus-backup.tar.gz ./prometheus-backup.tar.gz
Grafana:
# Export all dashboards
kubectl exec -n pipeops-monitoring deployment/grafana -- \
grafana-cli --homepath /usr/share/grafana admin export-dashboards > grafana-dashboards-backup.json
# Backup Grafana database
kubectl exec -n pipeops-monitoring deployment/grafana -- \
sqlite3 /var/lib/grafana/grafana.db .dump > grafana-db-backup.sql
Disaster Recovery
Complete Reinstallation:
# 1. Backup current configuration
helm get values pipeops-agent -n pipeops-system > backup-values.yaml
# 2. Uninstall agent
helm uninstall pipeops-agent -n pipeops-system
# 3. Reinstall with backed-up configuration
helm install pipeops-agent pipeops/pipeops-agent \
-f backup-values.yaml \
--namespace pipeops-system \
--create-namespace
Restore from Backup:
# Restore Kubernetes resources
kubectl apply -f pipeops-agent-backup.yaml
# Restore monitoring data
kubectl cp ./prometheus-backup.tar.gz pipeops-monitoring/prometheus-server-xxx:/tmp/
kubectl exec -n pipeops-monitoring deployment/prometheus-server -- \
tar xzf /tmp/prometheus-backup.tar.gz -C /
Health Checks
Agent Health Endpoints
The agent exposes health check endpoints:
# Liveness check
kubectl exec -n pipeops-system deployment/pipeops-agent -- curl http://localhost:8081/healthz
# Readiness check
kubectl exec -n pipeops-system deployment/pipeops-agent -- curl http://localhost:8081/readyz
# Metrics endpoint
kubectl exec -n pipeops-system deployment/pipeops-agent -- curl http://localhost:9091/metrics
Automated Health Monitoring
Configure Prometheus Alerts:
groups:
- name: pipeops-agent-health
rules:
- alert: PipeOpsAgentDown
expr: up{job="pipeops-agent"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "PipeOps Agent is down"
description: "The PipeOps Agent has been down for more than 5 minutes"
- alert: PipeOpsAgentHighMemory
expr: container_memory_working_set_bytes{pod=~"pipeops-agent.*"} / container_spec_memory_limit_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "PipeOps Agent high memory usage"
Maintenance Tasks
Log Rotation
Kubernetes (automatic): Kubernetes automatically rotates container logs. Configure if needed:
# kubelet config
containerLogMaxSize: "10Mi"
containerLogMaxFiles: 5
Systemd:
# Configure journald log rotation
sudo nano /etc/systemd/journald.conf
# Add/modify:
# SystemMaxUse=1G
# SystemKeepFree=2G
# MaxRetentionSec=7day
# Restart journald
sudo systemctl restart systemd-journald
Cleanup Old Resources
# Remove failed pods
kubectl delete pods --field-selector status.phase=Failed -n pipeops-system
# Remove completed jobs
kubectl delete jobs --field-selector status.successful=1 -n pipeops-system
# Cleanup evicted pods
kubectl get pods -n pipeops-system | grep Evicted | awk '{print $1}' | xargs kubectl delete pod -n pipeops-system
Certificate Rotation
If using custom certificates:
# Update certificate secret
kubectl create secret tls pipeops-agent-tls \
--cert=new-cert.crt \
--key=new-key.key \
--namespace pipeops-system \
--dry-run=client -o yaml | kubectl apply -f -
# Restart agent
kubectl rollout restart deployment/pipeops-agent -n pipeops-system
Security Best Practices
Regular Updates
- Enable automatic security updates for your OS
- Subscribe to PipeOps security advisories
- Regularly update to latest agent version
- Keep Kubernetes cluster updated
Token Rotation
# Generate new token in PipeOps dashboard
# Update secret with new token
kubectl create secret generic pipeops-agent-config \
--from-literal=token=new-token \
--namespace pipeops-system \
--dry-run=client -o yaml | kubectl apply -f -
# Restart agent
kubectl rollout restart deployment/pipeops-agent -n pipeops-system
Audit Logs
Enable Kubernetes audit logging to track agent activities:
# kube-apiserver audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
namespaces: ["pipeops-system"]
verbs: ["create", "update", "patch", "delete"]
Monitoring Agent Metrics
Key Metrics to Monitor
Agent Health:
pipeops_agent_up- Agent running statuspipeops_agent_connected- Connection to PipeOps APIpipeops_tunnel_active- Tunnel status
Resource Usage:
pipeops_agent_cpu_usage- CPU utilizationpipeops_agent_memory_usage- Memory utilizationpipeops_agent_goroutines- Number of Go routines
API Metrics:
pipeops_api_requests_total- Total API requestspipeops_api_request_duration_seconds- Request latencypipeops_api_errors_total- API errors
Prometheus Queries
# Agent uptime
time() - pipeops_agent_start_time_seconds
# API request rate
rate(pipeops_api_requests_total[5m])
# Error rate
rate(pipeops_api_errors_total[5m]) / rate(pipeops_api_requests_total[5m])
# Memory usage percentage
pipeops_agent_memory_usage / pipeops_agent_memory_limit * 100
Troubleshooting Common Operations
Agent Won't Start
# Check logs for errors
kubectl logs deployment/pipeops-agent -n pipeops-system
# Common issues:
# 1. Invalid token - check secret
# 2. Network connectivity - check network policies
# 3. Resource constraints - check node capacity
# 4. Configuration errors - validate config
High Resource Usage
# Check current usage
kubectl top pod -n pipeops-system
# Identify causes:
# - Too frequent metric scraping
# - Large log volume
# - Memory leaks (check for increasing trends)
# Solutions:
# - Increase resource limits
# - Adjust scrape intervals
# - Update to latest version (may include fixes)
Connection Issues
# Test connectivity to PipeOps API
kubectl exec -n pipeops-system deployment/pipeops-agent -- \
curl -v https://api.pipeops.sh/health
# Check DNS resolution
kubectl exec -n pipeops-system deployment/pipeops-agent -- \
nslookup api.pipeops.sh
# Verify network policies
kubectl get networkpolicies -n pipeops-system
Next Steps
- Troubleshooting — Detailed troubleshooting guide
- API Reference — Agent API documentation
- Configuration Reference — Complete configuration options