Kubernetes Integration Guide
1. Overview
The Vega Platform provides seamless integration for collecting detailed data and metrics from Kubernetes clusters through the Vega Kubernetes Metrics Agent. This versatile agent collects, processes, and transmits a wide range of metrics to the Vega Platform, giving users deep insights into their Kubernetes environments.
Key Features
- Node Metrics: Comprehensive data for each node, including capacity, allocatable resources, and utilization
- Pod Metrics: Resource metrics for all pods, including requests, limits, and actual usage
- Cluster Metrics: Aggregated cluster-wide statistics
- Storage Metrics: Detailed metrics for persistent volumes (PVs) and claims (PVCs)
- Namespace Metrics: Resource quotas, limit ranges, and consumption
- Workload Metrics: Monitoring for deployments, stateful sets, and daemon sets
- Network Metrics: Services and ingress metrics
- Autoscaling Metrics: HPA performance and scaling metrics
- Replication Metrics: Data for controllers and replica sets
2. Prerequisites
System Requirements
- Kubernetes cluster version 1.30 or higher
- Helm v3.0 or higher
- Outbound access to:
- Container image repository (public.ecr.aws/c0f8b9o4/vegacloud)
- api.vegacloud.io (port 443) — for pre-signed URL retrieval.
- vegametricsocean.s3.us-west-2.amazonaws.com (port 443) — for uploading data.
Access Requirements
- Kubernetes Administrator privileges
- RBAC permissions for:
- Creating deployments
- Creating cluster roles
- Reading metrics
- API access credentials (clientId and clientSecret)
Pre-Installation Checks
- Verify cluster access:
kubectl cluster-info
kubectl auth can-i create deployment
kubectl auth can-i create clusterrole
- Verify Helm installation:
helm version
- Check network connectivity:
curl -v https://google.com
3. Installation
Quick Start
- Add the Vega Helm repository:
helm repo add vegacloud https://vegacloud.github.io/charts/
helm repo update
- Verify repository addition:
helm search repo vegacloud
- Install the agent:
helm install vega-metrics vegacloud/vega-metrics-agent \
--set vega.clientId="your-client-id" \
--set vega.clientSecret="your-client-secret" \
--set vega.orgSlug="your-org-slug" \
--set vega.clusterName="your-cluster-name"
Please Note: Insecure mode is enabled by default to allow installation of the agent across a wide range of Kubernetes clusters out of the box. However, we strongly recommend that you have deployed and configured internal TLS/certificates within your Kubernetes cluster and enable secure mode for the agent by setting env.VEGA_INSECURE=false.
4. Configuration
Basic Configuration
Essential Parameters
Parameter | Description | Default | Required |
---|---|---|---|
vega.clientId | Client ID from API registration | "" | Yes |
vega.clientSecret | Client Secret from API registration | "" | Yes |
vega.orgSlug | Organization slug | "" | Yes |
vega.clusterName | Kubernetes cluster name | "" | Yes |
env.VEGA_INSECURE | Enable/disable insecure mode | true | No |
5. Verification & Monitoring
Installation Verification
- Check pod status:
kubectl get pods -n vegacloud -l app=metrics-agent
- View agent logs:
kubectl logs -f deployment/metrics-agent -n vegacloud
- Verify metrics collection:
kubectl top nodes
kubectl top pods -A
- Check agent connectivity:
kubectl exec -it deployment/metrics-agent -n vegacloud -- curl -v https://api.vegacloud.io/health
Health Monitoring
- Monitor agent health:
kubectl describe pod -l app=metrics-agent -n vegacloud
- Check resource usage:
kubectl top pod -l app=metrics-agent -n vegacloud
- View recent events:
kubectl get events -n vegacloud --field-selector involvedObject.name=vega-metrics
6. Maintenance
Upgrades
Option 1: Upgrade via Helm Chart
# Update repository
helm repo update
# Upgrade with existing values
helm upgrade vega-metrics vegacloud/vega-metrics-agent \
--namespace vegacloud \
--reuse-values
Option 2: Update Container Image
- Check available container versions:
docker pull public.ecr.aws/c0f8b9o4/vegacloud/vega-metrics-agent
docker images public.ecr.aws/c0f8b9o4/vegacloud/vega-metrics-agent
- Update the deployment with new image:
kubectl set image deployment/metrics-agent \
vega-metrics-agent=public.ecr.aws/c0f8b9o4/vegacloud/vega-metrics-agent:latest \
-n vegacloud
- Monitor the rolling update:
kubectl rollout status deployment/metrics-agent -n vegacloud
Note: Replace
:latest
with a specific version tag for better version control.
Version-Specific Helm Upgrade
helm upgrade vega-metrics vegacloud/vega-metrics-agent \
--version 1.2.3 \
--namespace vegacloud \
--reuse-values
Backup and Recovery
- Export current configuration:
helm get values vega-metrics -n vegacloud > vega-metrics-backup.yaml
- Export secrets:
kubectl get secret -n vegacloud vega-metrics-agent-secret -o yaml > vega-metrics-agent-secret-backup.yaml
Uninstallation
helm uninstall vega-metrics -n vegacloud
kubectl delete namespace vegacloud # Optional: removes the namespace
7. Troubleshooting
Common Issues
-
Pod in CrashLoopBackOff
- Check logs:
kubectl logs -f deployment/metrics-agent -n vegacloud
- Verify credentials:
kubectl get secret vega-metrics-agent-secret -n vegacloud
- Check resource limits:
kubectl describe pod -l app=metrics-agent -n vegacloud
- Check logs:
-
Connection Issues
- Verify network policies allow outbound traffic
- Check if proxy configuration is needed
- Ensure correct orgSlug is configured
- Verify DNS resolution:
kubectl run test-dns --image=busybox:1.28 --rm -it --restart=Never -- nslookup api.vegacloud.io
-
Authentication Failures
- Verify clientId and clientSecret are correct
- Check secret creation:
kubectl describe secret vega-metrics-agent-secret -n vegacloud
- Validate API access:
curl -v -H "Authorization: Bearer $TOKEN" https://api.vegacloud.io/health
Debugging Steps
- Enable debug logging:
helm upgrade vega-metrics vegacloud/vega-metrics-agent \
--set env.LOG_LEVEL=DEBUG \
--namespace vegacloud
- Check RBAC permissions:
kubectl auth can-i --list --as system:serviceaccount:vegacloud:vega-metrics-agent
- Verify network connectivity:
kubectl run test-net --image=busybox:1.28 --rm -it --restart=Never -- wget -q -O- https://api.vegacloud.io/health
8. Reference
Resource Recommendations
Based on the number of nodes in your cluster, here are the suggested CPU and memory requirements for the agent:
Nodes | CPU Request | CPU Limit | Memory Request | Memory Limit |
---|---|---|---|---|
< 100 | 500m | 1000m | 2Gi | 4Gi |
100-200 | 1000m | 1500m | 4Gi | 8Gi |
200-500 | 1500m | 2000m | 8Gi | 16Gi |
500-1000 | 2000m | 3000m | 16Gi | 24Gi |
1000+ | 3000m | - | 24Gi | - |
To apply these recommendations:
helm install vega-metrics vegacloud/vega-metrics-agent \
--set vega.clientId="your-client-id" \
--set vega.clientSecret="your-client-secret" \
--set vega.orgSlug="your-org-slug" \
--set vega.clusterName="your-cluster-name" \
--set resources.requests.cpu="500m" \
--set resources.requests.memory="2Gi" \
--set resources.limits.cpu="1000m" \
--set resources.limits.memory="4Gi"
Common Commands
Operational Commands
# Check agent status
kubectl get pods -n vegacloud -l app=metrics-agent
# View agent configuration
helm get values vega-metrics -n vegacloud
# Force agent restart
kubectl rollout restart deployment vega-metrics -n vegacloud
# Scale agent
kubectl scale deployment vega-metrics -n vegacloud --replicas=2
Debugging Commands
# Check agent permissions
kubectl auth can-i --list --as system:serviceaccount:vegacloud:vega-metrics-agent
# View agent events
kubectl get events -n vegacloud --sort-by='.lastTimestamp'
# Check resource usage
kubectl top pod -l app=metrics-agent -n vegacloud
# View detailed pod information
kubectl describe pod -l app=metrics-agent -n vegacloud