Architecture Overview
Neurox is architected in such a way that the Control plane components and the Workload management components can be deployed separately to different Kubernetes clusters due to infrastructure or networking constraints.
Control plane cluster
Where GPU workloads are configured/managed
Communicates with all of its joined Workload clusters
Requires ingress and persistent disk
Hosts the endpoints for web access to the Neurox Control Portal
Workload management cluster
Where GPU workloads run
Communicates with its control cluster
Does not require ingress
Does not require persistent disk
Directly interacts with the Kubernetes API
Components
These are the components of each type of cluster.
Control plane cluster
Portal: Hosts the web app
API: Endpoint for web app requests
Redis: Persists Workload and configuration data
Thanos: Receives and aggregates metrics from all Workload clusters
Relay Server: Manages the secure connections from Workload clusters
Authentication Server: Manages user sessions
Identity Provider Connector: Manages user authentication
Workload management cluster
Workload Manager: Directly manages GPU workloads running in Kubernetes
Agent: Collects GPU metrics
Prometheus: Forwards metrics to the Control cluster
Relay Client: Establishes a secure connection to the Control cluster
Last updated