Architecture Overview

Neurox is architected in such a way that the Control plane components and the Workload management components can be deployed separately to different Kubernetes clusters due to infrastructure or networking constraints.

Control plane cluster

Where GPU workloads are configured/managed
Communicates with all of its joined Workload clusters
Requires ingress and persistent disk
Hosts the endpoints for web access to the Neurox Control Portal

Workload management cluster

Where GPU workloads run
Communicates with its control cluster
Does not require ingress
Does not require persistent disk
Directly interacts with the Kubernetes API

Components

These are the components of each type of cluster.

Control plane cluster

Portal: Hosts the web app
API: Endpoint for web app requests
Redis: Persists Workload and configuration data
Thanos: Receives and aggregates metrics from all Workload clusters
Relay Server: Manages the secure connections from Workload clusters
Authentication Server: Manages user sessions
Identity Provider Connector: Manages user authentication

Workload management cluster

Workload Manager: Directly manages GPU workloads running in Kubernetes
Agent: Collects GPU metrics
Prometheus: Forwards metrics to the Control cluster
Relay Client: Establishes a secure connection to the Control cluster

PreviousSSO NextArchitecture Diagram

Last updated 4 months ago