Create a GCP VPC that is configured to allow traffic between your cluster nodes as follows:
Inbound on the following ports:
8800 (Server Manager access)
8008 (internal health checks)
Dash Enterprise will automatically configure additional firewall rules for the cluster when restarted in High Availability mode
Create an IAM service account (guide) with the following Roles:
Kubernetes Engine Admin
Kubernetes Engine Developer
Create a Google Compute Engine instance (guide) with Dash Enterprise installed on it according to our instructions in Dash Enterprise Single Server: On-Premise Installation on Own Server
Ensure that you set these values at the time of creation:
The VPC you intend to use from Step 1
The IAM service account from Step 2
Create a single Cloud SQL (guide) instance using Postgres 11, with a Private IP associated with your VPC
When creating your instance, under Configuration Options > Backups, recovery, and high availability, set Availability to High Availability
Connect to the Postgres instance and create two databases:
Grant all privileges to the postgres user for both databases
Create a Cloud Memorystore Standard Redis instance (guide)
Decide whether you want to use the NVIDIA RAPIDS GPU in your node pool
If yes, follow the instructions in Configuring a node pool to use the NVIDIA RAPIDS GPU before proceeding
If no, proceed to Step 7
Create a Google Kubernetes Engine cluster (guide)
Master version: choose the latest supported version from supported Kubernetes versions.
Create one node pool with at least 4 nodes using the n1-standard-4 machine type
In Node Pool > More options, select the IAM account you created in Step 2 for the Service account
Network: select your VPC from Step 1
Create a Google Container Registry repository (guide) for your Dash Enterprise app images
On GKE, GPUs cannot be added to an existing Kubernetes node pool, though new node pools that have GPUs may be associated with a Kubernetes cluster. To assign GPU resources to a cluster, you must request a GPU quota (see documentation). To create a GPU node pool:
gcloud container node-pools create POOL-NAME \--accelerator type=GPU-TYPE,count=NUMBER \--zone COMPUTE-ZONE \--cluster CLUSTER-NAME \--num-nodes 3 \--min-nodes 0 \--max-nodes 5 \--enable-autoscaling
Replace the following and adjust the node size as appropriate:
POOL-NAME: the name you choose for the node pool.
GPU-TYPE: the GPU type
As of this writing, GPU types are
NUMBER: the number of GPUs to attach to nodes in the node pool.
COMPUTE-ZONE: the compute zone in which to create the node pool, such as us-central1-c
The cluster must already be running in this
CLUSTER-NAME: the name of the cluster in which to create the node pool
For more examples of creating GPU node pools, consult the GKE documentation here.
After the GPU node pool has been created, install the NVIDIA GPU device drivers (guide). Dash Enterprise Kubernetes 4.1 supports CUDA 10.1, which requires a minimum driver version of 418.39. The driver installation process can take up to 10 minutes.
If there is an existing non-GPU node pool, installing the drivers will automatically taint GPU node pools to disable scheduling of non-GPU workloads. This taint will not apply retroactively to existing GPU nodes if the cluster is created with GPU nodes only and non-GPU node pools are added afterwards.
Once this configuration is complete, skip back to Step 7 of the GCP Kubernetes setup and continue setup from there.