Google Cloud Platform resource prerequisites

  1. Create a GCP VPC that is configured to allow traffic between your cluster nodes as follows:

    1. Inbound on the following ports:

      • 80

      • 443

      • 8800 (Server Manager access)

      • 8008 (internal health checks)

      • 5432 (Postgres)

      • 6379 (Redis)

    2. Dash Enterprise will automatically configure additional firewall rules for the cluster when restarted in High Availability mode

  2. Create an IAM service account (guide) with the following Roles:

    1. Kubernetes Engine Admin

    2. Kubernetes Engine Developer

    3. Storage Admin

  3. Create a Google Compute Engine instance (guide) with Dash Enterprise installed on it according to our instructions in Dash Enterprise Single Server: On-Premise Installation on Own Server. This instance will act as the Replicated Management Node.

    1. Ensure that you set these values at the time of creation:

      1. The VPC you intend to use from Step 1

      2. The IAM service account from Step 2

  4. Create a single Cloud SQL (guide) instance using Postgres 11, with a Private IP associated with your VPC

    1. When creating your instance, under Configuration Options > Backups, recovery, and high availability, set Availability to High Availability

    2. Connect to the Postgres instance and create two databases:

      1. dashauth

      2. dash_deployment_server

    3. Grant all privileges to the postgres user for both databases

  5. Create a Cloud Memorystore Standard Redis instance (guide)

  6. Decide whether you want to use the NVIDIA RAPIDS GPU in your node pool

    1. If yes, follow the instructions in Configuring a node pool to use the NVIDIA RAPIDS GPU before proceeding

    2. If no, proceed to Step 7

  7. Create a Google Kubernetes Engine cluster (guide)

    1. Master version: choose the latest supported version from supported Kubernetes versions.

    2. Create one node pool with at least 4 nodes using the n1-standard-4 machine type

    3. In Node Pool > More options, select the IAM account you created in Step 2 for the Service account

    4. Network: select your VPC from Step 1

  8. Create a Google Container Registry repository (guide) for your Dash Enterprise app images

Configuring a node pool to use the NVIDIA RAPIDS GPU

On GKE, GPUs cannot be added to an existing Kubernetes node pool, though new node pools that have GPUs may be associated with a Kubernetes cluster. To assign GPU resources to a cluster, you must request a GPU quota (see documentation). To create a GPU node pool:

gcloud container node-pools create POOL-NAME \
  --accelerator type=GPU-TYPE,count=NUMBER \
  --zone COMPUTE-ZONE \
  --cluster CLUSTER-NAME \
  --num-nodes 3  \
  --min-nodes 0 \
  --max-nodes 5 \

Replace the following and adjust the node size as appropriate:

  1. POOL-NAME: the name you choose for the node pool.

  2. GPU-TYPE: the GPU type

    1. As of this writing, GPU types are nvidia-tesla-k80, nvidia-tesla-p100, nvidia-tesla-p4, nvidia-tesla-v100, or nvidia-tesla-t4

  3. NUMBER: the number of GPUs to attach to nodes in the node pool.

  4. COMPUTE-ZONE: the compute zone in which to create the node pool, such as us-central1-c

    1. The cluster must already be running in this

  5. CLUSTER-NAME: the name of the cluster in which to create the node pool

For more examples of creating GPU node pools, consult the GKE documentation here.

After the GPU node pool has been created, install the NVIDIA GPU device drivers (guide). Dash Enterprise Kubernetes 4.1 supports CUDA 10.1, which requires a minimum driver version of 418.39. The driver installation process can take up to 10 minutes.

If there is an existing non-GPU node pool, installing the drivers will automatically taint GPU node pools to disable scheduling of non-GPU workloads. This taint will not apply retroactively to existing GPU nodes if the cluster is created with GPU nodes only and non-GPU node pools are added afterwards.

Once this configuration is complete, skip back to Step 7 of the GCP Kubernetes setup and continue setup from there.

Last updated