Troubleshooting Specific Issues

Many specific Dash Enterprise issues that we have encountered in the past are documented in the General Troubleshooting and Problem Solving page, so we suggest checking there first. Issues that require more advanced knowledge to troubleshoot are documented here.

Dash Enterprise contains a series of “preflight checks” that run at the start of the installation process. These are useful for diagnosing common server issues, and can be rerun at any time from the link in the Support tab of the Dash Enterprise Server Manager (Replicated UI).

Debugging Dash App Pushes

Examining the output produced during git push is the best way to debug issues encountered when pushing Dash Apps (to deploy or update them). To make this output much more verbose, two tracing options can be enabled:

Buildpack Tracing

When buildpack tracing is enabled, all commands run by the buildpack used to build the Dash App will be printed as part of the git push output. This option can be enabled and disabled individually for specific Dash Apps.

To enable buildpack tracing using the Dash App Manager, visit an app’s Settings tab and add an environment variable called BUILDPACK_XTRACE with a value of 1, as shown:

To disable buildpack tracing using the Dash App Manager, click the red "trash bin" button next to the BUILDPACK_XTRACE environment variable in the app’s Settings tab.

Administrators who can run Docker commands on Dash Enterprise’s server can enable buildpack tracing via docker exec. This method is not supported and may change in future Dash Enterprise releases.

To enable buildpack tracing via docker exec:

sudo docker exec dash dokku config:set APPNAME BUILDPACK_XTRACE=1

To disable buildpack tracing via docker exec:

sudo docker exec dash dokku config:unset APPNAME BUILDPACK_XTRACE

(Replace APPNAME with the application name in the examples above.)

Dokku Tracing

When Dokku tracing is enabled, all commands run internally by Dokku will be printed as part of the git push output. This option is enabled globally on a Dash Enterprise server and affects all app pushes until it is disabled.

To enable Dokku tracing, run this command on the Dash Enterprise server:

echo export DOKKU_TRACE=1 | sudo tee /plotly/dash/dokku/.dokkurc/DOKKU_TRACE

To disable Dokku tracing, run:

sudo rm -f /plotly/dash/dokku/.dokkurc/DOKKU_TRACE

(If you have configured a Plotly Data Directory other than /plotly in your Dash Enterprise Server Manager settings, replace /plotly above as needed.)

Troubleshooting “Application Error” from a Dash App

If you receive an “Application Error” page that is returned with 502 or 504 status when you visit a Dash App, it’s likely that no app containers are running.

You can troubleshoot this issue by visiting the app’s overview page in the Dash App Manager - the app’s Status will show as "Stopped". To see why, check the app’s Application Logs and Failure Logs for errors. The app’s Failure Logs are likely to show the error(s) encountered as a stack trace, which will help the app developer fix the issue.

You can also check the logs and status manually using ssh or Docker commands - see the Checking Dash App Status and Logs Manually section.

Troubleshooting 500 Errors from Dash Apps

If a Dash App encounters an error while serving an HTTP request, a 500 error will be returned to the user’s browser. In these cases usually the Dash App will start to render but some content will be missing either initially or when the user interacts with the app. The failing request will be visible in the browser’s Developer Tools, for example:

To diagnose 500 errors, check the app logs using the Dash App Manager or manually as explained in the Checking Dash App Status and Logs Manually section. A stack trace for the error(s) should be shown, which will help the app developer fix the issue.

The last part of the Request URL shown in the browser’s Developer Tools will be logged alongside the error, so searching the logs for this text can help pinpoint the error.

Networking Errors during Initial Installation of Replicated

If networking errors are shown in the status area of the Dash Enterprise Server Manager (Replicated UI on port 8800) during the initial installation of Replicated on a server, the issue is likely caused by the system’s iptables settings.

Example error: “Error while initializing daemon: Failed to initialize one or more components. Most often, this indicates an issue with networking on this server. Firewalls or problems with the container network are some common causes.”

For more information on iptables, see this tutorial, as well as some information from Replicated (intended for vendors such as Plotly but potentially useful to advanced users).

Troubleshooting steps:

  1. Check the system's iptables and look for networking restrictions (typically DROPs configured within the FORWARD or INPUT chains) using iptables -nL

  2. If networking restrictions are reflected in the iptables, determine the source of the iptables restriction. Typically these come from firewalld (in which case that service should be reconfigured or disabled), or when Docker is configured with the ICC: false setting, in which case Docker should be reconfigured.

  3. If no such service / configuration exists or the service / configuration has already been disabled, it's possible that the iptables chain with the DROP needs to be flushed manually.

  4. Check iptables -nL again to be sure no more undesired rules exist.

Slowness on AWS due to Burst Balance Depletion

If Dash Enterprise performance is slow and your server is running on Amazon Web Services (EC2), the issue may be due to a depleted burst balance. (EBS disks, the usual type of storage used for AWS EC2 virtual machines, allow a limited number of IO operations per second.) You can check the burst balance for a given disk in Cloud Watch, as in the following example:

Here the burst balance was depleted for a short period on December 4th and for a longer period starting on December 6th. After troubleshooting the issue on December 10th, the burst balance was increased and performance returned to normal.

To increase burst balance, either switch to an EBS volume with provisioned IOPS or increase the EBS volume size (normally IOPS are allocated proportionally to the disk size).

For more information on this subject, see the Amazon EBS Volumes documentation.

“Error starting userland proxy: bind: address already in use” in Server Manager status and replicated-operator logs

This error indicates that a port needed by a service in a Dash Enterprise container is being used by another process on the Dash Enterprise server.

In this case an error is normally shown in the status area of the Server Manager:

Additionally, the replicated-operator logs may show the same error:

ERRO 2019-03-01T15:44:06+00:00 [replicated-operator] docker.go:178 API error (500): driver failed programming external connectivity on endpoint redis (aa02b61d08291ed2d2cc6a7c5d3082bc16eb657222e5ad3f8748c9956db4b9ce): Error starting userland proxy: listen tcp 172.17.0.1:6379: bind: address already in use

In the case shown above, a port needed by the Redis container (port 6379) was already in use. Running the sudo netstat -lnpt command on the Dash Enterprise server revealed that another (non Dash Enterprise) installation of Redis was already listening on port 6379. Stopping and disabling the service and restarting Dash Enterprise fixed the issue.

If port 80 or 443 is shown in the message, it’s likely that a web server is running on the Dash Enterprise server. Again the netstat command can be used to figure out what process is using the port, and therefore what service started the process. Once this has been determined, stop and disable the service and restart Dash Enterprise.

Note that running other services (web servers, proxies, databases, etc.) on the same server as Dash Enterprise is not supported and may result in issues like this one.

Test Page Shown When Visiting Dash Enterprise

Instead of the expected Dash App, Portal, or Dash App Manager, a test page may be shown when visiting Dash Enterprise. Usually this means that a different web server is running on your Dash Enterprise server. Running other services (web servers, proxies, databases, etc.) on the same server as Dash Enterprise is not supported, so any extra services should be stopped and disabled.

Example test pages include:

This issue is usually accompanied by other errors such as the “Error starting userland proxy” error shown above, in which case the steps above can be followed to find the extra service that needs to be stopped and disabled.

“Last update error: no nodes available” in Server Manager

During an upgrade, the message “Last update error: no nodes available” may be seen in the Release History area of the Server Manager, or on the update page.

With this issue, the replicated-operator logs show an error of the form:

INFO 2019-02-05T22:23:02+00:00 [replicated-operator] heartbeat.go:46 Operator heartbeat failed: context deadline exceeded WARN 2019-02-05T22:23:02+00:00 [replicated-operator] heartbeat.go:50 Operator heartbeat monitor timeout after 2m0s, disconnecting

This is caused by a Replicated bug that they are working on addressing (the operator is supposed to reconnect automatically but fails to do so).

As a workaround, restart the replicated and replicated-operator services:

sudo service replicated restart
sudo service replicated-operator restart

After this you should be able to upgrade as usual.

SSL/TLS Certificate Issues

Possible symptoms of SSL/TLS certificate issues include:

  • The Dash Enterprise Server Manager reports an error of the form “Unable to validate TLS/SSL Key” or “Unable to validate TLS/SSL Certificate” on startup.

  • The HAProxy container logs show errors of the form:

    [ALERT] 289/224708 (4599) : parsing [/etc/haproxy/haproxy-ssl.cfg:2] : 'bind *:443' : inconsistencies between private key and certificate loaded from PEM file '/etc/plotly_ssl/dash_ssl_cert_and_key.pem'.

To help diagnose SSL certificate issues, use the openssl command from your Dash Enterprise server’s command line to examine your SSL certificates and keys. Certificates and keys can be found in /plotly/ssl (assuming you have configured /plotly as your Plotly Data Directory in Dash Enterprise’s settings). Certificates and keys normally exist in pairs; the certificate should have a .crt extension and the key should have a .key extension. One pair may exist on your server:

  • dash is used for the Dash Enterprise server.

To view information on a certificate, run:

sudo openssl x509 -noout -text -in /plotly/ssl/dash.crt

Check that the issuer CN matches your server’s hostname, and that the certificate has not expired.

To verify that a certificate matches a key, run the following commands and check that the output matches:

sudo openssl pkey -in /plotly/ssl/dash.key -pubout -outform pem | sha256sum
sudo openssl x509 -in /plotly/ssl/dash.crt -pubkey -noout -outform pem | sha256sum

(Note that these commands assume your Plotly Data Directory is /plotly and check the dash pair. Replace the values as appropriate to use other directories or to check other pairs.)

“Unable to remove filesystem” Error When Removing Containers

If you attempt to remove a container using docker rm -f while troubleshooting an issue with that container (e.g. docker rm -f replicated-ui), you may receive an error of the form:

Error response from daemon: Unable to remove filesystem for 0bfafa146431771f6024dcb9775ef47f170edb2f1852f71916ba44209ca6120a: remove /app/docker/containers/0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a/shm: device or resource busy

This issue may be caused by a system process accessing the container’s filesystem. This condition can be diagnosed and resolved by following these steps​

SAML2 Debugging: Helpful chrome plugin

Use this google chrome plugin by either installing it on your browser or recommending it to the customer to aide with the SAML2 debugging

It can probably do more, but we haven’t checked it thoroughly yet.

Troubleshooting Memory Issues

Memory issues can be difficult to reproduce in a test environment. We have improved the logging generated by the uwsgi application server used by the dashauth component (which provides the authentication server) in order to help pinpoint the source of memory leaks.

Generally Plotly’s support and engineering teams will need to be involved to diagnose and fix any memory leaks in dashauth, since they represent either a bug in our code or in a 3rd party library that we use. These instructions are intended to help customers gather data to help us with this process, especially customers who can’t send us logs directly.