Troubleshooting Specific Issues
Many specific Dash Enterprise issues that we have encountered in the past are documented in the General Troubleshooting and Problem Solving page, so we suggest checking there first. Issues that require more advanced knowledge to troubleshoot are documented here.
Dash Enterprise contains a series of “preflight checks” that run at the start of the installation process. These are useful for diagnosing common server issues, and can be rerun at any time from the link in the Support tab of the Dash Enterprise Server Manager (Replicated UI).
Examining the output produced during
git pushis the best way to debug issues encountered when pushing Dash Apps (to deploy or update them). To make this output much more verbose, two tracing options can be enabled:
When buildpack tracing is enabled, all commands run by the buildpack used to build the Dash App will be printed as part of the
git pushoutput. This option can be enabled and disabled individually for specific Dash Apps.
To enable buildpack tracing using the Dash App Manager, visit an app’s Settings tab and add an environment variable called
BUILDPACK_XTRACEwith a value of
1, as shown:
To disable buildpack tracing using the Dash App Manager, click the red "trash bin" button next to the
BUILDPACK_XTRACEenvironment variable in the app’s Settings tab.
Administrators who can run Docker commands on Dash Enterprise’s server can enable buildpack tracing via docker exec. This method is not supported and may change in future Dash Enterprise releases.
To enable buildpack tracing via
sudo docker exec dash dokku config:set APPNAME BUILDPACK_XTRACE=1
To disable buildpack tracing via
sudo docker exec dash dokku config:unset APPNAME BUILDPACK_XTRACE
(Replace APPNAME with the application name in the examples above.)
When Dokku tracing is enabled, all commands run internally by Dokku will be printed as part of the
git pushoutput. This option is enabled globally on a Dash Enterprise server and affects all app pushes until it is disabled.
To enable Dokku tracing, run this command on the Dash Enterprise server:
echo export DOKKU_TRACE=1 | sudo tee /plotly/dash/dokku/.dokkurc/DOKKU_TRACE
To disable Dokku tracing, run:
sudo rm -f /plotly/dash/dokku/.dokkurc/DOKKU_TRACE
(If you have configured a Plotly Data Directory other than
/plotlyin your Dash Enterprise Server Manager settings, replace
/plotlyabove as needed.)
If you receive an “Application Error” page that is returned with 502 or 504 status when you visit a Dash App, it’s likely that no app containers are running.
You can troubleshoot this issue by visiting the app’s overview page in the Dash App Manager - the app’s Status will show as "Stopped". To see why, check the app’s Application Logs and Failure Logs for errors. The app’s Failure Logs are likely to show the error(s) encountered as a stack trace, which will help the app developer fix the issue.
If a Dash App encounters an error while serving an HTTP request, a 500 error will be returned to the user’s browser. In these cases usually the Dash App will start to render but some content will be missing either initially or when the user interacts with the app. The failing request will be visible in the browser’s Developer Tools, for example:
To diagnose 500 errors, check the app logs using the Dash App Manager or manually as explained in the Checking Dash App Status and Logs Manually section. A stack trace for the error(s) should be shown, which will help the app developer fix the issue.
The last part of the Request URL shown in the browser’s Developer Tools will be logged alongside the error, so searching the logs for this text can help pinpoint the error.
If networking errors are shown in the status area of the Dash Enterprise Server Manager (Replicated UI on port 8800) during the initial installation of Replicated on a server, the issue is likely caused by the system’s iptables settings.
Example error: “Error while initializing daemon: Failed to initialize one or more components. Most often, this indicates an issue with networking on this server. Firewalls or problems with the container network are some common causes.”
- 1.Check the system's iptables and look for networking restrictions (typically
DROPs configured within the
- 2.If networking restrictions are reflected in the iptables, determine the source of the iptables restriction. Typically these come from firewalld (in which case that service should be reconfigured or disabled), or when Docker is configured with the ICC: false setting, in which case Docker should be reconfigured.
- 3.If no such service / configuration exists or the service / configuration has already been disabled, it's possible that the iptables chain with the
DROPneeds to be flushed manually.
iptables -nLagain to be sure no more undesired rules exist.
If Dash Enterprise performance is slow and your server is running on Amazon Web Services (EC2), the issue may be due to a depleted burst balance. (EBS disks, the usual type of storage used for AWS EC2 virtual machines, allow a limited number of IO operations per second.) You can check the burst balance for a given disk in Cloud Watch, as in the following example:
Here the burst balance was depleted for a short period on December 4th and for a longer period starting on December 6th. After troubleshooting the issue on December 10th, the burst balance was increased and performance returned to normal.
To increase burst balance, either switch to an EBS volume with provisioned IOPS or increase the EBS volume size (normally IOPS are allocated proportionally to the disk size).
This error indicates that a port needed by a service in a Dash Enterprise container is being used by another process on the Dash Enterprise server.
In this case an error is normally shown in the status area of the Server Manager:
Additionally, the replicated-operator logs may show the same error:
ERRO 2019-03-01T15:44:06+00:00 [replicated-operator] docker.go:178 API error (500): driver failed programming external connectivity on endpoint redis (aa02b61d08291ed2d2cc6a7c5d3082bc16eb657222e5ad3f8748c9956db4b9ce): Error starting userland proxy: listen tcp 172.17.0.1:6379: bind: address already in use
In the case shown above, a port needed by the Redis container (port 6379) was already in use. Running the
sudo netstat -lnptcommand on the Dash Enterprise server revealed that another (non Dash Enterprise) installation of Redis was already listening on port 6379. Stopping and disabling the service and restarting Dash Enterprise fixed the issue.
If port 80 or 443 is shown in the message, it’s likely that a web server is running on the Dash Enterprise server. Again the
netstatcommand can be used to figure out what process is using the port, and therefore what service started the process. Once this has been determined, stop and disable the service and restart Dash Enterprise.
Note that running other services (web servers, proxies, databases, etc.) on the same server as Dash Enterprise is not supported and may result in issues like this one.
Instead of the expected Dash App, Portal, or Dash App Manager, a test page may be shown when visiting Dash Enterprise. Usually this means that a different web server is running on your Dash Enterprise server. Running other services (web servers, proxies, databases, etc.) on the same server as Dash Enterprise is not supported, so any extra services should be stopped and disabled.
Example test pages include:
This issue is usually accompanied by other errors such as the “Error starting userland proxy” error shown above, in which case the steps above can be followed to find the extra service that needs to be stopped and disabled.
During an upgrade, the message “Last update error: no nodes available” may be seen in the Release History area of the Server Manager, or on the update page.
With this issue, the
replicated-operatorlogs show an error of the form:
INFO 2019-02-05T22:23:02+00:00 [replicated-operator] heartbeat.go:46 Operator heartbeat failed: context deadline exceeded
WARN 2019-02-05T22:23:02+00:00 [replicated-operator] heartbeat.go:50 Operator heartbeat monitor timeout after 2m0s, disconnecting
This is caused by a Replicated bug that they are working on addressing (the operator is supposed to reconnect automatically but fails to do so).
As a workaround, restart the
sudo service replicated restart
sudo service replicated-operator restart
After this you should be able to upgrade as usual.
Possible symptoms of SSL/TLS certificate issues include:
- The Dash Enterprise Server Manager reports an error of the form “Unable to validate TLS/SSL Key” or “Unable to validate TLS/SSL Certificate” on startup.
- The HAProxy container logs show errors of the form:
[ALERT] 289/224708 (4599) : parsing [/etc/haproxy/haproxy-ssl.cfg:2] : 'bind *:443' : inconsistencies between private key and certificate loaded from PEM file '/etc/plotly_ssl/dash_ssl_cert_and_key.pem'.
To help diagnose SSL certificate issues, use the
opensslcommand from your Dash Enterprise server’s command line to examine your SSL certificates and keys. Certificates and keys can be found in
/plotly/ssl(assuming you have configured
/plotlyas your Plotly Data Directory in Dash Enterprise’s settings). Certificates and keys normally exist in pairs; the certificate should have a
.crtextension and the key should have a
.keyextension. One pair may exist on your server:
dashis used for the Dash Enterprise server.
To view information on a certificate, run:
sudo openssl x509 -noout -text -in /plotly/ssl/dash.crt
Check that the issuer CN matches your server’s hostname, and that the certificate has not expired.
To verify that a certificate matches a key, run the following commands and check that the output matches:
sudo openssl pkey -in /plotly/ssl/dash.key -pubout -outform pem | sha256sum
sudo openssl x509 -in /plotly/ssl/dash.crt -pubkey -noout -outform pem | sha256sum
(Note that these commands assume your Plotly Data Directory is
/plotlyand check the dash pair. Replace the values as appropriate to use other directories or to check other pairs.)
If you attempt to remove a container using
docker rm -fwhile troubleshooting an issue with that container (e.g.
docker rm -f replicated-ui), you may receive an error of the form:
Error response from daemon: Unable to remove filesystem for 0bfafa146431771f6024dcb9775ef47f170edb2f1852f71916ba44209ca6120a: remove /app/docker/containers/0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a/shm: device or resource busy
It can probably do more, but we haven’t checked it thoroughly yet.
Memory issues can be difficult to reproduce in a test environment. We have improved the logging generated by the uwsgi application server used by the
dashauthcomponent (which provides the authentication server) in order to help pinpoint the source of memory leaks.
Generally Plotly’s support and engineering teams will need to be involved to diagnose and fix any memory leaks in
dashauth, since they represent either a bug in our code or in a 3rd party library that we use. These instructions are intended to help customers gather data to help us with this process, especially customers who can’t send us logs directly.