For Apache DolphinScheduler cluster deployment, the author has summarized a document that can be followed directly from start to finish, facilitating subsequent operations such as deployment, upgrade, adding nodes, and reducing nodes.
Note: DolphinScheduler does not depend on Hadoop, Hive, Spark, etc., but if your tasks require them, corresponding environment support is needed.
Upload the binary package and extract it to a directory. Specify the directory location as per your preference.
Pay attention to directory names; it's advisable to add some characters to differentiate between the installation directory and the directory where the binary package is extracted. For example:
tar -xvf apache-dolphinscheduler-3.1.7-bin.tar.gz
mv apache-dolphinscheduler-3.1.7-bin dolphinscheduler-3.1.7-origin
The '-origin' suffix indicates the original extracted binary package. When there are configuration changes later, you can modify the files in this directory and then re-execute the installation script.
Create a deployment user and ensure to configure sudo passwordless access. For example:
# Create user (requires root login)
useradd dolphinscheduler
# Set password
echo "dolphinscheduler" | passwd --stdin dolphinscheduler
# Configure sudo passwordless access
sed -i '$a dolphinscheduler ALL=(ALL) NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
# Modify directory permissions to grant deployment user access to the extracted apache-dolphinscheduler-*-bin directory
chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin
Note:
SSH passwordless login is required for resource transfer between different machines. Follow these steps to configure it:
su dolphinscheduler
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# Execute the following command; otherwise, passwordless login will fail
chmod 600 ~/.ssh/authorized_keys
Note: After configuration, you can test by running ssh localhost
to check if login without password is successful.
Simply start ZooKeeper in the cluster.
All the following operations should be executed under the dolphinscheduler user.
After preparing the basic environment, modify the configuration files based on your machine environment. Configuration files can be found in the bin/env directory, namely install_env.sh
and dolphinscheduler_env.sh
.
The install_env.sh
file configures where DolphinScheduler will be installed on which machines, and which services will be installed on each machine. You can find this file in the bin/env/
directory, then follow the instructions below to modify the corresponding configurations.
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
# Configure the machines where DolphinScheduler will be installed.
ips=${ips:-"ds01,ds02,ds03,hadoop02,hadoop03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"}
# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}
# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
# Configure the machines where the Master server will be installed.
masters=${masters:-"ds01,ds02,ds03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"}
# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
# To configure which machines the Worker role will be installed on, you need to specify a comma-separated list of machine hostnames or IP addresses along with their corresponding worker groups in the `workers` variable. By default, all workers are placed in the `default` worker group. Additional worker groups can be configured individually through the DolphinScheduler interface.
workers=${workers:-"ds01:default,ds02:default,ds03:default,hadoop02:default,hadoop03:default,hadoop04:default,hadoop05:default,hadoop06:default,hadoop07:default,hadoop08:default"}
# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
# To configure which machine the Alert role will be installed on, specify a single machine
alertServer=${alertServer:-"hadoop03"}
# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
# To configure which machine the Alert role will be installed on, specify a single machine
apiServers=${apiServers:-"hadoop04"}
# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
# Installation path configuration: It will be installed on all machines in the Dolphin cluster. Make sure to differentiate it from the directory where the binary package is extracted. It's preferable to include the version number for easier upgrade operations later.
installPath=${installPath:-"/opt/dolphinscheduler-3.1.5"}
# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
# Deployment user: Use the user created above for deployment.
deployUser=${deployUser:-"dolphinscheduler"}
# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
# Configure the name registered to the ZooKeeper znode. If multiple DolphinScheduler clusters are configured, different names need to be configured.
zkRoot=${zkRoot:-"/dolphinscheduler"}
You can find this file at the path bin/env/
. It is used to configure some environment settings. Modify the corresponding configurations according to the following instructions:
# JDK path, must be modified
export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_202}
# Database type, supports mysql, postgresql
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
# Connection URL, mainly modify the hostname below, and the last configuration is for the East Eight Zone
export SPRING_DATASOURCE_URL="jdbc:mysql://hostname:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai"
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
# If the password is complex, it needs to be enclosed in single quotes before and after
export SPRING_DATASOURCE_PASSWORD='xxxxxxxxxxxxx'
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
# Configure the time zone used when JVM starts for each role. Default is -UTC, if you want to fully support the East Eight Zone, set it to -GMT+8
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
# Configure the zookeeper address used
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2181,hadoop02:2181,hadoop03:2181}
# Configure some environment variables used according to your needs, install all required components by yourself
export HADOOP_HOME=${HADOOP_HOME:-/opt/cloudera/parcels/CDH/lib/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/spark-3.3.2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/python-3.9.16}
export HIVE_HOME=${HIVE_HOME:-/opt/cloudera/parcels/CDH/lib/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/flink-1.15.3}
export DATAX_HOME=${DATAX_HOME:-/opt/datax}
export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/seatunnel-2.1.3}
export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PY
Download the hdfs-site.xml
and core-site.xml
files from your Hadoop cluster and place them in the api-server/conf/
and worker-server/conf/
directories. If you have set up an Apache native cluster, retrieve these files from the respective component's conf
directory. For CDH, you can directly download them from the CDH interface.
Modify these files located in the api-server/conf/
and worker-server/conf/
directories. These files mainly configure parameters related to resource uploads, such as uploading DolphinScheduler's resources to HDFS. Follow the instructions below to make the necessary modifications:
# Local path, mainly used to store temporary files during task execution. Ensure that the user has read and write permissions for this directory. Generally, keep the default. If you encounter permission errors during task execution indicating insufficient permissions for files in this directory, simply change the directory permissions to 777.
data.basedir.path=/tmp/dolphinscheduler
# Resource view suffixes
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
# Location to save resources, possible values: HDFS, S3, OSS, NONE
resource.storage.type=HDFS
# Base path for resource uploads, must start with /dolphinscheduler, ensure that the user has read and write permissions for this directory
resource.storage.upload.base.path=/dolphinscheduler
# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id=minioadmin
# The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key=minioadmin
# The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region=cn-north-1
# The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
resource.aws.s3.bucket.name=dolphinscheduler
# You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
resource.aws.s3.endpoint=http://localhost:9000
# alibaba cloud access key id, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.id=<your-access-key-id>
# alibaba cloud access key secret, required if you set resource.storage.type=OSS
resource.alibaba.cloud.access.key.secret=<your-access-key-secret>
# alibaba cloud region, required if you set resource.storage.type=OSS
resource.alibaba.cloud.region=cn-hangzhou
# oss bucket name, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.bucket.name=dolphinscheduler
# oss bucket endpoint, required if you set resource.storage.type=OSS
resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
#
resource.hdfs.fs.defaultFS=hdfs://bigdata:8020
# whether to startup kerberos
hadoop.security.authentication.startup.state=false
# java.security.krb5.conf path
java.security.krb5.conf.path=/opt/krb5.conf
# login user from keytab username
login.user.keytab.username=hdfs-mycluster@ESZ.COM
# login user from keytab path
login.user.keytab.path=/opt/hdfs.headless.keytab
# kerberos expire time, the unit is hour
kerberos.expire.time=2
# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=8088
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=hadoop02,hadoop03
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
# job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
yarn.job.history.status.address=http://hadoop02:19888/ws/v1/history/mapreduce/jobs/%s
# datasource encryption enable
datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*
# data quality option
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
#data-quality.error.output.path=/tmp/data-quality-error-data
# Network IP gets priority, default inner outer
# Whether hive SQL is executed in the same session
support.hive.oneSession=false
# use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
sudo.enable=true
setTaskDirToTenant.enable=false
# network interface preferred like eth0, default: empty
#dolphin.scheduler.network.interface.preferred=
# network IP gets priority, default: inner outer
#dolphin.scheduler.network.priority.strategy=default
# system env path
#dolphinscheduler.env.path=dolphinscheduler_env.sh
# development state
development.state=false
# rpc port
alert.rpc.port=50052
# set path of conda.sh
conda.path=/opt/anaconda3/etc/profile.d/conda.sh
# Task resource limit state
task.resource.limit.state=false
# mlflow task plugin preset repository
ml.mlflow.preset_repository=https://github.com/apache/dolphinscheduler-mlflow
# mlflow task plugin preset repository version
ml.mlflow.preset_repository_version="main"
You need to modify the /conf/application.yaml
file for all roles, including: master-server/conf/application.yaml
, worker-server/conf/application.yaml
, api-server/conf/application.yaml
, and alert-server/conf/application.yaml
. The main modification is to set the time zone. Here's the specific modification:
spring:
banner:
charset: UTF-8
jackson:
# Set the time zone to GMT+8, modify only this section
time-zone: GMT+8
date-format: "yyyy-MM-dd HH:mm:ss"
You'll find these two files, service.57a50399.js
and service.57a50399.js.gz
, in the api-server/ui/assets/
and ui/assets/
directories, respectively.
Navigate to each of these directories and locate the mentioned files. Then, open them using the vim
command. Once opened, search for 15e3
and change it to 15e5
. This modification adjusts the timeout for page responses. The default value 15e3
represents 15 seconds, and we're changing it to 1500 seconds. This change ensures that there won't be errors due to page timeouts when uploading large files.
To initialize the database, follow these steps:
Driver Configuration:
Copy the MySQL driver (8.x) to the lib
directory of each DolphinScheduler role, including:
api-server/libs
alert-server/libs
master-server/libs
worker-server/libs
tools/libs
Database User:
Log in to MySQL with the root user.
Execute the following SQL commands (both MySQL 5 and MySQL 8 are supported):
create database `dolphinscheduler` character set utf8mb4 collate utf8mb4_general_ci;
create user 'dolphinscheduler'@'%' IDENTIFIED WITH mysql_native_password by 'your_password';
grant ALL PRIVILEGES ON dolphinscheduler.* to 'dolphinscheduler'@'%';
flush privileges;
Execute Database Upgrade Script: Run the following command to execute the database upgrade script:
bash tools/bin/upgrade-schema.sh
bash ./bin/install.sh
This script will remotely transfer all local files to the machines configured in the above configuration files using scp
. It will then stop the corresponding roles on each machine and start them again.
After the first installation, all roles will be started automatically. There's no need to start any roles separately. If any roles are not started, you can check the corresponding logs on the respective machines to identify the specific issues.
Stop all services:
bash ./bin/stop-all.sh
Start all services:
bash ./bin/start-all.sh
Start/Stop Master:
bash ./bin/dolphinscheduler-daemon.sh stop master-server
bash ./bin/dolphinscheduler-daemon.sh start master-server
Start/Stop Worker:
bash ./bin/dolphinscheduler-daemon.sh start worker-server
bash ./bin/dolphinscheduler-daemon.sh stop worker-server
Start/Stop Api:
bash ./bin/dolphinscheduler-daemon.sh start api-server
bash ./bin/dolphinscheduler-daemon.sh stop api-server
Start/Stop Alert:
bash ./bin/dolphinscheduler-daemon.sh start alert-server
bash ./bin/dolphinscheduler-daemon.sh stop alert-server
It's crucial to note that you must execute these scripts using the user who installed DolphinScheduler to avoid permission issues.
Each service has a dolphinscheduler_env.sh
file in the <service>/conf/
directory, which provides convenience for microservice requirements. This means you can configure <service>/conf/dolphinscheduler_env.sh
for the corresponding service and then start each service based on different environment variables using <service>/bin/start.sh
command. However, if you start the server using the command /bin/dolphinscheduler-daemon.sh start <service>
, it will override <service>/conf/dolphinscheduler_env.sh
with the file bin/env/dolphinscheduler_env.sh
and then start the service. This is done to reduce the cost of users modifying configurations.
Refer to the steps above and follow these operations:
New Node - Install and configure JDK. - Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions.
Disadvantages of this method: If DolphinScheduler has many tasks running at the minute level or real-time tasks such as Flink or Spark, stopping all roles and restarting them will take some time. During this period, tasks may stop abnormally due to the restart of the entire cluster or may not be scheduled normally. However, DolphinScheduler implements automatic fault tolerance and disaster recovery functions, so this operation is feasible. Finally, observe whether all tasks are executed normally.
Refer to the steps above and follow these operations:
./dolphinscheduler-daemon.sh start master-server
./dolphinscheduler-daemon.sh start worker-server
Stop all roles on the machine to be removed using the /bin/dolphinscheduler-daemon.sh
script. The stop command is:
./dolphinscheduler-daemon.sh stop worker-server
Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the roles stopped on the machine have disappeared.
On the machine where you previously installed Dolphin Scheduler by extracting the binary installation package:
bin/env/install_env.sh
. In this configuration file, remove the machines corresponding to the offline roles.Follow the steps above step by step. For operations that have been performed before, there is no need to perform them again. Below are some specific operation steps: