参考安装
https://my.oschina.net/hackenhu/blog/4987911
一.安装部署组件
1.官网下载安装包: https://pingcap.com/download-cn/community/
将离线包发送到目标集群的中控机后,执行以下命令安装 TiUP 组件:
tar xzvf tidb-community-server-${version}-linux-amd64.tar.gz && \
sh tidb-community-server-${version}-linux-amd64/local_install.sh && \
source /home/tidb/.bash_profile
2.在线安装tiup
1.curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
2.source ~/.bashrc
which tiup
3.重新登录tidb用户
3.创建无密码登录
2.1创建普通用户
#创建用户
#adduser tidb
#设置密码
#passwd tidb
tidb2021
ssh-keygen -t rsa
cat /home/tidb/.ssh/admin_rsa.pub >> /home/tidb/.ssh/authorized_keys
chmod 600 /home/tidb/.ssh/authorized_keys
注意:自身无密码登录
远程
scp -r tidb01_rsa.pub tidb@cnhuam0dp02:/home/tidb/.ssh/
cat /home/tidb/.ssh/tidb01_rsa.pub >> /home/tidb/.ssh/authorized_keys
4.创建部署和数据目录--
mkdir /home/tidb/tidb-deploy /home/tidb/tidb-data
mkdir /home/tidb/tidb-data
5.解压安装程序
tar -zxvf tidb-community-server-v5.0.2-linux-amd64.tar.gz
sh tidb-community-server-v5.0.2-linux-amd64/local_install.sh
source /home/tidb/.bash_profile
6.编辑部署配置文件
vi topology.yaml
global:
user: "tidb"
ssh_port: 22
deploy_dir: "/home/tidb/tidb-deploy"
data_dir: "/home/tidb/tidb-data"
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
server_configs:
tidb:
log.slow-threshold: 300
tikv:
readpool.storage.use-unified-pool: false
readpool.coprocessor.use-unified-pool: true
pd:
replication.enable-placement-rules: true
replication.location-labels: ["host"]
tiflash:
logger.level: "info"
pd_servers:
- host: admin.dt.com
tidb_servers:
- host: admin.dt.com
tikv_servers:
- host: master.dt.com
port: 20160
status_port: 20180
config:
server.labels: { host: "logic-host-1" }
- host: master.dt.com
port: 20161
status_port: 20181
config:
server.labels: { host: "logic-host-2" }
- host: master.dt.com
port: 20162
status_port: 20182
config:
server.labels: { host: "logic-host-3" }
tiflash_servers:
- host: admin.dt.com
monitoring_servers:
- host: admin.dt.com
grafana_servers:
- host: master.dt.com
7.开始部署
下载安装tiup
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
curl --proto '=https' --tlsv1.2 -sSf http:172.26.160.70:53/install.sh | sh
source .bash_profile
tiup install tikv:v6.1.0
tidb获取sudo权限
vi /etc/sudoers
增加:tidb ALL = NOPASSWD: ALL
输入:wq! 保存
检测: tiup cluster check ./topology.yaml --user tidb [-p] [-i /home/tidb/.ssh/id_rsa]
执行部署命令: tiup cluster deploy tidb-jabil v6.0.0 ./topology.yaml --user tidb
tiup cluster start tidb-jabil --init
启动:tiup cluster start tidb-jabil
tiup cluster check ./topology.yaml --user tidb [-p] [-i /home/tidb/.ssh/id_rsa]
tiup cluster deploy tidb-jabil v6.0.0 ./topology.yaml --user tidb
tiup cluster start tidb-jabil
tiup cluster start tidb-jabil --init
8. 启动/停止 命令 cd /home/tidb
tiup cluster start tidb-jabil
tiup cluster restart tidb-jabil
tiup cluster stop tidb-jabil
9.检查安装成功
可以看到tidb的端口号是4000,pd运维端口是2379。我们通过Navicat这种工具连接数据库是使用4000端口,默认密码为空。
控制台:http://admin.dt.com:2379/dashboard
10.数据迁移界面 模式
/home/tidb/tidb-community-toolkit-v5.0.1-linux-amd64/bin/tidb-lightning --server-mode --status-addr :8289
11.增加TiCDC
tiup cluster scale-out tidb-test scale-out.yaml
查看集群列表
tiup cluster list
查看组件
tiup cluster display tidb-jabil
12.卸载tidb
1、tiup cluster list 查看集群列表
2、tiup cluster destroy tidb-jabil 来进行销毁
13. 配置用户密码
set password for 'root'@'%' = password('jabil@2022');
flush privileges;
14.cdc任务操作
cd /home/tidb
tidb同步到mysql
tiup ctl:v5.0.2 cdc changefeed create --pd=http://admin.dt.com:2379 --sink-uri="mysql://root:root@admin.dt.com:3306/test?max-txn-row=500" --config /home/tidb/moveDataConf/test_to_mysql.toml --changefeed-id="test-to-202-mysql"
tiup ctl:v5.0.2 cdc changefeed query -s --pd=http://admin.dt.com:2379 --changefeed-id="test-to-201-mysql"
查询同步任务
tiup ctl:v5.0.2 cdc changefeed list --pd=http://admin.dt.com:2379
查询 capture 列表
tiup ctl:v5.0.2 cdc capture list --pd=http://admin.dt.com:2379
停止任务
tiup ctl:v5.0.2 cdc changefeed pause --pd=http://admin.dt.com:2379 --changefeed-id "test-to-201-mysql"
tiup ctl:v5.0.2 cdc changefeed pause --pd=http://admin.dt.com:2379 --changefeed-id test-to-201-mysql
tiup ctl:v5.0.2 cdc cli changefeed remove --pd=http://admin.dt.com:2379 --changefeed-id test-to-201-mysql
15. TIDB执行跨库关联脚本
export MYSQL_PWD=root && mysql -uroot -h admin.dt.com -P4000 < 1.sql
1.sql内容:select * from bdweb_db.bi_users b,test.bi_users t where t.u_id=b.u_id
16.数据迁移需要用DM,安装DM
tiup install dm dmctl
查看DM tiup dm list
tiup uninstall dm-test
编辑初始化配置文件topology-dm.yaml , 获取版本号:tiup list dm-master
tiup dm deploy dm-test v2.0.3 /home/tidb/topology-dm.yaml --user tidb [-p] [-i /home/tidb/.ssh/id_rsa]
启动 tiup dm start dm-test
查看dm角色 tiup dm display dm-test
删除部署dm tiup dm destroy dm-test
tiup cluster reload
执行数据迁移命令
1.加载数据源
tiup dmctl --master-addr 192.168.0.20:8261 operate-source create source-mysql.yaml
2.核对加载任务
tiup dmctl --master-addr 192.168.0.20:8261 check-task test-task.yaml
3.启动任务
tiup dmctl --master-addr 192.168.0.20:8261 start-task test-task.yaml
17.监控
http://slave.test.com:3001/
admin/123456
18.执行tidb脚本命令
mysql -h master.test.com -P 4000 -uroot -proot test < /opt/mysql/sql/del_resu.sql >> /opt/log/sql.out
mysql -h master.test.com -P 4000 -uroot -proot test -e " select dt,max(pv) from pvuv_sink GROUP BY dt order by dt desc limit 10 " >> /opt/log/sql.out
19.数据节点操作
tiup ctl:v6.0.0 pd -u admin.test.com:2379 -i
删除某个节点
store delete 1
20. 单独启动某个角色
tiup cluster start tidb-jabil -R tikv
tiup cluster start tidb-jabil -R pd
tiup cluster stop tidb-jabil -R cdc
21.重启节点
tiup dm reload
22.查找日志
cat /home/tidb/tidb-bin/pd-2379/log/pd.log | grep "init cluster id"
cat /home/tidb/tidb-deploy/pd-2379/log/pd.log | grep "init cluster id"
23.禁止开机启动
tiup cluster disable tidb-jabil
24.离线部署v6.1.0
注意权限
chmod -R 777 /df_bigdata_cluster/software/tidb
chmod -R 777 /df_bigdata_cluster/data/tidb
镜像一
tar xzvf tidb-community-server-v6.1.0-linux-amd64.tar.gz && \
sh tidb-community-server-v6.1.0-linux-amd64/local_install.sh && \
source /home/tidb/.bash_profile
镜像二
tar xf tidb-community-toolkit-v6.1.0-linux-amd64.tar.gz
ls -ld tidb-community-server-v6.1.0-linux-amd64 tidb-community-toolkit-v6.1.0-linux-amd64
cd tidb-community-server-v6.1.0-linux-amd64/
cp -rp keys ~/.tiup/
tiup mirror merge ../tidb-community-toolkit-v6.1.0-linux-amd64
验证
tiup cluster check ./topology.yaml --user tidb
部署--需要5分钟左右
tiup cluster deploy tidb-jabil v6.1.0 ./topology.yaml --user tidb
tiup cluster deploy tidb-JBDP v6.1.0 ./topology.yaml --user tidb
启动
第一次初始化
tiup cluster start tidb-JBDP --init
tiup cluster start tidb-JBDP
生产集群命令
tiup cluster restart tidb-JBDP
tiup cluster display tidb-JBDP
tiup cluster stop tidb-JBDP
tiup cluster destroy tidb-JBDP 来进行销毁
重启
tiup cluster restart tidb-JBDP
开始启动
tiup cluster start tidb-JBDP
mkdir /data6/tidb
mkdir /data6/tidb/data
chown -R tidb:tidb /data6/tidb
mkdir /data8/tidb
mkdir /data8/tidb/data
chown -R tidb:tidb /data8/tidb
Started cluster `tidb-JBDP` successfully
The root password of TiDB database has been changed.
The new password is: 'r8=%UT1#2D0W-G7YJ4'.
Copy and record it to somewhere safe, it is only displayed once, and will not be stored.
The generated password can NOT be get and shown again.
25.防火墙设计
需要开启端口
grafana 3001
pd 2379/2380
prometheus 9090/12020
tidb 4000/10080
tiflash 8889/8123/3930/20170/20292/8234
tikv 20160/20180
#pd
firewall-cmd --permanent --add-port=2379/tcp
firewall-cmd --permanent --add-port=2380/tcp
#tidb
firewall-cmd --permanent --add-port=4000/tcp
firewall-cmd --permanent --add-port=10080/tcp
#tikv
firewall-cmd --permanent --add-port=20160/tcp
firewall-cmd --permanent --add-port=20180/tcp
#tiflash
firewall-cmd --permanent --add-port=8889/tcp
firewall-cmd --permanent --add-port=8123/tcp
firewall-cmd --permanent --add-port=3930/tcp
firewall-cmd --permanent --add-port=20170/tcp
firewall-cmd --permanent --add-port=20292/tcp
firewall-cmd --permanent --add-port=8234/tcp
#grafana
firewall-cmd --permanent --add-port=3001/tcp
#prometheus
firewall-cmd --permanent --add-port=9090/tcp
firewall-cmd --permanent --add-port=12020/tcp
firewall-cmd --reload
firewall-cmd --list-port
先停防火墙
service firewalld stop
service firewalld start
26.数据导出备份工具Dumpling
参考:https://www.modb.pro/db/331066
获取 安装
tiup install dumpling
运行
tiup dumpling ...
tiup dumpling \
-u root \
-p jabil@2022 \
-P 4000 \
-h cnhuam0testg84 \
--filetype csv \
--sql 'select * from test.aps_rt_sink' \
-t 8 \
-o /df_bigdata_cluster/data/tidb/test \
-r 200000 \
-F 256MiB
tiup dumpling \
-u root \
-p jabil@2022 \
-P 4000 \
-h cnhuam0testg84 \
--filetype sql \
-t 8 \
-o /df_bigdata_cluster/data/tidb/test \
-r 200000 \
--filter "test.*" \
-F 256MiB
27.tidb-lightning和dumpling安装
tiup install tidb-lightning
chmod +x tidb-lightning
tiup tidb-lightning -h
数据导入 nohup tiup tidb-lightning -config eam-lightning.toml > nohup.out 2>&1 &
nohup /data13/tidb/deploy/tidb-community-toolkit-v6.1.0-linux-amd64/tidb-lightning -config /data13/tidb/lighting/import-conf/whirlpool-lightning.toml > /data13/tidb/tmp/nohup.out &
scp -r /data13/tidb/deploy/duming-data/esss tidb@10.114.19.66:/df_bigdata_cluster/software/tidb/tmp-data
scp -r /data13/tidb/duming-data/sbs tidb@10.114.19.66:/df_bigdata_cluster/software/tidb/tmp-data
#!/bin/bash
tiup install tidb-dumpling
数据导出:
/home/tidb/.tiup/bin/tiup dumpling -u root -p jabil@2022 -P 4000 -h cnhuam0testg84 --filetype sql -t 8 -o /df_bigdata_cluster/data/tidb/bakData/whirlpool -r 200000 --filter "whirlpool_db.*" -F 256MiB >>/df_bigdata_cluster/data/tidb/log/test.out
/home/tidb/.tiup/bin/tiup dumpling -u esss -p esss#20230817 -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/deploy/duming-data/esss -r 200000 --filter "esss_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
/home/tidb/.tiup/bin/tiup dumpling -u eam_prod -p eam#20240125 -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/deploy/duming-data/eam -r 200000 --filter "eam_prod_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
/home/tidb/.tiup/bin/tiup dumpling -u sbs -p sbs@20230303 -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/duming-data/sbs -r 200000 --filter "sbs_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
/home/tidb/.tiup/bin/tiup dumpling -u valeo -p vdb#20230608 -P 4001 -h cnhuam0dp06 -o /data13/tidb/duming-data/valeo/ods_aoi_image -r 200000 --filetype csv --sql 'select * from valeo_db.ods_spi_detail_camx_history where create_time>date_sub(curdate(),interval 47 day) and create_time< date_sub(curdate(),interval 18 day)' -F 100MiB --output-filename-template 'ods_spi_detail_camx.{{.Index}}'
/home/tidb/.tiup/bin/tiup dumpling -u valeo -p vdb#20230608 -P 4001 -h cnhuam0dp06 -o /data11/apps/spark/duming-data/valeo/ods_aoi_image2 -r 200000 --filetype csv --sql 'select * from valeo_db.ods_aoi_image where create_time>date_sub(curdate(),interval 55 day) and create_time< date_sub(curdate(),interval 50 day)' -F 100MiB --output-filename-template 'ods_aoi_image.{{.Index}}'
scp -r tidb@10.114.19.66:/df_bigdata_cluster/data/tidb/bakData/whirlpool /data13/tidb/lighting/dumper-data/whirlpool
28.在线扩容
tiup cluster scale-out tidb-JBDP scale-out-tidb.yaml -uroot -p
scale-out-tidb.yaml内容:
tidb_servers:
- host: 10.0.1.5
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /data/deploy/install/deploy/tidb-4000
log_dir: /data/deploy/install/log/tidb-4000
检查集群状态
tiup cluster display tidb-JBDP
参考:https://blog.csdn.net/weixin_42241611/article/details/126785954
29.设置系统最大打开文件数(不需要重启系统)
vi /etc/security/limits.conf, tidb配置数一定小于等于系统配置数,系统配置数一定要高否则没用添加:
# 进程线程数
* soft nproc 131072
* hard nproc 131072
# 文件句柄数
* soft nofile 1000000
* hard nofile 1000000
# 内存锁定交换
* soft memlock unlimited
* hard memlock unlimited
二.运维问题
1.报错:Error: Failed to initialize TiDB environment on remote host 'admin.dt.com' (task.env_init.failed)
caused by: Failed to create '~/.ssh' directory for user 'tidb'
caused by: Failed to execute command over SSH for 'tidb@admin.dt.com:22'
caused by: Process exited with status 1
解决办法:每台机器都加
chmod 470 /etc/sudoers
vi /etc/sudoers
添加:tidb ALL = NOPASSWD: ALL
chmod 440 /etc/sudoers
2.Starting component `ctl`: /home/tidb/.tiup/components/ctl/v5.0.1/ctl v5.0.1
Error: ctl need an explicit version, please run with `tiup ctl:<cluster-version>`
Error: run `/home/tidb/.tiup/components/ctl/v5.0.1/ctl` (wd:/home/tidb/.tiup/data/SZzAnSw) failed: exit status 1
解决办法: 运行
mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p
3.[CDC:ErrMySQLConnectionError]Error 1298: Unknown or incorrect time zone: 'Asia/Shanghai'
windows下mysql时区设置
修改的my.ini配置文件,在MySQL的安装根目录下
[mysqld]
4.Error: failed to start cdc: failed to start: master.dt.com cdc-8300.service, please check the instance's log(/home/tidb/tidb-deploy/cdc-8300/log) for more detail.: timed out waiting for port 8300 to be started after 2m0s
Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2021-06-13-17-01-07.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.5.0/tiup-cluster` (wd:/home/tidb/.tiup/data/SaCg628) failed: exit status 1
解决办法: 重启机器
没有 leader 的 region
tiup pd-ctl -u http://admin.test.com:2379 -d region --jq '.regions[]|select(has("leader")|not)|{id: .id,peer_stores: [.peers[].store_id]}'
副本丢失的 region
pd-ctl -u http://admin.test.com:2379 region --jq=".regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length != 2)}" // 当前环境中 max-replicas 参数为 2
5.路径不对报:is not a valid SemVer string
6.初始化报:Failed to create new system user 'tidb' on remote host
远程无密码账户不对
7. 启动dm初始化报fail to validate embed etcd config, RawCause: expected IP in URL for binding
解决办法:配置文件不能用机器名只能是IP地址
8.启动集群报错:Error: failed to start tidb: failed to start: admin.test.com tidb-4000.service, please check the instance's log
解决办法:再次启动就ok
9.启动TIDB集群报:failed to start tikv: failed to start:
解决办法:宿主机重启共享网络
10.启动 TiKV 一直报 PD 节点不通,retrying of unary invoker failed
解决办法: 重启节点 tiup dm reload ,最坏的办法销毁集群,重新部署,但先做好数据备份。
11. failed to start grafana: failed to start: slave.test.com grafana-3001.service, please check the instance's
解决办法:一般是网络问题,等待
12.PD报错:["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=admin.test.com:2379]
[2022/06/28 14:26:57.683 +08:00] [INFO] [util.rs:547] ["connecting to PD endpoint"] [endpoints=master.test.com:2379]
tikv报错:[ERROR] [etcdutil.go:71] ["failed to get cluster from remote"] [error="[PD:etcd:ErrEtcdGetCluster]could not retrieve cluster information from the given URLs: could not retrieve cluster information from the given URLs"]
解决办法:安装时不能用域名,只能改机器名
13.镜像报错:-bash: tsh: command not found
解决办法:复制官网的镜像语句再执行
ssh_stderr: bash: /tmp/tiup/bin/insight: Permission denied
cat /etc/fstab
umount /tmp
mount /dev/mapper/root_vg-tmp /tmp
14. 安装前检测cpu-governor Fail :CPU frequency governor is powersave, should use performance
解决办法:执行 cpupower frequency-set --governor performance
15. 安装前检测 Fail numactl not usable, bash: numactl: command not found
解决办法:执行 yum -y install numactl.x86_64
16.10.114.26.111 network Fail network speed of eno6 is 100MB too low, needs 1GB or more
17.The component `dumpling` not found (may be deleted from repository); skipped
解决办法:没有把tidb-community-toolkit-v6加入到tiup, 按照镜像2操作: tiup mirror merge ../tidb-community-toolkit-v6.1.0-linux-amd64
18. 创建表1055 ,group 报错
查看:show variables like '%sql_mode%'
set global sql_mode ='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
原 global sql_mode ='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
需要重启 tidb
19.TIDB-Error 1105: Out Of Memory Quota问题解决
server_configs:
tidb:
mem-quota-query: 10737418240# 修改大小,
这个配置值默认为 1GB,
20.报错:Transaction is too large(交易量太大)
修改方法:
performance.txn-total-size-limit: 10737418240 -->34359738368在tidb下加此参数
txn-total-size-limit
TiDB 单个事务大小限制
默认值:104857600
单位:Byte
单个事务中,所有 key-value 记录的总大小不能超过该限制。该配置项的最大值不超过 10737418240(表示 10GB)。
注意,如果使用了以 Kafka 为下游消费者的 binlog,如:arbiter 集群,该配置项的值不能超过 1073741824(表示 1GB),因为这是 Kafka 的处理单条消息的最大限制,超过该限制 Kafka 将会报错。
21.重启SSH报错:ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods re "errorVerbose": "ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey],
解决办法:目标机器authorized_keys缺少密钥对,开始部署集群时,tiup就自己生成了一对密钥对与普通SSH不同 ,需要拷贝到目标机的authorized_keys;
22.tidb删除大量数据报错:ERROR 1105 (HY000) at line 1: Out Of Memory Quota![conn_id=676956115087820587]
解决办法:修改参数
server_configs:
tidb:
mem-quota-query: 4294967296 # 修改大小
23.tidb后台链接报错:MySQLNonTransientConnectionException: Could not create connection to database server
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.]
2023-12-25 17:48:00.209 INFO 6897 --- [o-8089-exec-221] com.jabil.datax.utils.DBUtil : 异常连接URL:jdbc:mysql://cnhuam0dp06:4001?serverTimezone=GMT%2B8&useUnicode=true&characterEncoding=UTF-8,用户名:apiWS,密码:api@2022
2023-12-25 17:48:02.209 ERROR 6897 --- [o-8089-exec-221] com.jabil.datax.utils.RetryUtil : Exception when calling callable, 即将尝试执行第2次重试.本次重试计划等待[2000]ms,实际等待[2000]ms, 异常Msg:[Code:[MYSQLErrCode-02],
Description:[数据库服务的IP地址或者Port错误,请检查填写的IP地址和Port或者联系DBA确认IP地址和Port是否正确。如果是同步中心用户请联系DBA确认idb上录入的IP和PORT信息和数据库的当前实际信息是一致的].
解决方案是:
查询数据库没有关闭链接,释放资源
//执行完查询关闭资源 DBUtil.closeDBResources(null, null, conn);
24. ERROR 8242 (HY000): 'Drop Table' is unsupported on cache tables 缓存表不能删除
解决方案是: 先转化普通表再删除执行:ALTER TABLE dwd_indicator NOCACHE
25.tidb lightning encountered error: [Lightning:PreCheck:ErrSystemRequirementNotMet]system requirement not met: the maximum number of open file descriptors is too small, got 1024, expect greater or equal to 12590: operation not permitted
解决方案是: 每台执行 ulimit -HSn 1000000
26.导入数据报:Cluster doesn't have too many empty regions
解决方案是:
27. tidbit tiup无法下达指令 报:cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster
解决方案是:检测主机配置文件:/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/config-cache 和从机:/data13/tidb/deploy/tidb-4000/scripts 等
对应目录是否存在
28.链接数据库4000失败,报:rpc error: code = Canceled desc = grpc: the client connection is closing
解决方案是:tidb端口4000被占用,杀手进程后自动启动
29.tikv缩容后,节点不在集群中,但对应的tikv仍然在启动状态,需要删除节点,找到对应的store id 执行删除
解决方案是: /home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete 1
30.[FATAL] [server.rs:950] ["failed to start node: StoreTombstone(\"store is tombstone\")"]
解决方案是:删除对应storeid
31. 下线失败使用 ,先weight为0,然后删除
解决方案是:
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store weight 1 0 0
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store weight 183060412 0 0
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store weight 183062372 0 0
删除
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store remove-tombstone 1
curl -X POST http://10.114.26.111:2379 /pd/api/v1/store/1/state?state=Up
32.发现store 永远处于Down无法用store delete 删除,执行如下命令
解决方案是:/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores store_id
33.每次重启tidb时已经tombstone的实例又被启动,需要删除
解决方案是: tiup cluster prune tidb-JBDP
34.如果启动tidb超时,而中断其他服务不能正常启动,例如:timed out waiting for port 4000 to be started after 2m0s
解决方案是: tiup cluster start tidb-JBDP
35.系统健康检查失败集群中未启动必要组件 `NgMonitoring`,部分功能将不可用。
解决方案是: tiup cluster start tidb-JBDP
三.TIDB运维命令
1.2021/6/7
https://asktug.com/t/topic/92780
2.问题搜索
https://search.asktug.com
3.导入导出语句
导入表结构先创建表
tar -zxvf tidb-lightning-${version}-linux-amd64.tar.gz
tidb-lightning -d "mysql://user:password@IP:port/DBName" -t "tidb://user:password@IP:port/DBName
mysql -h 10.114.17.165 -P 4002 -uroot -pjabil@2022
use sbs_db;
/data13/tidb/duming-data/sbs_db-20240528.sql;
拷贝导出数据
scp -r tidb@cnhuam0dp01:/data13/tidb/duming-data/sbs_db /df_bigdata_cluster/data/tidb/bakData/lightData
4.导入测试环境
nohup /df_bigdata_cluster/software/tidb/tidb-community-server-v6.1.0-linux-amd64/tidb-lightning -config /df_bigdata_cluster/data/tidb/bakData/lightData/sbs_db/sbs-lightning.toml > /df_bigdata_cluster/data/tidb/bakData/lightData/logs/nohup.out &
nohup /df_bigdata_cluster/software/tidb/tidb-community-server-v6.1.0-linux-amd64/tidb-lightning -config /df_bigdata_cluster/data/tidb/bakData/lightData/conf/sbs-lightning.toml > /df_bigdata_cluster/data/tidb/bakData/lightData/logs/nohup.out &
5. 导出数据表结构
mysqldump -h cnhuam0dp09 -P 4001 -d sbs_db -uroot -pjabil#202210 >/data13/tidb/duming-data/sbs_db-20240528.sql
mysqldump -h cnhuam0dp09 -P 4001 -d valeo_db -uroot -pjabil#202210 >/data13/tidb/duming-data/valeo_db-20240528.sql
mysqldump -h cnhuam0dp09 -P 4001 -d bmwcss_db -uroot -pjabil#202210 >/data13/tidb/duming-data/bmwcss_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d dfengsys_prod_db -uroot -pjabil#202210 >/data13/tidb/duming-data/dfengsys_prod_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d eam_db -uroot -pjabil#202210 >/data13/tidb/duming-data/eam_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d eam_prod_db -uroot -pjabil#202210 >/data13/tidb/duming-data/eam_prod_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ehs_hcas_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ehs_hcas_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d esss_db -uroot -pjabil#202210 >/data13/tidb/duming-data/esss_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d etask_db -uroot -pjabil#202210 >/data13/tidb/duming-data/etask_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d hcss_db -uroot -pjabil#202210 >/data13/tidb/duming-data/hcss_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ic_sap_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ic_pro_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ic_sap_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ic_sap_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ie_ucs_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ie_ucs_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d iot_db -uroot -pjabil#202210 >/data13/tidb/duming-data/iot_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d mat_fuc_db -uroot -pjabil#202210 >/data13/tidb/duming-data/mat_fuc_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d pub_db -uroot -pjabil#202210 >/data13/tidb/duming-data/pub_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d wef_db -uroot -pjabil#202210 >/data13/tidb/duming-data/wef_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d wic_db -uroot -pjabil#202210 >/data13/tidb/duming-data/wic_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d soea_db -uroot -pjabil#202210 >/data13/tidb/duming-data/soea_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d screw_db -uroot -pjabil#202210 >/data13/tidb/duming-data/screw_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d scada_db -uroot -pjabil#202210 >/data13/tidb/duming-data/scada_db-20240531.sql
hdfs数据上传备份
hdfs dfs -put *.sql /user/tidb/dbCreateSQL
导出表数据
/home/tidb/.tiup/bin/tiup dumpling -u $userName -p $passwd -P 4001 -h cnhuam0dp09 -o /data13/tidb/duming-data/$dbName/$exportTabName/$partYearMonth -r 200000 --filetype csv --sql "select * from $tidb_tab where $fiterKeyName>=date_sub(curdate(),interval $fromDays day) and $fiterKeyName<= date_sub(curdate(),interval $toDays day)" -F 200MiB --output-filename-template "$4.{{.Index}}" >>/data13/tidb/logs/hue_dumpling.out
导出表数据
/home/tidb/.tiup/bin/tiup dumpling -u root -p jabil#202210 -P 4001 -h cnhuam0dp09 --filetype sql -t 8 -o /data13/tidb/duming-data/sbs_db -r 200000 --filter "sbs_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
6.备份数据库
导出数据
/home/tidb/.tiup/bin/tiup dumpling -u hcss -p hcss -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/dbBak/hcss_20231128 -r 200000 --filter "hcss_db.*" -F 256MiB >>/data13/tidb/logs/hcss.out
zip -r /data13/tidb/dbBak/hcss_20231128.zip /data13/tidb/dbBak
7.强制启动服务
tiup cluster start tidb-JBDP -R tidb
tiup cluster restart tidb-JBDP -R tiflash
tiup cluster restart tidb-JBDP -R node_exporter
tiup cluster restart tidb-JBDP -R prometheus
tiup cluster start tidb-JBDP -R tikv
tiup cluster stop tidb-JBDP -R tikv
tiup cluster start tidb-JBDP -R prometheus
tiup cluster start tidb-JBDP -R tiflash
tiup cluster start tidb-JBDP -R grafana
tiup cluster start tidb-JBDP -R NgMonitoring
tiup cluster display tidb-JBDP
03强制启动某个ID
tiup cluster start tidb-JBDP
tiup cluster start tidb-JBDP --node 10.114.26.104:20163
tiup cluster start tidb-JBDP --node 10.114.26.113:9100
tiup cluster restart tidb-JBDP -R node_exporter
四.优化调优
1.表级别,开启tiflash列存储,列存储统计计算快
针对大表才开启,例如:
ALTER TABLE soea_db.ods_product_bay_data SET TIFLASH REPLICA 1;
ALTER TABLE qmp_db.InternalAuditPlanReport SET TIFLASH REPLICA 1;
ALTER TABLE tb_equ_moniter_record_2022 SET TIFLASH REPLICA 1;
ALTER TABLE screw_db.dwd_tighten_basic SET TIFLASH REPLICA 1;
ALTER TABLE soea_db.ods_product_bay_data SET TIFLASH REPLICA 1;
取消
ALTER TABLE screw_db.dwd_tighten_basic SET TIFLASH REPLICA 0;
查看:
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'soea_db' and TABLE_NAME = 'ods_product_bay_data';
开启tiflash
ALTER TABLE tab SET TIFLASH REPLICA 1; #开启数据同步
set @@session.tidb_allow_mpp=1; #开启MPP计算
set @@session.tidb_enforce_mpp=1;
2.大批量删除数据失败时用
Delete from t where xx limit 5000(过滤条件)
3.配置参数调优
1. 数据同步:Transaction is too large, size: 104857819 With statement:
tiup cluster edit-config tidb-JBDP
tiup cluster show-config tidb-JBDP
tiup cluster show-config tidb-jabil
tiup cluster edit-config tidb-jabil
#修改配置参数:
performance.txn-total-size-limit: 10737418240 ## 单个事务大小限制,例如:一致性导数数据大小限制
重启生效:
tiup cluster reload tidb-JBDP -R tidb
tiup cluster stop tidb-JBDP -R tidb
2. #不超过机器内存的45%
storage.block-cache.capacity: 32GB
#重启生效:
tiup cluster reload tidb-JBDP -R tikv
3. 配置临时计算存储位置 tidb:
tmp-storage-path: /data13/tidb/tmpStorage
performance.txn-total-size-limit: 10737418240
4.日志保存天数
log.file.max-days: 30
4.处理函数长度上限问题:
修改GROUP_CONCAT函数存储字符串的最大长度
默认GROUP_CONCAT函数返回的结果大小被MySQL默认限制为1024(字节)的长度。
SET GLOBAL group_concat_max_len = 102400;
select * from mysql.GLOBAL_VARIABLES where variable_name = ‘group_concat_max_len’;
echo PubkeyAcceptedKeyTypes=+ssh-rsa >> /etc/ssh/sshd_config
systemctl restart sshd
重新加载SSH
tiup cluster reload --ssh system tidb-JBDP
6.清理日志
tiup cluster clean tidb-jabil --log
tiup cluster stop tidb-jabil
tiup cluster start tidb-jabil
tiup cluster clean tidb-JBDP --log
7.如果出现一次启动失败可以了接连tiup cluster start tidb-JBDP,直到成功为止
8.出现创建和删除表失败很有可能是日志积累太多需清除日志,日志需定期清理
tiup cluster clean tidb-JBDP --log
再重启
tiup cluster start tidb-JBDP
tiup cluster reload tidb-JBDP --role tikv --nodes 10.114.26.112
systemctl start tikv-server
tiup cluster reload tidb-JBDP
9.查看进程状态
SHOW PROCESSLIST
SELECT * FROM information_schema.processlist WHERE COMMAND='Sleep' ORDER BY db
50条以下比较健康 不然汇报错:communications link failure
10.某台服务器IO占很高,后台tikv日志报错:Error starting HTTP server" err="accept tcp [::]:9115: accept4: too many open files in system
解决方案是: vi /etc/security/limits.conf, tidb配置数一定小于等于系统配置数,系统配置数一定要高否则没用添加:
# 进程线程数
* soft nproc 131072
* hard nproc 131072
# 文件句柄数
* soft nofile 1000000
* hard nofile 1000000
# 内存锁定交换
* soft memlock unlimited
* hard memlock unlimited
11.查看系统IO使用情况
yum install -y sysstat
查看IO:iostat
实时:dumpling
iostat -d -m 2
配置文件
vi /home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/meta.yaml
执行 tiup cluster reload tidb-JBDP
/data13/tidb/deploy/tikv-20161/bin/tikv-server > /data11/tidb/deploy/tikv-20161/tikv.log 2>&1 &
/data13/tidb/deploy/tikv-20162/bin/tikv-server > /data12/tidb/deploy/tikv-20162/tikv.log 2>&1 &
/data13/tidb/deploy/tikv-20163/bin/tikv-server > /data13/tidb/deploy/tikv-20163/tikv.log 2>&1 &
12.手动启动tidb服务
tidb
112
/data13/tidb/deploy/tidb-4000/bin/tidb-server >/data13/tidb/deploy/tidb-4000/log/tidb.log 2>&1 &
105
/data13/tidb/deploy/tidb-4000/bin/tidb-server >/data13/tidb/deploy/tidb-4000/log/tidb.log 2>&1 &
/data13/tidb/deploy/tikv-server-20162 start
/data13/tidb/deploy/tikv-20162/bin/tikv-server start
/data13/tidb/deploy/tikv-20163/bin/tikv-server stop
systemctl list-unit-files --type=service --state=disabled
systemctl is-enabled tikv-20163
systemctl disable tikv-20163
13.集群丢失数据
https://zhuanlan.zhihu.com/p/616686144
tiup ctl:v6.1.0 pd -u https://10.114.26.111:2379 --cacert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/ca.crt --key=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.pem --cert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.crt -i
pd -u https://10.114.26.111:2379 --cacert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/ca.crt --key=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.pem --cert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.crt -i » unsafe
https://www.cnblogs.com/vansky/p/9415551.html
14.查看tikv实例详细操作
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store 5
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 member leader show
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete 2
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store remove-tombstone 183060412
添加evict调度方式驱逐待下线store上的Leader
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 scheduler add evict-leader-scheduler 183060412
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete 183060412 183062372
取消
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store cancel-delete 1
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete
17.删除分区
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 operator remove-region 185889957
18.查看tidb配置参数
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config show
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores show
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores 1
详细
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config show all
19.实例和分区操作
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores -s 183060412 -r 185889957
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 operator add remove-peer 185889957 183060412
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 operator add remove-peer 4743 183060412
20.查看实例(store)对应的分区存储
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq='.regions[]|select(has("leader")|not)|{id:,peer_stores: [.peers[].store_id]}'
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store state_name 1 tombstone
查看分区详情
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region 159869
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config show|grep limit > limit.config
21.设置region-schedule
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config set region-schedule-limit 0
pd-ctl config set region-schedule-limit 0
pd-ctl config set region-schedule-limit 0
22.应用JQ脚本查询
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(1) then . else empty end) | length>=$total-length)}' >>tt3.json
查看副本不足的
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(183062372,183060412,1,14,8,5,2,183,184,16,9,15,17,179394555,179397408,179393803,179395662) then . else empty end) | length>=$total-length)}' >> tt5.json
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq=".regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(1,14,8,5,2,183,184,16,9,15,17,179394555,179397408,179393803,179395662) then . else empty end) | length>=$total-length)}"
查看没有leader的分区
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq '.regions[] | select(.leader.store_id == null) | {id: .id, peer_stores: [.peers[].store_id], leader_store: .leader.store_id}' > noleader.json
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region store 1
23.删除存在下架store上的分区(每台tikv都要执行),当tidb无法启动时使用会丢失数据
主机
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /data11/tidb/data/tikv-20161 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /data12/tidb/data/tikv-20162 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /data13/tidb/data/tikv-20163 unsafe-recover remove-fail-stores -s 1 --all-regions
从机远程拷贝工具
scp -r /home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl tidb@cnhuam0dp03:/home/tidb/tikv-ctl
/home/tidb/tikv-ctl --data-dir /data11/tidb/data/tikv-20161 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/tikv-ctl --data-dir /data12/tidb/data/tikv-20162 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/tikv-ctl --data-dir /data13/tidb/data/tikv-20163 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
停止tikv
curl http://10.114.26.111:10080/regions/23209844
tiup cluster prune tidb-JBDP
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /df_bigdata_cluster/data/tidb/tidb-data/tikv-20160/ bad-regions
set region-schedule-limit 0
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config set max-replicas 1
tiup cluster start tidb-JBDP -N 10.114.26.112:20161
ps -ef | grep tikv-server | grep -v grep | awk '{print $2}' | xargs kill -9
ps -ef | grep pd-server | grep -v grep | awk '{print $2}' | xargs kill -9
tiup mirror clone tiup-custom-mirror-6.1.0 --tiup mirror-6.1.0
tiup mirror grant $USER
cp -rp /home/tidb/tidb-community-server-6.1.0-linux-amd64/keys /home/tidb/.tiup/
tiup mirror merge /home/tidb/tidb-community-server-6.1.0-linux-amd64
tiup mirror clone /home/tidb/tidb-community-server-6.1.0-linux-amd64 --full
tiup mirror clone tiup-custom-mirror-v6.1.0 --PCC 1.0.1 --os=linux --arch=amd64
tiup mirror merge /home/tidb/tiup-custom-mirror-v6.1.0
tiup install PCC
17.检测安装tidb合格线--调优
Node Check Result Message
---- ----- ------ -------
10.114.26.113 os-version Pass OS is CentOS Linux 7 (Core) 7.9.2009
10.114.26.113 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.114.26.113 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.114.26.113 sysctl Fail vm.swappiness = 1, should be 0
10.114.26.113 selinux Pass SELinux is disabled
10.114.26.113 thp Fail THP is enabled, please disable it for best performance
10.114.26.113 cpu-cores Pass number of CPU cores / threads: 64
10.114.26.113 memory Pass memory size is 262144MB
10.114.26.113 network Pass network speed of bond0 is 2000MB
10.114.26.113 network Pass network speed of eno5 is 1000MB
10.114.26.113 network Pass network speed of eno6 is 1000MB
10.114.26.113 network Pass network speed of eno7 is 1000MB
10.114.26.113 network Pass network speed of eno8 is 1000MB
10.114.26.113 command Pass numactl: policy: default
10.114.26.113 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.114.26.113 disk Warn mount point /data11 does not have 'noatime' option set
10.114.26.113 service Fail service firewalld is running but should be stopped
10.114.26.113 swap Warn swap is enabled, please disable it for best performance
查看:cat /proc/sys/fs/file-nr
vi /etc/sysctl.conf
fs.file-max=1000000
vm.swappiness=0
net.ipv4.tcp_syncookies=0
sysctl -p
18.tidb扩容和缩容优化
查看
检测
tiup cluster check tidb-JBDP scale-out.yml --cluster --user tidb
扩容
注意:检查扩容文件对应的路径权限是否是tidb的,端口是否已被使用
tiup cluster scale-out tidb-JBDP scale-out-20161.yml
tiup cluster scale-out tidb-JBDP scale-out-20162.yml
缩容
tiup cluster start tidb-JBDP --node 10.114.26.104:20163
tiup cluster stop tidb-JBDP --node 10.114.26.104:20163
tiup cluster scale-in tidb-JBDP --node 10.114.26.103:20163
tiup cluster scale-in tidb-JBDP --node 10.114.26.112:20162 --force
tiup cluster scale-in tidb-JBDP --node 10.114.26.112:20161 --force
tiup cluster scale-in --force -N
tiup cluster scale-in prod-cluster --node 192.168.4.18:20160 --force
注意:下线时间可能12小时以上。
tiup cluster restart tidb-JBDP --node 10.114.26.112:20161
修复tikv
tiup cluster start tidb-JBDP -N 10.114.26.112:20162
curl --proto =https --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh 7 | sh
tiup ctl:v1.14.1 pd -u http://10.114.26.111:2379 store
tiup cluster force-leader tidb-JBDP 10.114.26.112:20161
tiup cluster force-leader tidb-JBDP --node
启动
tiup tidb-JBDP restart tidb-JBDP 10.114.26.112:20161
19,看下tikv线程的参数 cat /proc/49897/limits
tidb优化方案措施
1.对于大表数据量超过100万最好建时间索引,检索数据最好带上时间过滤条件,避免全表搜索;
2.做表与表关联时尽量先过滤条件后关联;
3.用union all替换union,尽量避免使用in .like
4.尽量避免使用join关联
5.如果有大量批量计算和分析时,数据导入sparkSQL用spark做批量计算;
20.检测tidb卡死,队列等待情况
检测tidb 当前作业
admin show ddl jobs;
取消作业,
ADMIN CANCEL DDL JOBS 1887486;
21. 设置最大连接数
SET GLOBAL max_connections = 350;
show variables like '%max_connections%'
注意和haproxy的 maxconn 1024 #最大连接数 对应的1024约等于3*max_connections
注意haproxy的 maxconn(可能多个)不能超过2048.否则tidb扛不住
22.设置每条SQL默认最大内存大小
#SET GLOBAL tidb_mem_quota_query ='10737418240';
show variables like '%tidb_mem_quota_query%';
https://my.oschina.net/hackenhu/blog/4987911
一.安装部署组件
1.官网下载安装包: https://pingcap.com/download-cn/community/
将离线包发送到目标集群的中控机后,执行以下命令安装 TiUP 组件:
tar xzvf tidb-community-server-${version}-linux-amd64.tar.gz && \
sh tidb-community-server-${version}-linux-amd64/local_install.sh && \
source /home/tidb/.bash_profile
2.在线安装tiup
1.curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
2.source ~/.bashrc
which tiup
3.重新登录tidb用户
3.创建无密码登录
2.1创建普通用户
#创建用户
#adduser tidb
#设置密码
#passwd tidb
tidb2021
ssh-keygen -t rsa
cat /home/tidb/.ssh/admin_rsa.pub >> /home/tidb/.ssh/authorized_keys
chmod 600 /home/tidb/.ssh/authorized_keys
注意:自身无密码登录
远程
scp -r tidb01_rsa.pub tidb@cnhuam0dp02:/home/tidb/.ssh/
cat /home/tidb/.ssh/tidb01_rsa.pub >> /home/tidb/.ssh/authorized_keys
4.创建部署和数据目录--
mkdir /home/tidb/tidb-deploy /home/tidb/tidb-data
mkdir /home/tidb/tidb-data
5.解压安装程序
tar -zxvf tidb-community-server-v5.0.2-linux-amd64.tar.gz
sh tidb-community-server-v5.0.2-linux-amd64/local_install.sh
source /home/tidb/.bash_profile
6.编辑部署配置文件
vi topology.yaml
global:
user: "tidb"
ssh_port: 22
deploy_dir: "/home/tidb/tidb-deploy"
data_dir: "/home/tidb/tidb-data"
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
server_configs:
tidb:
log.slow-threshold: 300
tikv:
readpool.storage.use-unified-pool: false
readpool.coprocessor.use-unified-pool: true
pd:
replication.enable-placement-rules: true
replication.location-labels: ["host"]
tiflash:
logger.level: "info"
pd_servers:
- host: admin.dt.com
tidb_servers:
- host: admin.dt.com
tikv_servers:
- host: master.dt.com
port: 20160
status_port: 20180
config:
server.labels: { host: "logic-host-1" }
- host: master.dt.com
port: 20161
status_port: 20181
config:
server.labels: { host: "logic-host-2" }
- host: master.dt.com
port: 20162
status_port: 20182
config:
server.labels: { host: "logic-host-3" }
tiflash_servers:
- host: admin.dt.com
monitoring_servers:
- host: admin.dt.com
grafana_servers:
- host: master.dt.com
7.开始部署
下载安装tiup
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
curl --proto '=https' --tlsv1.2 -sSf http:172.26.160.70:53/install.sh | sh
source .bash_profile
tiup install tikv:v6.1.0
tidb获取sudo权限
vi /etc/sudoers
增加:tidb ALL = NOPASSWD: ALL
输入:wq! 保存
检测: tiup cluster check ./topology.yaml --user tidb [-p] [-i /home/tidb/.ssh/id_rsa]
执行部署命令: tiup cluster deploy tidb-jabil v6.0.0 ./topology.yaml --user tidb
tiup cluster start tidb-jabil --init
启动:tiup cluster start tidb-jabil
tiup cluster check ./topology.yaml --user tidb [-p] [-i /home/tidb/.ssh/id_rsa]
tiup cluster deploy tidb-jabil v6.0.0 ./topology.yaml --user tidb
tiup cluster start tidb-jabil
tiup cluster start tidb-jabil --init
8. 启动/停止 命令 cd /home/tidb
tiup cluster start tidb-jabil
tiup cluster restart tidb-jabil
tiup cluster stop tidb-jabil
9.检查安装成功
可以看到tidb的端口号是4000,pd运维端口是2379。我们通过Navicat这种工具连接数据库是使用4000端口,默认密码为空。
控制台:http://admin.dt.com:2379/dashboard
10.数据迁移界面 模式
/home/tidb/tidb-community-toolkit-v5.0.1-linux-amd64/bin/tidb-lightning --server-mode --status-addr :8289
11.增加TiCDC
tiup cluster scale-out tidb-test scale-out.yaml
查看集群列表
tiup cluster list
查看组件
tiup cluster display tidb-jabil
12.卸载tidb
1、tiup cluster list 查看集群列表
2、tiup cluster destroy tidb-jabil 来进行销毁
13. 配置用户密码
set password for 'root'@'%' = password('jabil@2022');
flush privileges;
14.cdc任务操作
cd /home/tidb
tidb同步到mysql
tiup ctl:v5.0.2 cdc changefeed create --pd=http://admin.dt.com:2379 --sink-uri="mysql://root:root@admin.dt.com:3306/test?max-txn-row=500" --config /home/tidb/moveDataConf/test_to_mysql.toml --changefeed-id="test-to-202-mysql"
tiup ctl:v5.0.2 cdc changefeed query -s --pd=http://admin.dt.com:2379 --changefeed-id="test-to-201-mysql"
查询同步任务
tiup ctl:v5.0.2 cdc changefeed list --pd=http://admin.dt.com:2379
查询 capture 列表
tiup ctl:v5.0.2 cdc capture list --pd=http://admin.dt.com:2379
停止任务
tiup ctl:v5.0.2 cdc changefeed pause --pd=http://admin.dt.com:2379 --changefeed-id "test-to-201-mysql"
tiup ctl:v5.0.2 cdc changefeed pause --pd=http://admin.dt.com:2379 --changefeed-id test-to-201-mysql
tiup ctl:v5.0.2 cdc cli changefeed remove --pd=http://admin.dt.com:2379 --changefeed-id test-to-201-mysql
15. TIDB执行跨库关联脚本
export MYSQL_PWD=root && mysql -uroot -h admin.dt.com -P4000 < 1.sql
1.sql内容:select * from bdweb_db.bi_users b,test.bi_users t where t.u_id=b.u_id
16.数据迁移需要用DM,安装DM
tiup install dm dmctl
查看DM tiup dm list
tiup uninstall dm-test
编辑初始化配置文件topology-dm.yaml , 获取版本号:tiup list dm-master
tiup dm deploy dm-test v2.0.3 /home/tidb/topology-dm.yaml --user tidb [-p] [-i /home/tidb/.ssh/id_rsa]
启动 tiup dm start dm-test
查看dm角色 tiup dm display dm-test
删除部署dm tiup dm destroy dm-test
tiup cluster reload
执行数据迁移命令
1.加载数据源
tiup dmctl --master-addr 192.168.0.20:8261 operate-source create source-mysql.yaml
2.核对加载任务
tiup dmctl --master-addr 192.168.0.20:8261 check-task test-task.yaml
3.启动任务
tiup dmctl --master-addr 192.168.0.20:8261 start-task test-task.yaml
17.监控
http://slave.test.com:3001/
admin/123456
18.执行tidb脚本命令
mysql -h master.test.com -P 4000 -uroot -proot test < /opt/mysql/sql/del_resu.sql >> /opt/log/sql.out
mysql -h master.test.com -P 4000 -uroot -proot test -e " select dt,max(pv) from pvuv_sink GROUP BY dt order by dt desc limit 10 " >> /opt/log/sql.out
19.数据节点操作
tiup ctl:v6.0.0 pd -u admin.test.com:2379 -i
删除某个节点
store delete 1
20. 单独启动某个角色
tiup cluster start tidb-jabil -R tikv
tiup cluster start tidb-jabil -R pd
tiup cluster stop tidb-jabil -R cdc
21.重启节点
tiup dm reload
22.查找日志
cat /home/tidb/tidb-bin/pd-2379/log/pd.log | grep "init cluster id"
cat /home/tidb/tidb-deploy/pd-2379/log/pd.log | grep "init cluster id"
23.禁止开机启动
tiup cluster disable tidb-jabil
24.离线部署v6.1.0
注意权限
chmod -R 777 /df_bigdata_cluster/software/tidb
chmod -R 777 /df_bigdata_cluster/data/tidb
镜像一
tar xzvf tidb-community-server-v6.1.0-linux-amd64.tar.gz && \
sh tidb-community-server-v6.1.0-linux-amd64/local_install.sh && \
source /home/tidb/.bash_profile
镜像二
tar xf tidb-community-toolkit-v6.1.0-linux-amd64.tar.gz
ls -ld tidb-community-server-v6.1.0-linux-amd64 tidb-community-toolkit-v6.1.0-linux-amd64
cd tidb-community-server-v6.1.0-linux-amd64/
cp -rp keys ~/.tiup/
tiup mirror merge ../tidb-community-toolkit-v6.1.0-linux-amd64
验证
tiup cluster check ./topology.yaml --user tidb
部署--需要5分钟左右
tiup cluster deploy tidb-jabil v6.1.0 ./topology.yaml --user tidb
tiup cluster deploy tidb-JBDP v6.1.0 ./topology.yaml --user tidb
启动
第一次初始化
tiup cluster start tidb-JBDP --init
tiup cluster start tidb-JBDP
生产集群命令
tiup cluster restart tidb-JBDP
tiup cluster display tidb-JBDP
tiup cluster stop tidb-JBDP
tiup cluster destroy tidb-JBDP 来进行销毁
重启
tiup cluster restart tidb-JBDP
开始启动
tiup cluster start tidb-JBDP
mkdir /data6/tidb
mkdir /data6/tidb/data
chown -R tidb:tidb /data6/tidb
mkdir /data8/tidb
mkdir /data8/tidb/data
chown -R tidb:tidb /data8/tidb
Started cluster `tidb-JBDP` successfully
The root password of TiDB database has been changed.
The new password is: 'r8=%UT1#2D0W-G7YJ4'.
Copy and record it to somewhere safe, it is only displayed once, and will not be stored.
The generated password can NOT be get and shown again.
25.防火墙设计
需要开启端口
grafana 3001
pd 2379/2380
prometheus 9090/12020
tidb 4000/10080
tiflash 8889/8123/3930/20170/20292/8234
tikv 20160/20180
#pd
firewall-cmd --permanent --add-port=2379/tcp
firewall-cmd --permanent --add-port=2380/tcp
#tidb
firewall-cmd --permanent --add-port=4000/tcp
firewall-cmd --permanent --add-port=10080/tcp
#tikv
firewall-cmd --permanent --add-port=20160/tcp
firewall-cmd --permanent --add-port=20180/tcp
#tiflash
firewall-cmd --permanent --add-port=8889/tcp
firewall-cmd --permanent --add-port=8123/tcp
firewall-cmd --permanent --add-port=3930/tcp
firewall-cmd --permanent --add-port=20170/tcp
firewall-cmd --permanent --add-port=20292/tcp
firewall-cmd --permanent --add-port=8234/tcp
#grafana
firewall-cmd --permanent --add-port=3001/tcp
#prometheus
firewall-cmd --permanent --add-port=9090/tcp
firewall-cmd --permanent --add-port=12020/tcp
firewall-cmd --reload
firewall-cmd --list-port
先停防火墙
service firewalld stop
service firewalld start
26.数据导出备份工具Dumpling
参考:https://www.modb.pro/db/331066
获取 安装
tiup install dumpling
运行
tiup dumpling ...
tiup dumpling \
-u root \
-p jabil@2022 \
-P 4000 \
-h cnhuam0testg84 \
--filetype csv \
--sql 'select * from test.aps_rt_sink' \
-t 8 \
-o /df_bigdata_cluster/data/tidb/test \
-r 200000 \
-F 256MiB
tiup dumpling \
-u root \
-p jabil@2022 \
-P 4000 \
-h cnhuam0testg84 \
--filetype sql \
-t 8 \
-o /df_bigdata_cluster/data/tidb/test \
-r 200000 \
--filter "test.*" \
-F 256MiB
27.tidb-lightning和dumpling安装
tiup install tidb-lightning
chmod +x tidb-lightning
tiup tidb-lightning -h
数据导入 nohup tiup tidb-lightning -config eam-lightning.toml > nohup.out 2>&1 &
nohup /data13/tidb/deploy/tidb-community-toolkit-v6.1.0-linux-amd64/tidb-lightning -config /data13/tidb/lighting/import-conf/whirlpool-lightning.toml > /data13/tidb/tmp/nohup.out &
scp -r /data13/tidb/deploy/duming-data/esss tidb@10.114.19.66:/df_bigdata_cluster/software/tidb/tmp-data
scp -r /data13/tidb/duming-data/sbs tidb@10.114.19.66:/df_bigdata_cluster/software/tidb/tmp-data
#!/bin/bash
tiup install tidb-dumpling
数据导出:
/home/tidb/.tiup/bin/tiup dumpling -u root -p jabil@2022 -P 4000 -h cnhuam0testg84 --filetype sql -t 8 -o /df_bigdata_cluster/data/tidb/bakData/whirlpool -r 200000 --filter "whirlpool_db.*" -F 256MiB >>/df_bigdata_cluster/data/tidb/log/test.out
/home/tidb/.tiup/bin/tiup dumpling -u esss -p esss#20230817 -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/deploy/duming-data/esss -r 200000 --filter "esss_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
/home/tidb/.tiup/bin/tiup dumpling -u eam_prod -p eam#20240125 -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/deploy/duming-data/eam -r 200000 --filter "eam_prod_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
/home/tidb/.tiup/bin/tiup dumpling -u sbs -p sbs@20230303 -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/duming-data/sbs -r 200000 --filter "sbs_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
/home/tidb/.tiup/bin/tiup dumpling -u valeo -p vdb#20230608 -P 4001 -h cnhuam0dp06 -o /data13/tidb/duming-data/valeo/ods_aoi_image -r 200000 --filetype csv --sql 'select * from valeo_db.ods_spi_detail_camx_history where create_time>date_sub(curdate(),interval 47 day) and create_time< date_sub(curdate(),interval 18 day)' -F 100MiB --output-filename-template 'ods_spi_detail_camx.{{.Index}}'
/home/tidb/.tiup/bin/tiup dumpling -u valeo -p vdb#20230608 -P 4001 -h cnhuam0dp06 -o /data11/apps/spark/duming-data/valeo/ods_aoi_image2 -r 200000 --filetype csv --sql 'select * from valeo_db.ods_aoi_image where create_time>date_sub(curdate(),interval 55 day) and create_time< date_sub(curdate(),interval 50 day)' -F 100MiB --output-filename-template 'ods_aoi_image.{{.Index}}'
scp -r tidb@10.114.19.66:/df_bigdata_cluster/data/tidb/bakData/whirlpool /data13/tidb/lighting/dumper-data/whirlpool
28.在线扩容
tiup cluster scale-out tidb-JBDP scale-out-tidb.yaml -uroot -p
scale-out-tidb.yaml内容:
tidb_servers:
- host: 10.0.1.5
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /data/deploy/install/deploy/tidb-4000
log_dir: /data/deploy/install/log/tidb-4000
检查集群状态
tiup cluster display tidb-JBDP
参考:https://blog.csdn.net/weixin_42241611/article/details/126785954
29.设置系统最大打开文件数(不需要重启系统)
vi /etc/security/limits.conf, tidb配置数一定小于等于系统配置数,系统配置数一定要高否则没用添加:
# 进程线程数
* soft nproc 131072
* hard nproc 131072
# 文件句柄数
* soft nofile 1000000
* hard nofile 1000000
# 内存锁定交换
* soft memlock unlimited
* hard memlock unlimited
二.运维问题
1.报错:Error: Failed to initialize TiDB environment on remote host 'admin.dt.com' (task.env_init.failed)
caused by: Failed to create '~/.ssh' directory for user 'tidb'
caused by: Failed to execute command over SSH for 'tidb@admin.dt.com:22'
caused by: Process exited with status 1
解决办法:每台机器都加
chmod 470 /etc/sudoers
vi /etc/sudoers
添加:tidb ALL = NOPASSWD: ALL
chmod 440 /etc/sudoers
2.Starting component `ctl`: /home/tidb/.tiup/components/ctl/v5.0.1/ctl v5.0.1
Error: ctl need an explicit version, please run with `tiup ctl:<cluster-version>`
Error: run `/home/tidb/.tiup/components/ctl/v5.0.1/ctl` (wd:/home/tidb/.tiup/data/SZzAnSw) failed: exit status 1
解决办法: 运行
mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p
3.[CDC:ErrMySQLConnectionError]Error 1298: Unknown or incorrect time zone: 'Asia/Shanghai'
windows下mysql时区设置
修改的my.ini配置文件,在MySQL的安装根目录下
[mysqld]
4.Error: failed to start cdc: failed to start: master.dt.com cdc-8300.service, please check the instance's log(/home/tidb/tidb-deploy/cdc-8300/log) for more detail.: timed out waiting for port 8300 to be started after 2m0s
Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2021-06-13-17-01-07.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.5.0/tiup-cluster` (wd:/home/tidb/.tiup/data/SaCg628) failed: exit status 1
解决办法: 重启机器
没有 leader 的 region
tiup pd-ctl -u http://admin.test.com:2379 -d region --jq '.regions[]|select(has("leader")|not)|{id: .id,peer_stores: [.peers[].store_id]}'
副本丢失的 region
pd-ctl -u http://admin.test.com:2379 region --jq=".regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length != 2)}" // 当前环境中 max-replicas 参数为 2
5.路径不对报:is not a valid SemVer string
6.初始化报:Failed to create new system user 'tidb' on remote host
远程无密码账户不对
7. 启动dm初始化报fail to validate embed etcd config, RawCause: expected IP in URL for binding
解决办法:配置文件不能用机器名只能是IP地址
8.启动集群报错:Error: failed to start tidb: failed to start: admin.test.com tidb-4000.service, please check the instance's log
解决办法:再次启动就ok
9.启动TIDB集群报:failed to start tikv: failed to start:
解决办法:宿主机重启共享网络
10.启动 TiKV 一直报 PD 节点不通,retrying of unary invoker failed
解决办法: 重启节点 tiup dm reload ,最坏的办法销毁集群,重新部署,但先做好数据备份。
11. failed to start grafana: failed to start: slave.test.com grafana-3001.service, please check the instance's
解决办法:一般是网络问题,等待
12.PD报错:["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=admin.test.com:2379]
[2022/06/28 14:26:57.683 +08:00] [INFO] [util.rs:547] ["connecting to PD endpoint"] [endpoints=master.test.com:2379]
tikv报错:[ERROR] [etcdutil.go:71] ["failed to get cluster from remote"] [error="[PD:etcd:ErrEtcdGetCluster]could not retrieve cluster information from the given URLs: could not retrieve cluster information from the given URLs"]
解决办法:安装时不能用域名,只能改机器名
13.镜像报错:-bash: tsh: command not found
解决办法:复制官网的镜像语句再执行
ssh_stderr: bash: /tmp/tiup/bin/insight: Permission denied
cat /etc/fstab
umount /tmp
mount /dev/mapper/root_vg-tmp /tmp
14. 安装前检测cpu-governor Fail :CPU frequency governor is powersave, should use performance
解决办法:执行 cpupower frequency-set --governor performance
15. 安装前检测 Fail numactl not usable, bash: numactl: command not found
解决办法:执行 yum -y install numactl.x86_64
16.10.114.26.111 network Fail network speed of eno6 is 100MB too low, needs 1GB or more
17.The component `dumpling` not found (may be deleted from repository); skipped
解决办法:没有把tidb-community-toolkit-v6加入到tiup, 按照镜像2操作: tiup mirror merge ../tidb-community-toolkit-v6.1.0-linux-amd64
18. 创建表1055 ,group 报错
查看:show variables like '%sql_mode%'
set global sql_mode ='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
原 global sql_mode ='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
需要重启 tidb
19.TIDB-Error 1105: Out Of Memory Quota问题解决
server_configs:
tidb:
mem-quota-query: 10737418240# 修改大小,
这个配置值默认为 1GB,
20.报错:Transaction is too large(交易量太大)
修改方法:
performance.txn-total-size-limit: 10737418240 -->34359738368在tidb下加此参数
txn-total-size-limit
TiDB 单个事务大小限制
默认值:104857600
单位:Byte
单个事务中,所有 key-value 记录的总大小不能超过该限制。该配置项的最大值不超过 10737418240(表示 10GB)。
注意,如果使用了以 Kafka 为下游消费者的 binlog,如:arbiter 集群,该配置项的值不能超过 1073741824(表示 1GB),因为这是 Kafka 的处理单条消息的最大限制,超过该限制 Kafka 将会报错。
21.重启SSH报错:ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods re "errorVerbose": "ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey],
解决办法:目标机器authorized_keys缺少密钥对,开始部署集群时,tiup就自己生成了一对密钥对与普通SSH不同 ,需要拷贝到目标机的authorized_keys;
22.tidb删除大量数据报错:ERROR 1105 (HY000) at line 1: Out Of Memory Quota![conn_id=676956115087820587]
解决办法:修改参数
server_configs:
tidb:
mem-quota-query: 4294967296 # 修改大小
23.tidb后台链接报错:MySQLNonTransientConnectionException: Could not create connection to database server
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.]
2023-12-25 17:48:00.209 INFO 6897 --- [o-8089-exec-221] com.jabil.datax.utils.DBUtil : 异常连接URL:jdbc:mysql://cnhuam0dp06:4001?serverTimezone=GMT%2B8&useUnicode=true&characterEncoding=UTF-8,用户名:apiWS,密码:api@2022
2023-12-25 17:48:02.209 ERROR 6897 --- [o-8089-exec-221] com.jabil.datax.utils.RetryUtil : Exception when calling callable, 即将尝试执行第2次重试.本次重试计划等待[2000]ms,实际等待[2000]ms, 异常Msg:[Code:[MYSQLErrCode-02],
Description:[数据库服务的IP地址或者Port错误,请检查填写的IP地址和Port或者联系DBA确认IP地址和Port是否正确。如果是同步中心用户请联系DBA确认idb上录入的IP和PORT信息和数据库的当前实际信息是一致的].
解决方案是:
查询数据库没有关闭链接,释放资源
//执行完查询关闭资源 DBUtil.closeDBResources(null, null, conn);
24. ERROR 8242 (HY000): 'Drop Table' is unsupported on cache tables 缓存表不能删除
解决方案是: 先转化普通表再删除执行:ALTER TABLE dwd_indicator NOCACHE
25.tidb lightning encountered error: [Lightning:PreCheck:ErrSystemRequirementNotMet]system requirement not met: the maximum number of open file descriptors is too small, got 1024, expect greater or equal to 12590: operation not permitted
解决方案是: 每台执行 ulimit -HSn 1000000
26.导入数据报:Cluster doesn't have too many empty regions
解决方案是:
27. tidbit tiup无法下达指令 报:cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster
解决方案是:检测主机配置文件:/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/config-cache 和从机:/data13/tidb/deploy/tidb-4000/scripts 等
对应目录是否存在
28.链接数据库4000失败,报:rpc error: code = Canceled desc = grpc: the client connection is closing
解决方案是:tidb端口4000被占用,杀手进程后自动启动
29.tikv缩容后,节点不在集群中,但对应的tikv仍然在启动状态,需要删除节点,找到对应的store id 执行删除
解决方案是: /home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete 1
30.[FATAL] [server.rs:950] ["failed to start node: StoreTombstone(\"store is tombstone\")"]
解决方案是:删除对应storeid
31. 下线失败使用 ,先weight为0,然后删除
解决方案是:
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store weight 1 0 0
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store weight 183060412 0 0
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store weight 183062372 0 0
删除
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store remove-tombstone 1
curl -X POST http://10.114.26.111:2379 /pd/api/v1/store/1/state?state=Up
32.发现store 永远处于Down无法用store delete 删除,执行如下命令
解决方案是:/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores store_id
33.每次重启tidb时已经tombstone的实例又被启动,需要删除
解决方案是: tiup cluster prune tidb-JBDP
34.如果启动tidb超时,而中断其他服务不能正常启动,例如:timed out waiting for port 4000 to be started after 2m0s
解决方案是: tiup cluster start tidb-JBDP
35.系统健康检查失败集群中未启动必要组件 `NgMonitoring`,部分功能将不可用。
解决方案是: tiup cluster start tidb-JBDP
三.TIDB运维命令
1.2021/6/7
https://asktug.com/t/topic/92780
2.问题搜索
https://search.asktug.com
3.导入导出语句
导入表结构先创建表
tar -zxvf tidb-lightning-${version}-linux-amd64.tar.gz
tidb-lightning -d "mysql://user:password@IP:port/DBName" -t "tidb://user:password@IP:port/DBName
mysql -h 10.114.17.165 -P 4002 -uroot -pjabil@2022
use sbs_db;
/data13/tidb/duming-data/sbs_db-20240528.sql;
拷贝导出数据
scp -r tidb@cnhuam0dp01:/data13/tidb/duming-data/sbs_db /df_bigdata_cluster/data/tidb/bakData/lightData
4.导入测试环境
nohup /df_bigdata_cluster/software/tidb/tidb-community-server-v6.1.0-linux-amd64/tidb-lightning -config /df_bigdata_cluster/data/tidb/bakData/lightData/sbs_db/sbs-lightning.toml > /df_bigdata_cluster/data/tidb/bakData/lightData/logs/nohup.out &
nohup /df_bigdata_cluster/software/tidb/tidb-community-server-v6.1.0-linux-amd64/tidb-lightning -config /df_bigdata_cluster/data/tidb/bakData/lightData/conf/sbs-lightning.toml > /df_bigdata_cluster/data/tidb/bakData/lightData/logs/nohup.out &
5. 导出数据表结构
mysqldump -h cnhuam0dp09 -P 4001 -d sbs_db -uroot -pjabil#202210 >/data13/tidb/duming-data/sbs_db-20240528.sql
mysqldump -h cnhuam0dp09 -P 4001 -d valeo_db -uroot -pjabil#202210 >/data13/tidb/duming-data/valeo_db-20240528.sql
mysqldump -h cnhuam0dp09 -P 4001 -d bmwcss_db -uroot -pjabil#202210 >/data13/tidb/duming-data/bmwcss_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d dfengsys_prod_db -uroot -pjabil#202210 >/data13/tidb/duming-data/dfengsys_prod_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d eam_db -uroot -pjabil#202210 >/data13/tidb/duming-data/eam_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d eam_prod_db -uroot -pjabil#202210 >/data13/tidb/duming-data/eam_prod_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ehs_hcas_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ehs_hcas_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d esss_db -uroot -pjabil#202210 >/data13/tidb/duming-data/esss_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d etask_db -uroot -pjabil#202210 >/data13/tidb/duming-data/etask_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d hcss_db -uroot -pjabil#202210 >/data13/tidb/duming-data/hcss_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ic_sap_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ic_pro_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ic_sap_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ic_sap_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d ie_ucs_db -uroot -pjabil#202210 >/data13/tidb/duming-data/ie_ucs_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d iot_db -uroot -pjabil#202210 >/data13/tidb/duming-data/iot_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d mat_fuc_db -uroot -pjabil#202210 >/data13/tidb/duming-data/mat_fuc_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d pub_db -uroot -pjabil#202210 >/data13/tidb/duming-data/pub_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d wef_db -uroot -pjabil#202210 >/data13/tidb/duming-data/wef_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d wic_db -uroot -pjabil#202210 >/data13/tidb/duming-data/wic_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d soea_db -uroot -pjabil#202210 >/data13/tidb/duming-data/soea_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d screw_db -uroot -pjabil#202210 >/data13/tidb/duming-data/screw_db-20240531.sql
mysqldump -h cnhuam0dp09 -P 4001 -d scada_db -uroot -pjabil#202210 >/data13/tidb/duming-data/scada_db-20240531.sql
hdfs数据上传备份
hdfs dfs -put *.sql /user/tidb/dbCreateSQL
导出表数据
/home/tidb/.tiup/bin/tiup dumpling -u $userName -p $passwd -P 4001 -h cnhuam0dp09 -o /data13/tidb/duming-data/$dbName/$exportTabName/$partYearMonth -r 200000 --filetype csv --sql "select * from $tidb_tab where $fiterKeyName>=date_sub(curdate(),interval $fromDays day) and $fiterKeyName<= date_sub(curdate(),interval $toDays day)" -F 200MiB --output-filename-template "$4.{{.Index}}" >>/data13/tidb/logs/hue_dumpling.out
导出表数据
/home/tidb/.tiup/bin/tiup dumpling -u root -p jabil#202210 -P 4001 -h cnhuam0dp09 --filetype sql -t 8 -o /data13/tidb/duming-data/sbs_db -r 200000 --filter "sbs_db.*" -F 256MiB >>/data13/tidb/logs/dump_data.out
6.备份数据库
导出数据
/home/tidb/.tiup/bin/tiup dumpling -u hcss -p hcss -P 4001 -h cnhuam0dp06 --filetype sql -t 8 -o /data13/tidb/dbBak/hcss_20231128 -r 200000 --filter "hcss_db.*" -F 256MiB >>/data13/tidb/logs/hcss.out
zip -r /data13/tidb/dbBak/hcss_20231128.zip /data13/tidb/dbBak
7.强制启动服务
tiup cluster start tidb-JBDP -R tidb
tiup cluster restart tidb-JBDP -R tiflash
tiup cluster restart tidb-JBDP -R node_exporter
tiup cluster restart tidb-JBDP -R prometheus
tiup cluster start tidb-JBDP -R tikv
tiup cluster stop tidb-JBDP -R tikv
tiup cluster start tidb-JBDP -R prometheus
tiup cluster start tidb-JBDP -R tiflash
tiup cluster start tidb-JBDP -R grafana
tiup cluster start tidb-JBDP -R NgMonitoring
tiup cluster display tidb-JBDP
03强制启动某个ID
tiup cluster start tidb-JBDP
tiup cluster start tidb-JBDP --node 10.114.26.104:20163
tiup cluster start tidb-JBDP --node 10.114.26.113:9100
tiup cluster restart tidb-JBDP -R node_exporter
四.优化调优
1.表级别,开启tiflash列存储,列存储统计计算快
针对大表才开启,例如:
ALTER TABLE soea_db.ods_product_bay_data SET TIFLASH REPLICA 1;
ALTER TABLE qmp_db.InternalAuditPlanReport SET TIFLASH REPLICA 1;
ALTER TABLE tb_equ_moniter_record_2022 SET TIFLASH REPLICA 1;
ALTER TABLE screw_db.dwd_tighten_basic SET TIFLASH REPLICA 1;
ALTER TABLE soea_db.ods_product_bay_data SET TIFLASH REPLICA 1;
取消
ALTER TABLE screw_db.dwd_tighten_basic SET TIFLASH REPLICA 0;
查看:
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'soea_db' and TABLE_NAME = 'ods_product_bay_data';
开启tiflash
ALTER TABLE tab SET TIFLASH REPLICA 1; #开启数据同步
set @@session.tidb_allow_mpp=1; #开启MPP计算
set @@session.tidb_enforce_mpp=1;
2.大批量删除数据失败时用
Delete from t where xx limit 5000(过滤条件)
3.配置参数调优
1. 数据同步:Transaction is too large, size: 104857819 With statement:
tiup cluster edit-config tidb-JBDP
tiup cluster show-config tidb-JBDP
tiup cluster show-config tidb-jabil
tiup cluster edit-config tidb-jabil
#修改配置参数:
performance.txn-total-size-limit: 10737418240 ## 单个事务大小限制,例如:一致性导数数据大小限制
重启生效:
tiup cluster reload tidb-JBDP -R tidb
tiup cluster stop tidb-JBDP -R tidb
2. #不超过机器内存的45%
storage.block-cache.capacity: 32GB
#重启生效:
tiup cluster reload tidb-JBDP -R tikv
3. 配置临时计算存储位置 tidb:
tmp-storage-path: /data13/tidb/tmpStorage
performance.txn-total-size-limit: 10737418240
4.日志保存天数
log.file.max-days: 30
4.处理函数长度上限问题:
修改GROUP_CONCAT函数存储字符串的最大长度
默认GROUP_CONCAT函数返回的结果大小被MySQL默认限制为1024(字节)的长度。
SET GLOBAL group_concat_max_len = 102400;
select * from mysql.GLOBAL_VARIABLES where variable_name = ‘group_concat_max_len’;
echo PubkeyAcceptedKeyTypes=+ssh-rsa >> /etc/ssh/sshd_config
systemctl restart sshd
重新加载SSH
tiup cluster reload --ssh system tidb-JBDP
6.清理日志
tiup cluster clean tidb-jabil --log
tiup cluster stop tidb-jabil
tiup cluster start tidb-jabil
tiup cluster clean tidb-JBDP --log
7.如果出现一次启动失败可以了接连tiup cluster start tidb-JBDP,直到成功为止
8.出现创建和删除表失败很有可能是日志积累太多需清除日志,日志需定期清理
tiup cluster clean tidb-JBDP --log
再重启
tiup cluster start tidb-JBDP
tiup cluster reload tidb-JBDP --role tikv --nodes 10.114.26.112
systemctl start tikv-server
tiup cluster reload tidb-JBDP
9.查看进程状态
SHOW PROCESSLIST
SELECT * FROM information_schema.processlist WHERE COMMAND='Sleep' ORDER BY db
50条以下比较健康 不然汇报错:communications link failure
10.某台服务器IO占很高,后台tikv日志报错:Error starting HTTP server" err="accept tcp [::]:9115: accept4: too many open files in system
解决方案是: vi /etc/security/limits.conf, tidb配置数一定小于等于系统配置数,系统配置数一定要高否则没用添加:
# 进程线程数
* soft nproc 131072
* hard nproc 131072
# 文件句柄数
* soft nofile 1000000
* hard nofile 1000000
# 内存锁定交换
* soft memlock unlimited
* hard memlock unlimited
11.查看系统IO使用情况
yum install -y sysstat
查看IO:iostat
实时:dumpling
iostat -d -m 2
配置文件
vi /home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/meta.yaml
执行 tiup cluster reload tidb-JBDP
/data13/tidb/deploy/tikv-20161/bin/tikv-server > /data11/tidb/deploy/tikv-20161/tikv.log 2>&1 &
/data13/tidb/deploy/tikv-20162/bin/tikv-server > /data12/tidb/deploy/tikv-20162/tikv.log 2>&1 &
/data13/tidb/deploy/tikv-20163/bin/tikv-server > /data13/tidb/deploy/tikv-20163/tikv.log 2>&1 &
12.手动启动tidb服务
tidb
112
/data13/tidb/deploy/tidb-4000/bin/tidb-server >/data13/tidb/deploy/tidb-4000/log/tidb.log 2>&1 &
105
/data13/tidb/deploy/tidb-4000/bin/tidb-server >/data13/tidb/deploy/tidb-4000/log/tidb.log 2>&1 &
/data13/tidb/deploy/tikv-server-20162 start
/data13/tidb/deploy/tikv-20162/bin/tikv-server start
/data13/tidb/deploy/tikv-20163/bin/tikv-server stop
systemctl list-unit-files --type=service --state=disabled
systemctl is-enabled tikv-20163
systemctl disable tikv-20163
13.集群丢失数据
https://zhuanlan.zhihu.com/p/616686144
tiup ctl:v6.1.0 pd -u https://10.114.26.111:2379 --cacert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/ca.crt --key=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.pem --cert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.crt -i
pd -u https://10.114.26.111:2379 --cacert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/ca.crt --key=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.pem --cert=/home/tidb/.tiup/storage/cluster/clusters/tidb-JBDP/tls/client.crt -i » unsafe
https://www.cnblogs.com/vansky/p/9415551.html
14.查看tikv实例详细操作
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store 5
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 member leader show
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete 2
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store remove-tombstone 183060412
添加evict调度方式驱逐待下线store上的Leader
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 scheduler add evict-leader-scheduler 183060412
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete 183060412 183062372
取消
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store cancel-delete 1
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store delete
17.删除分区
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 operator remove-region 185889957
18.查看tidb配置参数
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config show
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores show
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores 1
详细
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config show all
19.实例和分区操作
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 unsafe remove-failed-stores -s 183060412 -r 185889957
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 operator add remove-peer 185889957 183060412
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 operator add remove-peer 4743 183060412
20.查看实例(store)对应的分区存储
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq='.regions[]|select(has("leader")|not)|{id:,peer_stores: [.peers[].store_id]}'
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 store state_name 1 tombstone
查看分区详情
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region 159869
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config show|grep limit > limit.config
21.设置region-schedule
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config set region-schedule-limit 0
pd-ctl config set region-schedule-limit 0
pd-ctl config set region-schedule-limit 0
22.应用JQ脚本查询
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(1) then . else empty end) | length>=$total-length)}' >>tt3.json
查看副本不足的
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(183062372,183060412,1,14,8,5,2,183,184,16,9,15,17,179394555,179397408,179393803,179395662) then . else empty end) | length>=$total-length)}' >> tt5.json
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq=".regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(1,14,8,5,2,183,184,16,9,15,17,179394555,179397408,179393803,179395662) then . else empty end) | length>=$total-length)}"
查看没有leader的分区
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region --jq '.regions[] | select(.leader.store_id == null) | {id: .id, peer_stores: [.peers[].store_id], leader_store: .leader.store_id}' > noleader.json
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 region store 1
23.删除存在下架store上的分区(每台tikv都要执行),当tidb无法启动时使用会丢失数据
主机
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /data11/tidb/data/tikv-20161 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /data12/tidb/data/tikv-20162 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /data13/tidb/data/tikv-20163 unsafe-recover remove-fail-stores -s 1 --all-regions
从机远程拷贝工具
scp -r /home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl tidb@cnhuam0dp03:/home/tidb/tikv-ctl
/home/tidb/tikv-ctl --data-dir /data11/tidb/data/tikv-20161 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/tikv-ctl --data-dir /data12/tidb/data/tikv-20162 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
/home/tidb/tikv-ctl --data-dir /data13/tidb/data/tikv-20163 unsafe-recover remove-fail-stores -s 183060412,183062372 --all-regions
停止tikv
curl http://10.114.26.111:10080/regions/23209844
tiup cluster prune tidb-JBDP
/home/tidb/.tiup/components/ctl/v6.1.0/tikv-ctl --data-dir /df_bigdata_cluster/data/tidb/tidb-data/tikv-20160/ bad-regions
set region-schedule-limit 0
/home/tidb/.tiup/components/ctl/v6.1.0/pd-ctl -u http://10.114.26.111:2379 config set max-replicas 1
tiup cluster start tidb-JBDP -N 10.114.26.112:20161
ps -ef | grep tikv-server | grep -v grep | awk '{print $2}' | xargs kill -9
ps -ef | grep pd-server | grep -v grep | awk '{print $2}' | xargs kill -9
tiup mirror clone tiup-custom-mirror-6.1.0 --tiup mirror-6.1.0
tiup mirror grant $USER
cp -rp /home/tidb/tidb-community-server-6.1.0-linux-amd64/keys /home/tidb/.tiup/
tiup mirror merge /home/tidb/tidb-community-server-6.1.0-linux-amd64
tiup mirror clone /home/tidb/tidb-community-server-6.1.0-linux-amd64 --full
tiup mirror clone tiup-custom-mirror-v6.1.0 --PCC 1.0.1 --os=linux --arch=amd64
tiup mirror merge /home/tidb/tiup-custom-mirror-v6.1.0
tiup install PCC
17.检测安装tidb合格线--调优
Node Check Result Message
---- ----- ------ -------
10.114.26.113 os-version Pass OS is CentOS Linux 7 (Core) 7.9.2009
10.114.26.113 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.114.26.113 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.114.26.113 sysctl Fail vm.swappiness = 1, should be 0
10.114.26.113 selinux Pass SELinux is disabled
10.114.26.113 thp Fail THP is enabled, please disable it for best performance
10.114.26.113 cpu-cores Pass number of CPU cores / threads: 64
10.114.26.113 memory Pass memory size is 262144MB
10.114.26.113 network Pass network speed of bond0 is 2000MB
10.114.26.113 network Pass network speed of eno5 is 1000MB
10.114.26.113 network Pass network speed of eno6 is 1000MB
10.114.26.113 network Pass network speed of eno7 is 1000MB
10.114.26.113 network Pass network speed of eno8 is 1000MB
10.114.26.113 command Pass numactl: policy: default
10.114.26.113 timezone Pass time zone is the same as the first PD machine: Asia/Shanghai
10.114.26.113 disk Warn mount point /data11 does not have 'noatime' option set
10.114.26.113 service Fail service firewalld is running but should be stopped
10.114.26.113 swap Warn swap is enabled, please disable it for best performance
查看:cat /proc/sys/fs/file-nr
vi /etc/sysctl.conf
fs.file-max=1000000
vm.swappiness=0
net.ipv4.tcp_syncookies=0
sysctl -p
18.tidb扩容和缩容优化
查看
检测
tiup cluster check tidb-JBDP scale-out.yml --cluster --user tidb
扩容
注意:检查扩容文件对应的路径权限是否是tidb的,端口是否已被使用
tiup cluster scale-out tidb-JBDP scale-out-20161.yml
tiup cluster scale-out tidb-JBDP scale-out-20162.yml
缩容
tiup cluster start tidb-JBDP --node 10.114.26.104:20163
tiup cluster stop tidb-JBDP --node 10.114.26.104:20163
tiup cluster scale-in tidb-JBDP --node 10.114.26.103:20163
tiup cluster scale-in tidb-JBDP --node 10.114.26.112:20162 --force
tiup cluster scale-in tidb-JBDP --node 10.114.26.112:20161 --force
tiup cluster scale-in --force -N
tiup cluster scale-in prod-cluster --node 192.168.4.18:20160 --force
注意:下线时间可能12小时以上。
tiup cluster restart tidb-JBDP --node 10.114.26.112:20161
修复tikv
tiup cluster start tidb-JBDP -N 10.114.26.112:20162
curl --proto =https --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh 7 | sh
tiup ctl:v1.14.1 pd -u http://10.114.26.111:2379 store
tiup cluster force-leader tidb-JBDP 10.114.26.112:20161
tiup cluster force-leader tidb-JBDP --node
启动
tiup tidb-JBDP restart tidb-JBDP 10.114.26.112:20161
19,看下tikv线程的参数 cat /proc/49897/limits
tidb优化方案措施
1.对于大表数据量超过100万最好建时间索引,检索数据最好带上时间过滤条件,避免全表搜索;
2.做表与表关联时尽量先过滤条件后关联;
3.用union all替换union,尽量避免使用in .like
4.尽量避免使用join关联
5.如果有大量批量计算和分析时,数据导入sparkSQL用spark做批量计算;
20.检测tidb卡死,队列等待情况
检测tidb 当前作业
admin show ddl jobs;
取消作业,
ADMIN CANCEL DDL JOBS 1887486;
21. 设置最大连接数
SET GLOBAL max_connections = 350;
show variables like '%max_connections%'
注意和haproxy的 maxconn 1024 #最大连接数 对应的1024约等于3*max_connections
注意haproxy的 maxconn(可能多个)不能超过2048.否则tidb扛不住
22.设置每条SQL默认最大内存大小
#SET GLOBAL tidb_mem_quota_query ='10737418240';
show variables like '%tidb_mem_quota_query%';
相关推荐
【PingCAP】TiDB+在商业银行运维实践.pdf
TiDB Operator 是由 PingCAP 开发的一个 Kubernetes 扩展,专为自动化 TiDB 集群在 Kubernetes 上的部署、管理和运维而设计。它的主要特点包括: 1. 可以启动和管理多个 TiDB 集群。 2. 安全地扩展 TiDB 集群,无需...
4. 部署运维工具: TiDB 4.0 引入了部署运维工具,能够简化数据库部署和运维工作。 TiDB 的事务机制 1. 事务隔离级别: TiDB 支持多种事务隔离级别,能够满足不同应用需求。 2. 事务并发控制: TiDB 采用多种事务...
"藏经阁-TiDB在Kubernetes平台的自动化运维实践" 藏经阁-TiDB在Kubernetes平台的自动化运维实践.pdf文件提供了TiDB在Kubernetes平台上的自动化运维实践经验。本文档主要介绍了TiDB在Kubernetes平台上的部署、管理和...
《TiDB in Action》这本书是关于...总的来说,《TiDB in Action》是理解TiDB技术栈、实施高效运维和开发分布式数据库应用的宝贵资源。无论是对数据库管理员、开发者还是对分布式系统感兴趣的读者,都能从中受益匪浅。
- TiDB Operator:用于 Kubernetes 上的 TiDB 集群自动化部署和运维。 二、 MYSQL 介绍 1. MySQL 是什么? MySQL 是一款开源的关系型数据库管理系统,广泛应用于 Web 应用、互联网服务等领域,以其高性能、易用性...
在Kubernetes平台上,TiDB的自动化运维实践主要通过Kubernetes Operator来实现。Operator是Kubernetes的一种扩展机制,它允许开发人员定义和管理复杂的、有状态的应用程序,如TiDB。TiDB-Operator是专门针对TiDB设计...
1. **水平扩容和缩容**:TiDB的存储计算分离架构使得在线扩容和缩容变得简单,只需一键操作,不影响业务运行,运维人员可以轻松管理。 2. **金融级高可用**:TiDB采用多副本存储,并利用Multi-Raft协议保证事务的强...
TiUP 是 TiDB 4.0 版本引入的集群运维工具,专门用于简化 TiDB 集群的管理和维护任务。本文将深入讲解 TiDB 的整体架构以及使用 TiUP 进行离线部署的步骤。 1. **TiDB 整体架构** - **TiDB Server**:作为数据库...
TiDB Operator是为TiDB定制的Kubernetes operator,用于简化TiDB集群的运维。TiDB Operator可以管理TiDB集群的状态,实现集群的自动化部署、升级、扩缩容以及备份恢复等。 在“架构师大会”的背景下,还可能涉及到...
TiDB Ansible 是一个自动化部署和管理 TiDB 集群的工具,它极大地简化了在各种环境下的 TiDB 集群安装和运维过程。TiDB 是一款分布式 NewSQL 数据库,支持 HTAP(混合事务/分析处理)场景,具有强一致性和高可用性,...
通过这样的集成,运维团队可以更好地理解和优化TiDB集群的性能,及时发现和解决问题,确保服务的稳定运行。 总之,“grafana-TiDB.rar”是一个包含了配置Grafana展示Prometheus抓取的TiDB监控数据所需资源的压缩包...
TiDB 是 PingCAP 公司基于 Google Spanner / F1 论文实现的开源分布式 NewSQL 数据库。 TiDB 具备如下 NewSQL 核心...它也让开发运维人员不用关注数据库 Scale 的细节问题,专注于业务开发,极大的提升研发的生产力。
TiDB在Kubernetes平台自动化部署运维实践.pptx
TiDB 是 PingCAP 公司基于 Google Spanner / F1 论文实现的开源分布式 NewSQL 数据库。 TiDB 具备如下 NewSQL 核心...它也让开发运维人员不用关注数据库 Scale 的细节问题,专注于业务开发,极大的提升研发的生产力。
《TiDB中文技术手册》是针对分布式数据库TiDB的一份详尽指南,旨在帮助开发者、运维人员以及数据库管理员深入理解并有效地使用TiDB。TiDB(全称:Tidb-In-Memory Database)是一个开源的NewSQL数据库,设计灵感来源...
通过实践,读者可以更深刻地理解TiDB的工作原理,并掌握实际运维中的技巧。 TiDB的架构设计是其关键亮点。它采用了分片(Sharding)技术,将大型表分散到多个节点上,每个节点运行一个TiKV存储引擎,负责一部分数据...
它可以与Kubernetes等容器编排系统无缝集成,简化了运维流程,同时也支持多租户管理和资源隔离。 **性能提升** 在v5.4.1版本中,TiDB可能包含了对查询优化器、存储引擎、网络通信等方面的性能优化,以提供更快的...
推荐书籍TiDB原理探究TiDB部署运维部署前环境监测 Ansible工具TiUP工具TiDB周边生态TiUP工具套件备份恢复与数据迁移TiDB安全TiDB调优相关普罗米修斯监控与Grafana可视化 SQL生命周期与关键指标调优参数归纳与实践...