- 浏览: 243519 次
最新评论
ERROR: invalid page header in block 27073 of relation base/21078/45300926
- 博客分类:
- postgresql
突然断网,检查后通知我们UPS断电,db所在主机重启
1、连上后,发现pg主从不同步,主不向从传日志,从报错:
FATAL: could not connect to the primary server: could not connect to server:
发现从先启动成功,而主是后启动的,因此我们将从再次重启
service postgresql restart
开始正常传日志
2、过了一会,研发反应部分表的使用出现问题,主再次不传输日志,且无sender进程。
查看主库日志,发现报错:
ERROR: invalid page header in block 27073 of relation base/21078/45300926
查看从库日志,发现类似报错
ERROR:insert_t_black:invalid page head in block 27073 of relation base/21078/45300926
21078是tm_samples库的id号
通过查询:
select * from pg_class where relfilenode='45300926';—t_black
应该是停电的时候,正好在进行数据的写入等操作,造成了数据损坏。
此时,我们进行了如下操作:
set zero_damaged_pages = on; --当这个参数为on的时候,会忽略所有数据有损坏的页面
vacuum full t_black;
tm_samples=# select count(*) from t_black;
WARNING: invalid page header in block 27069 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27070 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27071 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27072 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27073 of relation base/21078/45300926; zeroing out page
count
---------
3066302
(1 row)
reindex table t_black;
再次select count(*) from t_black; --正常
再重启pg主从的服务,发现主从开始同步,后台没有再报刚刚那个错误
3、研发测试验证,发现插入部分hash值的时候,依然会刚刚那个错误
insert into t_black (sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int ) SELECT sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int
FROM t_white where sample_hash ='1E0CB07CDC71B2F994F5D3EB51050E3A';
程序报错:DatabaseError: invalid page header in block 27073 of relation base/21078/45300926
只插入部分字段就没问题,但完整插入就不行。
这时,我们想到可能是文件系统损坏了,当有数据往刚刚那个坏块里面写的时候,就会触发报错。
同时,我们查到一共有4张表出现了坏块,分别是:
t_black
t_white
t_batch_sample
t_derivative
我们决定先对所有的表做一次vacuum,然后进行磁盘的检查------后面的操作都是娟姐做的,待她回来后补充,但她反应检查后依然不行,最后的方法是磁盘格式化,然后用从库的备份进行了覆盖恢复。
网页上有一些介绍的方法:
1)根据错误提示 ERROR: invalid page header in block 1 of relation base/34780/34781 我们可以找到相应的文件, 文件的路径为: 数据目录/base/34780/34781,只要用工具手动把上面提示的坏块清除即可。
在Linux下面可以用dd工具把相应的页面清除:
$dd if=/dev/zero of=/home/postgres/data/base/34780/42995 bs=8192 seek=1 count=1 conv=notrunc
2)数据库暂时恢复使用了,但这是临时的,如果是磁盘文件系统故障,不久还是会重现这个问题,临了再修复了一下
1.reboot进入单用户模式
2.umount出现数据库异常的磁盘
3.fsck -v -t -p /dev/sda1
4.reboot
因为把有坏块的主库留了个备份,想用这个备份做一些实验:
a.尝试找到更为直接修复的方法
b.换到其他硬盘好的地方去,看它到底是db本身的文件坏了,还是啥原因。
======================实验=======================
将/data和pg_xlog压缩,传到已经安装好pg的12.250上面。替换到原来的data目录,目录保留
-rw-r--r-- 1 root root 39966776421 Apr 1 17:57 bk_data20150331.tar.gz
-rw-r--r-- 1 root root 5475189513 Apr 1 18:18 bk_pg_xlog20150331.tar.gz
[root@pgtina 9.1]# ll
total 39030084
drwxr-xr-x. 2 postgres postgres 4096 Apr 1 14:34 bin
drwx------. 13 postgres postgres 4096 Apr 1 16:22 bk_data ---原来的data目录
-rw-r--r--. 1 root root 39966776421 Apr 1 18:59 bk_data20150331.tar.gz
drwx------. 14 postgres postgres 4096 Apr 2 14:36 data ---解压后的data目录
drwxr-xr-x. 6 postgres postgres 4096 Apr 1 14:34 include
drwxr-xr-x. 3 postgres postgres 4096 Apr 1 14:34 lib
drwxr-xr-x. 6 postgres postgres 4096 Apr 1 14:34 share
[root@pgtina 9.1]# pwd
/home/pgsql/9.1 ---跟原来的库安装路径保持一致。
[postgres@pgtina ~]$ pg_ctl -D /home/pgsql/9.1/data start --直接启动报错:参数设置太大,需要调小
server starting
[postgres@pgtina ~]$ 2015-04-02 14:26:31 CST--- :FATAL: could not create shared memory segment: Cannot allocate memory
2015-04-02 14:26:31 CST---ETAIL: Failed system call was shmget(key=5432001, size=8836481024, 03600).
2015-04-02 14:26:31 CST--- :HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space, or exceeded your kernel's SHMALL parameter. You can either reduce the request size or reconfigure the kernel with larger SHMALL. To reduce the request size (currently 8836481024 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
The PostgreSQL documentation con
[postgres@pgtina data]$ vi postgresql.conf
[postgres@pgtina data]$ pg_ctl -D /home/pgsql/9.1/data start
server starting
[postgres@pgtina data]$ 2015-04-02 06:31:48 GMT--- :FATAL: WAL archival (archive_mode=on) requires wal_level "archive" or "hot_standby" --开了归档模式,改成off
^C
[postgres@pgtina data]$ vi postgresql.conf
[postgres@pgtina data]$ pg_ctl -D /home/pgsql/9.1/data start
server starting
[postgres@pgtina data]$ 2015-04-02 06:32:12 GMT--- :FATAL: WAL streaming (max_wal_senders > 0) requires wal_level "archive" or "hot_standby" ---改为0,不需要传输日志。
[postgres@pgtina pg_log]$ more postgresql-2015-04-02_143234.log ---一定要有流日志才能进行启动,因此再讲pg_xlog也解压
2015-04-02 14:32:34 CST--- :LOG: database system was shut down at 2015-03-30 22:26:53 CST
2015-04-02 14:32:34 CST--- :FATAL: required WAL directory "pg_xlog" does not exist
2015-04-02 14:32:34 CST--- :LOG: startup process (PID 11993) exited with exit code 1
2015-04-02 14:32:34 CST--- :LOG: aborting startup due to startup process failure
[postgres@pgtina pg_log]$ cd ..
[postgres@pgtina pg_log]$ more postgresql-2015-04-02_143335.log
2015-04-02 14:33:35 CST--- :LOG: database system was shut down at 2015-03-30 22:26:53 CST
2015-04-02 14:33:35 CST--- :LOG: creating missing WAL directory "pg_xlog/archive_status"
2015-04-02 14:33:35 CST--- :LOG: could not open tablespace directory "pg_tblspc/30139248/PG_9.1_201105231": No such file or directory
2015-04-02 14:33:35 CST--- :LOG: could not open file "pg_xlog/0000000100000B6400000027" (log file 2916, segment 39): No such file or directory
2015-04-02 14:33:35 CST--- :LOG: invalid primary checkpoint record
2015-04-02 14:33:35 CST--- :LOG: could not open file "pg_xlog/0000000100000B6400000025" (log file 2916, segment 37): No such file or directory
2015-04-02 14:33:35 CST--- :LOG: invalid secondary checkpoint record
2015-04-02 14:33:35 CST---ANIC: could not locate a valid checkpoint record
2015-04-02 14:33:35 CST--- :LOG: startup process (PID 12008) was terminated by signal 6: Aborted
2015-04-02 14:33:35 CST--- :LOG: aborting startup due to startup process failure
再次启动,虽有一个报错,但可以正常连接打开数据库了
[root@pgtina pg_log]# more postgresql-2015-04-02_151734.log
2015-04-02 15:17:34 CST--- :LOG: database system was shut down at 2015-03-30 22:26:53 CST
2015-04-02 15:17:34 CST--- :LOG: could not open tablespace directory "pg_tblspc/30139248/PG_9.1_201105231": No such file or direc
tory
2015-04-02 15:17:34 CST--- :LOG: database system is ready to accept connections
2015-04-02 15:17:34 CST--- :LOG: autovacuum launcher started
有一个人为创建的表空间,指向了其他位置:
CREATE TABLESPACE tm_data
OWNER postgres
LOCATION '/home/pgsql_data';
目录拷贝过来之后
修改postgres用户为简单密码tina
测试之前那个insert能否成功:
insert into t_black (sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int ) SELECT sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int
FROM t_white where sample_hash ='1E0CB07CDC71B2F994F5D3EB51050E3A';
执行依然报错:
ERROR: invalid page header in block 27073 of relation base/21078/45300926
********** 错误 **********
ERROR: invalid page header in block 27073 of relation base/21078/45300926
SQL 状态: XX001
再次像之前那样操作:
dd if=/dev/zero of=/home/postgres/data/base/34780/42995 bs=8192 seek=1 count=1 conv=notrunc
set zero_damaged_pages=on;
vacuum full t_black20150301; ---坏块就在这个表里面
WARNING: invalid page header in block 27069 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27070 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27071 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27072 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27073 of relation base/21078/45300926; zeroing out page
查询成功但无结果,耗时: 43486 毫秒(ms)。
tm_samples=# set zero_damaged_pages=on;
SET
tm_samples=# vacuum full t_black20150301; ---原来vacuum full t_black无效,需要具体到某一个分区表。
VACUUM
tm_samples=# select count(*) from t_black;
count
---------
3054944
(1 row)
再来执行insert试试:
insert into t_black (sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int ) SELECT sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int
FROM t_white where sample_hash ='1E0CB07CDC71B2F994F5D3EB51050E3A';
插入无报错,成功!
下面再来试试剩下的3个表。
tm_samples=# select count(*) from t_white; --白表正常
count
----------
25197129
(1 row)
tm_samples=# select count(*) from t_batch_sample; --这个表也正常,并都没有做vacuum操作。
count
----------
30878635
(1 row)
tm_samples=# select count(*) from t_derivative;
count
----------
70888497
(1 row)
----将有坏块的数据库迁移到磁盘正常的地方恢复,数据库只需要做一下vacuum操作,再次写入就没问题
==========================================================
补充:
1、如何对一个磁盘进行格式化?
mkfs命令
使用方式 : mkfs [-V] [-t fstype] [fs-options] filesys [blocks] [-L Lable]
说明 : 建立 linux 档案系统在特定的 partition 上
参数 :
device : 预备检查的硬盘 partition,例如:/dev/sda1
-V : 详细显示模式
-t : 给定档案系统的型式,Linux 的预设值为 ext2
-c : 在制做档案系统前,检查该partition 是否有坏轨
-l bad_blocks_file : 将有坏轨的block资料加到 bad_blocks_file 里面
block : 给定 block 的大小
-L:建立lable
[root@localhost beinan]# mkfs -t 文件系统 存储设备
注:
这里的文件系统是要指定的,比如 ext3 ;reiserfs ;ext2 ;fat32 ;msdos 等... ...
设备比如是一个硬盘的分区,软盘,光驱等.. ... 在格式化分区之前,您得懂得如何查看硬盘分区情况,并有针对性的格式化;
比如用 fdisk -l 来查看;
您也可以把分区格式化成其它的文件系统;比如我们把 /dev/sda6格式化为ext3 、ext2、reiserfs、fat32、msdos 文件系统,命令格式如下;
[root@localhost beinan]# mkfs -t ext3 /dev/sda6
[root@localhost beinan]# mkfs -t ext2 /dev/sda6
[root@localhost beinan]# mkfs -t reiserfs /dev/sda6
[root@localhost beinan]# mkfs -t fat32 /dev/sda6
[root@localhost beinan]# mkfs -t msdos /dev/sda6
2、添加分区的自动挂载
[root@pg home]# cat /etc/fstab
UUID=f3b4f67a-7e8a-477f-8dca-a1d683aa8a57 / ext4 defaults 1 1
UUID=9345bbc6-c984-4e42-9149-145649b6b753 swap swap defaults 0 0
UUID=e514fd36-4424-4984-952c-2661665f47c2 /home/pgsql ext4 defaults,noatime,nodiratime 0 0
UUID=2bba011f-7bb2-43cd-ba83-6dcf0b459a33 /opt/db_backup ext4 defaults,noatime,nodiratime 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
UUID在哪里看呢?
ls -l /dev/disk/by-uuid #查看sdc1对应的uuid值
使用df -T也可以查看到。
[root@pg home]# df -T -h
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda2 ext4 104G 37G 62G 38% /
tmpfs tmpfs 16G 0 16G 0% /dev/shm
/dev/sdc1 ext4 917G 529G 342G 61% /opt/db_backup
/dev/sdb ext4 939G 147G 745G 17% /home/pgsql
cat /etc/fstab
df -T -h
mount
3、有多余的表空间,且没有该表空间的目录依然可以正常启动数据库,如何将该表空间进行删除?
could not open tablespace directory "pg_tblspc/30139248/PG_9.1_201105231": No such file or directory 经查看就是表空间tm_data
postgres=# \db
List of tablespaces
Name | Owner | Location
------------+----------+------------------
pg_default | postgres |
pg_global | postgres |
tm_data | postgres | /home/pgsql_data
(3 rows)
postgres=# drop tablespace tm_data;
ERROR: tablespace "tm_data" is not empty --非空的表空间不能直接删除。
select * from pg_tables where tablespace='tm_data';--查到有4张表在这个表空间下
"public";"t_mobile_virus_url_event20150301";"postgres";"tm_data";f;f;f
"public";"t_mobile_virus_url_event20150201";"postgres";"tm_data";f;f;f
"public";"t_mobile_virus_url_event";"postgres";"tm_data";t;t;f
"public";"t_mobile_virus_url_event20150401";"postgres";"tm_data";t;f;f
先删除表,再删除表空间即可。
4、如何查看数据库的ID和表的ID?
select relfilenode from pg_class where relname='t_black'; --查看表的id
select datid,datname from pg_stat_activity where datname='tm_samples'; --查看库的id 暂时没找到更直接的办法
1、连上后,发现pg主从不同步,主不向从传日志,从报错:
FATAL: could not connect to the primary server: could not connect to server:
发现从先启动成功,而主是后启动的,因此我们将从再次重启
service postgresql restart
开始正常传日志
2、过了一会,研发反应部分表的使用出现问题,主再次不传输日志,且无sender进程。
查看主库日志,发现报错:
ERROR: invalid page header in block 27073 of relation base/21078/45300926
查看从库日志,发现类似报错
ERROR:insert_t_black:invalid page head in block 27073 of relation base/21078/45300926
21078是tm_samples库的id号
通过查询:
select * from pg_class where relfilenode='45300926';—t_black
应该是停电的时候,正好在进行数据的写入等操作,造成了数据损坏。
此时,我们进行了如下操作:
set zero_damaged_pages = on; --当这个参数为on的时候,会忽略所有数据有损坏的页面
vacuum full t_black;
tm_samples=# select count(*) from t_black;
WARNING: invalid page header in block 27069 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27070 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27071 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27072 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27073 of relation base/21078/45300926; zeroing out page
count
---------
3066302
(1 row)
reindex table t_black;
再次select count(*) from t_black; --正常
再重启pg主从的服务,发现主从开始同步,后台没有再报刚刚那个错误
3、研发测试验证,发现插入部分hash值的时候,依然会刚刚那个错误
insert into t_black (sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int ) SELECT sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int
FROM t_white where sample_hash ='1E0CB07CDC71B2F994F5D3EB51050E3A';
程序报错:DatabaseError: invalid page header in block 27073 of relation base/21078/45300926
只插入部分字段就没问题,但完整插入就不行。
这时,我们想到可能是文件系统损坏了,当有数据往刚刚那个坏块里面写的时候,就会触发报错。
同时,我们查到一共有4张表出现了坏块,分别是:
t_black
t_white
t_batch_sample
t_derivative
我们决定先对所有的表做一次vacuum,然后进行磁盘的检查------后面的操作都是娟姐做的,待她回来后补充,但她反应检查后依然不行,最后的方法是磁盘格式化,然后用从库的备份进行了覆盖恢复。
网页上有一些介绍的方法:
1)根据错误提示 ERROR: invalid page header in block 1 of relation base/34780/34781 我们可以找到相应的文件, 文件的路径为: 数据目录/base/34780/34781,只要用工具手动把上面提示的坏块清除即可。
在Linux下面可以用dd工具把相应的页面清除:
$dd if=/dev/zero of=/home/postgres/data/base/34780/42995 bs=8192 seek=1 count=1 conv=notrunc
2)数据库暂时恢复使用了,但这是临时的,如果是磁盘文件系统故障,不久还是会重现这个问题,临了再修复了一下
1.reboot进入单用户模式
2.umount出现数据库异常的磁盘
3.fsck -v -t -p /dev/sda1
4.reboot
因为把有坏块的主库留了个备份,想用这个备份做一些实验:
a.尝试找到更为直接修复的方法
b.换到其他硬盘好的地方去,看它到底是db本身的文件坏了,还是啥原因。
======================实验=======================
将/data和pg_xlog压缩,传到已经安装好pg的12.250上面。替换到原来的data目录,目录保留
-rw-r--r-- 1 root root 39966776421 Apr 1 17:57 bk_data20150331.tar.gz
-rw-r--r-- 1 root root 5475189513 Apr 1 18:18 bk_pg_xlog20150331.tar.gz
[root@pgtina 9.1]# ll
total 39030084
drwxr-xr-x. 2 postgres postgres 4096 Apr 1 14:34 bin
drwx------. 13 postgres postgres 4096 Apr 1 16:22 bk_data ---原来的data目录
-rw-r--r--. 1 root root 39966776421 Apr 1 18:59 bk_data20150331.tar.gz
drwx------. 14 postgres postgres 4096 Apr 2 14:36 data ---解压后的data目录
drwxr-xr-x. 6 postgres postgres 4096 Apr 1 14:34 include
drwxr-xr-x. 3 postgres postgres 4096 Apr 1 14:34 lib
drwxr-xr-x. 6 postgres postgres 4096 Apr 1 14:34 share
[root@pgtina 9.1]# pwd
/home/pgsql/9.1 ---跟原来的库安装路径保持一致。
[postgres@pgtina ~]$ pg_ctl -D /home/pgsql/9.1/data start --直接启动报错:参数设置太大,需要调小
server starting
[postgres@pgtina ~]$ 2015-04-02 14:26:31 CST--- :FATAL: could not create shared memory segment: Cannot allocate memory
2015-04-02 14:26:31 CST---ETAIL: Failed system call was shmget(key=5432001, size=8836481024, 03600).
2015-04-02 14:26:31 CST--- :HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space, or exceeded your kernel's SHMALL parameter. You can either reduce the request size or reconfigure the kernel with larger SHMALL. To reduce the request size (currently 8836481024 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
The PostgreSQL documentation con
[postgres@pgtina data]$ vi postgresql.conf
[postgres@pgtina data]$ pg_ctl -D /home/pgsql/9.1/data start
server starting
[postgres@pgtina data]$ 2015-04-02 06:31:48 GMT--- :FATAL: WAL archival (archive_mode=on) requires wal_level "archive" or "hot_standby" --开了归档模式,改成off
^C
[postgres@pgtina data]$ vi postgresql.conf
[postgres@pgtina data]$ pg_ctl -D /home/pgsql/9.1/data start
server starting
[postgres@pgtina data]$ 2015-04-02 06:32:12 GMT--- :FATAL: WAL streaming (max_wal_senders > 0) requires wal_level "archive" or "hot_standby" ---改为0,不需要传输日志。
[postgres@pgtina pg_log]$ more postgresql-2015-04-02_143234.log ---一定要有流日志才能进行启动,因此再讲pg_xlog也解压
2015-04-02 14:32:34 CST--- :LOG: database system was shut down at 2015-03-30 22:26:53 CST
2015-04-02 14:32:34 CST--- :FATAL: required WAL directory "pg_xlog" does not exist
2015-04-02 14:32:34 CST--- :LOG: startup process (PID 11993) exited with exit code 1
2015-04-02 14:32:34 CST--- :LOG: aborting startup due to startup process failure
[postgres@pgtina pg_log]$ cd ..
[postgres@pgtina pg_log]$ more postgresql-2015-04-02_143335.log
2015-04-02 14:33:35 CST--- :LOG: database system was shut down at 2015-03-30 22:26:53 CST
2015-04-02 14:33:35 CST--- :LOG: creating missing WAL directory "pg_xlog/archive_status"
2015-04-02 14:33:35 CST--- :LOG: could not open tablespace directory "pg_tblspc/30139248/PG_9.1_201105231": No such file or directory
2015-04-02 14:33:35 CST--- :LOG: could not open file "pg_xlog/0000000100000B6400000027" (log file 2916, segment 39): No such file or directory
2015-04-02 14:33:35 CST--- :LOG: invalid primary checkpoint record
2015-04-02 14:33:35 CST--- :LOG: could not open file "pg_xlog/0000000100000B6400000025" (log file 2916, segment 37): No such file or directory
2015-04-02 14:33:35 CST--- :LOG: invalid secondary checkpoint record
2015-04-02 14:33:35 CST---ANIC: could not locate a valid checkpoint record
2015-04-02 14:33:35 CST--- :LOG: startup process (PID 12008) was terminated by signal 6: Aborted
2015-04-02 14:33:35 CST--- :LOG: aborting startup due to startup process failure
再次启动,虽有一个报错,但可以正常连接打开数据库了
[root@pgtina pg_log]# more postgresql-2015-04-02_151734.log
2015-04-02 15:17:34 CST--- :LOG: database system was shut down at 2015-03-30 22:26:53 CST
2015-04-02 15:17:34 CST--- :LOG: could not open tablespace directory "pg_tblspc/30139248/PG_9.1_201105231": No such file or direc
tory
2015-04-02 15:17:34 CST--- :LOG: database system is ready to accept connections
2015-04-02 15:17:34 CST--- :LOG: autovacuum launcher started
有一个人为创建的表空间,指向了其他位置:
CREATE TABLESPACE tm_data
OWNER postgres
LOCATION '/home/pgsql_data';
目录拷贝过来之后
修改postgres用户为简单密码tina
测试之前那个insert能否成功:
insert into t_black (sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int ) SELECT sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int
FROM t_white where sample_hash ='1E0CB07CDC71B2F994F5D3EB51050E3A';
执行依然报错:
ERROR: invalid page header in block 27073 of relation base/21078/45300926
********** 错误 **********
ERROR: invalid page header in block 27073 of relation base/21078/45300926
SQL 状态: XX001
再次像之前那样操作:
dd if=/dev/zero of=/home/postgres/data/base/34780/42995 bs=8192 seek=1 count=1 conv=notrunc
set zero_damaged_pages=on;
vacuum full t_black20150301; ---坏块就在这个表里面
WARNING: invalid page header in block 27069 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27070 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27071 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27072 of relation base/21078/45300926; zeroing out page
WARNING: invalid page header in block 27073 of relation base/21078/45300926; zeroing out page
查询成功但无结果,耗时: 43486 毫秒(ms)。
tm_samples=# set zero_damaged_pages=on;
SET
tm_samples=# vacuum full t_black20150301; ---原来vacuum full t_black无效,需要具体到某一个分区表。
VACUUM
tm_samples=# select count(*) from t_black;
count
---------
3054944
(1 row)
再来执行insert试试:
insert into t_black (sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int ) SELECT sample_hash, sample_crc32, sample_sha1, virusname, avl_m, software_english,file_size, conditions_type,
sample_type, description,record_time, getname_time, developers, last_record_time, last_changename_time, source_id, ss_id, cps,level,keyhash,
sensitive_strings, behavior_info, relation_info,version,ratio,"AVL_Embed", "AVL_Adware", program_name, versionname, record_time_int
FROM t_white where sample_hash ='1E0CB07CDC71B2F994F5D3EB51050E3A';
插入无报错,成功!
下面再来试试剩下的3个表。
tm_samples=# select count(*) from t_white; --白表正常
count
----------
25197129
(1 row)
tm_samples=# select count(*) from t_batch_sample; --这个表也正常,并都没有做vacuum操作。
count
----------
30878635
(1 row)
tm_samples=# select count(*) from t_derivative;
count
----------
70888497
(1 row)
----将有坏块的数据库迁移到磁盘正常的地方恢复,数据库只需要做一下vacuum操作,再次写入就没问题
==========================================================
补充:
1、如何对一个磁盘进行格式化?
mkfs命令
使用方式 : mkfs [-V] [-t fstype] [fs-options] filesys [blocks] [-L Lable]
说明 : 建立 linux 档案系统在特定的 partition 上
参数 :
device : 预备检查的硬盘 partition,例如:/dev/sda1
-V : 详细显示模式
-t : 给定档案系统的型式,Linux 的预设值为 ext2
-c : 在制做档案系统前,检查该partition 是否有坏轨
-l bad_blocks_file : 将有坏轨的block资料加到 bad_blocks_file 里面
block : 给定 block 的大小
-L:建立lable
[root@localhost beinan]# mkfs -t 文件系统 存储设备
注:
这里的文件系统是要指定的,比如 ext3 ;reiserfs ;ext2 ;fat32 ;msdos 等... ...
设备比如是一个硬盘的分区,软盘,光驱等.. ... 在格式化分区之前,您得懂得如何查看硬盘分区情况,并有针对性的格式化;
比如用 fdisk -l 来查看;
您也可以把分区格式化成其它的文件系统;比如我们把 /dev/sda6格式化为ext3 、ext2、reiserfs、fat32、msdos 文件系统,命令格式如下;
[root@localhost beinan]# mkfs -t ext3 /dev/sda6
[root@localhost beinan]# mkfs -t ext2 /dev/sda6
[root@localhost beinan]# mkfs -t reiserfs /dev/sda6
[root@localhost beinan]# mkfs -t fat32 /dev/sda6
[root@localhost beinan]# mkfs -t msdos /dev/sda6
2、添加分区的自动挂载
[root@pg home]# cat /etc/fstab
UUID=f3b4f67a-7e8a-477f-8dca-a1d683aa8a57 / ext4 defaults 1 1
UUID=9345bbc6-c984-4e42-9149-145649b6b753 swap swap defaults 0 0
UUID=e514fd36-4424-4984-952c-2661665f47c2 /home/pgsql ext4 defaults,noatime,nodiratime 0 0
UUID=2bba011f-7bb2-43cd-ba83-6dcf0b459a33 /opt/db_backup ext4 defaults,noatime,nodiratime 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
UUID在哪里看呢?
ls -l /dev/disk/by-uuid #查看sdc1对应的uuid值
使用df -T也可以查看到。
[root@pg home]# df -T -h
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda2 ext4 104G 37G 62G 38% /
tmpfs tmpfs 16G 0 16G 0% /dev/shm
/dev/sdc1 ext4 917G 529G 342G 61% /opt/db_backup
/dev/sdb ext4 939G 147G 745G 17% /home/pgsql
cat /etc/fstab
df -T -h
mount
3、有多余的表空间,且没有该表空间的目录依然可以正常启动数据库,如何将该表空间进行删除?
could not open tablespace directory "pg_tblspc/30139248/PG_9.1_201105231": No such file or directory 经查看就是表空间tm_data
postgres=# \db
List of tablespaces
Name | Owner | Location
------------+----------+------------------
pg_default | postgres |
pg_global | postgres |
tm_data | postgres | /home/pgsql_data
(3 rows)
postgres=# drop tablespace tm_data;
ERROR: tablespace "tm_data" is not empty --非空的表空间不能直接删除。
select * from pg_tables where tablespace='tm_data';--查到有4张表在这个表空间下
"public";"t_mobile_virus_url_event20150301";"postgres";"tm_data";f;f;f
"public";"t_mobile_virus_url_event20150201";"postgres";"tm_data";f;f;f
"public";"t_mobile_virus_url_event";"postgres";"tm_data";t;t;f
"public";"t_mobile_virus_url_event20150401";"postgres";"tm_data";t;f;f
先删除表,再删除表空间即可。
4、如何查看数据库的ID和表的ID?
select relfilenode from pg_class where relname='t_black'; --查看表的id
select datid,datname from pg_stat_activity where datname='tm_samples'; --查看库的id 暂时没找到更直接的办法
发表评论
-
pg 锁
2016-01-14 16:26 0pg 锁 ... -
postgresql 的三类日志
2016-01-14 15:59 18527一、PostgreSQL有3种日志: 1)pg_log(数据 ... -
pg存储过程--创建分区表
2016-01-13 15:46 01)将普通表改成按时间字段分区表 调用select fun_c ... -
pg常用自制shell脚本-tina
2016-01-13 15:30 49391)小型监控: 1.在pg库主机上部署,每5分钟执行一次,插入 ... -
postgresql 时间类型和相关函数
2016-01-13 10:41 5457今天来好好学习一下postgresql涉及时间的字段类型和一些 ... -
pg 表空间
2016-01-07 16:28 3124一、说明 在数据库运维工作中,经常会有数据目录使用率较高 ... -
pg 定期vacuum和reindex
2016-01-07 14:56 8615定期vacuum和reindex: 一 ... -
pg 序列
2016-01-06 16:58 1621一、简介 一个序列对象通常用于为行或者表生成唯一的标识符。 ... -
pg 简单备份和恢复
2016-01-06 15:53 3768pg的备份和恢复 pg_dump ... -
pg_cancel_backend()和pg_terminate_backend()
2016-01-05 17:42 3555pg_cancel_backend()和pg_terminat ... -
canceling statement due to conflict with recovery
2016-01-05 17:12 1681报错: canceling statement due to ... -
postgresql dblink 使用
2015-12-31 14:33 2042dblink的使用 pg的跨库查询工具 select dbli ... -
root用户不能使用psql或者pg_dump等pg命令
2015-12-24 14:40 7030root用户不能使用psql或者pg_dump等pg命令 [ ... -
postgresql新建库2个常见报错
2015-12-22 16:43 6263今天使用pg建库发现两个报错: ERROR: new c ... -
安装postgresql 9.1.1
2015-12-22 16:25 642安装postgresql 9.1.1 ---版本自选,步骤相同 ... -
pgbadger监控安装和使用
2015-12-21 10:01 2033pgbadger监控安装和使用 https://github ... -
oracle,postgresql,mysql一些使用上的区别记录
2015-12-16 11:38 01.限制行数: select * from ta where ... -
postgresql存储过程实例:已审核证书存入临时表
2015-12-14 16:44 652存储过程实例: 需求: 思路:建立存储过程 代码逻辑: 1 ... -
pg 函数sfa_tmp_sleep()执行越来越慢-sql分析
2015-12-11 09:48 678pg 函数sfa_tmp_sleep()执行越来越慢 ... -
pgpool 主从流复制模式下的安装使用
2015-12-11 09:50 4127pgpool-II 是一个位于 PostgreSQL 服务器和 ...
相关推荐
在Python开发过程中,遇到"error: invalid command ‘egg-info‘"这样的报错通常是由于安装或配置Setuptools库时出现了问题。Setuptools是Python项目构建、安装和管理的工具,而`egg-info`命令用于生成项目的元数据...
Generic syntax highlighter syntaxerror: invalid syntax syntaxerror: invalid syntax syntaxerror: invalid syntax syntaxerror: invalid syntax syntaxerror: invalid syntax
syntaxerror: invalid syntax syntaxerror: invalid syntax syntaxerror: invalid syntax syntaxerror: invalid syntax syntaxerror: invalid syntax
python——pip install xxx报错SyntaxError: invalid syntax 在安装好python后,进入python运行环境后,因为我要用pip安装开发Web App需要的第三方库,执行pip install aiohttp,发现会报错SyntaxError: invalid ...
在ROS(Robot Operating System)开发过程中,遇到`rlexception: invalid roslaunch xml syntax: no element found: line 1, column 0`这样的错误信息时,表明当前的`.launch`文件存在XML语法错误。具体来说,可能是...
invalid LOC header (bad signature)问题解决,快速查找与删除有问题的jar包
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun 看了下Git命令能否正常使用 命令行git –help ...
关于pip install xxx报错SyntaxError:invalid syntax的解决方法 声明:1.以下均以pip install requests举例; 2.Windows系统; 首先,看自己是否在python环境中运行了pip,若是,请打开“开始”菜单,输入cmd,找到...
NULL 博文链接:https://wilian.iteye.com/blog/1992365
10. **`"upstream sent invalid header while reading response header from upstream"`**:后端服务器发送的响应头部无效。 11. **`"client intended to send too large body"`**:客户端发送的数据体超过了配置的...
再和前台对接的时候一直报错:invalid url domain 在网上查找了很多资料,有说端口的问题,也有其它各种原因的,一大堆,最后我发现以上的说法都不准确,这是我配置的地址(错误的地址) 把配置地址的http://去掉...
针对SparkR安装出现的Invalid or corrupt jarfile sbt/sbt-launch-0.13.5.jar替代
@ERROR: invalid uid nobody rsync error: error starting client-server protocol (code 5) at main.c(1506) [Receiver=3.0.7] 解决办法:在rsyncd.conf文件中添加下面两行即可解决问题 uid =
主要介绍了docker镜像无法删除 Error:No such image:xxxxxx解决,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
STM32F10x系列是意法半导体(STMicroelectronics)推出的基于ARM Cortex-M3内核的微控制器,广泛应用于各种嵌入式系统设计。`STM32F10x_startup`指的是针对该系列芯片的启动文件,通常以`.s`格式存在,它是微控制器...
在使用PostgreSQL数据库时,可能会遇到一个常见的错误提示“ERROR: invalid escape string”。这个错误通常出现在执行SQL查询时,尤其是在涉及转义字符的场景下。本文档将从问题描述、环境设定、错误分析、解决方法...
8. 错误信息:Error: Fitter requires that more entities of type logic cell be placed in a region than are available in the region 解决方法:检查所分配的 Logiclock region 面积是否足够,增加 Logiclock ...