`
jjxliu306
  • 浏览: 153743 次
  • 性别: Icon_minigender_1
  • 来自: 西安
社区版块
存档分类
最新评论

Postgres pg_multixact信息丢失极限拉起数据库

 
阅读更多

本测试模拟pg_multixact,pg_xlog全部信息丢失的情况下,数据库完全起不来的情况下拉起数据库。本人血淋淋的教训,建议还是做好备库和增量备份。

 

  一、初始化一个新的实例、创建数据库等。(因为当前系统用户并非postgres,因此我加了-U 来默认数据库超管用户postgres)

 

[lyf@centos250 pg9.6]$ initdb -D pg_db_test --encoding=UTF-8 --locale=C -U postgres 
The files belonging to this database system will be owned by user "lyf".
This user must also own the server process.

….信息忽略
Success. You can now start the database server using:

    pg_ctl -D pg_db_test -l logfile start

 修改配置端口即可,防止和其他已有实例冲突。如果本机只有一个pg实例,此步骤忽略。

listen_addresses = '*'          # (change requires restart)
port = 5678                             # (change requires restart)

 

 启动服务器

[lyf@centos250 pg9.6]$ pg_ctl start -D pg_db_test/ -l pg_db_test/db.log
server starting

 

Psql进入数据库,创建测试库和测试数据

[lyf@centos250 pg9.6]$ psql -U postgres -p 5678
psql (9.6.0)
Type "help" for help.

postgres=# create database dbtest;
CREATE DATABASE
postgres=# \c dbtest;
You are now connected to database "dbtest" as user "postgres".
dbtest=# create table t_test(id int , name text);
CREATE TABLE
dbtest=# insert into t_test select generate_series(1,10000) , md5(random()::text);
INSERT 0 10000
dbtest=# insert into t_test select generate_series(1,10000) , md5(random()::text);
INSERT 0 10000
dbtest=# insert into t_test select generate_series(1,10000) , md5(random()::text);
INSERT 0 10000
dbtest=# insert into t_test select generate_series(1,10000) , md5(random()::text);
INSERT 0 10000
-- 此处没有checkpoint的话,删除了xlog之后很可能数据已经丢了,感兴趣的可以试试
dbtest=# checkpoint ; 
CHECKPOINT
dbtest=# \q

 

 

二、停库,删除pg_multixact/members, pg_multixact/offsets 下的文件。

   先停库,做好clogxlogmultixact的备份

   

[lyf@centos250 pg9.6]$ pg_ctl stop -m fast -D pg_db_test/ 
waiting for server to shut down.... done
server stopped
[lyf@centos250 pg9.6]$ cd pg_db_test/
 [lyf@centos250 pg_db_test]$ zip -r xlog.zip pg_xlog
  adding: pg_xlog/ (stored 0%)
  adding: pg_xlog/000000010000000000000001 (deflated 83%)
  adding: pg_xlog/archive_status/ (stored 0%)
[lyf@centos250 pg_db_test]$ zip -r multixact.zip pg_multixact
  adding: pg_multixact/ (stored 0%)
  adding: pg_multixact/members/ (stored 0%)
  adding: pg_multixact/members/0000 (deflated 100%)
  adding: pg_multixact/offsets/ (stored 0%)
  adding: pg_multixact/offsets/0000 (deflated 100%)

 

删除文件

[lyf@centos250 pg_db_test]$ rm pg_xlog/* -r
[lyf@centos250 pg_db_test]$ rm pg_multixact/members/*
[lyf@centos250 pg_db_test]$ rm pg_multixact/offsets/*
[lyf@centos250 pg_db_test]$

 

三、 尝试启动数据库

[lyf@centos250 pg9.6]$ pg_ctl start -D pg_db_test/ -l pg_db_test/db.log
server starting
[lyf@centos250 pg9.6]$ tail -f pg_db_test/db.log 
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  invalid primary checkpoint record
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 16658) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down

 此时,数据库已经起不来了

 

四、尝试通过resetxlog恢复,参考手册的配置:https://www.postgresql.org/docs/9.6/static/app-pgresetxlog.html 。这里我们重点需要-O-x-m三个参数

 

-O mxoff

 

此时pg_multixact/members 文件夹下文件为空,因此值为 to_hex((0 + 1) * 52352) = 0xcc80, 注意此参数9.2以前为65536 , 9.3以后改为52352

 

-x xid

[lyf@centos250 pg9.6]$ ls pg_db_test/pg_clog/
0000

 此时pg_clog下的文件夹文件最大0000,因此值为: 0x100000

 

 

-m mxid,mxid

 

此时pg_multixact/ offsets文件夹下文件为空, 因此给值: 0x10000,0x10000

 

[lyf@centos250 pg9.6]$ pg_resetxlog -m 0x10000,0x10000 -x 0x100000 -O 0xcc80 -f pg_db_test/
Transaction log reset

 观察此时的文件状态中xlog已经重置了wal

 

 五、再次重启数据库测试

[lyf@centos250 pg9.6]$ pg_ctl start -D pg_db_test/ -l pg_db_test/db.log
server starting
[lyf@centos250 pg9.6]$ tail -f pg_db_test/db.log 
LOG:  database system was shut down at 2017-09-13 15:00:49 CST
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

 此时数据库已经启动,psql进入查看数据。

[lyf@centos250 pg9.6]$ psql -U postgres -p 5678
psql (9.6.0)
Type "help" for help.

postgres=# \l+
                                                               List of databases
   Name    |  Owner   | Encoding | Collate | Ctype |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+---------+-------+-----------------------+---------+------------+--------------------------------------------
 dbtest    | postgres | UTF8     | C       | C     |                       | 9767 kB | pg_default | 
 postgres  | postgres | UTF8     | C       | C     |                       | 7063 kB | pg_default | default administrative connection database
 template0 | postgres | UTF8     | C       | C     | =c/postgres          +| 6953 kB | pg_default | unmodifiable empty database
           |          |          |         |       | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | C       | C     | =c/postgres          +| 6953 kB | pg_default | default template for new databases
           |          |          |         |       | postgres=CTc/postgres |         |            | 
(4 rows)

postgres=# \c dbtest
You are now connected to database "dbtest" as user "postgres".
dbtest=# \d
         List of relations
 Schema |  Name  | Type  |  Owner   
--------+--------+-------+----------
 public | t_test | table | postgres
(1 row)

dbtest=# select count(1) from t_test ;
 count 
-------
 40000
(1 row)

dbtest=# create table t_test2(id int , name text);
CREATE TABLE
dbtest=# insert into t_test2 values(1 , '1');
INSERT 0 1
dbtest=#

 

 

到此,数据库恢复正常。

 

1
0
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics