自斟自饮——6. 唯一性索引

liangguanhui

浏览: 114072 次
性别:

最近访客更多访客>>

yuefei008

jyjsjd

jing344986203

xintingandzhouyang

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

自斟自饮

Informix MySQL Oracle SQL PostgreSQL

Unique Index，唯一性索引，相信each of us都知道是怎么一回事。相对于普通的index，唯一性索引多了一个检查唯一性的checking。但，你对这个Unique Index是不是真的已经非常了解了？是吗？

1. 假设有以下表结构：

create temp table tmp_one (tid integer, tname char(12));
create temp table tmp_two (tid integer, tno integer, tname char(12));

create unique index tmp_one_idx on tmp_one(tid);
create unique index tmp_two_idx on tmp_two(tid, tno);

你觉得下面的代码执行结果是怎么样？

-- first
insert into tmp_one values (1, 'a');
insert into tmp_one values (2, 'b');
insert into tmp_one values (2, 'c');
insert into tmp_one values (null, 'd');
insert into tmp_one values (null, 'e');

-- second
insert into tmp_two values (1, 1, 'a');
insert into tmp_two values (2, 1, 'b');
insert into tmp_two values (2, 1, 'c');
insert into tmp_two values (null, 1, 'd');
insert into tmp_two values (null, 1, 'e');
insert into tmp_two values (null, null, 'f');
insert into tmp_two values (null, null, 'g');

-- finally query
select * from tmp_one order by tid;
select * from tmp_two order by tid, tno;

你觉得这个结果在Informix、MySQL、Oracle、PostgreSQL是否会一样呢？你觉得为什么会有这样的不同？

2. 有以下一个business表

create table business (
    tid     integer,
    tkind   char(1),
    tname   char(12)
);

现在有这样一个需求，对于所有tkind = ‘Y’的数据，tid都必须是唯一的，其余的tkind != ‘Y’ 或者null的数据不受这个限制。请问你会如何建这个“部分唯一性”的索引？

3. 假若存在一个表，

create table thing (
    tid       integer,
    handled   char(1),     --这个标志位表示是否已经被day-end处理过，Y或者N
    data      char(300)
);
create unique index thing_idx on thing(tid);

假若thing表有1000w数据，handled列都是“Y”，然后每天会有一些front-end插入一些handled = ‘N’的记录（数量大概几百条的样子），接着每天晚上会有一个day-end把这些handled = ‘N’的数据取出来，处理完后，最后会把这些记录的handled update成“Y”。

--伪代码
for row in (select * from thing where handled = 'N' for update)
begin
    -- 这里我省略了处理这条row的代码
    update thing set handled = 'Y' where in current row
end

你会怎样改善这个过程？

1. NULL在索引的问题

这个NULL的问题在不同的数据库表现很不一样，informix把它看作一个普通的值；但在Oracle、MySQL、PostgreSQL中的情况完全相反——NULL不起作用。不过要留意，这里的“不起作用”还是有异同的。

简单来说，好像按照SQL标准（不知道是SQL92还是SQL99了），不应该对null进行索引，in other word，索引中不应该包含null，（注意，虽然我给出的例子是unique index，但普通索引也一样成立的，只不过unique index比较容易看到实验效果）

先列出Informix的结果：

下面是MySQL的结果：

然后是pgsql的结果：

最后是oracle的结果：

从上面的结果我们可以得到很多很多的信息。

大家可以看到，在这一方面pgsql竟然“背叛”了oralce，跟MySQL相同。

我不知道大家有没有留意，反正我是在写这篇文章的时候才留意到，这四个数据库对于NULL排序是有不同的。

在ASC升序的情况下，
Informix、MySQL是把NULL放在前面；pgsql和oracle把NULL放在最后。
相信在DESC降序时，情况会是倒过来的。

这个测试说明了：

不同的数据有不少关键地方是不同的，平时要多加笔；
对于unique index，如果业务允许，尽量增加not null的约束。

虽然几个数据库对NULL在索引上的表现有所不同，但Informix似乎是“独占鳌头”了，尽管我不怎么想说Informix烂

2.NULL在索引中的妙用

对于“部分唯一性”的约束，可能有人建议trigger。但实际情况下会有很多很多很多的问题。

首先是性能的问题，trigger绝对是性能杀手的重要一员，一个不留神就会中招。

其次是隔离级别的问题，不同隔离级别下看到的数据是不一样的，例如对于informix，如果别人的数据还没有commit，我们在read uncommitted下是可以得到的，但在其余的隔离级别是看不到的。这种“多样性”会导致很多判断上的分歧，令我们的系统存在不少“隐患”。

最后，我有点怀疑是否真的能够实现。因为数据库存在并发，我们当然需要对数据加锁，但一般数据库都没有类似MySQL（InnoDB）的“间隙锁”，似乎很难对不存在的数据进行加锁，即使是MySQL（InnoDB），间隙锁也必须是在repeatable read和serializable才有效，有一定的局限性。

实际上我们可以利用NULL不索引这个特性建立“部分唯一性”索引（或者叫有选择性的索引、有条件的索引）。

首先我们建一个function（这里以oracle为例）。可能很多人都不知道现在的数据库都可以create function的，实际上function跟procedure存储过程很相似，只是function必须有返回值。这个函数很简单，对于tkind = Y，返回实际的id，否则统统返回null。

create function test_fn1 (f_id in integer, f_kind in char(1)) return integer
deterministic
as
begin
    if f_kind = 'Y' then
        return f_id
    end if
    return null
end;
/

然后对表business建一个function index（函数索引）。要了解更多关于函数索引可以google一下。

create unique index busi_idx1 on business (test_fn1(tid, tkind));

这个时候已经可以实现部分唯一性约束了。因为这个函数只会对tkind = ‘Y’的数据返回真正的tid，其余的tkind都是返回null，所以我们的索引也就只对tkind = Y起作用了。我们知道null是不进入索引的。

注意：由于informix对null也会收录到index，所以这个方法对于Informix无效。如果你问我infomix下怎么搞，我answer：不知道。

3. 附加练习题

这一道附加题我就不打算细说了，看完上面估计大家都应该知道怎么做。