浏览 2899 次
锁定老帖子 主题:Cardinality对表Join的影响
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
|
|
---|---|
作者 | 正文 |
发表时间:2010-05-21
首先创建三个测试表: CREATE TABLE customer ( -- Unique ID customer_id integer NOT NULL, CONSTRAINT PK_customer PRIMARY KEY(customer_id) ); CREATE TABLE my_order ( -- Unique ID order_id integer NOT NULL, product_id integer NOT NULL, customer_id integer NOT NULL, CONSTRAINT PK_order PRIMARY KEY(product_id, customer_id ), CONSTRAINT FK_order2customer_id FOREIGN KEY (customer_id) REFERENCES customer(customer_id), CONSTRAINT UQ_order_id UNIQUE(order_id) ); CREATE TABLE delivery ( order_id int NOT NULL, time datetime NOT NULL, CONSTRAINT PK_order PRIMARY KEY(order_id, time), CONSTRAINT FK_order FOREIGN KEY (order_id) REFERENCES my_order(order_id) ); 然后加入数据: set @N=0; insert into customer select @N:=@N+1 from mysql.help_topic LIMIT 1000; set @N=0; insert into my_order select @N:=@N+1, @N, 1 from mysql.help_topic a, mysql.help_topic b LIMIT 100000; set @I=1; insert into my_order select @N:=@N+1, @N, @I:=@I+1 from mysql.help_topic a, mysql.help_topic b LIMIT 600; set @N=0; insert into delivery select @N:=@N+1, '2010-05-10 15:22:02' from mysql.help_topic a, mysql.help_topic b LIMIT 600; 注意在my_order表中大多数记录的customer_id的值是1。 执行ANALYZE TABLE来更新Cardinality: ANALYZE TABLE my_order; ANALYZE TABLE delivery; 下面我们来执行一个Query: mysql> SELECT count(*) FROM my_order a JOIN my_order b ON a.customer_id = b.customer_id and a.product_id = 123 JOIN delivery ON b.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02'; +----------+ | count(*) | +----------+ | 600 | +----------+ 1 row in set (0.51 sec) 然后用Explain分析一下: mysql> explain SELECT count(*) FROM my_order a JOIN my_order b ON a.customer_id = b.customer_id and a.product_id = 123 JOIN delivery ON b.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02'; +----+-------------+----------+--------+----------------------------------+----------------------+---------+------------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+--------+----------------------------------+----------------------+---------+------------------------------+------+-------------+ | 1 | SIMPLE | a | ref | PRIMARY,FK_order2customer_id | PRIMARY | 4 | const | 1 | Using index | | 1 | SIMPLE | b | ref | UQ_order_id,FK_order2customer_id | FK_order2customer_id | 4 | oliver_test.a.customer_id | 167 | | | 1 | SIMPLE | delivery | eq_ref | PRIMARY | PRIMARY | 12 | oliver_test.b.order_id,const | 1 | Using index | +----+-------------+----------+--------+----------------------------------+----------------------+---------+------------------------------+------+-------------+ 3 rows in set (0.00 sec) 从explain的结果来看,MySQL选择先把my_order自己做Join然后再去Join表delivery。第一个Join需要访问167行数据。 但实际情况是如何呢? mysql> SELECT count(*) FROM my_order a JOIN my_order b ON a.customer_id = b.customer_id and a.product_id = 123 ; +----------+ | count(*) | +----------+ | 100000 | +----------+ 1 row in set (0.06 sec) 如果我们只执行第一个Join,实际将访问100000条记录,远远高于167,相反如果我们先做第二个Join, 我们只需要访问600行: mysql> select count(*) from delivery JOIN my_order ON my_order.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02'; +----------+ | count(*) | +----------+ | 600 | +----------+ 1 row in set (0.00 sec) 为什么MySQL会选择错误的顺序来执行呢?首先察看一下my_order表的索引: mysql> show index from my_order; +----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | my_order | 0 | PRIMARY | 1 | product_id | A | 100600 | NULL | NULL | | BTREE | | | my_order | 0 | PRIMARY | 2 | customer_id | A | 100600 | NULL | NULL | | BTREE | | | my_order | 0 | UQ_order_id | 1 | order_id | A | 100600 | NULL | NULL | | BTREE | | | my_order | 1 | FK_order2customer_id | 1 | customer_id | A | 602 | NULL | NULL | | BTREE | | +----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 4 rows in set (0.00 sec) 我们可以看到对于索引FK_order2customer_id,Cardinality的值是602,MySQL在估算第一个Join的时候,假设索引是平均分布的,用总行数(100600)除以Cardinality,所以得到167,由于167小于第二个Join需要访问的行数600,所以选择先执行第一个Join。 如何解决这个问题? 修改Query中Join的顺序并用STRAIGHT_JOIN强制MySQL按照这个顺序执行,下面是新的Query: mysql> SELECT count(*) FROM delivery STRAIGHT_JOIN my_order a ON a.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02' JOIN my_order b ON a.customer_id = b.customer_id and b.product_id = 123 ; +----------+ | count(*) | +----------+ | 600 | +----------+ 1 row in set (0.00 sec) 你可以看到查询时间从0.51秒变成0秒。 用Explain察看新的执行顺序: mysql> explain SELECT count(*) FROM delivery STRAIGHT_JOIN my_order a ON a.order_id = delivery.order_id AND delivery.time = '2010-05-10 15:22:02' JOIN my_order b ON a.customer_id = b.customer_id and b.product_id = 123 ; +----+-------------+----------+--------+----------------------------------+-------------+---------+-------------------------------+------+---------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+--------+----------------------------------+-------------+---------+-------------------------------+------+---------------------------------------------+ | 1 | SIMPLE | b | ref | PRIMARY,FK_order2customer_id | PRIMARY | 4 | const | 1 | Using index | | 1 | SIMPLE | delivery | index | PRIMARY | PRIMARY | 12 | NULL | 600 | Using where; Using index; Using join buffer | | 1 | SIMPLE | a | eq_ref | UQ_order_id,FK_order2customer_id | UQ_order_id | 4 | oliver_test.delivery.order_id | 1 | Using where | +----+-------------+----------+--------+----------------------------------+-------------+---------+-------------------------------+------+---------------------------------------------+ 3 rows in set (0.00 sec) 另外很重要的一点是这个问题在开始阶段通常不会注意,因为数据量少不会影响性能,但随着数据不断增加,一旦达到一定数量,就会突然出现并影响系统的性能。 声明:ITeye文章版权属于作者,受法律保护。没有作者书面许可不得转载。
推荐链接
|
|
返回顶楼 | |