`
myreligion
  • 浏览: 206033 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

When the subselect runs faster (zt)

阅读更多

A few weeks ago, we had a query optimization request from one of our customer.

The query was very simple like:

CODE:
  1. SELECT * FROM `table` WHERE (col1='A'||col1='B') ORDER BY id DESC LIMIT 20 OFFSET 0

 

This column in the table is looks like this:

CODE:
  1. `col1` enum('A','B','C','CD','DE','F','G','HI') default NULL

 

The table have 549252 rows and of course, there is an index on the col1. MySQL estimated the cardinality of that index as 87, though what was of course misleading as index cardinality in this case can't be over 9, as there is only 8(+ NULL) different possible values for this column.

CODE:
  1. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
  2. | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                       |
  3. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+
  4. |  1 | SIMPLE      | table  | range | col1         | col1 | 2       | NULL | 549252 | Using where; Using filesort |
  5. +----+-------------+-------+-------+---------------+------+---------+------+--------+-----------------------------+

 

This query took more than 5 minutes (the rows are large and table does not fit in cache well)

When you want to run this query mysql first will try to find each row where col1 is A or B using index. Then its going to order by the ID using file sort and then send first 20 rows ignoring the rest.

In this case MySQL has 2 indexes where one is usable to find rows, while other is usable to return them in the right order. MySQL can chose only one of them to execute the query - use index to find rows. This is sensible strategy if there is no LIMIT, however it is poor chose if there is one - it is often a lot faster to retrieve rows in order checking WHERE clause for them until required number of rows were returned. Especially in the cases when WHERE clause is not very selective.

So I tried this:

CODE:
  1. select * from table where id in (SELECT id FROM `table` WHERE (col1='A'||col1='B')) ORDER BY id DESC LIMIT 20 OFFSET 0;

 

In this case we forcing MySQL to do retrieve rows in sorted order and checking if it matches our original WHERE clause with subselects. It looks scary if we look at EXPLAIN but in reality the dependent subquery is only executed enough times to produce 20 rows in result set.

CODE:
  1. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
  2. | id | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows   | Extra       |
  3. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+
  4. |  1 | PRIMARY            | table  | index           | NULL          | PRIMARY | 4       | NULL | 765105 | Using where |
  5. |  2 | DEPENDENT SUBQUERY | table  | unique_subquery | PRIMARY,col1  | PRIMARY | 4       | func |      1 | Using where |
  6. +----+--------------------+-------+-----------------+---------------+---------+---------+------+--------+-------------+

 

The result is a lot better result time:

CODE:
  1. (20 rows in set (0.01 sec))

 

So by rewriting query using subqueries we actually improved it performance 100 times. So subqueries are
not always slowing things down.

Even though proving subqueries are not always slow this way is not the most optimal. We do not really need separate subselect to make MySQL check WHERE clause while scanning table in index order. We can just use FORCE INDEX hint to override MySQL index choice:

CODE:
  1. mysql> explain select * from table FORCE INDEX(PRIMARY) where (col1='A'||col1='B') order by id desc limit 20 offset 0;
  2. +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
  3. | id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
  4. +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
  5. |  1 | SIMPLE      | table  | index | NULL          | PRIMARY | 4       | NULL | 549117 | Using where |
  6. +----+-------------+-------+-------+---------------+---------+---------+------+--------+-------------+
  7.  
  8. mysql> select * from table FORCE INDEX(PRIMARY) where (col1='A'||col1='B') order by id desc limit 20 offset 0;
  9. ...
  10. 20 rows in set (0.00 sec)

 

This approach works well if WHERE clause is not very selective, otherwise MySQL may need to scan very many rows to find enough matching rows. You can use another trick Peter
wrote. about couple of years ago.

 

分享到:
评论

相关推荐

    精通sql结构化查询语句

    以SQL Server为工具,讲解SQL语言的应用,提供了近500个曲型应用,读者可以随查随用,深入讲解SQL语言的各种查询语句,详细介绍数据库设计及管理,详细讲解存储过程、解发器和游标等知识,讲解了SQL语言在高级语言中...

    Hibernate Fetch 的作用

    在深入探讨Hibernate Fetch的作用之前,我们首先需要理解Hibernate框架本身以及其在持久化领域的重要性。Hibernate是一个开源的对象关系映射(ORM)框架,用于Java应用,它提供了将数据库记录映射到对象的方法,使得...

    Hibernate关联关系hbm.xml中的相关属性

    5. `fetch`: 控制集合的加载策略,如`select`(默认)或`subselect`。 6. `order-by`: 对集合元素进行排序的SQL表达式。 最后,`<many-to-many>`标签用于定义多对多的关系,例如学生可以选择多个课程,课程也可以被...

    hibernate配置参数详解

    ### Hibernate配置参数详解 #### 一、数据库连接配置 在Hibernate框架中,为了与数据库进行交互,必须正确地配置数据库连接参数。以下是一些关键的配置项及其含义: 1. **`hibernate.connection.driver_class`**:...

    Visual C++ 编程资源大全(英文源码 表单)

    (400KB)<END><br>78,AdvancedPrev.zip A simple class that helps provide faster Print Preview within MFC Doc/View applications(38KB)<END><br>79,mditab.zip A dockable bar containing a tabbed list ...

    Hibernate Reference Documentation3.1

    Using subselect fetching 19.1.7. Using lazy property fetching 19.2. The Second Level Cache 19.2.1. Cache mappings 19.2.2. Strategy: read only 19.2.3. Strategy: read/write 19.2.4. Strategy: nonstrict ...

    hibernate子查询

    然而,通过深入研究Hibernate的文档和配置,我们可以找到一种利用`subselect`属性在hbm(Hibernate Mapping)文件中定义临时对象的方法。 在Hibernate的`class`元素中,`subselect`属性允许我们指定一个SQL表达式,...

    How to solve the hibernate N+1 problem?

    5. **使用`@Fetch(FetchMode.JOIN)`或`@Fetch(FetchMode.SUBSELECT)`**:前者强制使用JOIN来获取关联数据,后者在父对象集合被初始化时执行子查询。 6. **使用`@NamedEntityGraph`**:JPA 2.1引入的新特性,允许...

    mondrian学习资料

    - **Subselect与Subcube**:在MDX中,子集和子立方体(Subcube)用于从整个数据立方体中提取部分数据。Subselect可以用于定义一个独立的查询区域,而Subcube则是在主查询基础上切出的一个特定的数据区域。 5. **...

    day36 06-Hibernate抓取策略:set集合上的抓取策略

    为了解决这个问题,Hibernate提供了多种抓取策略,如Eager Loading(急加载)、Batch Fetching(批量加载)和Subselect Fetching(子查询加载)等。 1. Eager Loading:急加载是一种在加载主对象时同时加载其关联...

    Hibernate参考文档

    使用子查询抓取(Using subselect fetching) 19.1.7. 使用延迟属性抓取(Using lazy property fetching) 19.2. 二级缓存(The Second Level Cache) 19.2.1. 缓存映射(Cache mappings) 19.2.2. 策略:只读...

    Hibernate 中文 html 帮助文档

    使用子查询抓取(Using subselect fetching) 19.1.7. 使用延迟属性抓取(Using lazy property fetching) 19.2. 二级缓存(The Second Level Cache) 19.2.1. 缓存映射(Cache mappings) 19.2.2. 策略:只读...

    NHibernate参考文档 2.0.0 chm

    16.1.6. 使用子查询抓取(Using subselect fetching) 16.2. 二级缓存(The Second Level Cache) 16.2.1. 缓存映射(Cache mappings) 16.2.2. 策略:只读缓存(Strategy: read only) 16.2.3. 策略:读/写缓存...

    NHibernate中文帮组文档(2008.11月更新)

    16.1.6. 使用子查询抓取(Using subselect fetching) 16.2. 二级缓存(The Second Level Cache) 16.2.1. 缓存映射(Cache mappings) 16.2.2. 策略:只读缓存(Strategy: read only) 16.2.3. 策略:读/写缓存...

    最全Hibernate 参考文档

    19.1.5. 使用子查询抓取(Using subselect fetching) 19.1.6. 使用延迟属性抓取(Using lazy property fetching) 19.2. 二级缓存(The Second Level Cache) 19.2.1. 缓存映射(Cache mappings) 19.2.2. 策略...

    Hibernate3+中文参考文档

    19.1.5. 使用子查询抓取(Using subselect fetching) 19.1.6. 使用延迟属性抓取(Using lazy property fetching) 19.2. 二级缓存(The Second Level Cache) 19.2.1. 缓存映射(Cache mappings) 19.2.2. 策略...

    hibernate 框架详解

    目录 前言 1.... 2.... 1. 在Tomcat中快速上手 ... 1.1.... 1.2.... 1.3.... 1.4.... 1.5.... 2.1.... 2.2.... 2.2.1.... 2.2.2.... 2.2.3.... 2.2.4.... 2.2.5.... 2.2.6.... 2.3.... 2.3.1.... 2.3.2.... 2.3.3.... 2.3.4.... 2.3.5.... 2.3.6.... 2.4.... 3.1.... 3.2....

Global site tag (gtag.js) - Google Analytics