`

优化groupby 和distinct

阅读更多

Optimizing GROUP BY and DISTINCT

MySQL optimizes these two kinds of queries similarly in many cases, and in fact con-

verts between them as needed internally during the optimization process. Both types

of queries benefit from indexes, as usual, and that’s the single most important way to

optimize them.

MySQL has two kinds of GROUP BY strategies when it can’t use an index: it can use a

temporary table or a filesort to perform the grouping. Either one can be more efficient

244 | Chapter 6: Query Performance Optimization

for any given query. You can force the optimizer to choose one method or the other

with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints, as discussed earlier in

this chapter.

If you need to group a join by a value that comes from a lookup table, it’s usually more

efficient to group by the lookup table’s identifier than by the value. For example, the

following query isn’t as efficient as it could be:

mysql> SELECT actor.first_name, actor.last_name, COUNT(*)

-> FROM sakila.film_actor

->

INNER JOIN sakila.actor USING(actor_id)

-> GROUP BY actor.first_name, actor.last_name;

The query is more efficiently written as follows:

mysql> SELECT actor.first_name, actor.last_name, COUNT(*)

-> FROM sakila.film_actor

->

INNER JOIN sakila.actor USING(actor_id)

-> GROUP BY film_actor.actor_id;

Grouping by actor.actor_id could be even more efficient than grouping by film_

actor.actor_id. You should test on your specific data to see.

This query takes advantage of the fact that the actor’s first and last name are dependent

on the actor_id, so it will return the same results, but it’s not always the case that you

can blithely select nongrouped columns and get the same result. You might even have

the server’s SQL_MODE configured to disallow it. You can use MIN() or MAX() to work

around this when you know the values within the group are distinct because they de-

pend on the grouped-by column, or if you don’t care which value you get:

mysql> SELECT MIN(actor.first_name), MAX(actor.last_name), ...;

Purists will argue that you’re grouping by the wrong thing, and they’re right. A spurious

MIN() or MAX() is a sign that the query isn’t structured correctly. However, sometimes

your only concern will be making MySQL execute the query as quickly as possible. The

purists will be satisfied with the following way of writing the query:

mysql> SELECT actor.first_name, actor.last_name, c.cnt

-> FROM sakila.actor

->

INNER JOIN (

->

SELECT actor_id, COUNT(*) AS cnt

->

FROM sakila.film_actor

->

GROUP BY actor_id

->

) AS c USING(actor_id) ;

But the cost of creating and filling the temporary table required for the subquery may

be high compared to the cost of fudging pure relational theory a little bit. Remember,

the temporary table created by the subquery has no indexes.17

It’s generally a bad idea to select nongrouped columns in a grouped query, because the

results will be nondeterministic and could easily change if you change an index or the

17. This is another limitation that’s fixed in MariaDB, by the way.

Optimizing Specific Types of Queries | 245

optimizer decides to use a different strategy. Most such queries we see are accidents

(because the server doesn’t complain), or are the result of laziness rather than being

designed that way for optimization purposes. It’s better to be explicit. In fact, we suggest

that you set the server’s SQL_MODE configuration variable to include ONLY_FULL

_GROUP_BY so it produces an error instead of letting you write a bad query.

MySQL automatically orders grouped queries by the columns in the GROUP BY clause,

unless you specify an ORDER BY clause explicitly. If you don’t care about the order and

you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort.

You can also add an optional DESC or ASC keyword right after the GROUP BY clause to

order the results in the desired direction by the clause’s columns.

 

分享到:
评论

相关推荐

    MySQL中索引优化distinct语句及distinct的多字段操作

    MySQL通常使用GROUPBY(本质上是排序动作)完成DISTINCT操作,如果DISTINCT操作和ORDERBY操作组合使用,通常会用到临时表.这样会影响性能. 在一些情况下,MySQL可以使用索引优化DISTINCT操作,但需要活学活用.本文涉及一个...

    分析MySQL中优化distinct的技巧

    MySQL的优化器将`DISTINCT`操作转换为`GROUP BY`,使得查询在利用索引分组后,仅扫描一次所需的`nick`值。在新的执行计划中,`Using index for group-by`表明查询利用索引完成了分组操作,从而提高了效率。 通过...

    oracle中使用group by优化distinct

    为了解决这个问题,我们可以考虑使用`GROUP BY`来优化`DISTINCT`。 首先,理解`DISTINCT`与`GROUP BY`的区别。`DISTINCT`用于返回不重复的行,而`GROUP BY`则用于对数据进行分组并计算每个组的聚合函数(如COUNT、...

    【DISTINCT】优化之MySQL官方文档翻译

    这种等价性意味着适用于`GROUP BY`查询的优化方法同样适用于带有`DISTINCT`子句的查询。因此,对于`DISTINCT`查询的更多优化细节,可以参考[MySQL官方文档]...

    数据库面试题索引sql优化

    7. **优化GROUP BY和DISTINCT:** - 对于GROUP BY和DISTINCT操作,如果可能的话,考虑先过滤记录再进行分组或去重,以减少处理的数据量。 8. **使用物化视图:** - 物化视图可以在数据库级别预先计算复杂查询的...

    101个MySQL的调节和优化方法

    24. **优化GROUP BY和DISTINCT**:合理设计查询结构,减少聚合操作的复杂性。 25. **使用子查询替代JOIN**:在某些情况下,子查询可以更高效地完成相同任务。 26. **使用全文索引优化搜索**:对于复杂的文本搜索需求...

    Mysql中distinct与group by的去重方面的区别

    在MySQL数据库中,`DISTINCT` 和 `GROUP BY` 是两个非常重要的SQL关键字,它们都可以用来处理数据的去重问题,但在实际应用中,两者的使用场景和效果有所差异。 首先,`DISTINCT` 关键字的主要作用是去除查询结果中...

    高性能MySQL(第3版).part2

    6.7.4优化GROUPBY和DISTINCT239 6.7.5优化LIMIT分页241 6.7.6优化SQL_CALC_FOUND_ROWS243 6.7.7优化UNION查询243 6.7.8静态查询分析244 6.7.9使用用户自定义变量244 6.8案例学习251 6.8.1使用MySQL构建一个...

    MySQL中distinct与group by之间的性能进行比较

    在MySQL数据库中,`DISTINCT` 和 `GROUP BY` 是两个用于数据去重的语句,但它们在功能和性能上有显著的区别。这次的测试主要比较了这两种方法在不同条件下的执行效率,尤其是在数据量较大的情况下。 首先,`...

    MySQL优化GROUP BY方案

    在处理大数据量时,优化GROUP BY语句至关重要,因为它直接影响到查询性能。本篇文章将深入探讨MySQL优化GROUP BY的策略。 ### 一、松散索引扫描(Loose Index Scan) 松散索引扫描是一种优化策略,适用于以下特定...

    使用GROUP BY的时候如何统计记录条数 COUNT(*) DISTINCT

    如果我们错误地使用了GROUP BY email, passwords和COUNT(*),则会得到每个组合的记录数,包括那些重复的记录。这并不是我们统计不同组合的目标结果。 在PHP中,我们通常使用mysql_num_rows()函数来获取结果集中记录...

    MySQL中distinct语句的基本原理及其与group by的比较

    在MySQL数据库中,`DISTINCT`和`GROUP BY`都是用于数据去重和分组的SQL语句,但它们有着不同的应用场景和优化策略。本文将深入探讨`DISTINCT`语句的基本原理,并与`GROUP BY`进行比较。 首先,`DISTINCT`关键字用于...

    MySQL中distinct与group by语句的一些比较及用法讲解

    在MySQL数据库中,`DISTINCT` 和 `GROUP BY` 是两种用于数据去重和分组统计的SQL语句,它们虽然都可以帮助我们处理重复数据,但有着不同的应用场景和执行机制。 1. `DISTINCT` 关键字: - `DISTINCT` 主要用于去除...

    MySQL数据库优化SQL篇PPT课件.pptx

    从执行计划、SELECT语句、IN和EXIST语句、LIMIT语句、RAND函数、Order by、Group by、Distinct和Count等方面对MySQL数据库优化进行了详细的讲解。 一、执行计划 执行计划是MySQL数据库优化的重要步骤。执行计划...

    使用Distinct查询.rar

    `DISTINCT`通常与`GROUP BY`和聚合函数(如`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`)一起使用,用于计算每个分组的唯一值。例如,计算每个部门的员工数量: ```sql SELECT DISTINCT department, COUNT(*) AS employee...

    distinct 多列问题结合group by的解决方法

    然而,当`DISTINCT`和`GROUP BY`结合使用时,它们可以实现更复杂的数据筛选。 例如,考虑你给出的表格`table1`,包含以下字段:`id`, `RegName`, `PositionSN`, `PersonSN`。如果你想要获取不重复的`RegName`, `...

Global site tag (gtag.js) - Google Analytics