`

优化groupby 和distinct

阅读更多

Optimizing GROUP BY and DISTINCT

MySQL optimizes these two kinds of queries similarly in many cases, and in fact con-

verts between them as needed internally during the optimization process. Both types

of queries benefit from indexes, as usual, and that’s the single most important way to

optimize them.

MySQL has two kinds of GROUP BY strategies when it can’t use an index: it can use a

temporary table or a filesort to perform the grouping. Either one can be more efficient

244 | Chapter 6: Query Performance Optimization

for any given query. You can force the optimizer to choose one method or the other

with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints, as discussed earlier in

this chapter.

If you need to group a join by a value that comes from a lookup table, it’s usually more

efficient to group by the lookup table’s identifier than by the value. For example, the

following query isn’t as efficient as it could be:

mysql> SELECT actor.first_name, actor.last_name, COUNT(*)

-> FROM sakila.film_actor

->

INNER JOIN sakila.actor USING(actor_id)

-> GROUP BY actor.first_name, actor.last_name;

The query is more efficiently written as follows:

mysql> SELECT actor.first_name, actor.last_name, COUNT(*)

-> FROM sakila.film_actor

->

INNER JOIN sakila.actor USING(actor_id)

-> GROUP BY film_actor.actor_id;

Grouping by actor.actor_id could be even more efficient than grouping by film_

actor.actor_id. You should test on your specific data to see.

This query takes advantage of the fact that the actor’s first and last name are dependent

on the actor_id, so it will return the same results, but it’s not always the case that you

can blithely select nongrouped columns and get the same result. You might even have

the server’s SQL_MODE configured to disallow it. You can use MIN() or MAX() to work

around this when you know the values within the group are distinct because they de-

pend on the grouped-by column, or if you don’t care which value you get:

mysql> SELECT MIN(actor.first_name), MAX(actor.last_name), ...;

Purists will argue that you’re grouping by the wrong thing, and they’re right. A spurious

MIN() or MAX() is a sign that the query isn’t structured correctly. However, sometimes

your only concern will be making MySQL execute the query as quickly as possible. The

purists will be satisfied with the following way of writing the query:

mysql> SELECT actor.first_name, actor.last_name, c.cnt

-> FROM sakila.actor

->

INNER JOIN (

->

SELECT actor_id, COUNT(*) AS cnt

->

FROM sakila.film_actor

->

GROUP BY actor_id

->

) AS c USING(actor_id) ;

But the cost of creating and filling the temporary table required for the subquery may

be high compared to the cost of fudging pure relational theory a little bit. Remember,

the temporary table created by the subquery has no indexes.17

It’s generally a bad idea to select nongrouped columns in a grouped query, because the

results will be nondeterministic and could easily change if you change an index or the

17. This is another limitation that’s fixed in MariaDB, by the way.

Optimizing Specific Types of Queries | 245

optimizer decides to use a different strategy. Most such queries we see are accidents

(because the server doesn’t complain), or are the result of laziness rather than being

designed that way for optimization purposes. It’s better to be explicit. In fact, we suggest

that you set the server’s SQL_MODE configuration variable to include ONLY_FULL

_GROUP_BY so it produces an error instead of letting you write a bad query.

MySQL automatically orders grouped queries by the columns in the GROUP BY clause,

unless you specify an ORDER BY clause explicitly. If you don’t care about the order and

you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort.

You can also add an optional DESC or ASC keyword right after the GROUP BY clause to

order the results in the desired direction by the clause’s columns.

 

分享到:
评论

相关推荐

    MySQL中索引优化distinct语句及distinct的多字段操作

    MySQL通常使用GROUPBY(本质上是排序动作)完成DISTINCT操作,如果DISTINCT操作和ORDERBY操作组合使用,通常会用到临时表.这样会影响性能. 在一些情况下,MySQL可以使用索引优化DISTINCT操作,但需要活学活用.本文涉及一个...

    oracle中使用group by优化distinct

    主要介绍了oracle中使用group by优化distinct的相关资料,需要的朋友可以参考下

    高性能MySQL(第3版).part2

    6.7.4优化GROUPBY和DISTINCT239 6.7.5优化LIMIT分页241 6.7.6优化SQL_CALC_FOUND_ROWS243 6.7.7优化UNION查询243 6.7.8静态查询分析244 6.7.9使用用户自定义变量244 6.8案例学习251 6.8.1使用MySQL构建一个...

    MySQL中distinct语句的基本原理及其与group by的比较

    DISTINCT 实际上和 GROUP BY 操作的实现非常相似,只不过是在 GROUP BY 之后的每组中只取出一条记录而已。所以,DISTINCT 的实现和 GROUP BY 的实现也基本差不多,没有太大的区别。同样可以通过松散索引扫描或者是...

    MySQL中distinct与group by语句的一些比较及用法讲解

    在数据表中记录了用户验证时使用的书目,现在想取出所有书目,用DISTINCT和group by都取到了我想要的结果,但我发现返回结果排列不同,distinct会按数据存放顺序一条条显示,而group by会做个排序(一般是ASC)。...

    解析mysql中:单表distinct、多表group by查询去除重复记录

    单表的唯一查询用:distinct多表的唯一查询用:group bydistinct 查询多表时,left join 还有效,全连接无效,在使用mysql时,有时需要查询出某个字段不重复的记录,虽然mysql提供有distinct这个关键字来过滤掉多余...

    mysql查询优化.

    1、理解MYSQL的Query Optimizer 2、Query语句优化基本思路和原则 3、充分利用Explain 和 Profiling 4、合理设计并利用索引 5、order by、group by 和 DISTINCT优化 6、小结

    MySQL SQL优化 .docx

    order by, group by,distinct … 都是消耗 CPU 的大户(这些操作基本上都是 CPU 处理内存中的数据比较运算)。当我们的 IO 优化做到一定阶段之后,降低 CPU 计算也就成为了我们 SQL 优化的重要目标

    MySQL中Distinct和Group By语句的基本使用教程

    MySQL Distinct 去掉查询结果重复记录 DISTINCT 使用 DISTINCT 关键字可以去掉查询中某个字段的重复记录。 语法: SELECT DISTINCT(column) FROM tb_name 例子: 假定 user 表有如下记录: uid username 1 小李...

    MySQL数据库查询优化

    从理论看,GROUP BY、ORDER BY、LIMIT、DISTINCT等怎么被优化? MySQL中:GROUP BY是怎么优化的?ORDER BY是怎么被优化?LIMIT是怎么被优化?DISTINCT是怎么被优化? 非SPJ优化与索引的关系。 预计时间1小时 第10课...

    MySQL DISTINCT 的基本实现原理详解

    DISTINCT 实际上和 GROUP BY 操作的实现非常相似,只不过是在 GROUP BY 之后的每组中只取出一条记录而已。所以,DISTINCT 的实现和 GROUP BY 的实现也基本差不多,没有太大的区别。同样可以通过松散索引扫描或者是...

    MySQL中distinct与group by之间的性能进行比较

    最近在网上看到了一些测试,感觉不是很准确,今天亲自测试了一番。得出了结论,测试过程在个人计算机上,可能不够全面,仅供参考。 测试过程: 准备一张测试表  CREATE TABLE `test_test` ( `id` int(11) NOT ...

    SQL优化手册

    2、group by vs distinct 案例:select count() from (select name from student group by name )student_temp 这个sql作用是统计学生表不重复姓名的总数 优化方案1:select count(distinct name) from student 优化...

    揭秘SQL优化技巧 改善数据库性能

    order by, group by,distinct … 都是消耗 CPU 的大户(这些操作基本上都是 CPU 处理内存中的数据比较运算)。当我们的 IO 优化做到一定阶段之后,降低 CPU 计算也就成为了我们 SQL 优化的重要目标 优化方法

    PLSQL程序优化和性能分析方法

    2.4.12 group by优化 13 2.4.13 尽量避免用order by 14 2.4.14 用Where子句替换HAVING子句 14 2.4.15 使用表的别名(Alias) 14 2.4.16 删除重复记录 14 2.4.17 COMMIT使用 15 2.4.18 减少多表关联 15 2.4.19 批量数据...

    MySQL经典面试题(含答案).zip

    group by 和 distinct 的区别? Blob和text有什么区别? 常见的存储引擎有哪些? myisam和innodb的区别? bin log/redo log/undo log是什么? bin log和redo log有什么区别? 说一下数据库的三大范式? 什么是存储...

    MySQL中distinct语句去查询重复记录及相关的性能讨论

    在 MySQL 查询中,可能会包含重复值。...SELECT *, COUNT(DISTINCT nowamagic) FROM table GROUP BY nowamagic 这个用法,MySQL的版本不能太低。 在编写查询之前,我们甚至应该对过滤条件进行排序,真正高效的条

    大数据企业级调优的完整过程:9.1 Fetch抓取;9.2 本地模式;9.3 表的优化;9.4 数据倾斜;9.5 并行执行

    9.3.4 Group By;9.3.5 Count(Distinct) 去重统计;9.3.6 笛卡尔积;9.3.7 行列过滤;9.3.8 动态分区调整;9.3.9 分桶;9.3.10 分区);9.4 数据倾斜(9.4.1 合理设置Map数;9.4.2 小文件进行合并;9.4.3 复杂文件...

Global site tag (gtag.js) - Google Analytics