优化groupby 和distinct -

abacus

浏览: 14462 次
性别:
来自: 北京

最近访客更多访客>>

lihaoware

方片小

juexin547137950

欧阳鹏

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

优化groupby 和distinct

博客分类：

mysql

mysql performance

Optimizing GROUP BY and DISTINCT

MySQL optimizes these two kinds of queries similarly in many cases, and in fact con-

verts between them as needed internally during the optimization process. Both types

of queries benefit from indexes, as usual, and that’s the single most important way to

optimize them.

MySQL has two kinds of GROUP BY strategies when it can’t use an index: it can use a

temporary table or a filesort to perform the grouping. Either one can be more efficient

244 | Chapter 6: Query Performance Optimization

for any given query. You can force the optimizer to choose one method or the other

with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints, as discussed earlier in

this chapter.

If you need to group a join by a value that comes from a lookup table, it’s usually more

efficient to group by the lookup table’s identifier than by the value. For example, the

following query isn’t as efficient as it could be:

mysql> SELECT actor.first_name, actor.last_name, COUNT(*)

-> FROM sakila.film_actor

INNER JOIN sakila.actor USING(actor_id)

-> GROUP BY actor.first_name, actor.last_name;

The query is more efficiently written as follows:

mysql> SELECT actor.first_name, actor.last_name, COUNT(*)

-> FROM sakila.film_actor

INNER JOIN sakila.actor USING(actor_id)

-> GROUP BY film_actor.actor_id;

Grouping by actor.actor_id could be even more efficient than grouping by film_

actor.actor_id. You should test on your specific data to see.

This query takes advantage of the fact that the actor’s first and last name are dependent

on the actor_id, so it will return the same results, but it’s not always the case that you

can blithely select nongrouped columns and get the same result. You might even have

the server’s SQL_MODE configured to disallow it. You can use MIN() or MAX() to work

around this when you know the values within the group are distinct because they de-

pend on the grouped-by column, or if you don’t care which value you get:

mysql> SELECT MIN(actor.first_name), MAX(actor.last_name), ...;

Purists will argue that you’re grouping by the wrong thing, and they’re right. A spurious

MIN() or MAX() is a sign that the query isn’t structured correctly. However, sometimes

your only concern will be making MySQL execute the query as quickly as possible. The

purists will be satisfied with the following way of writing the query:

mysql> SELECT actor.first_name, actor.last_name, c.cnt

-> FROM sakila.actor

INNER JOIN (

SELECT actor_id, COUNT(*) AS cnt

FROM sakila.film_actor

GROUP BY actor_id

) AS c USING(actor_id) ;

But the cost of creating and filling the temporary table required for the subquery may

be high compared to the cost of fudging pure relational theory a little bit. Remember,

the temporary table created by the subquery has no indexes.17

It’s generally a bad idea to select nongrouped columns in a grouped query, because the

results will be nondeterministic and could easily change if you change an index or the

17. This is another limitation that’s fixed in MariaDB, by the way.

Optimizing Specific Types of Queries | 245

optimizer decides to use a different strategy. Most such queries we see are accidents

(because the server doesn’t complain), or are the result of laziness rather than being

designed that way for optimization purposes. It’s better to be explicit. In fact, we suggest

that you set the server’s SQL_MODE configuration variable to include ONLY_FULL

_GROUP_BY so it produces an error instead of letting you write a bad query.

MySQL automatically orders grouped queries by the columns in the GROUP BY clause,

unless you specify an ORDER BY clause explicitly. If you don’t care about the order and

you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort.

You can also add an optional DESC or ASC keyword right after the GROUP BY clause to

order the results in the desired direction by the clause’s columns.

分享到：

Ab Test How to use ab (ApacheBench) for ... | 一个简单的sql优化

2014-04-21 13:43
浏览 1195
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

优化groupby 和distinct

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

优化groupby 和distinct

评论

发表评论

相关推荐

一个简单的sql优化

最近访客更多访客>>