这几天在做一个和oracle相关的项目的时候,遇到一个问题,我需要一次性查出一个表里每一个分组的最新的一条记录,于是想到了group by,处理后,可以获得每个分组最新的记录的时间,但是下一步卡住了,后来在网上查到了一个相关的解决方案,作为技术备份,先记录在这里。
原始链接:http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
PS:因为我用的oracle数据库,这个是mysql数据库,所以在第一个解决方法
select f.type, f.variety, f.price from ( select type, min(price) as minprice from fruits group by type ) as x inner join fruits as f on f.type = x.type and f.price = x.minprice;
在oracle下的代码要去掉里面的as和inner便可以执行通过了。
How to select the first/least/max row per group in SQL
Here are some common SQL problems, all of which have related solutions: how do I find the most recent log entry for each program? How do I find the most popular item from each category? How do I find the top score for each player? In general, these types of “select the extreme from each group” queries can be solved with the same techniques. I’ll explain how to do that in this article, including the harder problem of selecting the top N entries, not just the top 1.
This topic is related to numbering rows, which I just wrote about (see my articles about MySQL-specific and generic techniques to assign a number to each row in a group). Therefore I’ll use nearly the same table and data as I used in those articles, with the addition of a price column:
[code="java"]+--------+------------+-------+
| type | variety | price |
+--------+------------+-------+
| apple | gala | 2.79 |
| apple | fuji | 0.24 |
| apple | limbertwig | 2.87 |
| orange | valencia | 3.59 |
| orange | navel | 9.36 |
| pear | bradford | 6.05 |
| pear | bartlett | 2.14 |
| cherry | bing | 2.55 |
| cherry | chelan | 6.33 |
+--------+------------+-------+
Selecting the one maximum row from each group
Let’s say I want to select the most recent log entry for each program, or the most recent changes in an audit table, or something of the sort. This question comes up over and over on IRC channels and mailing lists. I’ll re-phrase the question in terms of fruits. I want to select the cheapest fruit from each type. Here’s the desired result:
+--------+----------+-------+ | type | variety | price | +--------+----------+-------+ | apple | fuji | 0.24 | | orange | valencia | 3.59 | | pear | bartlett | 2.14 | | cherry | bing | 2.55 | +--------+----------+-------+
There are a few common solutions to this problem. All involve two steps: finding the desired value of price, and then selecting the rest of the row based on that.
One common solution is a so-called self-join. Step one is to group the fruits by type (apple, cherry etc) and choose the minimum price:
select type, min(price) as minprice from fruits group by type; +--------+----------+ | type | minprice | +--------+----------+ | apple | 0.24 | | cherry | 2.55 | | orange | 3.59 | | pear | 2.14 | +--------+----------+
Step two is to select the rest of the row by joining these results back to the same table. Since the first query is grouped, it needs to be put into a subquery so it can be joined against the non-grouped table:
select f.type, f.variety, f.price from ( select type, min(price) as minprice from fruits group by type ) as x inner join fruits as f on f.type = x.type and f.price = x.minprice; +--------+----------+-------+ | type | variety | price | +--------+----------+-------+ | apple | fuji | 0.24 | | cherry | bing | 2.55 | | orange | valencia | 3.59 | | pear | bartlett | 2.14 | +--------+----------+-------+
Another common way to do this is with a correlated subquery. This can be much less efficient, depending on how good your system’s query optimizer is. You might find it clearer, though.
select type, variety, price from fruits where price = (select min(price) from fruits as f where f.type = fruits.type); +--------+----------+-------+ | type | variety | price | +--------+----------+-------+ | apple | fuji | 0.24 | | orange | valencia | 3.59 | | pear | bartlett | 2.14 | | cherry | bing | 2.55 | +--------+----------+-------+
Both queries are logically equivalent, though they may not perform the same.
Select the top N rows from each group
This is a slightly harder problem to solve. Finding a single row from each group is easy with SQL’s aggregate functions (MIN(), MAX(), and so on). Finding the first several from each group is not possible with that method because aggregate functions only return a single value. Still, it’s possible to do.
Let’s say I want to select the two cheapest fruits from each type. Here’s a first try:
select type, variety, price from fruits where price = (select min(price) from fruits as f where f.type = fruits.type) or price = (select min(price) from fruits as f where f.type = fruits.type and price > (select min(price) from fruits as f2 where f2.type = fruits.type)); +--------+----------+-------+ | type | variety | price | +--------+----------+-------+ | apple | gala | 2.79 | | apple | fuji | 0.24 | | orange | valencia | 3.59 | | orange | navel | 9.36 | | pear | bradford | 6.05 | | pear | bartlett | 2.14 | | cherry | bing | 2.55 | | cherry | chelan | 6.33 | +--------+----------+-------+
Yuck! That can be written as a self-join, but it’s just as bad (I leave it as an exercise for the reader). This gets worse as you go to higher numbers (top 3, top 4…). There are other ways to phrase the statement, but they all boil down to the same thing, and they’re all pretty unwieldy and inefficient.
Yuck! That can be written as a self-join, but it’s just as bad (I leave it as an exercise for the reader). This gets worse as you go to higher numbers (top 3, top 4…). There are other ways to phrase the statement, but they all boil down to the same thing, and they’re all pretty unwieldy and inefficient.
There’s a better way: select the variety from each type where the variety is no more than the second-cheapest of that type.
select type, variety, price from fruits where ( select count(*) from fruits as f where f.type = fruits.type and f.price < fruits.price ) <= 2;
This is elegant, and lets you vary N without rewriting your query (a very good thing!), but it’s functionally the same as the previous query. Both are essentially a quadratic algorithm relative to the number of varieties in each type. And again, some query optimizers may not do well with this and make it quadratic with respect to the number of rows in the table overall (especially if no useful index is defined), and the server might get clobbered. Are there better ways? Can it be done with one pass through the data, instead of the many passes required by a correlated subquery? You know it can, or I wouldn’t be writing this, now would I?
Use UNION
If there’s an index on (type, price), and there are many more records to eliminate than to include in each group, a more efficient single-pass method (especially for MySQL, but also for some other RDBMSs) is to break the queries out separately and put a limit on each, then UNION them all back together. Here’s the syntax you need for MySQL:
(select * from fruits where type = 'apple' order by price limit 2) union all (select * from fruits where type = 'orange' order by price limit 2) union all (select * from fruits where type = 'pear' order by price limit 2) union all (select * from fruits where type = 'cherry' order by price limit 2)
Peter Zaitsev has written in detail about this technique, so I won’t go into it too much more here. If it suits your purposes, it can be a very good solution.
One note: use UNION ALL, not just UNION. It prevents the server sorting the results to eliminate duplicates before returning them. In this case there will be no duplicates, so I’m telling the server to skip that (useless, expensive) step.
Do it with user variables on MySQL
The UNION trick is an especially good idea when the results are a small fraction of the rows in the table and there is an index that can be used for sorting the rows. Another linear-time technique, which might be a good option in cases where you are selecting most of the rows from the table anyway, is user variables. This is MySQL-specific. Please refer to my previous post on how to number rows in MySQL for the gory details of why this works:
set @num := 0, @type := ''; select type, variety, price from ( select type, variety, price, @num := if(@type = type, @num + 1, 1) as row_number, @type := type as dummy from fruits order by type, price ) as x where x.row_number <= 2;
This isn’t one pass through the table, by the way. The subquery is implemented as a temporary table behind the scenes, so filling it with data is one pass; then selecting every row from it and applying the WHERE clause is another. However, twice through is still O(n) with respect to the table size. That’s a lot better than correlated subqueries, which are O(n2) with respect to the group size — even moderate group sizes cause bad performance (say there are five varieties of each fruit. That’s on the order of 25 passes through the table, all told).
One-pass technique on MySQL… maybe?
If you want to leave your queries up the the query optimizer’s whims, you can try this technique, which builds no temporary tables and makes just one pass through:
set @num := 0, @type := ''; select type, variety, price, @num := if(@type = type, @num + 1, 1) as row_number, @type := type as dummy from fruits group by type, price, variety having row_number <= 2;
This theoretically ought to work if MySQL orders by the GROUP BY criteria, which it sometimes does for efficiency and to produce the expected results. Does it work? Here’s what it returns on MySQL 5.0.27 on Windows:
+--------+----------+-------+------------+--------+
| type | variety | price | row_number | dummy |
+--------+----------+-------+------------+--------+
| apple | gala | 2.79 | 1 | apple |
| apple | fuji | 0.24 | 3 | apple |
| orange | valencia | 3.59 | 1 | orange |
| orange | navel | 9.36 | 3 | orange |
| pear | bradford | 6.05 | 1 | pear |
| pear | bartlett | 2.14 | 3 | pear |
| cherry | bing | 2.55 | 1 | cherry |
| cherry | chelan | 6.33 | 3 | cherry |
+--------+----------+-------+------------+--------+
Look closely… it’s returning rows one and three from each group, and they’re not numbered in order of increasing price? Huh? But the HAVING clause says the row_number should be no greater than 2! Here’s what it returns on version 5.0.24a on Ubuntu:
+--------+------------+-------+------------+--------+
| type | variety | price | row_number | dummy |
+--------+------------+-------+------------+--------+
| apple | fuji | 0.24 | 1 | apple |
| apple | gala | 2.79 | 1 | apple |
| apple | limbertwig | 2.87 | 1 | apple |
| cherry | bing | 2.55 | 1 | cherry |
| cherry | chelan | 6.33 | 1 | cherry |
| orange | valencia | 3.59 | 1 | orange |
| orange | navel | 9.36 | 1 | orange |
| pear | bartlett | 2.14 | 1 | pear |
| pear | bradford | 6.05 | 1 | pear |
+--------+------------+-------+------------+--------+
Look, this time everything is numbered 1 and every row is returned. Wonky. This is exactly what the MySQL manual page on user variables warns about.
This technique is pretty much non-deterministic, because it relies on things that you and I don’t get to control directly, such as which indexes MySQL decides to use for grouping. However, if you need to use it — and I know there are some folks out there who do, because I’ve consulted for them — you can still tweak it. We’re getting into the realm of really bastardizing SQL, but the results above came from a table without indexes other than the primary key on (type, variety). What happens if I add an index MySQL can use for grouping?
alter table fruits add key(type, price);
Nothing changes, and EXPLAIN shows why: the query doesn’t use the index I just added. Why? Because the grouping is on three columns, and the index is only on two. In fact, the query is using a temp table and filesort anyway, so this is still not achieving the once-through goal. I can force it to use the index:
set @num := 0, @type := ''; select type, variety, price, @num := if(@type = type, @num + 1, 1) as row_number, @type := type as dummy from fruits force index(type) group by type, price, variety having row_number <= 2;
Let’s see if that works:
+--------+----------+-------+------------+--------+
| type | variety | price | row_number | dummy |
+--------+----------+-------+------------+--------+
| apple | fuji | 0.24 | 1 | apple |
| apple | gala | 2.79 | 2 | apple |
| cherry | bing | 2.55 | 1 | cherry |
| cherry | chelan | 6.33 | 2 | cherry |
| orange | valencia | 3.59 | 1 | orange |
| orange | navel | 9.36 | 2 | orange |
| pear | bartlett | 2.14 | 1 | pear |
| pear | bradford | 6.05 | 2 | pear |
+--------+----------+-------+------------+--------+
Ah, now we’re cooking! It did what I wanted, without a filesort or temporary table. Another way to do this, by the way, is to take variety out of the GROUP BY so it uses the index on its own. Because this selects a non-grouped column from a grouped query, this only works if you are running with ONLY_FULL_GROUP_BY mode turned off, which I hope you are not doing without good reason.
Other methods
Be sure to check the comments for user-contributed methods. There are some really novel approaches. I always learn so much from your comments… thank you!
Conclusion
Well, that’s it. I’ve shown you several ways of solving the common “get the extreme row from each group” query, and then moved on to how you can get the top N rows from each group in various ways. Then I dove into MySQL-specific techniques which some (including myself, depending on my mood) would regard as mildly foolish to utterly stupid. But if you need the last bit of speed out of your server, you sometimes have to know when to break the rules. And for those who think this is just MySQL foolishness, it’s not; I’ve seen people desperately do these types of things on other platforms too, such as SQL Server. There are hacks and tweaks on every platform, and people who need to use them.
相关推荐
VB中SQL数据备份/还原。不错的一个程序,里面包含了数据库的备份/还原。
sql server 提取汉字/数字/字母的方法 里面是Sql语句,以及测试代码等,一看就会,而且是现成的代码
此文档中详细的记载了,SQL Server分组排序取数据的实现,希望可以帮到下载的朋友们!
sql自定义函数,用于合并字符串列,可以在分组的情况下对varchar类型的字段合并
该程序通过程序代码实现SQL SERVER 2000数据库的完全备份/恢复和数据库表备份/恢复,经过测试在备份恢复时需要关闭备份的数据库,负责备份/恢复数据库不能成功!希望大家喜欢
SQL SERVER 通用数据库恢复/备份组件 免费使用 支持 目前所有SQL SERVER版本 附带详细的调用例子 为你的程序添加数据库恢复/备份功能。 还可以做单独的数据库安装程序。赶快下载吧! 2002-08-01 更新: ...
获取SQL_Server元数据的几种方法获取SQL_Server元数据的几种方法获取SQL_Server元数据的几种方法获取SQL_Server元数据的几种方法
如何通过Excel来获取SQLServer数据 方便对数据库中的某些数据取数!
本系统所实现功能既能保证近一段时间的数据总在数据库中,又能保证系统管理员可随时恢复备份数据,供用户查询。本文介绍了一种特殊的数据磁带备份与恢复的具体实现方法。需要指出的是,数据库备份操作是数据库管理...
用一条SQL语句将数据表中某列更新到另一个数据表里
SQL中获取数据库中所有表的所有列名的方法 可以查询数据库中的某个列明,也可查询数据库中所有的列名
昨天遇到一个SQL Server的问题:需要写一个储存过程来处理几个表中的数据,最后问题出在我想将一个表的一个列的多行内容拼接成一行,比如表中有两列数据 : 类别 名称 AAA 企业1 AAA 企业2 AAA 企业3 ...
SQL数据备份恢复SQL数据备份恢复SQL数据备份恢复
由于是不断备份数据,可能导致硬盘空间不够,所以,你可以将"自动删除N天的备份数据"勾上, 同时设置好N值,然后点击保存,再点击"伺服运行", 5>.此款是免费软件,可任意传播. 6>.软件作者承诺软件质量,但不承担相应法律...
SQL提供了大量用于分类、分组和总计的子句及运算符。本文介绍了何时进行分类、何时分组、何时及如何进行总计。
SQL语句分组获取记录的第一条数据的方法,使用Northwind 数据库为例子
软件名称:通用Sql备份恢复工具 执行程序:iNethink_SqlTool.exe 程序版本:V1.0.0.0 Md5值:E5117643E64EE0708FC3ABED4A2BE30C Sha1值:86377C5E7F60B8921B16ED6CBB3C886299FF15BB 操作系统支持:(32位/64位)win ...
Sql数据自动备份和还原工具 Sql数据自动备份和还原工具 Sql数据自动备份和还原工具
SQL Server 数据备份与恢复组件 一个很好用的。NET组件
DB2 SQL 通过函数(CONCAT/POSSTR/LOCATE)实现行转列,列转行 可以按照标点把多列转换为一行,多行转换为一列