- 浏览: 1549224 次
- 性别:
- 来自: 上海
文章分类
- 全部博客 (532)
- 软件设计师 (7)
- PSP (5)
- NET MD (9)
- Hibernate (8)
- DIY (51)
- Work (43)
- GAME (24)
- 未分类 (44)
- iPod (6)
- MySQL (39)
- JSP (7)
- 日语能力考试 (36)
- 小说 (4)
- 豆包网 (23)
- 家用电脑 (7)
- DB2 (36)
- C/C++ (18)
- baby (9)
- Linux (13)
- thinkpad (23)
- OA (1)
- UML (6)
- oracle (24)
- 系统集成 (27)
- 脑梗塞 (6)
- 车 (8)
- MainFrame (8)
- Windows 7 (13)
- 手机 (8)
- git (12)
- AHK (2)
- COBOL (2)
- Java (9)
最新评论
-
安静听歌:
... ...
UUID做主键,好还是不好?这是个问题。 -
lehehe:
http://www.haoservice.com/docs/ ...
天气预报 -
lehehe:
[url http://www.haoservice.com/ ...
天气预报 -
liubang201010:
监控TUXEDO 的软件推荐用这个,专业,权威.并能提供报警和 ...
(转载)Tuxedo中间件简介 -
tinkame:
Next[j] =-1 当j=0时;=Max{k|0<k ...
KMP字符串模式匹配详解
Common queries for MySQL 5
Extending Chapter 9
Cascading aggregates
When you have parent-child-grandchild tables, eg
One solution is to use derived tables. Assuming ...
then...
companies, users, actions
, and your query requirement is for per-parent aggregates from the child table and per-child aggregates from the grandchild table, then cascading joins yield spuriously multiplied counts, and correlated subqueries fail because the second correlated subquery cannot find a visible joining column.One solution is to use derived tables. Assuming ...
CREATE TABLE companies (id int, name char(10));
CREATE TABLE users (id INT,companyid INT);
CREATE TABLE actions (id INT, userid INT, date DATE);
then...
- Join
companies
&users
once to establish a derived company-user table. - Join them a second time, this time aggregating on
users.id
to generate user counts per company. - Join the first derived table to the actions table, aggregating on
actions.id
to report actions per user per company:
SELECT cu1.cid, cu1.cname, cu2.cid, cu2.uCnt, ua.aCnt
FROM (
SELECT c.id AS cid, c.name AS cname, u1.id AS uid
FROM companies c
INNER JOIN users u1 ON u1.companyid=c.id
) AS cu1
INNER JOIN (
SELECT c.id AS cid, COUNT(u2.id) AS uCnt
FROM companies c
INNER JOIN users u2 ON u2.companyid=c.id
GROUP BY c.id
) AS cu2 ON cu1.cid=cu2.cid
INNER JOIN (
SELECT u3.id AS uid, COUNT(a.id) AS aCnt
FROM users u3
INNER JOIN actions a ON a.userid=u3.id
GROUP BY u3.id
) AS ua ON ua.uid=cu1.uid;
Back to top
Cross-aggregates
Given the table authorbook(authid INT, bookid INT), what query finds the books who have authors with more than one book in the table?
Even one level of recursion can induce a mild trance. Escape the trance by taking the problem one step at a time. First write the query that finds the authors with multiple books. Then join an outer query to that on authorid, and have the outer query select bookid:
Even one level of recursion can induce a mild trance. Escape the trance by taking the problem one step at a time. First write the query that finds the authors with multiple books. Then join an outer query to that on authorid, and have the outer query select bookid:
SELECT a1.bookid
FROM authorbook a1
INNER JOIN (
SELECT authid,count(bookid)
FROM authorbook a2
GROUP BY authid
HAVING COUNT(bookid)>1
) AS a3 ON a1.authid=a3.authid;
Back to top
Group by datetime period
To group rows by a time period whose length in minutes divides evenly into 60, use this formula:
where
When the desired grouping period is a value returned by a MySQL date-time function, matters become simpler: just group by the desired value. Thus to group by weeks, write ..
If there is no MySQL date-time function that returns the desired grouping period, you will need to write your own stored function.
GROUP BY ((60/periodLen) * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / periodlen ))
where
thistime
is the TIME
column and periodLen
is the period length. So to group by 15-min periods ...
SELECT ...
GROUP BY ( 4 * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / 15 ))
...
When the desired grouping period is a value returned by a MySQL date-time function, matters become simpler: just group by the desired value. Thus to group by weeks, write ..
SELECT ...
GROUP BY WEEK( datecol)
...
If there is no MySQL date-time function that returns the desired grouping period, you will need to write your own stored function.
Back to top
Per-group aggregate lists of specified size
Find the values of a table column c1 for which there are a specified number of listed values in another column c2.
To get an overview of the values of c2 for each value of c1:
To retrieve a list of c1 values for which there exist specific values in another column c2, you need an IN clause specifying the c2 values and a HAVING clause specifying the required number of items in the list ...
This is easy to generalise to multiple column expressions, and a HAVING clause specifying any number of items from the IN list.
To list c1 values that have exactly one instance of each c2 value, add DISTINCT to the count:
To get an overview of the values of c2 for each value of c1:
SELECT
c1,
GROUP_CONCAT(c2 ORDER BY c2) AS 'C2 values
FROM table
GROUP BY c1;
To retrieve a list of c1 values for which there exist specific values in another column c2, you need an IN clause specifying the c2 values and a HAVING clause specifying the required number of items in the list ...
SELECT c1
FROM table
WHERE c2 IN (1,2,3,4)
GROUP BY c1
HAVING COUNT(c2)=4;
This is easy to generalise to multiple column expressions, and a HAVING clause specifying any number of items from the IN list.
To list c1 values that have exactly one instance of each c2 value, add DISTINCT to the count:
SELECT c1
FROM table
WHERE c2 IN (1,2,3,4)
GROUP BY c1
HAVING COUNT(DISTINCT c2)=4;
Back to top
Per-group aggregates
This is the simplest grouping query pattern. For column foo, display the first (smallest), last (largest) or average value of column bar.
To return the highest value, and order top to bottom by that value:
Ditto for AVG(), COUNT() etc. It is easily extended for multiple grouping column expressions.
SELECT foo, MIN(bar) AS bar
FROM tbl
GROUP BY foo
To return the highest value, and order top to bottom by that value:
SELECT foo, MAX(bar) AS Count
FROM tbl
GROUP BY foo
ORDER BY Count DESC;
Ditto for AVG(), COUNT() etc. It is easily extended for multiple grouping column expressions.
Back to top
Per-group aggregates across multiple joins
Given a parent table and two child tables, a query which sums values in both child tables,
grouping on a parent table column, returns sums which are exactly twice as large as they
should be. In this recent example from the MySQL General Discussion list:
The query ...
returns ...
With three child tables, the sums are tripled. Why? Because the query aggregates across
each join.How then to get the correct results? With correlated subqueries:
grouping on a parent table column, returns sums which are exactly twice as large as they
should be. In this recent example from the MySQL General Discussion list:
CREATE TABLE packageItem (
packageItemID INT,
packageItemName CHAR(20),
packageItemPrice DECIMAL(10,2)
);
INSERT INTO packageItem VALUES(1,'Delta Hotel',100.00);
CREATE TABLE packageCredit (
packageCreditID INT,
packageCreditItemID INT,
packageItemType CHAR(10),
packageCreditAmount DECIMAL(10,2)
);
INSERT INTO packageCredit VALUES
(1,1,'Deposit',25.00),
(2,1,'Balance',92.00);
CREATE TABLE packageItemTax (
packageItemTaxID INT,
packageItemTaxItemID INT,
packageItemTaxName CHAR(5),
packageItemTaxAmount DECIMAL(10,2)
);
INSERT INTO packageItemTax VALUES
(1,1,'GST',7.00),
(2,1,'HST',10.00);
The query ...
SELECT
packageItemID
, packageItemName
, packageItemPrice
, SUM(packageItemTaxAmount) as Tax
, SUM(packageCreditAmount) as Credit
FROM packageItem
LEFT JOIN packageCredit ON packageItemID=packageCreditItemID
LEFT JOIN packageItemTax ON packageItemTaxItemID=packageItemID
GROUP BY packageItemID
ORDER BY packageItemID
returns ...
+---------------+-----------------+------------------+-------+--------+
| packageItemID | packageItemName | packageItemPrice | Tax | Credit |
+---------------+-----------------+------------------+-------+--------+
| 1 | Delta Hotel | 100.00 | 34.00 | 234.00 |
+---------------+-----------------+------------------+-------+--------+
With three child tables, the sums are tripled. Why? Because the query aggregates across
each join.How then to get the correct results? With correlated subqueries:
SELECT
packageItemID,
SUM(packageItemPrice),
( SELECT SUM(c.packageCreditAmount)
FROM packageCredit c
WHERE c.packageCreditItemID = packageItemID
) AS CreditSum,
( SELECT SUM(t.packageItemTaxAmount)
FROM packageItemTax t
WHERE t.packageItemTaxItemID = packageItemID
) AS TaxSum
FROM packageItem
GROUP BY packageItemID;
+---------------+-----------------------+-----------+--------+
| packageItemID | SUM(packageItemPrice) | CreditSum | TaxSum |
+---------------+-----------------------+-----------+--------+
| 1 | 100.00 | 117.00 | 17.00 |
+---------------+-----------------------+-----------+--------+
If subqueries are unavailable or too slow, replace them with temp tables.
Back to top
Show only one child row per parent row
Given tables
parent(id int not null primary key, etc...)
and child (id int not null primary key, pid int not null references parent (id), etc...)
, how do we write a query that retrieves only one child row per pid
even when the child table has multiple matching rows? MySQL permits use of GROUP BY
even when the SELECT
list specifies no aggregate function, so this will work:
select p.id, c.id
from parent p
join child c on p.id=c.pid
group by p.id;
Back to top
Skip repeating values
You want to report all unique values of a column and skip all rows repeating any of these values.
SELECT col, ...
FROM foo
GROUP BY col
Back to top
Within-group aggregates
You have a
Your first thought may be to
This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous at best, so we think better names for it are subaggregates, inner aggregates, or within-group aggregates.
It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in an aggregate query. Suppose you write ...
Will this tell you which supplier offers the minimum price per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary.
The simplest and often best-performing solution to the within-aggregates problem is an outer self exclusion join...
...because in the resultset built by joining on left item=right item and left price larger than right price, the left-sided rows for which there is no greater right-sided price are precisely the per-item rows with the smallest prices.
You can also accomplish this by building a table of aggregated minimum prices. Before MySQL 4.1, it has to be a temporary table:
to which you then join the products table:
From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. Often it's also the slowest solution:
Another solution, sometimes the fastest of all, is to move the aggregating subquery from the
Try all solutions to find which is fastest for your version of the problem.
To find more than one value per group, you might think the
products
table with columns item, supplier, price
. Multiple suppliers offer various prices for the same item. You need to find the supplier with the lowest price for each item. Your first thought may be to
GROUP BY item
, but that is not guaranteed to return the correct supplier
value for each minimum item price. Grouping by both item
and supplier
will return more information than you want. Nor can you write WHERE price=MIN(...)
because the query engine will evaluate the WHERE
clause before it knows the MIN
value. This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous at best, so we think better names for it are subaggregates, inner aggregates, or within-group aggregates.
It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in an aggregate query. Suppose you write ...
SELECT item,supplier,MIN(price)
FROM products
GROUP BY item;
Will this tell you which supplier offers the minimum price per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary.
The simplest and often best-performing solution to the within-aggregates problem is an outer self exclusion join...
SELECT p1.item,p1.supplier,p1.price
FROM products AS p1
LEFT JOIN products AS p2 ON p1.item = p2.item AND p1.price > p2.price
WHERE p2.id IS NULL;
...because in the resultset built by joining on left item=right item and left price larger than right price, the left-sided rows for which there is no greater right-sided price are precisely the per-item rows with the smallest prices.
You can also accomplish this by building a table of aggregated minimum prices. Before MySQL 4.1, it has to be a temporary table:
CREATE TEMPORARY TABLE tmp (
item INT,
minprice DECIMAL DEFAULT 0.0
);
LOCK TABLES products READ;
INSERT INTO tmp
SELECT item, MIN(price)
FROM products
GROUP BY item;
to which you then join the products table:
SELECT products.item, supplier, products.price
FROM products
JOIN tmp ON products.item = tmp.item
WHERE products.price=tmp.minprice;
UNLOCK TABLES;
DROP TABLE tmp;
From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. Often it's also the slowest solution:
SELECT item, supplier, price
FROM products AS p1
WHERE price = (
SELECT MIN(p2.price)
FROM products AS p2
WHERE p1.item = p2.item
);
Another solution, sometimes the fastest of all, is to move the aggregating subquery from the
WHERE
clause to the FROM
clause:
SELECT p.item, p.supplier, p.price
FROM products AS p
JOIN (
SELECT item, MIN(price) AS minprice
FROM products
GROUP BY item
) AS pm ON p.item = pm.item AND p.price = pm.minprice;
Try all solutions to find which is fastest for your version of the problem.
To find more than one value per group, you might think the
LIMIT
clause would work, but LIMIT
is limited in subqueries. See Within-group quotas.Back to top
Within-group quotas (Top N per group)
A table has multiple rows per key value, and you need to retrieve, say, the first or earliest two rows per key.
If the groups are fairly small, this can be done efficiently with a self-join and counts. For example the following table (based on a tip by Rudy Limeback) has three small data groups:
The first two rows per
... then we get our result immediately by removing rows where the 'earlier' count exceeds 2:
This works beautifully with smallish aggregates. But the query algorithm compares every within-group row to every other within-group row. As the size N of a group increases, execution time increases by N*N. If the query takes one minute for groups of 1,000, it will take 16 minutes for groups of 4,000, and more than four hours for groups for 16,000. The solution does not scale.
What to do? Forget
You need one
Most useful reports run again and again. If that's the case for yours, automate it in a stored procedure: using a cursor and a prepared statement, auto-generate an
If the groups are fairly small, this can be done efficiently with a self-join and counts. For example the following table (based on a tip by Rudy Limeback) has three small data groups:
DROP TABLE IF EXISTS test;
CREATE TABLE test (
id INT,
entrydate DATE
);
INSERT INTO test VALUES
( 1, '2007-5-01' ),
( 1, '2007-5-02' ),
( 1, '2007-5-03' ),
( 1, '2007-5-04' ),
( 1, '2007-5-05' ),
( 1, '2007-5-06' ),
( 2, '2007-6-01' ),
( 2, '2007-6-02' ),
( 2, '2007-6-03' ),
( 2, '2007-6-04' ),
( 3, '2007-7-01' ),
( 3, '2007-7-02' ),
( 3, '2007-7-03' );
The first two rows per
ID
are the rows which, for a given ID
, have two or fewer rows with earlier dates. If we use an inequality join with the COUNT(*)
function to find the earlier rows per ID
...
SELECT t1.id, t1.entrydate, COUNT(*) AS earlier
FROM test AS t1
JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate
+------+------------+---------+
| id | entrydate | earlier |
+------+------------+---------+
| 1 | 2007-05-01 | 1 |
| 1 | 2007-05-02 | 2 |
| 1 | 2007-05-03 | 3 |
| 1 | 2007-05-04 | 4 |
| 1 | 2007-05-05 | 5 |
| 1 | 2007-05-06 | 6 |
| 2 | 2007-06-01 | 1 |
| 2 | 2007-06-02 | 2 |
| 2 | 2007-06-03 | 3 |
| 2 | 2007-06-04 | 4 |
| 3 | 2007-07-01 | 1 |
| 3 | 2007-07-02 | 2 |
| 3 | 2007-07-03 | 3 |
+------+------------+---------+
... then we get our result immediately by removing rows where the 'earlier' count exceeds 2:
SELECT t1.id, t1.entrydate, count(*) AS earlier
FROM test AS t1
JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate
HAVING earlier <= 2;
+------+------------+---------+
| id | entrydate | earlier |
+------+------------+---------+
| 1 | 2007-05-01 | 1 |
| 1 | 2007-05-02 | 2 |
| 2 | 2007-06-01 | 1 |
| 2 | 2007-06-02 | 2 |
| 3 | 2007-07-01 | 1 |
| 3 | 2007-07-02 | 2 |
+------+------------+---------+
This works beautifully with smallish aggregates. But the query algorithm compares every within-group row to every other within-group row. As the size N of a group increases, execution time increases by N*N. If the query takes one minute for groups of 1,000, it will take 16 minutes for groups of 4,000, and more than four hours for groups for 16,000. The solution does not scale.
What to do? Forget
GROUP BY
! Manually assemble the desired query results in a temporary table from simple indexed queries, in this case, two rows per ID
:
DROP TEMPORARY TABLE IF EXISTS earliers;
CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
INSERT INTO earliers
SELECT id,entrydate FROM test WHERE id=1 ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
SELECT id,entrydate FROM test WHERE id=2 ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
SELECT id,entrydate FROM test WHERE id=3 ORDER BY entrydate LIMIT 2;
You need one
INSERT
statement per grouping value. To print the result, just query the earliers
table:
SELECT * FROM earliers
ORDER BY id, entrydate;
+------+------------+
| id | entrydate |
+------+------------+
| 1 | 2007-05-01 |
| 1 | 2007-05-02 |
| 2 | 2007-06-01 |
| 2 | 2007-06-02 |
| 3 | 2007-07-01 |
| 3 | 2007-07-02 |
+------+------------+
DROP TEMPORARY TABLE earliers;
Most useful reports run again and again. If that's the case for yours, automate it in a stored procedure: using a cursor and a prepared statement, auto-generate an
INSERT
statement for every grouping value, and return the result:
DROP PROCEDURE IF EXISTS listearliers;
DELIMITER |
CREATE PROCEDURE listearliers()
BEGIN
DECLARE curdone, vid INT DEFAULT 0;
DECLARE idcur CURSOR FOR SELECT DISTINCT id FROM test;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET curdone = 1;
DROP TEMPORARY TABLE IF EXISTS earliers;
CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
SET @sql = 'INSERT INTO earliers SELECT id,entrydate FROM test WHERE id=? ORDER BY entrydate LIMIT 2';
OPEN idcur;
REPEAT
FETCH idcur INTO vid;
IF NOT curdone THEN
BEGIN
SET @vid = vid;
PREPARE stmt FROM @sql;
EXECUTE stmt USING @vid;
DROP PREPARE stmt;
END;
END IF;
UNTIL curdone END REPEAT;
CLOSE idcur;
SELECT * FROM earliers ORDER BY id,entrydate;
DROP TEMPORARY TABLE earliers;
END;
|
DELIMITER ;
CALL listearliers();
Back to top
Average the top 50% of values per group
Each row of a
How would we write a query that returns the average of the top 50% of scores per team?
The per-team median value is its middle value--lower than the highest 50% and higher than the lowest 50% of values for that team--so a shortcut is to query the team medians, then aggregate on a join that selects per-team scores above the medians.
How to find per-team medians? If a resultset has an odd number of rows, at least one row has the true median score. If it has an even number of rows, the median score is an average of two central values. The following query adapts Joe Celko's formula in "SQL for Smarties" averaging "low" and "high" medians:
Now join
Yes, all the logic can be moved into one query:
games
table records one game score for a team:
DROP TABLE IF EXISTS games;
CREATE TABLE games(id INT, teamID INT, score INT);
INSERT INTO games VALUES
(1,1,3),(2,1,4),(3,1,5),(4,1,6),(5,2,6),
(6,2,7),(7,2,8),(8,2,7),(9,2,6),(10,2,7);
How would we write a query that returns the average of the top 50% of scores per team?
The per-team median value is its middle value--lower than the highest 50% and higher than the lowest 50% of values for that team--so a shortcut is to query the team medians, then aggregate on a join that selects per-team scores above the medians.
How to find per-team medians? If a resultset has an odd number of rows, at least one row has the true median score. If it has an even number of rows, the median score is an average of two central values. The following query adapts Joe Celko's formula in "SQL for Smarties" averaging "low" and "high" medians:
DROP TABLE IF EXISTS medians;
CREATE TABLE medians
SELECT p1.teamid, AVG(P1.score) AS median
FROM games AS P1, games AS P2
WHERE p1.teamid=p2.teamid
GROUP BY p1.teamid
HAVING (
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
)
OR (
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
);
+--------+--------+
| teamid | median |
+--------+--------+
| 1 | 4.5000 |
| 2 | 6.8333 |
+--------+--------+
Now join
games
to medians
accepting only top-half values:
SELECT g.teamid, AVG(g.score) AS Top50Avg
FROM games g
JOIN medians m ON g.teamid = m.teamid AND g.score >= m.median
GROUP BY g.teamid
ORDER BY Top50Avg DESC;
+--------+----------+
| teamid | Top50Avg |
+--------+----------+
| 2 | 7.2500 |
| 1 | 5.5000 |
+--------+----------+
DROP TABLE medians;
Yes, all the logic can be moved into one query:
SELECT g.teamid, AVG(g.score) AS Top50Avg
FROM games g
JOIN (
SELECT p1.teamid, AVG(P1.score) AS median
FROM games AS P1, games AS P2
WHERE p1.teamid=p2.teamid
GROUP BY p1.teamid
HAVING (
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
)
OR (
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
)
) AS m ON g.teamid = m.teamid AND g.score >= m.median
GROUP BY g.teamid
ORDER BY Top50Avg DESC;
Back to top
Averages from bands of values
To count and average scores in bands of 10:
SELECT 10 * FLOOR( score / 10 ) AS Bottom,
10 * FLOOR( score / 10 ) 9 AS Top,
Count( score ),
Avg( score )
FROM scores
GROUP BY 10 * FLOOR( score / 10 );
Back to top
Count unique values of one column
SELECT col_name, COUNT(*) AS frequency
FROM tbl_name
GROUP by col_name
ORDER BY frequency DESC;
Back to top
Median
Statistically, the median is the middle value--the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half:
SELECT l1.hours As Median
FROM BulbLife As l1, bulbLife AS l2
GROUP BY l1.Hours
HAVING SUM(CASE WHEN l2.hours <= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)+1) / 2
AND SUM(CASE WHEN l2.hours >= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)/2) + 1;
Back to top
Mode
Statistically, the mode is the most frequently occurring value. Given tables parent(id int) and child(pid int, cid int), where child.pid references parent.id as a foreign key, what query finds the parent.id most often represented in the child id, that is, the modal count of child.pid?
SELECT pid, COUNT(*) AS frequency
FROM child
GROUP BY pid
ORDER BY frequency DESC
LIMIT 1;
Back to top
Rank order
Without MSSQL's
The query is a two-step:
1. Join the table to itself on the value to be ranked, handling ties
2. Group and order the result of the self-join on rank:
RANK()
aggregate function, how do we display rank order in a MySQL query, for example from a table like this?
CREATE TABLE votes( name CHAR(10), votes INT );
INSERT INTO votes VALUES
('Smith',10),('Jones',15),('White',20),('Black',40),('Green',50),('Brown',20);
The query is a two-step:
1. Join the table to itself on the value to be ranked, handling ties
2. Group and order the result of the self-join on rank:
SELECT v1.name, v1.votes, COUNT(v2.votes) AS Rank
FROM votes v1
JOIN votes v2 ON v1.votes < v2.votes OR (v1.votes=v2.votes and v1.name = v2.name)
GROUP BY v1.name, v1.votes
ORDER BY v1.votes DESC, v1.name DESC;
+-------+-------+------+
| name | votes | Rank |
+-------+-------+------+
| Green | 50 | 1 |
| Black | 40 | 2 |
| White | 20 | 3 |
| Brown | 20 | 3 |
| Jones | 15 | 5 |
| Smith | 10 | 6 |
+-------+-------+------+
Back to top
Backslashes in data
Backslashes multiply weirdly:
returns 1, as does...
because in a pair of backslashes, the second is not escaped by the first, so to compare two literals you double each backslash in the
That's eight backslashes to match two!
SELECT 'a\b' RLIKE 'a\b';
returns 1, as does...
SELECT 'a\\b' RLIKE 'a\\\\b';
because in a pair of backslashes, the second is not escaped by the first, so to compare two literals you double each backslash in the
RLIKE
argument. But if you are querying a table for such a string from the MySQL client, this doubling happens twice--once in the client, and once in the database--so to find a column value matching 'a\\b'
, you need to write...
SELECT desc FROM xxx WHERE desc RLIKE 'aa\\\\\\\\bb';
That's eight backslashes to match two!
Back to top
Compare data in two tables
This query UNIONs queries for matching column names from two tables, and keeps just those rows which occur once in the union. Those are the rows unique to one table or the other. Customise your column list { id, col1, col2, col3 ...} as desired.
SELECT
MIN(TableName) as TableName, id, col1, col2, col3, ...
FROM (
SELECT 'Table a' as TableName, a.id, a.col1, a.col2, a.col3, ...
FROM a
UNION ALL
SELECT 'Table b' as TableName, b.id, b.col1, b.col2, b.col3, ...
FROM b
) AS tmp
GROUP BY id, col1, col2, col3, ...
HAVING COUNT(*) = 1
ORDER BY ID;
Back to top
Age in years
Computing age in years is a natural job for a stored function:
CREATE FUNCTION Age( dob DATE, today DATE ) RETURNS INTEGER
DETERMINISTIC
BEGIN
RETURN DATE_FORMAT(FROM_DAYS(TO_DAYS(today) - TO_DAYS(dob)), '%Y') + 0;
END;
Back to top
Appointments available
Given a clinic of physicians, patients and appointments, how to find an available appointment time for a given physician?
This is a variant of the [Not] Exists query pattern. Though we can write it with subqueries, performance will be crisper with a join. But finding data that is not there requires a join to data which is there. So in addition to tables for appointments, doctors and patients, we need a table of all possible appointment datetimes. Here's a schema illustrating the idea ...
Now we can apply the [Not] Exists query pattern. To find free appointment datetimes for a given doctor in a given datetime range, we left join possible appointments to existing appointments on date and time and doctor, add Where conditions for desired appointment datetimes, and finally add a Where condition that the appointment slot be null, i.e. free...
This is a variant of the [Not] Exists query pattern. Though we can write it with subqueries, performance will be crisper with a join. But finding data that is not there requires a join to data which is there. So in addition to tables for appointments, doctors and patients, we need a table of all possible appointment datetimes. Here's a schema illustrating the idea ...
CREATE TABLE a_dt ( -- POSSIBLE APPOINTMENT DATES AND TIMES
d DATE,
t TIME
);
CREATE TABLE a_drs ( -- DOCTORS
did INT -- doctor id
);
CREATE TABLE a_pts ( -- PATIENTS
pid INT
);
CREATE TABLE a_appts ( -- APPOINTMENTS
aid INT, -- appt id
did INT, -- doctor id
pid INT, -- patient id
d DATE,
t TIME
);
Now we can apply the [Not] Exists query pattern. To find free appointment datetimes for a given doctor in a given datetime range, we left join possible appointments to existing appointments on date and time and doctor, add Where conditions for desired appointment datetimes, and finally add a Where condition that the appointment slot be null, i.e. free...
SELECT d.did, a.d, a.t
FROM a_dt AS a
LEFT JOIN a_appts AS ap USING (d,t)
LEFT JOIN a_drs AS d
ON a.d = ap.d
AND a.t = ap.t
AND ap.did = d.did
AND ap.did = 1
WHERE a.d BETWEEN desired_start_date AND desired_end_date
AND a.t BETWEEN desired_start_time AND desired_end_time
AND ap.aid IS NULL;
Back to top
Count business days between two dates
Given a table named
For dates that span different years, week numbers won't work. The answer is the number of raw days, minus twice the number of whole weeks (because there are two weekend days/week), minus the number of weekend days in any remainder part-week. This algorithm works when the start and stop dates are themselves business days (but needs refinement to work when passed weekend dates--anybody want to try?):
The algorithm is easily encapsulated in a function:
dates
with date columns d1,d2
, if the two dates are in the same year, the solution is simply the date difference in days minus the date difference in weeks:
SELECT d1, d2, DATEDIFF(d2, d1) - (WEEK(d2) - WEEK(d1)) * 2 AS BizDays
FROM dates
ORDER BY d1, d2;
For dates that span different years, week numbers won't work. The answer is the number of raw days, minus twice the number of whole weeks (because there are two weekend days/week), minus the number of weekend days in any remainder part-week. This algorithm works when the start and stop dates are themselves business days (but needs refinement to work when passed weekend dates--anybody want to try?):
SET @d1='2007-1-1';
SET @d2='2007-3-31';
SET @dow1 = DAYOFWEEK(@d1);
SET @dow2 = DAYOFWEEK(@d2);
SET @days = DATEDIFF(@d2,@d1);
SET @wknddays = 2 * FLOOR( @days / 7 ) +
IF( @dow1 = 1 AND @dow2 > 1, 1,
IF( @dow1 = 7 AND @dow2 = 1, 1,
IF( @dow1 > 1 AND @dow1 > @dow2, 2,
IF( @dow1 < 7 AND @dow2 = 7, 1, 0 )
)
)
);
SELECT FLOOR(@days-@wkndDays) AS BizDays;
The algorithm is easily encapsulated in a function:
DROP FUNCTION IF EXISTS BizDayDiff;
DELIMITER |
CREATE FUNCTION BizDayDiff( d1 DATE, d2 DATE )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE dow1, dow2, days, wknddays INT;
SET dow1 = DAYOFWEEK(d1);
SET dow2 =
发表评论
-
MySQL配置文件my.cnf 例子最详细翻译
2009-07-20 11:58 1759http://www.blogjava.net/hunte ... -
MySQL :: @@Rowcount in MySQL
2009-03-02 15:39 2262@@Rowcount in MySQL? Posted ... -
MYSQL中删除重复记录的方法
2009-02-26 12:23 0MYSQL中删除重复记录的方法 2008 ... -
使用 MYSQLBINLOG 来恢复数据。
2009-02-26 12:21 0使用 MYSQLBINLOG 来恢复数据。 20 ... -
13.2.3. InnoDB Startup Options and System Variable
2009-02-20 13:37 1273http://dev.my ... -
mysql innodb 配置详解
2009-02-20 11:01 3594加为好友发送私信在线聊天 -
20.3.9 关于死锁 深入浅出MySQL——数据库开发、优化与管理维护
2009-02-20 10:50 3169深入浅出MySQL——数据库开发、优化与管理维护 回 ... -
mysql的临时表不支持自连接
2009-02-18 13:32 2372TEMPORARY TABLE ProblemsThe fol ... -
windows 下安装两个MySQL收藏
2009-02-03 16:28 2405由于两个MySQL进程需要不同的配置文件,以下所有操作均是在W ... -
1.4.1. What's New in MySQL 5.1
2009-02-03 16:25 11901.4.1. What's New in My ... -
查看mysql版本的四种方法
2009-02-02 10:57 9781:在终端下:mysql -V。 以下 ... -
Server SQL Modes
2009-02-01 11:06 14475.1.7. Server SQL Modes ... -
MySQL中的各种JOIN(CROSS JOIN, INNER JOIN, LEFT [OUTER]
2009-01-13 14:50 6230MySQL中的各种JOIN(CROSS JOIN, I ... -
[转]SQL Server 2000执行计划成本(5/5)
2009-01-09 15:26 1212[转]SQL Server 2000执行计划成本(5/5) ... -
SQL Server 2000执行计划成本(4/5)
2009-01-09 15:25 1110[转]SQL Server 2000执行计划成本(4/5) ... -
[转]SQL Server 2000执行计划成本(3/5)
2009-01-09 15:23 1544[转]SQL Server 2000执行计划成本(3/5) ... -
SQL Server 2000执行计划成本(1/5)
2009-01-09 14:52 1655[转]SQL Server 2000执行计划成本(1/5) ... -
浅谈MySQL数据库优化
2009-01-09 14:46 1358浅谈MySQL数据库优化 [收藏此页] [打印] ... -
sql server支持create table as ...建表么
2009-01-09 11:52 9054sql server支持create table as .. ... -
SQL Server如何识别自动创建的索引
2009-01-08 15:57 1324SQL Server如何识别自动创建的索引 http://w ...
相关推荐
Walk away from old-fashioned and cumbersome query approaches and answer your business intelligence questions through simple and powerful queries built on common table expressions (CTEs) and window ...
If you are a MySQL developer or administrator looking for quick, handy solutions to solve the most common and not-so-common problems in MySQL, this book is for you. MySQL DBAs looking to get up-to-...
Database administrators and application developers who want to quickly get up to speed on important features in MariaDB and MySQL for writing business intelligence queries. Any developer writing SQL ...
A special section on the common and not so common troubleshooting techniques for effective MySQL administration is also covered in this book. By the end of this highly practical book, you will have ...
8. **新的SQL模式**:支持更多的标准SQL特性,如Common Table Expressions (CTE)和Recursive Queries,使开发人员能编写更复杂的查询。 安装MySQL 8.0时,用户可以选择安装各种组件,如服务器、客户端工具、连接器...
4. `lucene-analyzers-common-4.7.2.jar`:包含了各种分析器,用于处理文本输入,进行分词、去除停用词等预处理工作,以便为索引做好准备。 5. `ikanalyzer-2012_u6.jar`:ikanalyzer是一款基于Java的中文分词器,...
1. Licenses for Third-Party Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
MySQL日志文件是数据库管理的重要组成部分,它们记录了MySQL服务器的各种操作,有助于故障排查、数据恢复和性能优化。本文将详细介绍MySQL的日志类型、配置方法以及如何查看和使用这些日志。 1. MySQL日志类型: -...
4. **MySQL**: As the chosen database management system, MySQL provides efficient storage and retrieval of data, ensuring the site can handle large volumes of user information and queries. 5. **Web ...
5. **MySQL 数据库** (MySQL Database): As a relational database management system, MySQL provides efficient storage and retrieval of structured data, ensuring the smooth functioning of the website and ...
- **Web Technologies**: The use of JavaScript on the client side and Python on the server side is a common approach for building dynamic web applications. The document could delve into the benefits of...
3. SQL特性:MariaDB 10.8.4增加了对更多SQL标准的支持,例如窗口函数、JSON操作、Common Table Expressions (CTE) 和Recursive Queries等,这使得开发人员可以编写更高效和灵活的查询。 4. 安全性:MariaDB引入了...
- **Complex Queries:** Techniques for building complex queries, including joins and subqueries. **Chapter 15: Making Your Job Easier with PEAR** - **PEAR Overview:** Explanation of what PEAR ...
·Use a consistent,generic Information Engineering(IE)Crow's Foot E-R diagram notation for data modehng and database design. Provide a detailed discussion of specific normal forms within a ...
queries with ORDER BY, GROUP BY, PAGING, INNER JOIN, BATCH OPERATION...- Most common database operations we may met in our development life, which significantly simplifies database operations from our...
5. **数据库交互**: 在ASP中,通常会使用ADO (ActiveX Data Objects) 或 ASP.NET的数据访问组件来连接和操作数据库,比如SQL Server或MySQL。产品信息、用户数据等可能存储在数据库中,并通过ASP脚本动态获取和展示...