`

MySQL: Data Join

 
阅读更多

 

Introduction

The purpose of this article is to show the power of MySQL JOIN operations, Nested MySQL Queries(intermediate or temporary result set table) and Aggregate functions(like GROUP BY). One can refer to link1 or link2 for basic understanding of MySQL join operations.

In order to explain various MySQL JOIN operations, I have chosen a problem domain of an idea management system. Let me first briefly describe this problem domain.

  • System users: In this idea management system, there are set of of registered users who belongs to different departments. Basic requirement of this system is to keep track of all the ideas submitted by registered users.
  • Idea: A single idea can originates from single or multiple users, so an idea can have single or multiple innovators accordingly. However, only one innovator can submit the idea and other can be listed as idea innovators.
  • User role: User can have multiple role like normal user or admin user. Admin user can change the status of submitted idea from pending status(0) to accepted(1) or rejected(2).
  • Idea status update: On status change, all the associated innovators receive some points as reward.
  • Idea events generation: Based on idea progress, some events can be generated which are then fed into system as idea events by admin user.
  • Generation of consolidated reports: Main objective of this idea management system is to process the user & idea information to show consolidated information like list of recent ideas, list of top users or top groups based on points to motivate users to submit more ideas.
In this article, I will explain, how to get this consolidated information in a single complex MySQL query using MySQL JOIN and Aggregate operators. I would suggest, first attempt the problems mentioned in this article yourself before looking into the actual MySQL queries in order to get better understanding.

This system contains following tables which we will be referring through out this article to solve the problems. For sample data, download the database dump and import it into your MySQL environment(see using the code section).

Let me first start with brief description of these tables.

user_info:
 id   full_name    dep_id    points  
jack JACK D. 2 2
jackson M.S. Jackson 3
alice Alice W 2 1
bob Bob S. 2 2

This table list information of all registered users.

Attributes:

  • id :- Auto generated id for user
  • full_name :- Full name of user
  • dep_id :- Department id of user which is a foreign key referring to 'id' field in dept_info table
  • points :- Total points received by user through idea submission 

 

 

user_idea:
 idea_id   user_id   title  innovators  idea_categories    status   description
1 jack Video Annotations jack 1;2 1 Video Annotations Description
2 jack Optimize waterfall model jack;jackson 3 0 Optimize waterfall model Description
3 jackson Automation jackson 1 1 Automation Description
4 jackson Design Patterns jackson 1 0 Design Patterns Description
5 alice Identify Video Objects alice;jack 2 1 Identify Video Objects Description
6 bob Tin Can LMS bob 1 1 Tin Can LMS Description
7 bob Text Summarization bob 2;3 0 Text Summarization Description

This table contains details of all the ideas submitted by registered users present in user_info table.

Attributes:

  • idea_id :- Auto generated id for submitted idea
  • user_id :- id of user who submitted the idea. It is a foreign key referring to 'id' field of user_info table
  • title :- title of idea
  • innovators :- Semicolon concatenated list string of all the user 'id's who invented this idea. For example, innovators ="1;2" means, user with id=1(i.e. jack) and user with id =2(i.e. jackson) are innovators of this idea. 
  • idea categories :- Semicolon concatenated list string of category 'id's (referring to category_id in idea_categories table) of all idea categories in which idea can belong. For example, idea_categories ="2;3" means idea belong to category with category_id =2 (i.e. Video) and category with category_id =3(i.e. Language Analysis).
  • status :- acceptance status of idea (e.g. 0 implies accepted, 1 implies accepted & 2 implies rejected)
  • description :- description of idea
  • idea_events:
     idea_id    event_id    events 
    Ideation Phase
    Implementaion Phase
    Discussion Phase

    It contains event information associated with idea present in user_idea table.An idea can have zero or multiple idea events.

    Attributes:

    • idea_id :- Foreign key referring to 'idea_id' field of user_idea table
    • event_id :- Auto generated id for event
    • events :- Event description string

     

     

  • idea_categories:
    category_id    category_name 
    Project Lifecycle 
    Video
    Language Analysis

    This table contain information of all registered idea categories in which any submitted idea can belong.

    Attributes:

    • category_id :- Auto generated id for category
    • category_name :- Category Name

     

     

  • dept_info:
     id   name 
    Other
    Development
    Manager

    This table list all the user departments.

    Attributes:

    • id :- Auto generated id for department
    • name :- Department Name

    Problem 1: Getting list of ideas with user information 

    Task: In this problem, we want to retrieve all the ideas with attributes (idea_id, title, status and innovators) which are present in user_idea table along with idea user information with attributes (user id and user full name) present in user_info table. In order to get desired result, we have to join user_idea and user_info tables.

    Concepts: Inner Join, Table aliases 

    Relevant Tables:user_info and user_idea

    Expected Result:

     idea_id   user_id    full_name   title  innovators   status
    jack  JACK D.  Video Annotations jack 1
    jack  JACK D.  Optimize waterfall model  jack;jackson 0
    jackson M.S. Jackson Automation jackson 1
    4 jackson M.S. Jackson Design Patterns jackson 0
    5 alice Alice W Identify Video Objects alice;jack 1
    6 bob Bob S. Tin Can LMS bob 1
    7 bob Bob S. Another Idea1 bob 0

    Solution:

    Here is the appropriate query :
SELECT UI.idea_id, UI.user_id, UInfo.full_name, UI.title, UI.innovators, UI.status
FROM user_idea AS UI 
INNER JOIN user_info AS UInfo ON UI.user_id = UInfo.id

 

 

Note: Here, we have used table aliases using "AS" to refer table fields more easily in a convenient way.

This was a simple example of Inner Join usage. Here, records from both user_idea and user_info table are merged based on common user_id.

Before moving to complex queries, let's try to understand SQL 'LIKE' operator which is primarily used to test if one string is a part of another string.

Problem 2: Fetch all accepted ideas for a specific user

Task: Here, we want to get list of all accepted ideas(i.e. idea with status = 1) for a particular user who is one of the innovators of those ideas.

Concepts: Use of Like operator to check if a given string is part of list represented as string where list elements are joined using some separator(e.g ';').

Relevant Tables: user_idea

Expected Result:

 idea_id    user_id    title  innovators  
1 jack Video Annotations jack
5 alice Identify Video Objects alice;jack

Solution:

Approach 1:
SELECT UI.idea_id, UI.user_id,UI.title, UI.innovators
FROM user_idea AS UI 
WHERE UI.status=1 AND UI.innovators like '%jack%';

 

Result:

 idea_id   user_id   title  innovators 
1 jack Video Annotations jack
3 jackson Automation jackson
5 alice Identify Video Objects alice;jack

Issues: If you examine this query carefully, you can realize that it might fetch wrong results if one user_id is substring of another. For e.g. above query will return idea 3 having innovators "jackson"(as jackson contain jack) which is not desired. This approach might be suited in situations where each user id is distinct and doesn't contain other user id as substring.

Approach 2: Append list separator(';') to the start and end of innovators string before using LIKE operator. It's kind of tricky way of matching user_id without using multiple'OR' operator in 'LIKE" statements to handle cases where innovator present at the start, middle or end of the innovators string.

Case 1: Appending list separator:

SELECT UI.idea_id, UI.user_id,UI.title, UI.innovators
FROM user_idea AS UI  
WHERE UI.status=1 AND CONCAT(';',UI.innovators,';') like '%;jack;%';

Alternatively, we could use following approaches to get the same result but, I prefer the preceding approach as it is more concise and faster.

Case 2: Using Multiple 'OR'

SELECT UI.idea_id, UI.user_id,UI.title, UI.innovators
FROM user_idea AS UI 
WHERE UI.status=1 AND 
( 
UI.innovators = 'jack'		OR	/* e.g. innovators = 'jack' ; only one user is present in the innovators list*/
UI.innovators like 'jack;%' 	OR 	/* user id is present at the start. e.g. innovators = "jack;bob" */
UI.innovators like '%;jack;%'	OR 	/* user id is present in the middle. e.g. innovators = "alice;jack;bob" */
UI.innovators like '%;jack'   		/* user id is present at the end. e.g. innovators = "alice;jack" */
)

Case 3: Using Regular expression (REGEXP or RLIKE)

SELECT UI.idea_id, UI.user_id,UI.title, UI.innovators
FROM user_idea AS UI 
WHERE UI.status=1 AND UI.innovators REGEXP '^jack$|^jack;|;jack;|;jack$';

Case 4: Create function in the database incase this operation is used frequently

//Create Function isMember
DELIMITER $$
CREATE FUNCTION IsMember(inList Text, inMember varchar(10))
RETURNS BOOLEAN
DETERMINISTIC
BEGIN
  DECLARE exp varchar(50) DEFAULT '';
  SET exp = CONCAT('^',inMember,'$','|^',inMember,';|;','inMember',';|;',inMember,'$');
RETURN inList REGEXP exp;
END
$$
DELIMITER ;

//Using Function
SELECT UI.idea_id, UI.user_id,UI.title, UI.innovators
FROM user_idea AS UI 
WHERE UI.status=1 AND IsMember(UI.innovators,'jack');

Performance Analysis: In General 'LIKE' is more optimized than 'REGEXP'(function in this case) for large dataset. Below is the approximate execution time for queries against set of approx 70K user_ideas.

System Details:

Processor: (Intel Core i5 CPU M520 @2.4GHz 2.4GHz), OS: (Windows 7, 4GB RAM), MySQL Version: (5.5.24 for Win32 (x86)).

Query Execution Time:

  • Case1 (CONCAT with LIKE): ~0.28sec
  • Case2 (Multiple LIKE with OR): ~0.29sec
  • Case3 (REGEXP): ~0.9sec
  • Case4 (FUNCTION): ~8.6sec

 

Result:

 idea_id    user_id    title   innovators  
1 jack Video Annotations jack
5 alice Identify Video Objects alice;jack

 

 

Problem 3: Fetch idea details with idea categories names and idea events information

Task: This problem is little bit complex as it involves fetching concatenated data(e.g. idea events) from one table(e.g idea events table) and merging it with row fetched from other table(e.g. idea table)

Concepts: LEFT JOIN, INNER JOIN, Nested SQL query, Table aliases and Aggregate Functions like GROUP BY, GROUP_CONCAT

 

Relevant Tables: user_idea, idea_events, idea_categories

Expected Result:

 idea_id    title   events   categories
Video Annotations Ideation Phase;Implementation Phase Project Lifecycle;Video
2 Optimize waterfall model NULL Language Analysis
3 Automation Discussion Phase Project Lifecycle
4 Design Patterns NULL Project Lifecycle
5 Identify Video Objects NULL Video
6 Tin Can LMS NULL Project Lifecycle
7 Text Summarization NULL Video;Language Analysis

Solution:

We can divide our problem in two parts. First, we will fetch all the events associated with each idea and merge into single concatenated field(named as 'events') using some separator(say ';'). Second, we will join the result of this query with idea_categories table to add idea category information.

Note: Idea events can be null. i.e. some idea may not have any associated event.

Step 1: Query to fetching Idea Events (concatenated using separator ';' )

Approach 1:

 

SELECT UI.idea_id, UI.title, GROUP_CONCAT(IE.events SEPARATOR ';') as events 
	FROM user_idea AS UI 
	INNER JOIN idea_events AS IE ON UI.idea_id = IE.idea_id      
	GROUP BY UI.idea_id	

 

Note: GROUP BY and GROUP_CONCAT aggregate operators are used to club all events specific to single idea.

Result:

 idea_id   title   events
1 Video Annotations Ideation Phase;Implementation Phase
3 Automation Discussion Phase

Issues: Ideas with no associated event are not present in results as INNER JOIN is used. We could resolve this problem in approach2  using LEFT JOIN.

Approach 2:

 

	SELECT UI.idea_id, UI.title, GROUP_CONCAT(IE.events SEPARATOR ';') as events 
	FROM user_idea UI 
	LEFT JOIN idea_events IE ON UI.idea_id = IE.idea_id      
	GROUP BY UI.idea_id

 

Result:

 idea_id   title   events
1 Video Annoatations Ideation Phase;Implementation Phase
2 Optimize waterfall model NULL
3 Automation Discussion Phase
4 Design Patterns NULL 
Identify Video Objects NULL
6 Tin Can LMS NULL
7 Text Summarization  NULL 

Note:

  • We have used LEFT JOIN instead of INNER JOIN as we want all records of ideas even if there is no corresponding event.
  • We are using GROUP_CONCAT and GROUP_BY to club events per idea in a single entry. Without GROUP_BY & GROUP_CONCAT operators, we would get multiple entries per idea as shown below (see first two row of result):
	SELECT UI.idea_id, UI.title, IE.events
	FROM user_idea UI 
	LEFT JOIN idea_events IE ON UI.idea_id = IE.idea_id;			

 

Result(Without GROUP_BY):

 idea_id   title   events
1 Video Annoatations Ideation Phase
1 Video Annoatations Implementation Phase
2 Optimize waterfall model  NULL 
... ... ...

However, if we use GROUP_BY without using GROUP_CONCAT, we won't get expected result. We will get one row per idea as we are grouping by idea_id with single event information that corresponds to that idea(missing other events) as we are not clubbing events using GROUP_CONCAT (see event column of row 1 in result. Only 'Ideation Phase' is coming and not the 'Implementation Phase'). Key rule is, one should use aggregate operators like GROUP_CONCAT, AVG, SUM, COUNT, MAX, MIN etc. whenever using GROUP_BY.

SELECT UI.idea_id, UI.title, IE.events  
	FROM user_idea UI 
	LEFT JOIN idea_events IE ON UI.idea_id = IE.idea_id
	GROUP BY UI.idea_id;

 

Result(Without GROUP_CONCAT):

 idea_id    title   events 
Video Annoatations  Ideation Phase
2 Optimize waterfall model NULL
3 Automation Discussion Phase
... ... ...

 

 

 

Step 2: Fetch Pillars Information by joining idea_categories tables with the results of query in Step1.  

We will use SQL query in step 1 as inner or nested query for fetching category information.
SELECT AUI.idea_id, AUI.title, AUI.events,
       GROUP_CONCAT(IC.category_name SEPARATOR ';') as categories  
FROM (       
	SELECT UI.idea_id, UI.title, GROUP_CONCAT(IE.events SEPARATOR ';') as events ,
	       CONCAT(';',UI.idea_categories,';') as temp_categories
 	FROM user_idea UI 
	LEFT JOIN idea_events IE ON UI.idea_id = IE.idea_id      
	GROUP BY UI.idea_id
) AS AUI
INNER JOIN idea_categories IC
ON AUI.temp_categories LIKE CONCAT('%;',IC.category_id,';%') 
GROUP BY AUI.idea_id;

 

Result:

 idea_id   title   events   categories
1 Video Annotations Ideation Phase;Implementation Phase Project Lifecycle;Video
2 Optimize waterfall model NULL Language Analysis
3 Automation Discussion Phase Project Lifecycle
4 Design Patterns NULL Project Lifecycle
Identify Video Objects  NULL  Video
Tin Can LMS  NULL  Project Lifecycle
Text Summarization  NULL  Video;Language Analysis

 

Note: 

  •      We have used nested query result set aliased as AUI to append category information. In general, nested query of following pattern are very useful when you want to peform join operation on the result set of some temporary or intermediate sql query.
	SELECT T1.field1,..., NT.field1,... 
	FROM ( 
		SELECT T2.field1, ..
		FROM T2
		WHERE ...
	) AS NT /* Nested Query Result */
	INNER JOIN T1 ON T1.someField = NT.someField && ....
	WHERE ...
  • We have applied techniques mentioned in Problem 2(Approach2-Case1) to get category names (this time we have used same technique in SELECT statement). However, we could use the same in conditional clause also). In nested query, we are appending list separator(';') to the user_idea.categories field and aliasing it as 'temp_categories' which will be used in conditional clause of outer query. The result set of nested query will have 4 columns i.e. 'idea_id', 'title', 'events' (concatenated event string) and 'temp_categories'. Now, this result set is used as temporary table aliased as 'AUI' which is then joined with idea_categories table using LIKE condition on temp_categories. GROUP_BY & GROUP_CONCAT are then used in outer SELECT query to get concatenated string of idea category names.
  • Alternate way to get the same result could be, get idea events & idea categories separately and use inner join or intersection.

Problem 4: Get Top Innovator Groups based on some heuiristic like group point average.

Task: In this problem, we want to fetch leading or top groups information with attributes like total number of innovators in a leading group, total number of ideas submitted by innnovators in that group and total group points. Also, sort the result based on group point average.

Concepts: COUNT, SUM, GROUP BY, ORDER BY,DESC, Nested SQL query, JOIN

 

Relevant Tables: user_info, user_idea, dept_info 

 

Expected Result:

 dept_id   dept_name   totalInnovators    totalIdeas    totalPoints  
3 Manager 1 2 3
2 Development 3 5 5

Solution:

We will divide this task also into two parts as done for Problem 3. First, we will join the user_info with dep_info to get user department or group information and join it with idea table to get total idea count submitted by each user. Second, we will join the result of previous step with dept_info to get total number of innovators, total ideas, total points per group using aggregate operators sorted by group point average

Step 1: Get user Department infomation joining user_pre_info and dept_info and count user ideas

SELECT UPI.id as user_id, UPI.dept_id, DI.name as dept_name, UPI.points, 
                       COUNT(UI.idea_id) AS totalIdeas 
FROM user_info UPI 	
INNER JOIN dept_info AS DI ON UPI.dept_id=DI.id 
LEFT JOIN user_idea AS UI ON UPI.id = UI.user_id 
GROUP BY UPI.id

Result:

 user_id   dept_id    dept_name   points   totalIdeas 
alice 2 Development 1 1
bob 2 Development 2 2
jack 2 Development 2 2
jackson 3 Manager 3 2

Note:

  • We have used COUNT operator to get idea count per user
  • We have used GROUP BY operator to group based on user_id

Step 2: Count number of users, total ideas in a group and sort result by point average w.r.t number of users in a group

We will join the result of SQL query in step 1 with dept_info table to get the desired result.

SELECT UGI.dept_id, UGI.dept_name, COUNT(UGI.id) AS totalInnovators, 
SUM(UGI.totalIdeas) AS totalIdeas, SUM(UGI.points) AS totalPoints 
FROM ( 
        SELECT UPI.id, UPI.dept_id, DI.name as dept_name, UPI.points,                                     COUNT(UI.idea_id) AS totalIdeas 
	FROM user_info AS UPI 
	INNER JOIN dept_info AS DI ON UPI.dept_id=DI.id 
	LEFT JOIN user_idea AS UI ON UPI.id = UI.user_id 
	GROUP BY UPI.id
) AS UGI 
GROUP BY UGI.dept_id HAVING (totalPoints>0 AND totalInnovators>0) 
ORDER BY SUM(UGI.points)/COUNT(UGI.id) DESC LIMIT 5

Result:

 dept_id   dept_name   totalInnovators   totalIdeas   totalPoints 
3 Manager 1 2 3
2 Development 3 5 5

Note:

  • we have grouped the result by department id to get total points per group.
  • We have sorted the result by group point average using ORDER BY.
 
 
 
 
 
 

 

 

 

 

 

 

 

Reference:

http://www.codeproject.com/Articles/684973/Understanding-Complex-MySQL-JOIN

http://dev.mysql.com/doc/refman/5.6/en/join.html

http://en.wikipedia.org/wiki/Join_%28SQL%29

分享到:
评论

相关推荐

    mysql.data版本集合

    MySQL是一个广泛使用的开源关系型数据库管理系统,其`.data`文件是与MySQL数据库相关的文件,通常包含数据库中的数据或日志信息。在这个“mysql.data版本集合”中,我们可能找到不同版本的MySQL数据库的数据文件,...

    mysql.data.dll +教程

    MySQL.data.dll是MySQL数据库连接Unity游戏引擎的一个关键组件,它是一个动态链接库(DLL)文件,主要用于在Unity项目中实现与MySQL服务器的数据交互。DLL文件是Windows操作系统中的一个程序库,包含了可被多个应用...

    MySQL++ v3.1.0教程

    mysql::Blob blob(binary_data); mysql::Query q = conn.query(); q (id, data) VALUES (1, '" )"; // 检索数据 mysql::Query q2 = conn.query(); q2 << "SELECT data FROM images WHERE id = 1"; mysql::Result r...

    mysql_data_adapter_v2.7.0.pdf

    4. **示例使用**: 文档提供了如何使用mysql_fdw的示例,包括如何创建外部表、导入外部模式、实现JOIN操作的推下以及聚合函数的推下等。 **配置** 配置mysql_fdw涉及创建外部服务器定义、外部表,并指定要访问的...

    深入分析MySQL Sending data查询慢问题

    通过一个实例给大家分享了MySQL Sending data表查询慢问题解决办法。 最近在代码优化中,发现了一条sql语句非常的慢,于是就用各种方法进行排查,最后终于找到了原因。 一、事故现场 SELECT og.goods_barcode, og....

    Mysql笔记markdown

    - `JOIN`操作:INNER JOIN、LEFT JOIN、RIGHT JOIN、FULL JOIN。 7. **存储引擎** - InnoDB:支持事务、行级锁定,适合高并发环境。 - MyISAM:读取速度快,不支持事务,适合读多写少的场景。 - 其他引擎:如...

    net-join-mysql.rar_C#连接Mysql_MYSQL

    1. **引入MySQL .NET Connector**:首先,你需要在项目中引用MySQL数据提供者,如`MySql.Data.MySqlClient`,这通常通过NuGet包管理器进行安装。 2. **创建连接字符串**:连接字符串包含所有数据库连接信息。例如:...

    mysql:MySQL的

    MySQL是一种广泛使用的开源关系型数据库管理系统(RDBMS),它基于结构化查询语言(SQL)。MySQL因其高效、可靠和易于管理的特点,在Web应用程序开发中备受青睐。本篇将深入探讨MySQL的相关知识点,包括其基本概念、...

    MySql.Data.dll 6.4.4 for .net 4.0

    MySQL.Data.dll是MySQL数据库连接器的一个重要组成部分,用于.NET Framework 4.0环境下的应用程序与MySQL服务器进行交互。这个版本是6.4.4,它提供了丰富的功能,使得开发人员能够在C#、VB.NET或其他支持.NET的语言...

    Mysql官方示例employees数据库employees_mysql.zip

    2. **SQL查询**:实践如何使用SELECT语句进行数据检索,包括JOIN操作来合并多个表的数据,WHERE子句进行条件筛选,GROUP BY和HAVING子句进行数据分组,以及ORDER BY和LIMIT子句进行排序和限制结果集。 3. **数据库...

    RockyLinux9.0 yum安装 mysql8.0

    chown -R mysql.mysql /usr/local/mysql/ /appdata/data_mysql ``` **三、初始化与配置** 进入`mysql`的`bin`目录,执行初始化命令,记住生成的临时root密码: ```bash cd /usr/local/mysql/bin ./mysqld --...

    MySql 5.1 参考手册.chm

    7.2.9. MySQL如何优化LEFT JOIN和RIGHT JOIN 7.2.10. MySQL如何优化嵌套Join 7.2.11. MySQL如何简化外部联合 7.2.12. MySQL如何优化ORDER BY 7.2.13. MySQL如何优化GROUP BY 7.2.14. MySQL如何优化LIMIT 7.2.15. ...

    mysql安装.docx

    bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql/ --user=mysql --initialize ``` - 此命令会输出一系列日志信息,其中包含临时生成的root用户的密码,需记录下来用于后续登录MySQL。 ##### 7. ...

    kettle(ETL工具)mysql 5.7 数据库的驱动包

    这包括但不限于表输入、表输出、SQL步骤、JOIN步骤等,这些都极大地丰富了数据处理的可能性。 使用Kettle进行ETL工作时,你还可以利用其强大的数据转换能力,例如数据清洗、数据类型转换、数据聚合、过滤、排序以及...

    mysqlhelp帮助文档

    MySQL提供mysqldump工具进行数据备份,而恢复可以通过LOAD DATA INFILE或恢复备份文件来实现。 日志系统,如二进制日志(binlog)和慢查询日志,用于追踪数据库更改和优化性能。 复制是MySQL的一个重要特性,通过...

    DBI、Data-ShowTable、DBD-mysql

    这里我们关注的是三个关键组件:DBI、Data-ShowTable和DBD-mysql,它们都是Perl与MySQL数据库交互的重要工具。 首先,DBI(Database Independent Interface)是Perl的一个模块,提供了一个统一的接口,使得Perl程序...

    MYSQL教程MYSQL教程MYSQL教程

    - 查询数据:利用SELECT语句进行各种复杂查询,如JOIN、WHERE、GROUP BY、HAVING等子句。 - 更新数据:使用UPDATE语句修改已有记录。 - 删除数据:用DELETE语句删除单个或多个记录,或使用TRUNCATE TABLE清空整个...

Global site tag (gtag.js) - Google Analytics