Tables or result sets sometimes contain duplicate records. Sometime
it is allowed but sometime it is required to stop duplicate records.
Sometime it is required to identify duplicate records and remove them
from the table. This chapter will describe how to prevent duplicate
records occurring in a table and how to remove already existing
duplicate records.
Preventing Duplicates from Occurring in a Table:
You can use a PRIMARY KEY
or UNIQUE
Index on a table
with appropriate fields to stop duplicate records. Lets take one
example, The following table contains no such index or primary key, so
it would allow duplicate records for first_name and last_name
CREATE TABLE person_tbl
(
first_name CHAR(20),
last_name CHAR(20),
sex CHAR(10)
);
|
To prevent multiple records with the same first and last name values
from being created in this table, add a PRIMARY KEY to its definition.
When you do this, it's also necessary to declare the indexed columns to
be NOT NULL, because a PRIMARY KEY does not allow NULL values:
CREATE TABLE person_tbl
(
first_name CHAR(20) NOT NULL,
last_name CHAR(20) NOT NULL,
sex CHAR(10)
PRIMARY KEY (last_name, first_name)
);
|
The presence of a unique index in a table normally causes an error
to occur if you insert a record into the table that duplicates an
existing record in the column or columns that define the index.
Use INSERT IGNORE
rather than INSERT
. If a record
doesn't duplicate an existing record, MySQL inserts it as usual. If the
record is a duplicate, the IGNORE keyword tells MySQL to discard it
silently without generating an error.
Following example does not error out and same time it will not insert duplicate records.
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
-> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
-> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)
|
Use REPLACE
rather than INSERT. If the record is new, it's
inserted just as with INSERT. If it's a duplicate, the new record
replaces the old one:
mysql> REPLACE INTO person_tbl (last_name, first_name)
-> VALUES( 'Ajay', 'Kumar');
Query OK, 1 row affected (0.00 sec)
mysql> REPLACE INTO person_tbl (last_name, first_name)
-> VALUES( 'Ajay', 'Kumar');
Query OK, 2 rows affected (0.00 sec)
|
INSERT IGNORE and REPLACE should be chosen according to the
duplicate-handling behavior you want to effect. INSERT IGNORE keeps the
first of a set of duplicated records and discards the rest. REPLACE
keeps the last of a set of duplicates and erase out any earlier ones.
Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table.
CREATE TABLE person_tbl
(
first_name CHAR(20) NOT NULL,
last_name CHAR(20) NOT NULL,
sex CHAR(10)
UNIQUE (last_name, first_name)
);
|
Counting and Identifying Duplicates:
Following is the query to count duplicate records with first_name and last_name in a table.
mysql> SELECT COUNT(*) as repetitions, last_name, first_name
-> FROM person_tbl
-> GROUP BY last_name, first_name
-> HAVING repetitions > 1;
|
This query will return a list of all the duplicate records in
person_tbl table. In general, to identify sets of values that are
duplicated, do the following:
-
Determine which columns contain the values that may be duplicated.
-
List those columns in the column selection list, along with COUNT(*).
-
List the columns in the GROUP BY clause as well.
-
Add a HAVING clause that eliminates unique values by requiring group counts to be greater than one.
Eliminating Duplicates from a Query Result:
You can use DISTINCT
along with SELECT statement to find out unique records available in a table.
mysql> SELECT DISTINCT last_name, first_name
-> FROM person_tbl
-> ORDER BY last_name;
|
An alternative to DISTINCT is to add a GROUP BY clause that names
the columns you're selecting. This has the effect of removing
duplicates and selecting only the unique combinations of values in the
specified columns:
mysql> SELECT last_name, first_name
-> FROM person_tbl
-> GROUP BY (last_name, first_name);
|
Removing Duplicates Using Table Replacement:
If you have duplicate records in a table and you want to remove all
the duplicate records from that table then here is the procedure.
mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
-> FROM person_tbl;
-> GROUP BY (last_name, first_name);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;
|
An easy way of removing duplicate records from a table is that add
an INDEX or PRIMAY KEY to that table. Even if this table is already
available you can use this technique to remove duplicate records and
you will be safe in future as well.
mysql> ALTER IGNORE TABLE person_tbl
-> ADD PRIMARY KEY (last_name, first_name);
分享到:
相关推荐
"Delete Duplicates for Outlook Express 5.8.1" 是一款专为Outlook Express设计的软件,用于帮助用户查找并删除邮箱中的重复邮件。Outlook Express是微软在早期推出的一款电子邮件客户端,它允许用户接收、发送电子...
Records: 3 Duplicates: 0 Warnings: 0 mysql> SELECT MATCH(c) AGAINST('上海') FROM t; ---------------------------- | MATCH(c) AGAINST('上海') | ---------------------------- | 0 | | ...
Records: 0 Duplicates: 0 Warnings: 0 ``` - 查看修改后的表结构: ```sql mysql> describe student; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra...
Python 重复数据处理(df.drop_duplicates方法)Python源码Python 重复数据处理(df.drop_duplicates方法)Python源码Python 重复数据处理(df.drop_duplicates方法)Python源码Python 重复数据处理(df.drop_...
Delete Duplicates for Outlook Express and Windows Mail v3.8.0.1注册版程序
《Informatica案例研究:RemoveDuplicates》 在信息技术领域,数据清洗和去重是至关重要的步骤,尤其是在大数据处理中。Informatica作为一个强大的数据集成工具,提供了RemoveDuplicates功能,以确保数据的准确性和...
在本次分享中,我们将详细探讨Pandas库中的`drop_duplicates`方法,以及它的应用场景和重要参数。 首先,`drop_duplicates`方法是专为`DataFrame`格式的数据设计的,它允许我们从数据集中移除那些重复的行。该方法...
Remove_duplicates.cpp
这个压缩包文件"Python代码源码-实操案例-框架案例-重复数据处理(df.drop duplicates方法).zip"显然包含了一些示例代码,用于解释如何使用Pandas的`df.drop_duplicates()`方法来处理重复的数据。`df.drop_duplicates...
"LeetCode Remove Duplicates from Sorted Array解决方案" 本文将详细介绍 LeetCode 中的 Remove Duplicates from Sorted Array 解决方案,包括问题描述、解决方案和关键知识点。 问题描述: 给定一个排序的数组 ...
"remove_duplicates"功能专注于解决一个特定的问题:从数组中去除重复的条目。这个功能对于那些需要保持数组元素唯一性的场景非常有用,比如在数据存储、数据分析或者前端开发中避免显示重复的数据。 在JavaScript...
`pandas.DataFrame.drop_duplicates` 是一个非常实用的功能,用于处理数据框 (`DataFrame`) 中的重复行。这个函数在数据预处理阶段尤其重要,因为它可以帮助确保分析的数据是唯一的,避免因重复记录导致的错误统计。...
“Delete Duplicates for Outlook Express and Windows Mail v3.8.0.1”是一款专门针对Outlook Express和Windows Mail设计的去重工具。该软件的主要功能在于帮助用户清理邮箱中重复的邮件,从而有效提高邮箱空间的...
而“FindDuplicates”是 Calibre 的一款插件,专门用于帮助用户查找和处理重复的书籍条目,确保用户的电子书库保持整洁有序。这款插件通过比较元数据、文件内容或标题来识别可能的重复项,并提供了合并标题的功能,...
Pandas之drop_duplicates:去除重复项 方法 DataFrame.drop_duplicates(subset=None, keep='first', inplace=False) 参数 这个drop_duplicate方法是对DataFrame格式的数据,去除特定列下面的重复行。返回...
`drop_duplicates()`函数是DataFrame中用于去重的关键方法,本文将详细介绍如何使用这个函数来实现这两种需求。 首先,让我们了解如何创建一个包含重复值的DataFrame。DataFrame可以由字典、列表、数组或其他数据...
I have two seperate directory structures `dirA` and `dirB` that have possible duplicates between them. I want to delete all duplicates from `dirB` 在这种情况下, finddupes_in_dir.py是最佳解决方案。 您...
python python_leetcode题解之083_Remove_Duplicates_from_Sorted_List