from MySQL Performance Blog by
Alexey Kovyrin Really
often in customers' application we can see a huge tables with
varchar/char fields, with small sets of possible values. These are
"state", "gender", "status", "weapon_type", etc, etc. Frequently we
suggest to change such fields to use ENUM column type, but is it really
necessary (from performance standpoint)? In this post I'd like to
present a small benchmark which shows MySQL performance when you use 3
different approaches: ENUM, VARCHAR and tinyint (+joined table) columns.
In practice you can also often use 4th variant which is not
comparable directly, which is using integer value and having value
mapping done on application level.
So, first of all, a few words about our data set we've used for this benchmark. We have 4 tables:
1) Table with ENUM:
SQL:
-
CREATE TABLE cities_enum (
-
id int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-
state enum('Alabama','Alaska','Arizona','Arkansas','California','Colorado','Connecticut','Delaware','District of Columbia','Florida','Georgia','Hawaii','Idaho','Illinois','Indiana','Iowa','Kansas','Kentucky','Louisiana','Maine','Maryland','Massachusetts','Michigan','Minnesota','Mississippi','Missouri') NOT NULL,
-
city varchar(255) NOT NULL,
-
PRIMARY KEY (id),
-
KEY state (state)
-
) ENGINE=MyISAM;
2) Table with VARCHAR:
SQL:
-
CREATE TABLE cities_varchar (
-
id int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-
state varchar(50) NOT NULL,
-
city varchar(255) NOT NULL,
-
PRIMARY KEY (id),
-
KEY state (state)
-
) ENGINE=MyISAM;
3) Table with INT:
SQL:
-
CREATE TABLE cities_join (
-
id int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
-
state_id tinyint(3) UNSIGNED NOT NULL,
-
city varchar(255) NOT NULL,
-
PRIMARY KEY (id),
-
KEY state_id (state_id)
-
) ENGINE=MyISAM;
4) Dictionary table for cities_join:
SQL:
-
CREATE TABLE IF NOT EXISTS `states` (
-
`id` tinyint(3) NOT NULL AUTO_INCREMENT,
-
`name` char(40) NOT NULL,
-
PRIMARY KEY (`id`),
-
UNIQUE KEY `name` (`name`)
-
) ENGINE=MyISAM;
All cities_* tables have 1,5M records each and records are
distributed among 29 different states (just happens to be data we had
available for tests)
Two important notes about this table before we get to results - this
is rather small table which fits in memory in all cases (and dictionary
table does too). Second - the rows are relatively short in this table
so changing state from VARCHAR to ENUM or TINYINT affects row size
significantly. In many cases size difference will be significantly less.
All tests are runned 1000 times and the result time is average from those 1000 runs.
So, our first benchmark is simple: we need to get 5 names of cities,
located in Minnesota and, to make things slower, we'll take those
records starting from record #10000 making MySQL to discard first 10000
records.
1) Results for ENUM:
SQL:
-
SELECT SQL_NO_CACHE city FROM cities_enum WHERE state='Minnesota' LIMIT 10000,5;
-
Result time(mean): 0.082196
2) Results for VARCHAR:
SQL:
-
SELECT SQL_NO_CACHE city FROM cities_varchar WHERE state='Minnesota' LIMIT 10000,5;
-
Result time(mean): 0.085637
3) Results for INT + join:
SQL:
-
SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) WHERE s.name='Minnesota' LIMIT 10000,5;
-
Result time(mean): 0.083277
So, as you can see, all three approaches are close with ENUM being fastest and VARCHAR few percent slower.
This may look counterintuitive because table is significantly
smaller with ENUM or TINYINT but in fact it is quite expected - This is
MyISAM table which is accessed via index, which means to retrieve each
row MySQL will have to perform OS system call to read the row, at this
point there is not much difference if 20 or 30 bytes are being read.
For Full Table Scan operation difference often would be larger.
It is also interesting to note performance of Innodb tables in this
case: for VARCHAR it takes about 0.022 per query which makes it about 4
times faster than for MyISAM. This is great example of the case when
Innodb is much faster than MyISAM for Read load.
The other surprise could be almost zero cost of the join, which we
always claimed to be quite expensive. Indeed there is no cost of the
join in this case because there is really no join:
SQL:
-
mysql> EXPLAIN SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) WHERE s.name='Minnesota' LIMIT 10000,5 \G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: SIMPLE
-
TABLE: s
-
type: const
-
possible_keys: PRIMARY,name
-
KEY: name
-
key_len: 40
-
ref: const
-
rows: 1
-
Extra:
-
*************************** 2. row ***************************
-
id: 1
-
select_type: SIMPLE
-
TABLE: c
-
type: ref
-
possible_keys: state
-
KEY: state
-
key_len: 1
-
ref: const
-
rows: 225690
-
Extra:
-
2 rows IN SET (0.10 sec)
Because we refer state by name, which is unique,it is pre-read and
query executed basically on single table querying state by ID.
Next test was a result of my curiosity. I've tried to order results by states.
1) Results for ENUM:
SQL:
-
SELECT SQL_NO_CACHE city FROM cities_enum ORDER BY state LIMIT 10000, 5;
-
Result time(mean): 0.077549
2) Results for VARCHAR:
SQL:
-
SELECT SQL_NO_CACHE city FROM cities_varchar ORDER BY state LIMIT 10000, 5;
-
Result time(mean): 0.0854793
3)
SQL:
-
SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) ORDER BY s.name LIMIT 10000,5;
-
Result time(mean): 26.0854793
As you can see, ENUM and VARCHAR show close performance, while join performance degraded dramatically.
Here is why:
SQL:
-
mysql> EXPLAIN SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) ORDER BY s.name LIMIT 10000,5\G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: SIMPLE
-
TABLE: c
-
type: ALL
-
possible_keys: state
-
KEY: NULL
-
key_len: NULL
-
ref: NULL
-
rows: 1439943
-
Extra: USING TEMPORARY; USING filesort
-
*************************** 2. row ***************************
-
id: 1
-
select_type: SIMPLE
-
TABLE: s
-
type: eq_ref
-
possible_keys: PRIMARY
-
KEY: PRIMARY
-
key_len: 1
-
ref: test.c.state_id
-
rows: 1
-
Extra:
-
2 rows IN SET (0.00 sec)
Because we're sorting by name we have to perform the join for each
row to retrieve it. This also means sort can't be done by index and
extra sort pass (filesort) is required, which also makes MySQL to store
Join result in temporary table to do the sort, all together makes
things quite miserable. Note this might not be best execution plan to
pick in this case but this is other story.
To avoid part of this problem we of course arrange state ids in the
alphabetical order and do sort by state_id, though join cost still
could be significant.
And the last test - selecting city and name in arbitrary order, skipping first 10000 rows to make query times longer.
1) Results for ENUM:
SQL:
-
SELECT SQL_NO_CACHE city, state FROM cities_enum LIMIT 10000, 5;
-
Result time(mean): 0.003125
2) Results for VARCHAR:
SQL:
-
SELECT SQL_NO_CACHE city, state FROM cities_varchar LIMIT 10000, 5;
-
Result time(mean): 0.003283
3)
SQL:
-
SELECT SQL_NO_CACHE c.city, s.name FROM cities_join c JOIN states s ON (s.id = c.state_id) LIMIT 10000,5;
-
Result time(mean): 0.004170
As you can see, ENUM and VARCHAR results are almost the same, but join query performance is 30% lower.
Also note the times themselves - traversing about same amount of rows
full table scan performs about 25 times better than accessing rows via
index (for the case when data fits in memory!)
So, if you have an application and you need to have some table field
with a small set of possible values, I'd still suggest you to use ENUM,
but now we can see that performance hit may not be as large as you
expect. Though again a lot depends on your data and queries.
分享到:
相关推荐
+ INT:i + UINT:ui + DINT:di + UDINT:udi + LINT:li + ULINT:uli + REAL:r + LREAL:f + STRING:str + WSTRING:ws + STRUCT:st + UNION:u + TIME:t + TIME_OF_DAY:tod + DATE_AND_TIME...
+ int[]:class TestInt{static void Main(){int[] array = {1,2,3,4,5};for(int i in array){Console.WriteLine(i);}}} * 委托: + delegate double MatchAction(double num); class DelegateTest{static double...
Java中的枚举(enum)是自JDK 1.5版本引入的一种强大的类型,它用于定义一组相关的常量。枚举在Java中被设计为一种特殊的类,它们默认继承自`java.lang.Enum`抽象类,并且是单继承的,这意味着它们无法再继承其他类...
这种方法不仅节省存储空间,而且在可读性、可维护性和可扩展性上优于int和enum。例如,用'Y'和'N'代替1和0来表示布尔值,可以提供更好的语义理解。此外,如果需要添加新的枚举值,char类型允许在不更改原有数据结构...
+ 字节 7:(前五个 bit)设备硬件版本号(后三个 bit)物理信号类型(Enum) + 字节 8:设备标志 + 字节 9-11:设备 ID 号 Hart 命令 1: 读主变量(PV)以浮点类型返回主变量的值。 * 请求:无 * 响应: + ...
枚举字符串C++ 枚举可与字符串相互转换。...int main (){// get enum sizestd::cout << " enum size: " << EnumManager>:: size () << std xss=removed xss=removed>:: toString (e);std::co
Java Enum和String及int的相互转化示例 Java Enum和String及int的相互转化示例是Java编程语言中的一种重要技术,通过本文,我们将详细介绍Java Enum和String及int的相互转化示例,并提供详细的示例代码,帮助大家更...
- **cat_id**(`int(10)`):商品所属分类的ID。 - **type_id**(`int(10)`):商品所属类型的ID。 - **brand_id**(`mediumint(8) unsigned`):商品品牌的ID。 - **image_default**(`longtext`):商品默认展示的...
选项Swift软件包,用于更强大的Enum类型。 目录介绍特征安装用法设置一个MappedValueRepresentable枚举使用MappedValueCollectionRepresented 使用MappedEnum类型的可编码枚举在ESet中使用OptionSet中的EnumSet 将...
2. `IntEnum`: 这是`Enum`的一个子类,它的成员默认是整数值。这意味着`IntEnum`的成员可以进行数学运算,同时保持枚举的特性。 3. `Flag`: 用于创建位标志枚举,即可以将多个枚举成员组合在一起的枚举。这对于表示...
在IT行业中,枚举(Enum)是一种非常常见且实用的数据类型,它用于定义一组相关的命名常量。在C++编程语言中,枚举提供了一种更安全、更易读的方式来代替整数常量。本文将深入探讨“魔术枚举”(Magic Enum),这是...
- `int k = i + ++i * i++`:`i`现在是3,`++i`使`i`变成4,然后计算`4 * 4++`,`4 * 4`的结果是16,此时`i`自增变为5,所以`k`的最终值是`4 + 16`,即20。 然而,代码中的`k`打印出来是11,这表明在`k`的计算过程...
wise_enum 因为反思使你明智,而不是聪明 wise_enum是C ++ wise_enum的独立智能枚举库。 它支持C ++中的智能枚举类所期望的所有标准功能: ... // Equivalent to enum class MoreColor : int64_t {BLUE, BLACK = 1
static_enum::get_enumerators创建具有所有枚举值(按值排序)的std::array<Enum> static_enum::enum_cast可以像static_cast一样用于将枚举转换为字符串或从枚举创建字符串 static_enum::to_string从枚举变量获取...
enum_simple是标准库 Enum 的一个小编辑,并且与它完美兼容。 fastenum是一种简化的实现,速度明显更快,但与标准 Enum 不兼容。 安装使用 pip install git+https://github.com/ze-phyr-us/fastenum import enum_...
`magic_enum`是一个开源库,专门用于C++中枚举类型的静态反射,它提供了一种在编译时处理枚举值的方法,而无需使用宏或额外的样板代码。这个库在C++17标准及更高版本中运行良好,是纯头文件库,这意味着只需要包含一...
可以用来解决VC不包含stdint.h头文件问题,网上下载的很多工程项目经常使用的两个头文件inttypes.h stdint.h,将头文件放到(以VS2008为例):C:\Program Files\Microsoft Visual Studio 9.0\VC\include(装在C盘时...
OpenNI2 SDK for ROS2奥比中光深度摄像头驱动安装
public enum Singleton2 { INSTANCE; } ``` - 静态内部类方式:延迟加载,线程安全。 ```java public class Singleton3 { private static class SingletonHolder { public static final Singleton3 ...
### Java中的Enum使用与分析 #### 一、概述 在Java编程语言中,`Enum`是一种特殊的数据类型,用于表示一组固定的常量值。通过枚举(`enum`),开发者可以更加直观地定义一系列预定义的值,并为这些值提供类型安全...