`

Enum VS Varchar VS Int + Joined: What is Faster

阅读更多

Enum Fields VS Varchar VS Int + Joined table: What is Faster?

Really often in customers' application we can see a huge tables with varchar/char fields, with small sets of possible values. These are "state", "gender", "status", "weapon_type", etc, etc. Frequently we suggest to change such fields to use ENUM column type, but is it really necessary (from performance standpoint)? In this post I'd like to present a small benchmark which shows MySQL performance when you use 3 different approaches: ENUM, VARCHAR and tinyint (+joined table) columns.

In practice you can also often use 4th variant which is not comparable directly, which is using integer value and having value mapping done on application level.

So, first of all, a few words about our data set we've used for this benchmark. We have 4 tables:
1) Table with ENUM:

SQL:
  1. CREATE TABLE cities_enum (
  2.   id int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   state enum('Alabama','Alaska','Arizona','Arkansas','California','Colorado','Connecticut','Delaware','District of Columbia','Florida','Georgia','Hawaii','Idaho','Illinois','Indiana','Iowa','Kansas','Kentucky','Louisiana','Maine','Maryland','Massachusetts','Michigan','Minnesota','Mississippi','Missouri') NOT NULL,
  4.   city varchar(255) NOT NULL,
  5.   PRIMARY KEY  (id),
  6.   KEY state (state)
  7. ) ENGINE=MyISAM;

2) Table with VARCHAR:

SQL:
  1. CREATE TABLE cities_varchar (
  2.   id int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   state varchar(50) NOT NULL,
  4.   city varchar(255) NOT NULL,
  5.   PRIMARY KEY  (id),
  6.   KEY state (state)
  7. ) ENGINE=MyISAM;

3) Table with INT:

SQL:
  1. CREATE TABLE cities_join (
  2.   id int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  3.   state_id tinyint(3) UNSIGNED NOT NULL,
  4.   city varchar(255) NOT NULL,
  5.   PRIMARY KEY  (id),
  6.   KEY state_id (state_id)
  7. ) ENGINE=MyISAM;

4) Dictionary table for cities_join:

SQL:
  1. CREATE TABLE IF NOT EXISTS `states` (
  2.   `id` tinyint(3) NOT NULL AUTO_INCREMENT,
  3.   `name` char(40) NOT NULL,
  4.   PRIMARY KEY  (`id`),
  5.   UNIQUE KEY `name` (`name`)
  6. ) ENGINE=MyISAM;

All cities_* tables have 1,5M records each and records are distributed among 29 different states (just happens to be data we had available for tests)

Two important notes about this table before we get to results - this is rather small table which fits in memory in all cases (and dictionary table does too). Second - the rows are relatively short in this table so changing state from VARCHAR to ENUM or TINYINT affects row size significantly. In many cases size difference will be significantly less.

All tests are runned 1000 times and the result time is average from those 1000 runs.

So, our first benchmark is simple: we need to get 5 names of cities, located in Minnesota and, to make things slower, we'll take those records starting from record #10000 making MySQL to discard first 10000 records.

1) Results for ENUM:

SQL:
  1. SELECT SQL_NO_CACHE city FROM cities_enum WHERE state='Minnesota' LIMIT 10000,5;
  2. Result time(mean): 0.082196

2) Results for VARCHAR:

SQL:
  1. SELECT SQL_NO_CACHE city FROM cities_varchar WHERE state='Minnesota' LIMIT 10000,5;
  2. Result time(mean): 0.085637

3) Results for INT + join:

SQL:
  1. SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) WHERE s.name='Minnesota' LIMIT 10000,5;
  2. Result time(mean): 0.083277

So, as you can see, all three approaches are close with ENUM being fastest and VARCHAR few percent slower.

This may look counterintuitive because table is significantly smaller with ENUM or TINYINT but in fact it is quite expected - This is MyISAM table which is accessed via index, which means to retrieve each row MySQL will have to perform OS system call to read the row, at this point there is not much difference if 20 or 30 bytes are being read. For Full Table Scan operation difference often would be larger.

It is also interesting to note performance of Innodb tables in this case: for VARCHAR it takes about 0.022 per query which makes it about 4 times faster than for MyISAM. This is great example of the case when Innodb is much faster than MyISAM for Read load.

The other surprise could be almost zero cost of the join, which we always claimed to be quite expensive. Indeed there is no cost of the join in this case because there is really no join:

SQL:
  1. mysql> EXPLAIN SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) WHERE s.name='Minnesota' LIMIT 10000,5 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: SIMPLE
  5.         TABLE: s
  6.          type: const
  7. possible_keys: PRIMARY,name
  8.           KEY: name
  9.       key_len: 40
  10.           ref: const
  11.          rows: 1
  12.         Extra:
  13. *************************** 2. row ***************************
  14.            id: 1
  15.   select_type: SIMPLE
  16.         TABLE: c
  17.          type: ref
  18. possible_keys: state
  19.           KEY: state
  20.       key_len: 1
  21.           ref: const
  22.          rows: 225690
  23.         Extra:
  24. 2 rows IN SET (0.10 sec)

Because we refer state by name, which is unique,it is pre-read and query executed basically on single table querying state by ID.

Next test was a result of my curiosity. I've tried to order results by states.

1) Results for ENUM:

SQL:
  1. SELECT SQL_NO_CACHE city FROM cities_enum ORDER BY state LIMIT 10000, 5;
  2. Result time(mean): 0.077549

2) Results for VARCHAR:

SQL:
  1. SELECT SQL_NO_CACHE city FROM cities_varchar ORDER BY state LIMIT 10000, 5;
  2. Result time(mean): 0.0854793

3)

SQL:
  1. SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) ORDER BY s.name LIMIT 10000,5;
  2. Result time(mean): 26.0854793

As you can see, ENUM and VARCHAR show close performance, while join performance degraded dramatically.

Here is why:

SQL:
  1. mysql> EXPLAIN SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id) ORDER BY s.name LIMIT 10000,5\G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: SIMPLE
  5.         TABLE: c
  6.          type: ALL
  7. possible_keys: state
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 1439943
  12.         Extra: USING TEMPORARY; USING filesort
  13. *************************** 2. row ***************************
  14.            id: 1
  15.   select_type: SIMPLE
  16.         TABLE: s
  17.          type: eq_ref
  18. possible_keys: PRIMARY
  19.           KEY: PRIMARY
  20.       key_len: 1
  21.           ref: test.c.state_id
  22.          rows: 1
  23.         Extra:
  24. 2 rows IN SET (0.00 sec)

Because we're sorting by name we have to perform the join for each row to retrieve it. This also means sort can't be done by index and extra sort pass (filesort) is required, which also makes MySQL to store Join result in temporary table to do the sort, all together makes things quite miserable. Note this might not be best execution plan to pick in this case but this is other story.

To avoid part of this problem we of course arrange state ids in the alphabetical order and do sort by state_id, though join cost still could be significant.

And the last test - selecting city and name in arbitrary order, skipping first 10000 rows to make query times longer.

1) Results for ENUM:

SQL:
  1. SELECT SQL_NO_CACHE city, state FROM cities_enum LIMIT 10000, 5;
  2. Result time(mean): 0.003125

2) Results for VARCHAR:

SQL:
  1. SELECT SQL_NO_CACHE city, state FROM cities_varchar LIMIT 10000, 5;
  2. Result time(mean): 0.003283

3)

SQL:
  1. SELECT SQL_NO_CACHE c.city, s.name FROM cities_join c JOIN states s ON (s.id = c.state_id) LIMIT 10000,5;
  2. Result time(mean): 0.004170

As you can see, ENUM and VARCHAR results are almost the same, but join query performance is 30% lower.
Also note the times themselves - traversing about same amount of rows full table scan performs about 25 times better than accessing rows via index (for the case when data fits in memory!)

So, if you have an application and you need to have some table field with a small set of possible values, I'd still suggest you to use ENUM, but now we can see that performance hit may not be as large as you expect. Though again a lot depends on your data and queries.

分享到:
评论

相关推荐

    汇川PLCopen库开发指导

    + INT:i + UINT:ui + DINT:di + UDINT:udi + LINT:li + ULINT:uli + REAL:r + LREAL:f + STRING:str + WSTRING:ws + STRUCT:st + UNION:u + TIME:t + TIME_OF_DAY:tod + DATE_AND_TIME...

    C#中引用类型和值类型

    + int[]:class TestInt{static void Main(){int[] array = {1,2,3,4,5};for(int i in array){Console.WriteLine(i);}}} * 委托: + delegate double MatchAction(double num); class DelegateTest{static double...

    javaenum(枚举)使用详解+总结.pdf

    Java中的枚举(enum)是自JDK 1.5版本引入的一种强大的类型,它用于定义一组相关的常量。枚举在Java中被设计为一种特殊的类,它们默认继承自`java.lang.Enum`抽象类,并且是单继承的,这意味着它们无法再继承其他类...

    对比MySQL中int、char以及varchar的性能

    这种方法不仅节省存储空间,而且在可读性、可维护性和可扩展性上优于int和enum。例如,用'Y'和'N'代替1和0来表示布尔值,可以提供更好的语义理解。此外,如果需要添加新的枚举值,char类型允许在不更改原有数据结构...

    (中文)HART命令一览.doc

    + 字节 7:(前五个 bit)设备硬件版本号(后三个 bit)物理信号类型(Enum) + 字节 8:设备标志 + 字节 9-11:设备 ID 号 Hart 命令 1: 读主变量(PV)以浮点类型返回主变量的值。 * 请求:无 * 响应: + ...

    enum_strings:c ++ enum可转换为和从字符串

    枚举字符串C++ 枚举可与字符串相互转换。...int main (){// get enum sizestd::cout << " enum size: " << EnumManager>:: size () << std xss=removed xss=removed>:: toString (e);std::co

    Java Enum和String及int的相互转化示例

    Java Enum和String及int的相互转化示例 Java Enum和String及int的相互转化示例是Java编程语言中的一种重要技术,通过本文,我们将详细介绍Java Enum和String及int的相互转化示例,并提供详细的示例代码,帮助大家更...

    shopex 商城数据表整体结构

    - **cat_id**(`int(10)`):商品所属分类的ID。 - **type_id**(`int(10)`):商品所属类型的ID。 - **brand_id**(`mediumint(8) unsigned`):商品品牌的ID。 - **image_default**(`longtext`):商品默认展示的...

    Options:有时在某些情况下,您想在OptionSet中使用Enum或希望Enum由Raw的Int类型支持,但同时也具有String标签

    选项Swift软件包,用于更强大的Enum类型。 目录介绍特征安装用法设置一个MappedValueRepresentable枚举使用MappedValueCollectionRepresented 使用MappedEnum类型的可编码枚举在ESet中使用OptionSet中的EnumSet 将...

    Python库 | enum-0.4.7.tar.gz

    2. `IntEnum`: 这是`Enum`的一个子类,它的成员默认是整数值。这意味着`IntEnum`的成员可以进行数学运算,同时保持枚举的特性。 3. `Flag`: 用于创建位标志枚举,即可以将多个枚举成员组合在一起的枚举。这对于表示...

    article-018-magic-enum:魔术枚举的示例代码

    在IT行业中,枚举(Enum)是一种非常常见且实用的数据类型,它用于定义一组相关的命名常量。在C++编程语言中,枚举提供了一种更安全、更易读的方式来代替整数常量。本文将深入探讨“魔术枚举”(Magic Enum),这是...

    Java se 面试题.docx

    - `int k = i + ++i * i++`:`i`现在是3,`++i`使`i`变成4,然后计算`4 * 4++`,`4 * 4`的结果是16,此时`i`自增变为5,所以`k`的最终值是`4 + 16`,即20。 然而,代码中的`k`打印出来是11,这表明在`k`的计算过程...

    wise_enum:C ++的反射枚举实现

    wise_enum 因为反思使你明智,而不是聪明 wise_enum是C ++ wise_enum的独立智能枚举库。 它支持C ++中的智能枚举类所期望的所有标准功能: ... // Equivalent to enum class MoreColor : int64_t {BLUE, BLACK = 1

    static_enum:C ++ static_enum

    static_enum::get_enumerators创建具有所有枚举值(按值排序)的std::array<Enum> static_enum::enum_cast可以像static_cast一样用于将枚举转换为字符串或从枚举创建字符串 static_enum::to_string从枚举变量获取...

    fastenum:Python 3.4 的 Enum 实现的更快版本

    enum_simple是标准库 Enum 的一个小编辑,并且与它完美兼容。 fastenum是一种简化的实现,速度明显更快,但与标准 Enum 不兼容。 安装使用 pip install git+https://github.com/ze-phyr-us/fastenum import enum_...

    magic_enum:现代C ++的枚举(从字符串到字符串,从迭代到迭代)的静态反射,可与任何枚举类型一起使用,而无需任何宏或样板代码

    `magic_enum`是一个开源库,专门用于C++中枚举类型的静态反射,它提供了一种在编译时处理枚举值的方法,而无需使用宏或额外的样板代码。这个库在C++17标准及更高版本中运行良好,是纯头文件库,这意味着只需要包含一...

    stdint.h和inttypes.h头文件

    可以用来解决VC不包含stdint.h头文件问题,网上下载的很多工程项目经常使用的两个头文件inttypes.h stdint.h,将头文件放到(以VS2008为例):C:\Program Files\Microsoft Visual Studio 9.0\VC\include(装在C盘时...

    magic-enum-0.8.0

    OpenNI2 SDK for ROS2奥比中光深度摄像头驱动安装

    JavaSE 面试题 (2).docx

    public enum Singleton2 { INSTANCE; } ``` - 静态内部类方式:延迟加载,线程安全。 ```java public class Singleton3 { private static class SingletonHolder { public static final Singleton3 ...

    Java中的Enum的使用与分析

    ### Java中的Enum使用与分析 #### 一、概述 在Java编程语言中,`Enum`是一种特殊的数据类型,用于表示一组固定的常量值。通过枚举(`enum`),开发者可以更加直观地定义一系列预定义的值,并为这些值提供类型安全...

Global site tag (gtag.js) - Google Analytics