`
Hooopo
  • 浏览: 335207 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Using indexes in rails: Index your associations

阅读更多


Many rails developers are great at building applications but have limited experience in database design. As a consequence, projects often have half-baked indexing strategies, and as a result suffer bad performance.

To try and improve this I’ve planned a series of posts on indexes, targetted at rails developers. In this first post I’ll introduce indexes and how to index your associations, then I’ll write about choosing additional indexes to improve query performance, and finally how to avoid redundant and duplicate indexes.

A brief overview of database indexes

Wikipedia states that ‘a database index is a data structure that improves the speed of operations on a database table’. Unfortunately, this improvement comes at a cost.

For every index on a table, there is a penalty both when inserting and updating rows. Indexes also take up space on disk and in memory, which can affect the efficiency of queries. Finally, having too many indexes can cause databases to choose between them poorly, actually harming performance rather than improving it.

So while indexing is important, we shouldn’t just throw indexes at our slow queries: we need to choose carefully how to index our data.

Indexing simple associations

By far the most common performance problem I’ve encountered in rails projects is a lack of indexes on foreign keys. There’s no real excuse for this - not indexing foreign keys can cripple your app.

Take the following schema:

create_table users do |table|
  table.string :login
end

create_table conversations do |table|
  table.string  :subject, :null => false
  table.integer :user_id, :null => false
end

We can use this to map a one-to-many relationship between users and conversations, where user_id as the foreign key.

Here are the models to do that:

class User < ActiveRecord::Base
  has_many :conversations
end

class Conversation < ActiveRecord::Base
  belongs_to :user
end

With these models, to find all conversations for a particular user we’d use user.conversations, which in turns uses sql like this:

SELECT * FROM conversations WHERE user_id = 41;

I can run this query on a test database which I’ve randomly populated with 1,000,000 rows, to see how long it takes. Note, I’ve cut out the actual results as they are unimportant:

mysql> SELECT * FROM conversations WHERE user_id = 41;
12 rows in set (1.42 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE user_id = 41;
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ALL  | NULL          | NULL    | NULL  | 1001111 | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

Although the query is simple, it took 1.42 seconds. The key column show the key or index that MySQL decided to use, in this case NULL as there are no indexes. The rows column is also relevant. It shows that MySQL will need to look at around 1,000,000 rows; that’s a lot of data being loaded and compared.

What a difference just an index makes

If we then add an index on user_id:

add_index :conversations, :user_id, :name => 'user_id_ix'

And do the same select:

mysql> SELECT * FROM conversations WHERE user_id = 41;
12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE user_id = 41;
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | used_id_ix    | 5       | const |  108    | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

The difference is remarkable. From over 1.4 seconds to about 1 hundredth. Unless you have a cast-iron reason not to, index your foreign keys.

Indexing polymorphic associations

So for simple associations, we can add an index on the foreign_key column. For polymorphic associations the foreign key is made up of two columns, one for the id and one for the type. Let’s add another association to our models to illustrate this.

add_column :conversations, :subject_id, :integer
add_column :conversations, :subject_type, :string

create_table :artworks do |table|
  table.string :title
end

class Artwork < ActiveRecord::Base
  has_one :conversation, :as => :subject
end

class Conversation < ActiveRecord::Base
  belongs_to :subject, :polymorphic => true
end

Here we’ve added an association between Artwork and Conversation, where an artwork can be the subject of a conversation. From an artwork, we can find the related conversation (if any) withartwork.conversation which will use the following SQL:

SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';

Again the query takes around 1.4 seconds without any indexes. Now though, we have a choice on what to index. We can index either subject_type on its own, subject_id on its own, or both together.

Let’s try each in turn, and measure the performance.

First, an index on just subject_type:

  
mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';
12 rows in set (0.31 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork'
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | sub_type_ix   | 5       | const | 89511   | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

An index on just subject_id:

  
mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';
12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork'
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | sub_id_ix     | 5       | const | 204     | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

An index on subject_id, subject_type:

mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';
12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork'
+-------------+------+--------------------+---------+-------+---------+-------------+
| select_type | type | key                | key_len | ref   | rows    | Extra       |
+-------------+------+--------------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | sub_id_and_type_ix | 5       | const | 5       | Using where | 
+-------------+------+--------------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

So subject_type compared ~90,000 rows in 0.31 seconds, subject_id compared ~200 rows in 0.01 seconds and subject_id, subject_type compared 4 rows also in 0.01 seconds. We should add an index tosubject_id, subject_type as so:

add_index :conversations, [:subject_id, :subject_type]

Wrapping up

This should give a basic overview of indexes and the performance improvements they can give. Hopefully I’ve shown that foreign_keys should always be indexed, and how to index them. The next article (which I hope to publish later this week) will explain more about how to reason about indexes, and how to identify additional indexes (beyond those on foreign keys) to add.

via:http://tomafro.net/2009/08/using-indexes-in-rails-index-your-associations

http://tomafro.net/2009/08/using-indexes-in-rails-choosing-additional-indexes
分享到:
评论

相关推荐

    windows下python 3.9 Numpy scipy和matlabplot的安装教程详解

    学习python过程中想使用python的matlabplot绘图功能,遇到了一大批问题,然后一路过关斩将,最终安装成功,实为不易,发帖留念。 1 首先打开cmd win+r 2 pip安装 pip3 install –user numpy scipy matplotlib ...

    解决pip install psycopg2出错问题

    pip install --index-url=https://mirrors.aliyun.com/pypi/simple/ psycopg2 ``` 7. **Python虚拟环境**: 在Python项目中使用虚拟环境(如`venv`或`virtualenv`)可以避免全局安装冲突和依赖问题。创建并激活...

    动易6.6数据库结构文档

    - **Indexes**: - 主键索引:AdminID **2. PE_Advertisement** - **Columns**: - AdID:广告ID,主键 - Title:广告标题 - Description:广告描述 - ImgUrl:图片URL - LinkUrl:链接地址 - AdZoneID:...

    Rails.Angular.Postgres.and.Bootstrap.2nd.Edition

    See how creating an advanced Postgres index for a case-insensitive search speeds up your back end - enabling you to create a dynamic user experience using Angular 2. Create reusable components that ...

    Expert.Oracle.Indexing.and.Access.Paths.2nd.epub

    Recognize when queries aren’t using indexes as you intend. Manage your indexing for maximum performance. Confidently use the In Memory column store feature as an alternate access path to improve ...

    mongodb-indexes-app:简单的NodeJS应用程序,以学习如何在MongoDB中使用索引

    userSchema.index({ name: 1, email: 1 }, { unique: true }); ``` 这里,我们在`name`和`email`字段上创建了升序索引,并且`email`字段的索引设置为唯一。`1`表示升序,`-1`表示降序。 一旦模型定义好,我们可以...

    hashing-indexes-tikz:LaTeX软件包,用于使用TikZ绘制可扩展的哈希索引和线性哈希索引

    哈希索引是一种高效的数据检索方法,而`hashing-indexes-tikz`是一个专门针对LaTeX用户设计的软件包,旨在帮助用户使用TikZ(一个强大的LaTeX图形库)来绘制出清晰、专业的可扩展哈希索引和线性哈希索引的示意图。...

    clean-indexes-es:Lambda函数从您的ElasticSearch Service清除旧索引或维护许多索引

    zip your_script_zip_name.zip cleanup.py 安装要求以正确启动该功能。 路径是必要的。 pip3 install -r requirements.txt -t python/lib/python3.8/site-packages/ 压缩包含所有库的文件夹。 zip -r your_layer_...

    Major-World-Stock-Market-Indexes-data:在这里,我从url下载了html页面的文本,并对其进行了处理以获取当前的世界股票市场指数值

    for data in index_data: name = data.find('span', class_='index-name').text value = float(data.find('span', class_='index-value-num').text.replace(',', '')) date = data.find('span', class_='index-...

    rails3 和 thinking_sphinx安装

    Rails3 是 Ruby on Rails 框架的第三个主要版本,它在灵活性、性能和社区支持方面都有显著提升。Thinking Sphinx 是一个为 Rails 应用程序提供全文搜索功能的插件,它将数据库中的数据与 Sphinx 搜索引擎紧密结合,...

    b+树,learning index

    ## A C++11 implementation of the B-Tree part of "The Case for Learned Index Structures" A research **proof of concept** that implements the B-Tree section of [The Case for Learned Index Structures]...

    java相关软件

    - **修改表添加主键并创建唯一索引**:`ALTER TABLE table_name ADD CONSTRAINT pk_column PRIMARY KEY (column_name) USING INDEX;` ### 索引维护 为了保持索引的有效性和性能,还需要定期对索引进行维护: - **...

    Programming in C++ for Engineering and Science

    Programming in C++ for Engineering and Science By 作者: Larry Nyhoff ISBN-10 书号: 1439825343 ISBN-13 书号: 9781439825341 Edition 版本: 1 Release Finelybook 出版日期: 2012-08-03 pages 页数: (744 ) $...

    .A.Practical.Guide.to.TPM.2.0

    A Practical Guide to TPM 2.0: Using the Trusted Platform Module in the New Age of Security is a straight-forward primer for developers. It shows security and TPM concepts, demonstrating their use in ...

    微软内部资料-SQL性能优化5

    As we’ll see, the clustered key is duplicated in every nonclustered index row, so keeping your clustered key small will allow you to have more index fit per page in all your indexes. Note The ...

    A Practical Guide to TPM 2.0

    Chapter 11: NV Indexes Chapter 12: Platform Configuration Registers Chapter 13: Authorizations and Sessions Chapter 14: Extended Authorization (EA) Policies Chapter 15: Key Management Chapter 16: ...

    Apache Hadoop---Phoenix.docx

    - **Functional Indexes**:允许基于任何表达式创建索引,查询时直接返回表达式结果。 - **Global Indexes**:适用于读取密集型应用,但写入时可能造成较高的性能开销。 - **Local Indexes**:更适合写操作频繁的...

Global site tag (gtag.js) - Google Analytics