PreparedState 原理

geeksun

浏览: 970742 次
性别:
来自: 北京

最近访客更多访客>>

尘与飞

poppinhai

PROFANS

zy17655015

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

database

应用服务器 IT厂商 SQL Server JDBC SQL

数据库有一个艰巨的任务。他们接受来自许多客户端的并发SQL查询和尽可能有效地处理对数据的查询。处理语句是一个昂贵的数据库操作，但现在写的这样一种方式使这一开销降到最低。然而，如果我们要利用这些优势，这些优化需要从应用程序开发得到援助。本文介绍如何正确使用PreparedStatements可以大大帮助数据库执行这些优化。

数据库如何执行一个语句？
显然，不要指望在这里讲到很多细节，我们将只研究这方面的重要文章。当一个数据库收到一条SQL语句，数据库引擎首先解析这条语句，并寻找这条语句的语法错误。一旦语句解析，数据库需要找出最有效的方法来执行语句。这项计算相当昂贵。该数据库会检查索引，如果有的话，可以帮助分析出是否应该做表的全表扫描。数据库使用数据统计找出最好的方法。一旦查询计划创建，就可以由数据库引擎执行。

这就促使CPU生成访问计划。理想情况下，如果我们向数据库发送相同的语句两次，那么我们希望数据库能重用第一条语句的访问计划。这将比它第二次重新生成这样的计划占用较少的CPU。

语句的缓存
数据库会调整语句的缓存。通常包括一些不同种类语句的缓存。这种缓存将语句本身当做键使用，并将访问计划存在通信语句的缓存中。这使得数据库引擎重用那些以前被执行过的语句的计划。例如，如果我们发送给数据库这样一条语句，如“select a,b from t where c = 2”，那么这个计算过的访问计划被缓存到数据库里。如果我们稍后发送同样的语句，数据库可以重用以前的访问计划，从而减少了我们CPU的计算能力。

但是请注意，整个语句是关键。例如，如果我们稍后发出语句“select a,b from t where c = 3”，这将找不到访问计划。这是因为在“C = 3”与缓存计划里的“C = 2不同”。因此，例如：

对于（int i = 0; i < 1000; ++i）
（
         PreparedStatement ps = conn.prepareStatement（“select a,b from t where c = ”+ i）;
         ResultSet rs = Ps.executeQuery（）;
         Rs.close（）;
         Ps.close（）;
）
这里的缓存将不被使用。在每个循环的遍历中发出了一个不同的SQL语句到数据库。一个新的访问计划在每个迭代时计算出来，我们这样做主要是在CPU轮循中不能使用这种方法（访问计划）。但是，看看下一个片断：

PreparedStatement ps = conn.prepareStatement（“select a,b from t where c = ？”）;
对于（int i = 0; i < 1000; ++i）
（
         ps.setInt（1，i）;
         ResultSet rs = ps.executeQuery（）;
         Rs.close（）;
）
ps.close（）;
这里将更加有效。语句发送到数据库时使用参数化的'？' SQL标记。这意味着每一个迭代发出了同样的语句到数据库，只是“C =？” 部分参数不同而已。这将允许数据库重用SQL语句的访问计划，并使得程序在数据库中执行更有效。这基本上让你的应用程序运行得更快，或节省出更多的CPU空间提供给数据库用户。

PreparedStatements和J2EE服务器
当我们使用J2EE服务器时，事情可能会变得更加复杂。通常情况下，一个语句关联一个单独的数据库连接。当连接关闭时，PreparedStatement就被丢弃。通常，一个胖客户端应用将获得一个数据库连接，然后持有它的生命周期。它将立即或延迟创建所有的预处理语句。立即就是在应用程序启动时所有的语句都被创建。延迟就是当语句被使用时才创建。立即的方法会使应用程序的启动时间加长，但一旦启动后执行效率很高。延迟的方法会使应用程序快速启动，但在应用程序运行时，预处理语句在首次被应用程序使用时才创建。这使性能变得不平衡，直到所有的语句预处理完毕，最后应用和立即运行的应用一样快。这最好取决于你是否需要一个快速启动，或者性能的要求。

一个J2EE应用程序的问题是它不能这样工作。在整段时间内只能保存一个请求的连接。这意味着每一次当请求执行时，必须创建预处理语句。这不是有效率的胖客户端的方法，即预处理语句只创建一次，而不是每个请求每创建。 J2EE厂商已经注意到这一点，设计的连接池就是为了避免这种性能上的缺点。

当J2EE服务器给您的应用程序提供了一个连接时，这不是给了你一个真实的连接，你只是得到了一个包装。你可以验证这个--分析连接给你的类名可以证实这一点。这不是一个数据库的JDBC连接，这将是您的应用程序服务器上创建的一个类。通常，如果你要求关闭一个连接，然后JDBC驱动会关闭连接。我们想让连接被J2EE应用程序关闭时，归还到连接池中。我们通过一个代理JDBC类，这么做，使连接看上去像一个真实的连接。它和真正的连接有关。当我们调用连接的一些方法时，代理将转发到真正的连接去调用。但是，当我们调用这样的方法如要求关闭连接而不是要求关闭真正的连接，它只是返回连接到连接池，然后将代理的连接标记为无效，所以当这个应用再次使用时，我们将得到一个异常。

包装是非常有用的，同样有利于J2EE应用服务器实现对预处理语句增加支持，这是个明智的做法。当应用程序调用Connection.prepareStatement，通过驱动返回一个PreparedStatement对象。然后，应用程序持有连接和在请求完成的时候关闭连接。但是，在连接返回到池中，后来再用到这个连接时，或者另一个应用，理想情况下，我们希望是同一个PreparedStatement返回给应用程序。

J2EE的PreparedStatement缓存
J2EE的PreparedStatement缓存实现是使用了J2EE服务器内部的连接池管理器的缓存。 J2EE服务器为每个池中的数据库连接保持了一个预处理语句的列表。当一个应用程序调用一个连接上的prepareStatement时，应用服务器会检查这个预处理语句是不是已编译过。如果是，PreparedStatement对象将被放在缓存里，并返回给应用程序。如果不是，调用将被传给JDBC驱动程序，并且query/ PreparedStatement对象被添加在该连接的缓存里。

我们需要每个连接的缓存，因为那是JDBC驱动程序的工作。任何一个preparedstatements都会返回特定的连接。

如果我们想利用缓存的优势，像之前一样应用同样的规则。我们需要使用参数化的查询，所以他们将会去匹配一个已经在缓存里预编译过的。大多数应用服务器将允许您调整这个预处理语句的高速缓存大小。

摘要
总之，我们应该使用预处理语句的参数化查询。通过重用预编译的访问计划，将会减轻数据库的压力。这个缓存是数据库范围的，所以如果你能为所有的应用程序安排使用类似的参数化SQL，你将会改善这个缓存计划的效率，就像一个应用可以使用另一个应用的预处理语句的优势。这是一个应用服务器的优势，因为访问数据库的逻辑应该集中在数据访问层（或者一个OR映射器，实体bean或者直接JDBC）。

最后，正确使用预处理语句也使你可利用应用服务器里的预处理语句缓存。这将会提高你的应用程序的性能，同样地，通过重用早先的预编译语句调用，应用可以减少调用JDBC的数量。这使得，它和胖客户端的竞争是效率明智的，并消除了不能保持一个专用连接的缺点。

如果您使用参数化的预处理语句，您将会提高数据库和应用程序服务器托管代码的效率。这些改进将允许您的应用程序提高它的性能。

（后记：终于翻译完了，是其中借鉴了google translate参考，在google translate的基础上个人做的修改，翻译真不容易，这一篇小文用了4、5个小时才搞定，向从事过翻译工作的各位致敬）

原文：

Databases have a tough job. They accept SQL queries from many clients concurrently and execute the queries as efficiently as possible against the data. Processing statements can be an expensive operation but databases are now written in such a way so that this overhead is minimized. However, these optimizations need assistance from the application developers if we are to capitalize on them. This article shows you how the correct use of PreparedStatements can significantly help a database perform these optimizations.

How does a database execute a statement?

Obviously, don't expect alot of detail here; we'll only examine the aspects important to this article. When a database receives a statement, the database engine first parses the statement and looks for syntax errors. Once the statement is parsed, the database needs to figure out the most efficient way to execute the statement. This can be computationally quite expensive. The database checks what indexes, if any, can help, or whether it should do a full read of all rows in a table. Databases use statistics on the data to figure out what is the best way. Once the query plan is created then it can be executed by the database engine.

It takes CPU power to do the access plan generation. Ideally, if we send the same statement to the database twice, then we'd like the database to reuse the access plan for the first statement. This uses less CPU than if it regenerated the plan a second time.

Statement Caches

Databases are tuned to do statement caches. They usually include some kind of statement cache. This cache uses the statement itself as a key and the access plan is stored in the cache with the corresponding statement. This allows the database engine to reuse the plans for statements that have been executed previously. For example, if we sent the database a statement such as "select a,b from t where c = 2", then the computed access plan is cached. If we send the same statement later, the database can reuse the previous access plan, thus saving us CPU power.

Note however, that the entire statement is the key. For example, if we later sent the statement "select a,b from t where c = 3", it would not find an access plan. This is because the "c=3" is different from the cached plan "c=2". So, for example:

For(int I = 0; I < 1000; ++I)
{
        PreparedStatement ps = conn.prepareStatement("select a,b from t where c = " + I);
        ResultSet rs = Ps.executeQuery();
        Rs.close();
        Ps.close();
}

Here the cache won't be used. Each iteration of the loop sends a different SQL statement to the database. A new access plan is computed for each iteration and we're basically throwing CPU cycles away using this approach. However, look at the next snippet:

PreparedStatement ps = conn.prepareStatement("select a,b from t where c = ?");
For(int I = 0; I < 1000; ++I)
{
        ps.setInt(1, I);
        ResultSet rs = ps.executeQuery();
        Rs.close();
}
ps.close();

Here it will be much more efficient. The statement sent to the database is parameterized using the '?' marker in the sql. This means every iteration is sending the same statement to the database with different parameters for the "c=?" part. This allows the database to reuse the access plans for the statement and makes the program execute more efficiently inside the database. This basically let's your application run faster or makes more CPU available to users of the database.

PreparedStatements and J2EE servers

Things can get more complicated when we use a J2EE server. Normally, a prepared statement is associated with a single database connection. When the connection is closed, the preparedstatement is discarded. Normally, a fat client application would get a database connection and then hold it for its lifetime. It would also create all prepared statements eagerly or lazily . Eagerly means that they are all created at once when the application starts. Lazily means that they are created as they are used. An eager approach gives a delay when the application starts but once it starts then it performs optimally. A lazy approach gives a fast start but as the application runs, the prepared statements are created when they are first used by the application. This gives an uneven performance until all statements are prepared but the application eventually settles and runs as fast as the eager application. Which is best depends on whether you need a fast start or even performance.

The problem with a J2EE application is that it can't work like this. It only keeps a connection for the duration of the request. This means that it must create the prepared statements every time the request is executed. This is not as efficient as the fat client approach where the prepared statements are created once, rather than on every request. J2EE vendors have noticed this and designed connection pooling to avoid this performance disadvantage.

When the J2EE server gives your application a connection, it isn't giving you the actual connection; you're getting a wrapper. You can verify this by looking at the name of the class for the connection you are given. It won't be a database JDBC connection, it'll be a class created by your application server. Normally, if you called close on a connection then the jdbc driver closes the connection. We want the connection to be returned to the pool when close is called by a J2EE application. We do this by making a proxy jdbc connection class that looks like a real connection. It has a reference to the actual connection. When we invoke any method on the connection then the proxy forwards the call to the real connection. But, when we call methods such as close instead of calling close on the real connection, it simply returns the connection to the connection pool and then marks the proxy connection as invalid so that if it is used again by the application we'll get an exception.

Wrapping is very useful as it also helps J2EE application server implementers to add support for prepared statements in a sensible way. When an application calls Connection.prepareStatement, it is returned a PreparedStatement object by the driver. The application then keeps the handle while it has the connection and closes it before it closes the connection when the request finishes. However, after the connection is returned to the pool and later reused by the same, or another application, , then ideally, we want the same PreparedStatement to be returned to the application.

J2EE PreparedStatement Cache

J2EE PreparedStatement Cache is implemented using a cache inside the J2EE server connection pool manager. The J2EE server keeps a list of prepared statements for each database connection in the pool. When an application calls prepareStatement on a connection, the application server checks if that statement was previously prepared. If it was, the PreparedStatement object will be in the cache and this will be returned to the application. If not, the call is passed to the jdbc driver and the query/preparedstatement object is added in that connections cache.

We need a cache per connection because that's the way jdbc drivers work. Any preparedstatements returned are specific to that connection.

If we want to take advantage of this cache, the same rules apply as before. We need to use parameterized queries so that they will match ones already prepared in the cache. Most application servers will allow you to tune the size of this prepared statement cache.

Summary

In conclusion, we should use parameterized queries with prepared statements. This reduces the load on the database by allowing it to reuse access plans that were already prepared. This cache is database-wide so if you can arrange for all your applications to use similar parameterized SQL, you will improve the efficiency of this caching scheme as an application can take advantage of prepared statements used by another application. This is an advantage of an application server because logic that accesses the database should be centralized in a data access layer (either an OR-mapper, entity beans or straight JDBC).

Finally, the correct use of prepared statements also lets you take advantage of the prepared statement cache in the application server. This improves the performance of your application as the application can reduce the number of calls to the JDBC driver by reusing a previous prepared statement call. This makes it competitive with fat clients efficiency-wise and removes the disadvantage of not being able to keep a dedicated connection.

If you use parameterized prepared statements, you improve the efficiency of the database and your application server hosted code. Both of these improvements will allow your application to improve its performance.

0
顶

0
踩

分享到：

Oracle的内存管理 | linux下安装mysql

2009-12-13 13:01
浏览 1374
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论