In networking, one of the most expensive things you can do is make a server roundtrip. In JDBC, this corresponds to a database call. If you are doing inserts, updates or deletes, you can use the executeBatch() signature to cut down the number of trips to the server. Unfortunately, there is no such built-in mechanism for select queries.
Suppose you want to get the names for a given list of ids. Logically, we would want to do something like:
PreparedStatement stmt = conn.prepareStatement( "select id, name from users where id in (?)"); stmt.setString("1,2,3");
However, this will not work. JDBC only allows you to substitute a single literal value in the "?". You can't substitute things that are meant to become part of the SQL statement. This is necessary because if the SQL itself can change, the driver can't precompile the statement. It also has the nice side effect of preventing SQL injection attacks.
Instead, you have four options for the implementation:
- Run the query separately for each id
- Build one query to do everything
- Use a stored procedure
- Select batching
Option 1: Run the query separately for each id
If you have 100 ids, this option results in 100 database calls.
PreparedStatement stmt = conn.prepareStatement( "select id, name from users where id = ?"); for ( int i=0; i < 3; i++ ) { stmt.setInt(i); // or whatever values you are trying to query by // execute statement and get result }
This option is the easiest to code, but results in the slowest performance.
Option 2: Build one query to do everything
At runtime, you use a loop to build a statement like this:
PreparedStatement stmt = conn.prepareStatement( "select id, name from users where id in (?, ?, ?)"); stmt.setInt(1); stmt.setInt(2); stmt.setInt(3);
This solution is next easiest to code (compared to running the query separately) and solves the server roundtrip problem. Unfortunately, the prepared statement must be recompiled each time you call it with a different number of parameters. This is due to the fact that the string literal of the SQL is matched against. So if I call it with ten ids, then three ids, then 100 ids, there will be three prepared statements in the cache. Aside from the added time from recompiling the prepared statement, other queries may be removed from the pool of available statements causing them to be recompiled again too. Finally, this solution can take longer for very large queries if memory is exceeded and paging to disk occurs.
Another variant of this approach is to hard code the values in the statement:
PreparedStatement stmt = conn.prepareStatement( "select id, name from users where id in (1, 2, 3)");
This approach is even worse as there is almost no chance of the statement being reused. At least with the "?", the statement will be reused for queries with same number of parameters.
Still another variant of this approach is to send multiple statements:
PreparedStatement stmt = conn.prepareStatement( "select id, name from users where id in (?) ; " + "select id, name from users where id in (?); " + "select id, name from users where id in (?)"); stmt.setInt(1); stmt.setInt(2); stmt.setInt(3);
This approach has the advantage of the database seeing the statement as being the same each time. The database will not need to figure out the execution path each time. However, the SQL is different each time from the driver's point of view and the prepared statement must still be prepared and stored in the cache each time. Also, note that not all database support sending multiple statements separated with semicolons.
Option 3: Use a stored procedure
A stored procedure executes on the database server. Therefore, it can make many queries without any network overhead. The stored procedure can collect all the results and return them in one trip. This has the advantage of being the fastest solution. However, it ties you to one database and requires you to split logic between the application server and database server. If your architecture already uses stored procedures, this is the typically the best solution.
Option 4: Select batching
Select batching is a compromise between options one and two. The ideas is to predetermine a few numbers of parameters that you want to run in each query. Then you build a bunch of queries with those numbers of parameters. Since there are a finite number of queries involved, you get the advantages of prepared statements (pre-compiling and not bumping other prepared statements from the cache.) The batching of multiple values in the same query retains the benefit of minimizing the number of server roundtrips. Finally, you avoid the memory problems of large queries by controlling the upper limit of the batch sizes. This option is the best approach if you have a critical query in terms of performance that needs batching and are not using stored procedures. Now, we walk through an example and note the important things to consider.
Example
public static final int SINGLE_BATCH = 1; public static final int SMALL_BATCH = 4; public static final int MEDIUM_BATCH = 11; public static final int LARGE_BATCH = 51;
The first thing to do is pick how many batches you want and what sizes they should be. [Note that in real code, these values should be in a property file, not hard coded. That way, you can experiment and change the batch sizes at runtime.] Regardless, of the actual batch sizes, you need a single batch - a batch of size 1. This is so you can do queries if someone only asks for one value or the "left over" values in a big query. For the batch sizes, relatively prime numbers work well. In other words, the sizes should not cleanly divide into each other or be cleanly divisible by the same number. This maximizes the number of requests that will have the fewest server roundtrips. The number of batch sizes and exact sizes vary based on your configuration. Note that the large batch size should not be too big or you will run into memory issues. Also note that the smallest batch size should be very small. You will be using this one a lot since most queries eventually get down to this point.
while ( totalNumberOfValuesLeftToBatch > 0 ) {
Everything that follows is done repeatedly until we run out of values.
int batchSize = SINGLE_BATCH; if ( totalNumberOfValuesLeftToBatch >= LARGE_BATCH ) { batchSize = LARGE_BATCH; } else if ( totalNumberOfValuesLeftToBatch >= MEDIUM_BATCH ) { batchSize = MEDIUM_BATCH; } else if ( totalNumberOfValuesLeftToBatch >= SMALL_BATCH ) { batchSize = SMALL_BATCH; } totalNumberOfValuesLeftToBatch -= batchSize;
The idea here is to find the largest batch size possible that is bigger than the number of values we are trying to query. If none of the batches are big enough, we use the single batch size of 1. As an example of how this works, suppose I do a query for 75 values. First I do a query for 51 elements. Now there are 24 values left to query. Then I do a query for 11 elements. Now there are 13 values left to query. Since there are still more than 11 elements left, I do another query for 11 elements. Now there only 2 values left to query. This is less than my smallest batch size of 4, so I do two single queries for 1 value each. So I have made a total of 5 roundtrips and used three prepared statements in the cache. A significant improvement over the 75 queries of doing this individually! Finally, we need to keep track of how many values are left to query.
StringBuilder inClause = new StringBuilder(); boolean firstValue = true; for (int i=0; i < batchSize; i++) { inClause.append('?'); if ( firstValue ) { firstValue = false; } else { inClause.append(','); } } PreparedStatement stmt = conn.prepareStatement( "select id, name from users where id in (" + inClause.toString() + ')');
Now we build the actual prepared statement. Since we always build the query in the same way, the driver will notice the SQL is identical on later executions of the same batch size. [Note: If you are not on Java 5 yet, replace StringBuilder with StringBuffer so this compiles.] It is important to return the id we are querying on so that we can later determine which name goes with which id.
for (int i=0; i < batchSize; i++) { stmt.setInt(i); // or whatever values you are trying to query by }
Set the proper number of values to query by. It is OK to include other search criteria in the query. Just put these parameters before or after the in clause parameters. In this case you will want to keep track of the current index.
From this point, you just execute the query and get the result normally. On the first try, you should expect to see a performance improvement over options 1 and 2. It will take some tweaking to optimize the batch sizes to the best ones for your situation.
Disclaimer: As noted in the famous quote "premature optimization is the root of all evil", select batching should be used to solve a performance problem. This is especially important here as you need the original numbers to confirm that you are actually getting a benefit.
相关推荐
2018-06-16 00:23:14.698 INFO 9128 --- [ main] jdbc.sqlonly : batching 5 statements: 1: insert into application$model (name, id) values ('model0', '<byte>') 2: insert into application$model (name, id...
6. **性能优化**:了解JDBC的最佳实践,如批处理(batching)、预编译语句和连接池管理,可以显著提高性能和资源效率。 7. **异常处理**:确保正确处理可能出现的`SQLExceptions`,并根据需要关闭数据库资源,如`...
《PyPI官网下载 | plone.batching-1.0.zip》 在Python的世界里,PyPI(Python Package Index)是官方的第三方软件包仓库,它为开发者提供了分享和下载Python库的平台。本文将深入探讨从PyPI官网下载的`plone....
本文将围绕“配料机控制系统”展开详细讨论,主要关注其配料程序和系统架构,以"pll.rar_Batching Plant_batching_presidentynf_配料程序_配料系统"为线索,结合提供的文件"配料机控制系统源程序讲解(2).txt"和参考...
《PyPI官网下载:深入解析pyams_batching-1.0.1-py3.6.egg》 PyPI(Python Package Index)是Python开发者的重要资源库,它提供了丰富的Python库,供全球开发者免费下载使用。在PyPI官网上,我们可以找到各种用于...
标题中的“pll.rar_batching_配料”暗示了这是一个关于配料控制系统的编程项目,可能涉及到PLC(可编程逻辑控制器)的编程与批处理控制。在工业自动化领域,配料控制系统是用于精确控制各种原料混合比例的重要系统,...
- **事务管理**:优化事务处理,减少不必要的数据库交互,例如使用批处理(Batching)来处理多个SQL语句,降低网络传输开销。 - **连接回收与释放**:确保JSP页面在完成数据库操作后正确关闭和释放连接,避免资源...
标题中的"P89V52X.rar_batching_配料_配料程序"暗示了这是一个与微控制器编程相关的项目,其中"P89V52X"可能是单片机型号,"batching"指的是批量处理或配料控制,而"配料程序"则表示这个程序用于控制配料过程。...
#hibernate.c3p0.max_statements 100 #hibernate.c3p0.idle_test_period 3000 #hibernate.c3p0.acquire_increment 2 #hibernate.c3p0.validate false ############################## ### Proxool Connection ...
10g、11g的优化器差别导致同样SQL生成不同的计划,从而导致返回结果顺序的不同,此种现象可以通过在11g中增加optimizer_features_enable提示设定优化器或者在session级别禁掉_nlj_batching_enabled来实现返回结果...
Use asynchronous and batching execution, or write your own custom add-ons to extend Absinthe. Go live with subscriptions, delivering data over websockets on top of Elixir (and Erlang/OTP's) famous ...
AB PLC例程_本资料仅供学习参考
谷歌官方发布视频
3. 支持包括预编译语句(PreparedStatement)、批处理(Batching)、游标(Cursor)在内的多种JDBC特性。 4. 具有良好的性能,尤其是在处理大量数据时。 5. 支持SSL加密连接,保证数据传输安全。 6. 提供了对Unicode...
Batching and compressing messages in a configurable manner for improved performance Bruce runs on each individual host that communicates with Kafka, receiving messages from local clients over a UNIX...
batching-fs-writestream 通过将数据分块写入磁盘来更快地写入文件,而不是像fs.createWriteStream一样,每次写入一次syscall,只是缓冲数据块,而其他写入正等待批处理将其分批到磁盘 例子 var batching = ...
var DistributedBatchingStrategy = require ( 'react-distributed-batching' ) ; var ReactUpdates = require ( 'react/lib/ReactUpdates' ) ; ReactUpdates . injection . injectBatchingStrategy ( ...
自述文件 该自述文件通常会记录启动和运行应用程序所需的所有步骤。 您可能要讲的内容: Ruby版本 系统依赖 配置 数据库创建 数据库初始化 如何运行测试套件 服务(作业队列,缓存服务器,搜索引擎等) ...
Problems with mesh batching. Not all type of meshes can be converted too. DirectX 11 Low Poly Shader has no such problems: • Does not need mesh conversion. • Does not increase mesh vertex ...