The question I asked myself recently what seems to be one of those for which the response should be quick and painless. So, when to send the commit command to Solr (or Lucene)? Despite the simplicity of the questions, the answer is not clear, at least in my opinion.
To answer the question of when to send the commit command, you must look at several different variants of data indexing and how quickly you want the data to be available on the slave servers. Looking at a typical implementations, which I had a pleasure to work with we can distinguish the following categories:
Data can be made available only after a total index update
The simplest situation theoretically and practically. We send the commit command only when you run out documents to be indexed.
The data may be available in batches, without waiting for a full update of the index
Here we have three possibilities:
- If it does not matter whether the data will be made available in batches or not, we can send the commit command after sending the last document.
- If you want to share data in batches, our application can send a commit command from time to time.
- If you do not want to send the commit commands from the indexing application, we can tell Solr to do it for us by setting up the autocommit mechanism.
Data must be indexed as fast as possible
If your data should be indexed as fast as possible the commit operation should be sent only after sending all the data. Commit is quite expensive in terms of performance and therefore, in this case, should be used only at the end of the indexation process.
It is important that the data should be published as soon as possible
This is probably the most difficult of the described cases. It all depends on how quickly we want the data to be available on slave servers. For example, in the case of CMS, when the user saves the edited page, we want its updated content to be available right away – then commit after every document, and fast replication is needed.When you add items to an online store, you may add some delay to commit and replication. Such cases can be multiplied indefinitely. But remember to set up your warming queries properly to prepare Solr fot the usual load during querying.
Persons interested in very frequent updating of the index should observe what is happening in Lucene and Solr for NRT (near real time).
Optimization
It is worth remembering also to optimize the index. If we send the commit command only once, at the end of the indexing is worth considering whether or not to send optimize instead of commit. Our slaves will get an optimized version of the index along with the newest data. Note, however, that the optimization of the index is longer than commit.
Dangers
It is also worth remembering that the waiting indefinitely with commit operations can lead to the danger of data loss that have not been physically written to the index files. Of course, nothing with the data does not happen if the Solr will be properly turned off, while in case of machine failure situation we can lost the data tha we were indexing since the last commit operation.
To sum up
As you can see, there is no clear answer to when to send the commit command because it depends on the situation and individual needs. Note, however, that the actions that are performed by Lucene / Solr after sending the commit command is costly in terms of system resources. Do not use this command frequently as instead of indexing data Lucene/Solr may spend most of their time processing those commands.
相关推荐
When to commit: commit per logical change. For example, if you fixed a typo, then fixed a bug in a separate part of the file. Tracking across files? Is important if all the files in the repository ...
git-commit-id-plugin is a plugin quite similar to https://fisheye.codehaus.org/browse/mojo/tags/buildnumber-maven-plugin-1.0-beta-4 for example but as buildnumber at the time when I started this ...
when ‘commit transaction’ statement returns, the updated data are effective immediately. when ‘rollback transaction’ statement returns, the effect of data modifications by the Xact will have been ...
You don’t have to commit to any chamfer, you can change them anytime And many more other benefits over the normal chamfer and quad chamfer maxscript Situations in which you would want to use the ...
we used when explaining pause() functionality), you will want to process the events you accumulated before losing ownership of the partition. Perhaps you also need to close file handles, database ...
* "git request-pull" learned to warn when the ref we ask them to pull from in the local repository and in the published repository are different. * When creating a partial clone, the object ...
Committers are developers with the privilege of being able to commit changes. These are usually the most active developers who are willing to spend their time not only integrating their own code but ...
Issue 11205: Allow Gerrit admins to reindex a change even when Read access is not allowed on its target branch. Extend the addMenuLink method in the PolyGerrit plugin API to allow plugins to specify ...
To reserve or commit memory and unintentionally not release it when it is no longer being used. A process can leak resources such as process memory, pool memory, user and GDI objects, handles, threads...
It is up to the application to define what consistency means, and isolation in some form is needed to achieve consistent results. SQL Server uses locking to achieve isolation. Definition of ...
Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can ...
Note: There is a slight mirroring lag between when a change is visible on the web in Gerrit and when repo download will be able to find it for all users, because of replication delays to all servers ...
When teams understand and commit to delivering business value for their customers, when they are free to figure out how to perform tasks, and when they are given the resources they need, they will ...
When teams understand and commit to delivering business value for their customers, when they are free to figure out how to perform tasks, and when they are given the resources they need, they will ...
API: Updated Python 3.3 to commit 8e3b9bf917a7, and SQLite to 3.14.1 Packages: Loading packages will no longer abort if a .sublime-package is corrupt Packages: Fixed an edge case when loading third ...
* "git rev-list" learns to omit the "commit <object-name>" header lines from the output with the `--no-commit-header` option. * "git worktree add --lock" learned to record why the worktree is ...
Compare Table Data tool would commit every record in "update database" mode when using a commit interval Using the Auto Refresh function in the SQL Window would prompt for substitution variable values...
Compare Table Data tool would commit every record in "update database" mode when using a commit interval Using the Auto Refresh function in the SQL Window would prompt for substitution variable values...
Hotfix: Subversion 1.6 and older client fails to commit changes to paths with whitespaces or non-ASCII characters (the problem reappeared after the update to Apache HTTP Server 2.2.27). Significant ...
Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can ...