`
beritha
  • 浏览: 13603 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类

When to commit

    博客分类:
  • solr
阅读更多

When to commit ?

The question I asked myself recently what seems to be one of those for which the response should be quick and painless. So, when to send the commit command to Solr (or Lucene)? Despite the simplicity of the questions, the answer is not clear, at least in my opinion.

To answer the question of when to send the commit command, you must look at several different variants of data indexing and how quickly you want the data to be available on the slave servers. Looking at a typical implementations, which I had a pleasure to work with we can distinguish the following categories:

Data can be made available only after a total index update

The simplest situation theoretically and practically. We send the commit command only when you run out documents to be indexed.

The data may be available in batches, without waiting for a full update of the index

Here we have three possibilities:

  1. If it does not matter whether the data will be made available in batches or not, we can send the commit command after sending the last document.
  2. If you want to share data in batches, our application can send a commit command from time to time.
  3. If you do not want to send the commit commands from the indexing application, we can tell Solr to do it for us by setting up the autocommit mechanism.

Data must be indexed as fast as possible

If your data should be indexed as fast as possible the commit operation should be sent only after sending all the data. Commit is quite expensive in terms of performance and therefore, in this case, should be used only at the end of the indexation process.

It is important that the data should be published as soon as possible

This is probably the most difficult of the described cases. It all depends on how quickly we want the data to be available on slave servers. For example, in the case of CMS, when the user saves the edited page, we want its updated content to be available right away – then commit after every document, and fast replication is needed.When you add items to an online store, you may add some delay to commit and replication. Such cases can be multiplied indefinitely. But remember to set up your warming queries properly to prepare Solr fot the usual load during querying.
Persons interested in very frequent updating of the index should observe what is happening in Lucene and Solr for NRT (near real time).

Optimization

It is worth remembering also to optimize the index. If we send the commit command only once, at the end of the indexing is worth considering whether or not to send optimize instead of commit. Our slaves will get an optimized version of the index along with the newest data. Note, however, that the optimization of the index is longer than commit.

Dangers

It is also worth remembering that the waiting indefinitely with commit operations can lead to the danger of data loss that have not been physically written to the index files. Of course, nothing with the data does not happen if the Solr will be properly turned off, while in case of machine failure situation we can lost the data tha we were indexing since the last commit operation.

To sum up

As you can see, there is no clear answer to when to send the commit command because it depends on the situation and individual needs. Note, however, that the actions that are performed by Lucene / Solr after sending the commit command is costly in terms of system resources. Do not use this command frequently as instead of indexing data Lucene/Solr may spend most of their time processing those commands.

 

转载:http://solr.pl/en/2011/06/27/when-to-commit/

分享到:
评论

相关推荐

    程序员为什么还要刷题-version-control:来自Udacity的HowtoUseGitandGitHub课程的工作产品

    When to commit: commit per logical change. For example, if you fixed a typo, then fixed a bug in a separate part of the file. Tracking across files? Is important if all the files in the repository ...

    Android代码-maven-git-commit-id-plugin

    git-commit-id-plugin is a plugin quite similar to https://fisheye.codehaus.org/browse/mojo/tags/buildnumber-maven-plugin-1.0-beta-4 for example but as buildnumber at the time when I started this ...

    Introduction to Transactions

    when ‘commit transaction’ statement returns, the updated data are effective immediately. when ‘rollback transaction’ statement returns, the effect of data modifications by the Xact will have been ...

    Quad Chamfer Modifier 1.16

    You don’t have to commit to any chamfer, you can change them anytime And many more other benefits over the normal chamfer and quad chamfer maxscript Situations in which you would want to use the ...

    kafka-definitive-guide pdf

    we used when explaining pause() functionality), you will want to process the events you accumulated before losing ownership of the partition. Perhaps you also need to close file handles, database ...

    Git-2.21.0-64-bit.zip

    * "git request-pull" learned to warn when the ref we ask them to pull from in the local repository and in the published repository are different. * When creating a partial clone, the object ...

    a project model for the FreeBSD Project.7z

    Committers are developers with the privilege of being able to commit changes. These are usually the most active developers who are willing to spend their time not only integrating their own code but ...

    gerrit-3.0.3.war

    Issue 11205: Allow Gerrit admins to reindex a change even when Read access is not allowed on its target branch. Extend the addMenuLink method in the PolyGerrit plugin API to allow plugins to specify ...

    微软内部资料-SQL性能优化2

    To reserve or commit memory and unintentionally not release it when it is no longer being used. A process can leak resources such as process memory, pool memory, user and GDI objects, handles, threads...

    微软内部资料-SQL性能优化3

    It is up to the application to define what consistency means, and isolation in some form is needed to achieve consistent results. SQL Server uses locking to achieve isolation. Definition of ...

    Google Earth

    Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can ...

    google repo工具

    Note: There is a slight mirroring lag between when a change is visible on the web in Gerrit and when repo download will be able to find it for all users, because of replication delays to all servers ...

    Agile Project Management with Scrum

    When teams understand and commit to delivering business value for their customers, when they are free to figure out how to perform tasks, and when they are given the resources they need, they will ...

    Agile.Project.Management.with.Scrum

    When teams understand and commit to delivering business value for their customers, when they are free to figure out how to perform tasks, and when they are given the resources they need, they will ...

    Sublime Text Build 3124 x64 Setup.exe

    API: Updated Python 3.3 to commit 8e3b9bf917a7, and SQLite to 3.14.1 Packages: Loading packages will no longer abort if a .sublime-package is corrupt Packages: Fixed an edge case when loading third ...

    GIT2.33.0.2最新版本.zip

    * "git rev-list" learns to omit the "commit <object-name>" header lines from the output with the `--no-commit-header` option. * "git worktree add --lock" learned to record why the worktree is ...

    plsqldev13.0.3.1902x32主程序+ v12中文包+keygen

    Compare Table Data tool would commit every record in "update database" mode when using a commit interval Using the Auto Refresh function in the SQL Window would prompt for substitution variable values...

    plsqldev13.0.3.1902x64主程序+ v12中文包+keygen

    Compare Table Data tool would commit every record in "update database" mode when using a commit interval Using the Auto Refresh function in the SQL Window would prompt for substitution variable values...

    VisualSVN-Server-2.7.6-svn-1.8.9

    Hotfix: Subversion 1.6 and older client fails to commit changes to paths with whitespaces or non-ASCII characters (the problem reappeared after the update to Apache HTTP Server 2.2.27). Significant ...

    宾馆管理系统

    Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can ...

Global site tag (gtag.js) - Google Analytics