`
BlackWing
  • 浏览: 200570 次
  • 性别: Icon_minigender_1
  • 来自: 广州
社区版块
存档分类
最新评论

Solr主从索引复制

    博客分类:
  • solr
阅读更多
摘自官网:
How does the slave replicate?

The master is totally unaware of the slaves. The slave continuously keeps polling the master (depending on the 'pollInterval' parameter) to check the current index version the master. If the slave finds out that the master has a newer version of the index it initiates a replication process. The steps are as follows,Slave issues a filelist command to get the list of the files. This command returns the names of the files as well as some metadata (size,lastmodified,alias if any)
The slave checks with its own index if it has any of those files in the local index. It then proceeds to download the missing files (The command name is 'filecontent' ). This uses a custom format (akin to the HTTP chunked encoding) to download the full content or a part of each file. If the connection breaks in between , the download resumes from the point it failed. At any point, it tries 5 times before giving up a replication altogether.
The files are downloaded into a temp dir. So if the slave or master crashes in between it does not corrupt anything. It just aborts the current replication.
After the download completes, all the new files are 'mov'ed to the slave's live index directory and the files' timestamps will match the timestamps in the master.
A 'commit' command is issued on the slave by the Slave's ReplicationHandler and the new index is loaded.

How are configuration files replicated?

The files that are to be replicated have to be mentioned explicitly in using the 'confFiles' parameter.
Only files in the 'conf' dir of the solr instance are replicated.
The files are replicated only along with a fresh index. That means even if a file is changed in the master the file is replicated only after there is a new commit/optimize on the master.
Unlike the index files, where the timestamp is good enough to figure out if they are identical, conf files are compared against their checksum. The schema.xml files (on master and slave) are same if their checksums match.
Conf files are also downloaded to a temp dir before they are 'mov'ed to the original files. The old files are renamed and kept in the same directory. ReplicationHandler does not automatically clean up these old files.
If a replication involved downloading of at least one conf file a core reload is issued instead of a 'commit' command.
What if I add documents to the slave or if slave index gets corrupted?

If docs are added to the slave, then the slave is not in sync with the master anymore. But, it does not do anything to keep it in sync with master until the master has a newer index. When a commit happens on the master, the index version of the master will become different from that of the slave. The slave fetches the list of files and finds that some of the files (same name) are there in the local index with a different size/timestamp. This means that the master and slave have incompatible indexes. Slave then copies all the files from master (there may be scope to optimize this, but this is a rare case and may not be worth it) to a new index dir and and asks the core to load the fresh index from the new directory.
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics