http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html
A clever way to do this is to take advantage of Lucene's index file structure. Indexes are directories of files. As the index changes through additions and deletions most files in the index stay the same. So you can efficiently synchronize multiple copies of an index by only copying the files that change.
The way I did this for Technorati was to:
1. On the index master, periodically checkpoint the index. Every minute or so the IndexWriter is closed and a 'cp -lr index index.DATE' command is executed from Java, where DATE is the current date and time. This efficiently makes a copy of the index when its in a consistent state by constructing a tree of hard links. If Lucene re-writes any files (e.g., the segments file) a new inode is created and the copy is unchanged.
2. From a crontab on each search slave, periodically poll for new checkpoints. When a new index.DATE is found, use 'cp -lr index index.DATE' to prepare a copy, then use 'rsync -W --delete master:index.DATE index.DATE' to get the incremental index changes. Then atomically install the updated index with a symbolic link (ln -fsn index.DATE index).
3. In Java on the slave, re-open 'index' it when its version changes. This is best done in a separate thread that periodically checks the index version. When it changes, the new version is opened, a few typical queries are performed on it to pre-load Lucene's caches. Then, in a synchronized block, the Searcher variable used in production is updated.
4. In a crontab on the master, periodically remove the oldest checkpoint indexes.
Technorati's Lucene index is updated this way every minute. A mergeFactor of 2 is used on the master in order to minimize the number of segments in production. The master has a hot spare.
分享到:
相关推荐
Distributing Negative Messages in VANET Based on Meet-Table and Cloud Computing
CHAPTER 6 Distributing and Partitioning Data 135 CHAPTER 7 Importing and Exporting Data 161 CHAPTER 8 Designing Policy Based Management 177 CHAPTER 9 Backing up and Restoring a Database 197 CHAPTER 10...
CHAPTER 6 Distributing and Partitioning Data 135 CHAPTER 7 Importing and Exporting Data 161 CHAPTER 8 Designing Policy Based Management 177 CHAPTER 9 Backing up and Restoring a Database 197 CHAPTER 10...
Python Continuous Integration and Delivery: A ...10.Distributing and Deploying Packages in the Pipeline 11.Pipeline Improvements 12.Security 13.State Management 14.Conclusions and Outlook Back Matter
ERP系统信息化资料:SAP专业培训教材handbook Transporting and Distributing Roles.doc
### 《Packt.Publishing.Expert.Python.Programming》知识点概览 #### 一、环境配置与基础(第一章) - **Python安装与配置**:本书的第一章主要介绍如何安装Python并确保所有读者拥有一个标准化的环境。...
In the early 2000s, Google invented MapReduce, a framework to systematically and methodically process big data in a scalable way by distributing the work across multiple machines. Later, the ...
Microservice is an architecture style and pattern in which complex systems are decomposed into smaller services that work together to form larger business services. Microservices are services that are...
Running and Distributing Macros 108 Running Macros 108 Distributing the VBA Project 109 Distributing Individual Macros 110 Summary 110 Chapter 5: Outlook Forms 111 Working with Standard Forms 111 ...
Docker containers offer simpler, faster, and more robust methods for developing, distributing, and running software than previously available. With this hands-on guide, you’ll learn why containers ...
It presents innovative solutions addressing node information collection, multicast in heterogeneous environments, routing efficiency, and indexing management. The implementation of Granary ...
exe4j helps you with starting your Java applications in a safe way, displaying native splash screens, detecting or distributing suitable JREs and JDKs, startup error handling and much more.
1. **Automated Distribution**: QlikView Publisher automates the process of distributing QlikView applications and reports to end-users based on predefined schedules and rules. This feature ...
, the book provides techniques for packaging and distributing the final app to all the major platforms. Get Hands-On Guidance through Practical Techniques and Examples The book is divided into three...
Docker in Action starts with a clear explanation of the Docker model of virtualization and shows how to create, deploy, and manage applications hosted in Docker containers. It offers specific ...
It offers specific techniques for testing and distributing applications via Docker Hub and other registries. It shows how to take advantage of the Linux OS features that Docker uses to run programs ...
Python in a Nutshell: A Desktop Quick Reference by Alex Martelli English | 7 Apr....Python extension modules, and tools for packaging and distributing extensions, modules, and applications
After starting with a clear explanation of the Docker model, you will learn how to package applications in containers, including techniques for testing and distributing applications. You will also ...