日期:2014-05-16  浏览次数:20473 次

加速 lucene 的搜索速度 ImproveSearchingSpeed

* Be sure you really need to speed things up.

Many of the ideas here are simple to try, but others will necessarily add some complexity to your application. So be sure your searching speed is indeed too slow and the slowness is indeed within Lucene.

* 请确认你真的需要更快的搜索速度

这里的很多想法都非常容易尝试,但也有一些会给你的程序带来额外的复杂度。所以请确认你的搜索速度真的慢到不能忍受,并且慢的原因的确是因为lucene。

*Make sure you are using the latest version of Lucene.

* 请确认你在使用Lucene的最新版本

*Use a local filesystem.

Remote filesystems are typically quite a bit slower for searching. If the index must be remote, try to mount the remote filesystem as a "readonly" mount. In some cases this could improve performance.

* 尽量使用本地文件系统

远程文件系统一般来说都会降低搜索速度。如果索引必须分布在远程服务器,可以尝试将远程文件系统设置为只读。在某些情况下,这样可以提高性能。

* Get faster hardware, especially a faster IO system.

Flash-based Solid State Drives works very well for Lucene searches. As seek-times for SSD's are about 100 times faster than traditional platter-based harddrives, the usual penalty for seeking is virtually eliminated. This means that SSD-equipped machines need less RAM for file caching and that searchers require less warm-up time before they respond quickly.

* 使用更快的硬件设备,特别是更快的IO设备

Lucene 搜索可以很好的工作在基于闪存的固态硬盘上。固态硬盘的寻道时间大概比传统的以磁盘为基础的硬盘快100倍。这意味着,配备固态硬盘的机器用于文件缓存的内存将变少,搜索需要较少的热身时间,能够更加迅速作出反应。

* Open the IndexReader with readOnly=true.

This makes a big difference when multiple threads are sharing the same reader, as it removes certain sources of thread contention.

* 以只读方式打开 IndexReader

在多个线程共享同一个 reader 的环境下,这样做会带来很大的改善,因为它减少了部分锁争用。

*On non-Windows platform, using NIOFSDirectory instead of FSDirectory.

This also removes sources of contention when accessing the underlying files. Unfortunately, due to a longstanding bug on Windows in Sun's JRE (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734 -- feel particularly free to go vote for it), NIOFSDirectory gets poor performance on Windows.

* 在非 windows 平台上,使用 NIOFSDirectory 替代 FSDirectory

这样做同样可以减少部分底层文件访问时的锁争用。不幸的是,因为 windows 上 Sun 的 JRE 的一个 bug (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734 ), NIOFSDirectory 在 windows 上性能更差。

* Add RAM to your hardware and/or increase the heap size for the JVM.

For a large index, searching can use alot of RAM. If you don't have enough RAM or your JVM is not running with a large enough HEAP size then the JVM can hit swapping and thrashing at which point everything will run slowly.

* 加大你的机器内存容量,给Java虚拟机分配更多的内存

索引越大,在搜索时需要使用更多的内存。如果你的机器没有足够大的内存或者你的Java虚拟机没有设置足够大的堆空间,频繁的页面文件交换和虚拟内存的使用将使你的硬盘处于超负荷状态运行,此时,一切的程序都将运行的很慢。

*Use one instance of IndexSearcher.

Share a single IndexSearcher across queries and across threads in your application.

* 在程序中使用一个唯一的IndexSearch实例

在程序的查询中共享一个IndexSearch实例,它是多线程安全的。

*When measuring performance, disregard the first query.

The first query to a searcher pays the price of initializing caches (especially when sorting by fields) and thus will skew your results (assuming you re-use the searcher for many queries). On the other hand, if you re-run the same query again and again, results won't be realistic either, because the operating system will use its cache to speed up IO operations. On Linux (kernel 2.6.16 and later) you can clean the disk cache using sync ; echo 3 > /proc/sys/vm/drop_caches. See http://linux-mm.org/Drop_Caches for details.

* 当测试搜索速度时,忽略第一次查询时间

第一次搜索将花费部分时间在缓存上(特别在按某个字段排序的情况下),从而可能使你的测试结果不太准确(假设你在多个查询中复用一个 IndexSearch实例)。另一方面来说,如果你一次又一次的重复同一个查询,所得的测试结果也是不准确的。因为操作系统将利用其高速缓存加速IO操 作。在Linux上,你可以使用如下命令清除磁盘高速缓存: echo 3 > /proc/sys/vm/drop_caches.

* Re-open the IndexSearcher only when necessar