What is Apache Lucene ?
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. It is an open source project available for free download.
As I was submitted to the Google Summer of Code 2016 (GSoC), I tried to install Apache Lucene 6.0.0 for learning basical concept before the final answer at 25th April.
Download Apache Lucene 6.0.0
First, I’ve downloaded the latest Lucene distribution (6.0.0) and then extract
it to my GSoC working directory located at ~/Documents/gsoc
Include jars for demo
Assume that we’re located at the root path of Apache Lucene installation. THere’re 4 jars that should be included for the demo, they’re :
- the Lucene JAR
./core/lucene-core-6.0.0.jar
- the queryparser JAR
./queryparser/lucene-queryparser-6.0.0.jar
- the common analysis JAR
./analysis/common/lucene-analyzers-common-6.0.0.jar
- the Lucene demo JAR
./demo/lucene-demo-6.0.0.jar
Use the linux commandline export
to add jars into java classpath :
Indexing Files
Once I’ve done that, I should now build an index! Assuming I’m currently located
at the home of lucene, then tape the following command the build index for
folder docs
. Please notice that the official tutorial suggests to use src
folder. But this folder is not avaible to Apache Lucene 6.0.0 installation
(I’m using lucene-6.0.0.tgz
). So use another folder if you’re in the same
situation, such as docs
:
java org.apache.lucene.demo.IndexFiles -docs docs
This will produce a subdirectory called index
which will contain an index of
all of the Lucene source code.
Search results
We can search index / results using the following commandline :
Here’re the search results for keyword huangmincong
and keyword string
:
Tomorrow, I’ll learn more about how Lucene works,especially the IndexFiles
,
the Analyzer
, the Directory
and the IndexWriter
.