Sphinx: Setup a Full Text Indexer
Software version | 2.0.4 |
Operating System | Debian 7 |
Website | Sphinx Search Website |
Introduction
Sphinx is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It’s written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems.
Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as with a database server. A variety of text processing features enable fine-tuning Sphinx for your particular application requirements, and a number of relevance functions ensures you can tweak search quality as well. Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler, with search queries expressed in good old SQL.
Sphinx clusters scale up to tens of billions of documents and hundreds of millions search queries per day, powering top websites such as Craigslist, Living Social, MetaCafe and Groupon… to view a complete list of known users please visit our Powered-by page. And last but not least, it’s open-sourced under GPLv2, and the community edition is free to use.
Installation
To install Sphinx on Debian, this is simple:
|
|
Configuration
First of all, you need to setup the application configuration for Sphinx before starting the indexation.
MediaWiki
Install plugin
First of all, we need to get the MediaWiki plugin1 to get the sphinx config with and deploy:
|
|
You will also need to get the PHP client API. The biggest problem here is to find the one that match your Sphinx version. I had several problem regarding that point and I strongly suggest that you take time to choose the best corresponding one for your version: http://code.google.com/p/sphinxsearch/source/browse/#svn%2Ftags.
For Debian 7 and the version installed with, you need to take this one:
|
|
You also can take the logo to add it just next to the Mediawiki logo (this is optional):
|
|
Configure
Edit the configuration file and fulfill the informations with your mediawiki database (/etc/sphinxsearch/sphinx.conf
):
|
|
Others infos you need to know:
- Adapt the tables if you use a prefix (like you can see here with ‘wiki_’) on the SQL requests
- I’ve also modified all paths to match in Debian’s ones
- All highlighted lines are important, I’ve added a comment on each that needed to bring additional informations
Now you should perform an indexation of the wiki.
Then we are going to configure the MediaWiki plugin. add those lines to your LocalSettings.php:
|
|
Then you’ll be able to make search with sphinx :-)
Sphinx default
To permit to the deamon to start on boot, simply edit that file and change ’no’ to ‘yes’ (/etc/default/sphinxsearch
):
|
|
Indexation
Index
You need to create a first indexation once you’ve configured your application. To prepare sphinx to search:
|
|
If this is the first time and it works (no configuration problem), start the deamon (you’ll to have setup this before):
|
|
Test your indexation
There are several way to test your indexation but you need to know that the search binary contains bugs. If it crash, it doesn’t mean that you have a problem with. Anyway, here is how to test:
|
|
As you can see, we have results here :-). The work “test” have been found 20 times.
Incremental updates
We need to setup the incremental updates. Change it to a slower value if you need to have more often indexation. For my own usage, once by hour, is really enough. I’ve added the MediaWiki example here (/etc/cron.d/sphinxsearch
):
|
|
Debug
What if you don’t see any results or you want to be sure that Sphinx receive search requests? There is a console mode:
|
|
I see my test search here :-). All is good! If you don’t see anything, that should be a problem with the application API or a missmatch configuration. You can also check with tcpdump if you see network connections arriving on 9312 port.
FAQ
I don’t see any search result on MediaWiki, why?
You certainly have a problem with your php API. Select another version that should match. Check also the Debug part to help you to see what’s wrong.
References
Last updated 02 Aug 2013, 14:58 CEST.