Brought to you by molecularsciences.org.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.
This publication may not be redistributed without this notice.

Sphinx Search

Sphinx is a free standalone full-text search engine. It is fast, efficient, and easily integrates with SQL databases and programming major programming languages. It is ideal for use with MySQL, PostgreSQL, PHP, Python, Perl and Ruby. Although Sphinx works well with Java but Lucene is better option for Java.

Following is a summary of features (copied from http://www.sphinxsearch.com)

Download and Install Sphinx

Sphinx can be downloaded from http://www.sphinxsearch.com/. Installing Sphinx is pretty straightforward. Following are the steps. --prefix defines the installation directory.

$ cd /home/me/src
$ wget http://sphinxsearch.com/downloads/sphinx-0.9.8.tar.gz
$ tar -xzvf sphinx-0.9.8.tar.gz
$ cd sphinx-0.9.8
$ mkdir /usr/local/sphinx
$ ./configure --prefix /usr/local/sphinx --with-mysql
$ make
$ make install

Troubleshooting
If you get an error like:

sphinx.h:54:19: error: mysql.h: No such file or directory

It is because you do not have mysql-devel installed on your system. To fix this you need to install mysql-devel and mysql-libs, if you don't already have it installed.

If you are using Fedora, use:

$ yum install mysql-devel

If you are using Ubuntu:

$ apt-get install mysql-devel

Sample Database

The next step is to configure and test Sphinx. Here we create a sample database to walk through configuration and testing.

mysql> create table phonebook (
    -> id int(10) not null auto_increment primary key,
		-> name varchar(15) not null,
		-> phone varchar(20) not null
		-> );

mysql> describe phonebook;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(10)     | NO   | PRI | NULL    | auto_increment |
| name  | varchar(15) | NO   |     |         |                |
| phone | varchar(20) | NO   |     |         |                |
+-------+-------------+------+-----+---------+----------------+

mysql> insert into phonebook VALUES (NULL,'John','212-123-0987');
mysql> insert into phonebook VALUES (NULL,'Jake','718-123-0987');
mysql> insert into phonebook VALUES (NULL,'Kate','987-123-2322');
mysql> insert into phonebook VALUES (NULL,'Khan','987-893-2322');
mysql> insert into phonebook VALUES (NULL,'Mike','829-893-2322');

mysql> select * from phonebook;
+----+------+--------------+
| id | name | phone        |
+----+------+--------------+
|  1 | John | 212-123-0987 |
|  2 | Jake | 718-123-0987 |
|  3 | Kate | 987-123-2322 |
|  4 | Khan | 987-893-2322 |
|  5 | Mike | 829-893-2322 |
+----+------+--------------+

Note that Sphinx requires a unique integer indentifier (primary key) for each row

Configuring and Testing Sphinx

Once you have installed Sphinx, you need to configure it so that it can access your database, be accessible from your scripts. Configuration files are stored in the etc directory of the installion directory.

$ cd /usr/local/sphinx/etc

A default copy of sphinx.conf file we need to edit is called sphinx.conf.dist. Copy this file to sphinx.conf and start editing.

$ cp sphinx.conf.dist sphinx.conf
$ sphinx.conf contains the config info for sphinx
$ vi sphinx.conf

Specifically, we need to specify the data source, index or indices, indexer settings, and search daemon (searchd) settings.

Here is a sample sphinx.conf file

  1 # define your data source here.
  2 source phonesource
  3 {
  4   type      = mysql
  5   sql_host  = localhost
  6   sql_user  = type_your_database_username_here
  7   sql_pass  = type_your_password_here
  8   sql_db    = type_your_database_name_here
  9   sql_port  = 3306  # optional, default is 3306
 10
 11   # define the primary fetch query. You can define up
 12   # to 32 full-text fields but the first field must be
 13   # unique unsigned positve integer.
 14   sql_query = select id, name, phone from dg_phonebook;
 15   
 16   # display information on each selected id
 17   # only used for search CLI
 18   sql_query_info = SELECT name, phone FROM phonebook WHERE id=$id;
 19 }
 20
 21 # define an index
 22 index phoneindex
 23 {
 24   source = phonesource
 25   path = /usr/local/sphinx/var/data/phone
 26   morphology = none
 27
 28   # for stemming
 29   min_word_len = 3 
 30   min_prefix_len = 0
 31   min_infix_len = 3
 32   enable_star = 1 
 33   # * means any
 34 }
 35
 36 # indexer settings
 37 indexer
 38 {
 39   recommended 256M to 1024M     
 40   mem_limit = 1024M
 41 }
 42
 43 # search daemon settings
 44 searchd
 45 {
 46   port             = 3312
 47   log              = /usr/local/sphinx/var/log/searchd.log
 48   query_log        = /usr/local/sphinx/var/log/query.log
 49   read_timeout     = 5
 50   max_children     = 30
 51   pid_file         = /usr/local/sphinx/var/log/searchd.pid
 52   max_matches      = 1000
 53   seamless_rotate  = 1
 54   preopen_indexes  = 0
 55   unlink_old       = 1
 56 }

Once we have configure sphinx.conf, we need to run the indexer.

$ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all

Finally we test

$ /usr/local/sphinx/bin/search John
$ /usr/local/sphinx/bin/search J*
$ /usr/local/sphinx/bin/search *e

Error: sql_fetch_row: Lost connection to MySQL server during query

Error
$ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all --rotate
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'orthodb'...
ERROR: index 'orthodb': sql_fetch_row: Lost connection to MySQL server during query.
total 2087306 docs, 18785754 bytes
total 2462.060 sec, 7630.10 bytes/sec, 847.79 docs/sec

Solution
To solve this problem you have two choices.
1. Reduce memory limit in sphinx.conf
In your sphinx.conf file, set mem_limit = 256M

2. increase wait_timeout in my.cnf
Log into you mysql using the root password.

use mysql;
show variables like '%wait_timeout%';
update variables set wait_timeout = 30000;