Sphinx is a free standalone full-text search engine. It is fast, efficient, and easily integrates with SQL databases and programming major programming languages. It is ideal for use with MySQL, PostgreSQL, PHP, Python, Perl and Ruby. Although Sphinx works well with Java but Lucene is better option for Java.
Following is a summary of features (copied from http://www.sphinxsearch.com)
Sphinx can be downloaded from http://www.sphinxsearch.com/. Installing Sphinx is pretty straightforward. Following are the steps. --prefix defines the installation directory.
$ cd /home/me/src $ wget http://sphinxsearch.com/downloads/sphinx-0.9.8.tar.gz $ tar -xzvf sphinx-0.9.8.tar.gz $ cd sphinx-0.9.8 $ mkdir /usr/local/sphinx $ ./configure --prefix /usr/local/sphinx --with-mysql $ make $ make install
Troubleshooting
If you get an error like:
sphinx.h:54:19: error: mysql.h: No such file or directory
It is because you do not have mysql-devel installed on your system. To fix this you need to install mysql-devel and mysql-libs, if you don't already have it installed.
If you are using Fedora, use:
$ yum install mysql-devel
If you are using Ubuntu:
$ apt-get install mysql-devel
The next step is to configure and test Sphinx. Here we create a sample database to walk through configuration and testing.
mysql> create table phonebook (
-> id int(10) not null auto_increment primary key,
-> name varchar(15) not null,
-> phone varchar(20) not null
-> );
mysql> describe phonebook;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| name | varchar(15) | NO | | | |
| phone | varchar(20) | NO | | | |
+-------+-------------+------+-----+---------+----------------+
mysql> insert into phonebook VALUES (NULL,'John','212-123-0987');
mysql> insert into phonebook VALUES (NULL,'Jake','718-123-0987');
mysql> insert into phonebook VALUES (NULL,'Kate','987-123-2322');
mysql> insert into phonebook VALUES (NULL,'Khan','987-893-2322');
mysql> insert into phonebook VALUES (NULL,'Mike','829-893-2322');
mysql> select * from phonebook;
+----+------+--------------+
| id | name | phone |
+----+------+--------------+
| 1 | John | 212-123-0987 |
| 2 | Jake | 718-123-0987 |
| 3 | Kate | 987-123-2322 |
| 4 | Khan | 987-893-2322 |
| 5 | Mike | 829-893-2322 |
+----+------+--------------+
Note that Sphinx requires a unique integer indentifier (primary key) for each row
Once you have installed Sphinx, you need to configure it so that it can access your database, be accessible from your scripts. Configuration files are stored in the etc directory of the installion directory.
$ cd /usr/local/sphinx/etc
A default copy of sphinx.conf file we need to edit is called sphinx.conf.dist. Copy this file to sphinx.conf and start editing.
$ cp sphinx.conf.dist sphinx.conf $ sphinx.conf contains the config info for sphinx $ vi sphinx.conf
Specifically, we need to specify the data source, index or indices, indexer settings, and search daemon (searchd) settings.
Here is a sample sphinx.conf file
1 # define your data source here.
2 source phonesource
3 {
4 type = mysql
5 sql_host = localhost
6 sql_user = type_your_database_username_here
7 sql_pass = type_your_password_here
8 sql_db = type_your_database_name_here
9 sql_port = 3306 # optional, default is 3306
10
11 # define the primary fetch query. You can define up
12 # to 32 full-text fields but the first field must be
13 # unique unsigned positve integer.
14 sql_query = select id, name, phone from dg_phonebook;
15
16 # display information on each selected id
17 # only used for search CLI
18 sql_query_info = SELECT name, phone FROM phonebook WHERE id=$id;
19 }
20
21 # define an index
22 index phoneindex
23 {
24 source = phonesource
25 path = /usr/local/sphinx/var/data/phone
26 morphology = none
27
28 # for stemming
29 min_word_len = 3
30 min_prefix_len = 0
31 min_infix_len = 3
32 enable_star = 1
33 # * means any
34 }
35
36 # indexer settings
37 indexer
38 {
39 recommended 256M to 1024M
40 mem_limit = 1024M
41 }
42
43 # search daemon settings
44 searchd
45 {
46 port = 3312
47 log = /usr/local/sphinx/var/log/searchd.log
48 query_log = /usr/local/sphinx/var/log/query.log
49 read_timeout = 5
50 max_children = 30
51 pid_file = /usr/local/sphinx/var/log/searchd.pid
52 max_matches = 1000
53 seamless_rotate = 1
54 preopen_indexes = 0
55 unlink_old = 1
56 }
Once we have configure sphinx.conf, we need to run the indexer.
$ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all
Finally we test
$ /usr/local/sphinx/bin/search John $ /usr/local/sphinx/bin/search J* $ /usr/local/sphinx/bin/search *e
Error
$ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all --rotate
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'orthodb'...
ERROR: index 'orthodb': sql_fetch_row: Lost connection to MySQL server during query.
total 2087306 docs, 18785754 bytes
total 2462.060 sec, 7630.10 bytes/sec, 847.79 docs/sec
Solution
To solve this problem you have two choices.
1. Reduce memory limit in sphinx.conf
In your sphinx.conf file, set mem_limit = 256M
2. increase wait_timeout in my.cnf
Log into you mysql using the root password.
use mysql; show variables like '%wait_timeout%'; update variables set wait_timeout = 30000;