Solr: Indexing SQL databases made easier!
Update
Part two is now available here!
At the beginning of this year Christopher Vig wrote a great post about indexing an SQL database to the internet’s current search engine du jour, Elasticsearch. This first post in a two part series will show that Apache Solr is a robust and versatile alternative that makes indexing an SQL database just as easy. The second will go deeper into how to make leverage Solr’s features to create a great backend for a people search solution.
Solr ships with a configuration driven contrib called the DataImportHandler. It provides a way to index structured data into Solr in both full and incremental delta imports. We will cover a simple use case of the tool i.e. indexing a database containing personnel data to form the basis of a people search solution. You can also easily extend the DataImportHandler tool via various APIs to pre-process data and handle more complex use cases.
For now, let’s stick with basic indexing of an SQL database.
Setting up our environment
Before we get started, there are a few requirements:
- Java 1.7 or greater
- For this demo we’ll be using a MySQL database
- A copy of the sample employees database
- The MySQL jdbc driver
With that out of the way, let’s get Solr up and running and ready for database indexing:
- Download Solr and extract it to a directory of your choice.
- Open solr-4.9.0/example/solr/collection1/conf/solrconfig.xml in a text editor and add the following within the config tags:
- In the same directory, open schema.xml and add this this line
- Create a lib subdir in solr-4.9.0/solr/collection1/ and extract the MySQL jdbc driver jar into it. It’s the file called mysql-connector-java-{version}-bin.jar
- To start Solr, open a terminal and navigate to the example subdir in your extracted Solr directory and run
java -jar start.jar
When started this way, Solr runs by default on port 8983. If you need to change this, edit solr-4.9.0/example/etc/jetty.xml and restart Solr.
Navigate to http://localhost:8983/solr and you should see the Solr admin GUI splash page. From here, use the Core Selector dropdown button to select the default core and then click on the Dataimport option. Expanding the Configuration section should show an XML response with a stacktrace with a message along the lines of Can't find resource 'db-data-config.xml' in classpath
. This is normal as we haven’t actually created this file yet, which stores the configs for connecting to our target database.
We’ll come back to that file later but let’s make our demo database now. If you haven’t already downloaded the sample employees database and installed MySQL, now would be a good time!
Setting up our database
Assuming your MySQL server is installed and running, access the MySQL terminal and create the empty employees database: create database employees;
Exit the MySQL terminal and import the employees.sql into your empty database, ensuring that you carry out the following command from the same directory as the employees.sql file itself: mysql -u root -p employees < employees.sql
You can test this was successful by logging into the MySql server and querying the database, as shown here on the right.
Having successfully created and populated your employee database, we can now create that missing db-data-config.xml file.
Indexing our database
In your Solr conf directory, which contains the schema.xml and solrconfig.xml we previously modified, create a new file called db-data-config.xml.
Its contents should look like the example below. Make sure to replace the user and password values with yours and feel free to modify or remove the limit parameter. There’s approximately 30’000 entries in the employees table in total
We’re now going to make use of Solr’s REST-like HTTP API with a couple of commands worth saving. I prefer to use the Postman app on Chrome and have created a public collection of HTTP requests, which you can import into Postman’s Collections view using this url: https://www.getpostman.com/collections/9e95b8130556209ed643
For those of you not using Chrome, here are the commands you will need: First let’s reload the core so that Solr is
aware of the new db-data-config.xml file we have created.
Next, we index our database with the HTTP request or from within the Solr Admin GUI on the DataImport page.
Here we have carried out a full index of our database using the full-import command parameter. To only retrieve changes since the last import, we would use delta-import instead.
We can confirm that our database import was successful by querying our index with the “Retrieve all” and “Georgi query” requests.
Finally, to schedule reindexing you can use a simple cronjob. This one, for example, will run everyday at 23:00 and retrieve all changes since the previous indexing operation:
Conclusion
So far we have successfully
- Setup a database with content
- Indexed the database into our Solr index
- Setup basic scheduled delta reindexing
In the next part of this two part series we will look at how to process our indexed data. Specifically, with a view to making a good people search solution. We will implement several features such as phonetic search, spellcheck and basic query completion. In the meantime, let’s carry on the conversation in the comments below!
You’ve made getting started with aa demo project using solr and MySql very easy here. When will the part 2 of this be coming up. I will love to following your steps in implementing this to the point of been able to create searches, facet search, handling indexing issues etc. Thanks
Hi Dan! I’m glad you found the post useful. Is there something in particular that you would like to see me cover? I also want to revisit the topics I covered in the post to see how well the instructions hold up with Solr 5 before writing the next part :)
I’m interested in learning how you’d do phonetic search, spellcheck and basic query completion.
[…] Configuring the data import handler and connecting it to a database […]
hello seb,i got lost!don’t you have to set the path where your database(employees.sql) exist,in order for the connector to retrieve the data and be seen on solr.please help me
Hi Mikaelo! What specifically are you having difficulty with? Have you exported the employees data from the employees.sql file into your local database? The command to export the employees data into the database (mysql -u root -p employees < employees.sql) assumes that you're in the same directory as the employees.sql file you downloaded.
i am trying to do something different.i have already created a database called a_q using mysql server.is there any way i can index it on solr??sorry for the misleading previous comment
No problem :) You probably need to do the following in the db-data-config.xml file:
Assuming you haven’t changed the mysql default port, replace url=”jdbc:mysql://localhost:3306/employees” with url=”jdbc:mysql://localhost:3306/a_q”
Replace the username and password with the appropriate values
Change the query value to yours, making sure to include an id column like I did with
emp_no as 'id'
otherwise Solr won’t index the documents