<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Nuggets &#187; indexing</title>
	<atom:link href="http://blog.comperiosearch.com/blog/tag/indexing/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.comperiosearch.com</link>
	<description>A blog about Search as THE solution</description>
	<lastBuildDate>Mon, 13 Jun 2016 08:59:45 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=3.9.40</generator>
	<item>
		<title>Solr: Indexing SQL databases made easier!</title>
		<link>http://blog.comperiosearch.com/blog/2014/08/28/indexing-database-using-solr/</link>
		<comments>http://blog.comperiosearch.com/blog/2014/08/28/indexing-database-using-solr/#comments</comments>
		<pubDate>Thu, 28 Aug 2014 12:05:17 +0000</pubDate>
		<dc:creator><![CDATA[Seb Muller]]></dc:creator>
				<category><![CDATA[English]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[jdbc]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[people search]]></category>

		<guid isPermaLink="false">http://blog.comperiosearch.com/?p=2848</guid>
		<description><![CDATA[Update Part two is now available here! At the beginning of this year Christopher Vig wrote a great post about indexing an SQL database to the internet&#8217;s current search engine du jour, Elasticsearch. This first post in a two part series will show that Apache Solr is a robust and versatile alternative that makes indexing [...]]]></description>
				<content:encoded><![CDATA[<h3>Update</h3>
<p>Part two is now available <a href="http://blog.comperiosearch.com/blog/2015/04/14/solr-indexing-index-sql-databases-made-easier-part-2/">here!</a></p>
<hr />
<p>At the beginning of this year <a href="http://blog.comperiosearch.com/blog/author/cvig/">Christopher Vig</a> wrote a <a href="http://blog.comperiosearch.com/blog/2014/01/30/elasticsearch-indexing-sql-databases-the-easy-way/">great post </a>about indexing an SQL database to the internet&#8217;s current search engine du jour, <a href="http://www.elasticsearch.org/">Elasticsearch.</a> This first post in a two part series will show that <a href="http://lucene.apache.org/solr/">Apache Solr</a> is a robust and versatile alternative that makes indexing an SQL database just as easy. The second will go deeper into how to make leverage Solr&#8217;s features to create a great backend for a people search solution.</p>
<p>Solr ships with a configuration driven contrib called the <a href="http://wiki.apache.org/solr/DataImportHandler">DataImportHandler.</a> It provides a way to index structured data into Solr in both full and incremental delta imports. We will cover a simple use case of the tool i.e. indexing a database containing personnel data to form the basis of a people search solution. You can also easily extend the DataImportHandler tool via various <a href="http://wiki.apache.org/solr/DataImportHandler#Extending_the_tool_with_APIs">APIs</a> to pre-process data and handle more complex use cases.</p>
<p>For now, let&#8217;s stick with basic indexing of an SQL database.</p>
<h2>Setting up our environment</h2>
<p>Before we get started, there are a few requirements:</p>
<ol>
<li>Java 1.7 or greater</li>
<li>For this demo we&#8217;ll be using a <a href="http://dev.mysql.com/downloads/mysql/">MySQL</a> database</li>
<li>A copy of the <a href="https://launchpad.net/test-db/employees-db-1/1.0.6/+download/employees_db-full-1.0.6.tar.bz2">sample employees database</a></li>
<li>The MySQL <a href="http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.32.tar.gz">jdbc driver</a></li>
</ol>
<p>With that out of the way, let&#8217;s get Solr up and running and ready for database indexing:</p>
<ol>
<li>Download <a href="https://lucene.apache.org/solr/downloads.html">Solr</a> and extract it to a directory of your choice.</li>
<li>Open solr-4.9.0/example/solr/collection1/conf/solrconfig.xml in a text editor and add the following within the config tags:  <script src="https://gist.github.com/dd7cef212fd7f6a415b5.js?file=DataImportHandler"></script></li>
<li>In the same directory, open schema.xml and add this this line   <script src="https://gist.github.com/5bbc8c6e1a5b617b5d16.js?file=names"></script></li>
<li>Create a lib subdir in solr-4.9.0/solr/collection1/ and extract the MySQL jdbc driver jar into it. It&#8217;s the file called mysql-connector-java-{version}-bin.jar</li>
<li>To start Solr, open a terminal and navigate to the example subdir in your extracted Solr directory and run <code>java -jar start.jar</code></li>
</ol>
<p>When started this way, Solr runs by default on port 8983. If you need to change this, edit solr-4.9.0/example/etc/jetty.xml and restart Solr.</p>
<p>Navigate to <a href="http://localhost:8983/solr">http://localhost:8983/solr</a> and you should see the Solr admin GUI splash page. From here, use the Core Selector dropdown button to select the default core and then click on the Dataimport option. Expanding the Configuration section should show an XML response with a stacktrace with a message along the lines of <code>Can't find resource 'db-data-config.xml' in classpath</code>. This is normal as we haven&#8217;t actually created this file yet, which stores the configs for connecting to our target database.</p>
<p>We&#8217;ll come back to that file later but let&#8217;s make our demo database now. If you haven&#8217;t already downloaded the sample employees database and installed MySQL, now would be a good time!</p>
<h2>Setting up our database</h2>
<p>Assuming your MySQL server is installed <a href="http://blog.comperiosearch.com/wp-content/uploads/2014/12/createdatabase.png"><img class="alignright size-full wp-image-2900" src="http://blog.comperiosearch.com/wp-content/uploads/2014/12/createdatabase-300x226.png" alt="Prepare indexing database" width="300" height="226" /></a>and running, access the MySQL terminal and create the empty employees database: <code>create database employees;</code></p>
<p>Exit the MySQL terminal and import the employees.sql into your empty database, ensuring that you carry out the following command from the same directory as the employees.sql file itself: <code>mysql -u root -p employees &lt; employees.sql</code></p>
<p>You can test this was successful by logging <a href="http://blog.comperiosearch.com/wp-content/uploads/2014/08/testdatabase.png"><img class="alignright size-medium wp-image-2900" src="http://blog.comperiosearch.com/wp-content/uploads/2014/08/testdatabase-276x300.png" alt="Verify indexing database" width="276" height="300" /></a>into the MySql server and querying the database, as shown here on the right.</p>
<p>Having successfully created and populated your employee database, we can now create that missing db-data-config.xml file.</p>
<h2>Indexing our database</h2>
<p>In your Solr conf directory, which contains the schema.xml and solrconfig.xml we previously modified, create a new file called db-data-config.xml.</p>
<p>Its contents should look like the example below. Make sure to replace the user and password values with yours and feel free to modify or remove the limit parameter. There&#8217;s approximately 30&#8217;000 entries in the employees table in total <script src="https://gist.github.com/03935f1384e150504363.js?file=db-data-config"></script></p>
<p>We&#8217;re now going to make use of Solr&#8217;s REST-like HTTP API with a couple of commands worth saving. I prefer to use the <a href="https://chrome.google.com/webstore/detail/postman-rest-client/fdmmgilgnpjigdojojpjoooidkmcomcm">Postman app</a> on Chrome and have created a public collection of HTTP requests, which you can import into Postman&#8217;s Collections view using this url: <a href="https://www.getpostman.com/collections/9e95b8130556209ed643">https://www.getpostman.com/collections/9e95b8130556209ed643</a></p>
<p>For those of you not using Chrome, here are the commands you will need:<script src="https://gist.github.com/05a2a1dd01a6c5a4517b.js?file=solr-http"></script> First let&#8217;s reload the core so that Solr is <a href="http://blog.comperiosearch.com/wp-content/uploads/2014/08/reloadcore.png"><img class="alignright size-medium wp-image-2921" src="http://blog.comperiosearch.com/wp-content/uploads/2014/08/reloadcore-300x181.png" alt="Reload Solr core" width="300" height="181" /></a><br />
aware of the new db-data-config.xml file we have created.<br />
Next, we index our database with the <a href="http://blog.comperiosearch.com/wp-content/uploads/2014/08/indexdb.png"><img class="alignright size-medium wp-image-2923" src="http://blog.comperiosearch.com/wp-content/uploads/2014/08/indexdb-300x181.png" alt="Index database to Solr" width="300" height="181" /></a>HTTP request or from within the Solr Admin GUI on the DataImport page.</p>
<p>Here we have carried out a full index of our database using the full-import command parameter. To only retrieve changes since the last import, we would use delta-import instead.</p>
<p>We can confirm that our database import was successful by querying our index with the &#8220;Retrieve all&#8221; and &#8220;Georgi query&#8221; requests.</p>
<p>Finally, to schedule reindexing you can use a simple cronjob. This one, for example, will run everyday at 23:00 and retrieve all changes since the previous indexing operation:<script src="https://gist.github.com/47f6df5a306e4cd51617.js?file=delta"></script></p>
<h2>Conclusion</h2>
<p>So far we have successfully</p>
<ul>
<li>Setup a database with content</li>
<li>Indexed the database into our Solr index</li>
<li>Setup basic scheduled delta reindexing</li>
</ul>
<p>In the next part of this two part series we will look at how to process our indexed data. Specifically, with a view to making a good people search solution. We will implement several features such as phonetic search, spellcheck and basic query completion. In the meantime, let&#8217;s carry on the conversation in the comments below!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.comperiosearch.com/blog/2014/08/28/indexing-database-using-solr/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>
