Search Nuggets » people search

Solr: Indexing SQL databases made easier!

Seb Muller — Thu, 28 Aug 2014 12:05:17 +0000

Update

Part two is now available here!

At the beginning of this year Christopher Vig wrote a great post about indexing an SQL database to the internet’s current search engine du jour, Elasticsearch. This first post in a two part series will show that Apache Solr is a robust and versatile alternative that makes indexing an SQL database just as easy. The second will go deeper into how to make leverage Solr’s features to create a great backend for a people search solution.

Solr ships with a configuration driven contrib called the DataImportHandler. It provides a way to index structured data into Solr in both full and incremental delta imports. We will cover a simple use case of the tool i.e. indexing a database containing personnel data to form the basis of a people search solution. You can also easily extend the DataImportHandler tool via various APIs to pre-process data and handle more complex use cases.

For now, let’s stick with basic indexing of an SQL database.

Setting up our environment

Before we get started, there are a few requirements:

Java 1.7 or greater
For this demo we’ll be using a MySQL database
A copy of the sample employees database
The MySQL jdbc driver

With that out of the way, let’s get Solr up and running and ready for database indexing:

Download Solr and extract it to a directory of your choice.
Open solr-4.9.0/example/solr/collection1/conf/solrconfig.xml in a text editor and add the following within the config tags:
In the same directory, open schema.xml and add this this line
Create a lib subdir in solr-4.9.0/solr/collection1/ and extract the MySQL jdbc driver jar into it. It’s the file called mysql-connector-java-{version}-bin.jar
To start Solr, open a terminal and navigate to the example subdir in your extracted Solr directory and run java -jar start.jar

When started this way, Solr runs by default on port 8983. If you need to change this, edit solr-4.9.0/example/etc/jetty.xml and restart Solr.

Navigate to http://localhost:8983/solr and you should see the Solr admin GUI splash page. From here, use the Core Selector dropdown button to select the default core and then click on the Dataimport option. Expanding the Configuration section should show an XML response with a stacktrace with a message along the lines of Can't find resource 'db-data-config.xml' in classpath. This is normal as we haven’t actually created this file yet, which stores the configs for connecting to our target database.

We’ll come back to that file later but let’s make our demo database now. If you haven’t already downloaded the sample employees database and installed MySQL, now would be a good time!

Setting up our database

Assuming your MySQL server is installed and running, access the MySQL terminal and create the empty employees database: create database employees;

Exit the MySQL terminal and import the employees.sql into your empty database, ensuring that you carry out the following command from the same directory as the employees.sql file itself: mysql -u root -p employees < employees.sql

You can test this was successful by logging into the MySql server and querying the database, as shown here on the right.

Having successfully created and populated your employee database, we can now create that missing db-data-config.xml file.

Indexing our database

In your Solr conf directory, which contains the schema.xml and solrconfig.xml we previously modified, create a new file called db-data-config.xml.

Its contents should look like the example below. Make sure to replace the user and password values with yours and feel free to modify or remove the limit parameter. There’s approximately 30’000 entries in the employees table in total

We’re now going to make use of Solr’s REST-like HTTP API with a couple of commands worth saving. I prefer to use the Postman app on Chrome and have created a public collection of HTTP requests, which you can import into Postman’s Collections view using this url: https://www.getpostman.com/collections/9e95b8130556209ed643

For those of you not using Chrome, here are the commands you will need: First let’s reload the core so that Solr is
aware of the new db-data-config.xml file we have created.
Next, we index our database with the HTTP request or from within the Solr Admin GUI on the DataImport page.

Here we have carried out a full index of our database using the full-import command parameter. To only retrieve changes since the last import, we would use delta-import instead.

We can confirm that our database import was successful by querying our index with the “Retrieve all” and “Georgi query” requests.

Finally, to schedule reindexing you can use a simple cronjob. This one, for example, will run everyday at 23:00 and retrieve all changes since the previous indexing operation:

Conclusion

So far we have successfully

Setup a database with content
Indexed the database into our Solr index
Setup basic scheduled delta reindexing

In the next part of this two part series we will look at how to process our indexed data. Specifically, with a view to making a good people search solution. We will implement several features such as phonetic search, spellcheck and basic query completion. In the meantime, let’s carry on the conversation in the comments below!

Et bedre personsøk

Johannes Hoff Holmedahl — Mon, 23 Sep 2013 08:25:05 +0000

Jeg har laget mange internsøk de siste årene. Og i brukerintervjuene vi har gjennomført i forprosjektene, har vi funnet at de aller fleste leter etter mennesker: Enten etter et telefonnummer, eller etter eksperten på et fagområde.

Målet vårt er ofte å lage en “intern Google” for kundene våre. Det betyr at vi må forstå hva brukerne leter etter.

Når du søker på “365″: Leter du da etter en kollega med internnummer som slutter på 365? Leter du etter personen med ansattnummer 365? Eller leter du rett og slett etter eksperten på produktet deres som heter “365″?

Jeg er en stor tilhenger av ett stort søk for bedriften din! Og jeg vet at jeg har flinke kolleger som kjenner veien frem til god relevans og en smart søkemotor.

Men: Min teori er at et godt søk blir enda bedre, jo mer vi vet om hva du egentlig leter etter.

Vi har to ganske tydelige retninger for personsøk: Enten søker du etter kontaktinfo eller så leter du etter områdeeksperten.

Lek at vi skal lage et internsøk for Willy Wonka. Målet vårt er å la Oompa Loompaene bruke mest mulig tid på å lage fantastisk godteri, og minst mulig tid på å lete etter oppskrifter, eksperter og telefonnummer.

I tillegg til å ha laget et stort “internt Google”, har vi også laget to app’er for Loompaene. En telefonbok og et ekspertsøk.

Først telefonboken:

Willy Wonka PhoneBook er akkurat det det høres ut som. En app på datamaskinen, nettbrettet eller smarttelefonen som lar Loompane søke internt etter et telefonnummer eller en Lync-kontakt. I eksempelet over husker ikke brukeren fornavnet på kollegaen, men vet at han heter Loompa til etternavn og at han jobber i sjokoladeavdelingen. (Og slapp av! Selv om du ikke ser forskjell klarer en Oompa Loompa å skille ansiktene fra hverandre.)

Dette løser et problem vi ikke bare finner på sjokoladefabrikker, men hos de fleste av våre kunder; å finne telefonnummeret. Og på denne måten har vi enda større sjanse for å gi dem rett svar på topp i resultatlisten.

Fordi brukerhistorien er så enkel som “jeg vil finne kontaktinfo til kollegaen min”, kan vi også strippe løsningen for sortering og fasetter – og ende opp med et enda enklere brukergrensesnitt.

Når en Oompa Loompa på den annen side vil finne eksperten på sorbet kan han derimot starte app’en “Willy Wonka Expert Search”:

Ekspertsøket søker gjennom alle dokumenter som ligger på filserveren til Willy Wonka, og søker gjennom alle diskusjoner, statusoppdateringer, grupper og diskusjoner på Willy Wonkas “interne Facebook”. Gjennom å finne hvem som har skrevet mest om fagområdet sorbet, kan vi mest sannsynlig også vise frem at “Roger Loompa” er eksperten på området.

Når vi i tillegg ser at Roger på Iskrem labratoriet har nevnt sorbet i sin siste statusoppdatering … kan vi vel si at vi har en vinner.

–

Målet er ikke å lage en haug med app’er. Langt i fra! Målet er derimot å gi brukeren rett svar på spørsmålet sitt – hver eneste gang.

Å lage løsninger som forteller oss mer om hva brukeren leter etter, gjerne allerede før de har gjort et søk, øker sannsynligheten for å gi det rette svaret med en gang.

Better people search

Johannes Hoff Holmedahl — Mon, 23 Sep 2013 07:57:56 +0000

For the last year I have made a few internal searches. Almost every user interview I attend, we find that people are looking for people. Either a colleague’s phone number or email, or an expert on a specific field area.

Our goal is often to make “an internal Google” for our customers. That indicates that we must understand what you are looking for.

When you search for “365”: Are you looking for the person with the internal phone number 365? Are you looking for a colleague with the employee number 365? Or are you actually looking for the expert on your product, named “365”?

I am all for making one big search for your company! And I know I have colleagues that are magicians enough to solve these kinds of traps.

But: My theory is that a great search gets even greater when we know more about what the users are looking for.

These are the two directions of people search:

Let’s pretend we make a search solution for Willy Wonka. Our goal is to let the Oompa Loompas spent as much time as possible making candy, and as little time as possible searching for recipes and phone numbers.

In addition to having one great search where they find everything, we also made two apps for the Loompas. One PhoneBook and one ExpertSearch.

The PhoneBook is exactly that: An internal app on the Oompa Loompas desktop, smartboard or smartphone, where they can search for the phone number of the loompa in the chocolate department that they don’t remember the name to. (Chill! They can separate their colleagues from each other even if you think they all look the same.)

This solves a common user need that we find not only in candy factories, but at most of our customers. And we have a lot greater chance to give them the right answer doing it like this.

Because we take away all other user stories then “I want to find my colleagues contact information”, we can also take away almost all navigation and refiners – and end up with a much cleaner interface.

When an Oompa Loompa on the other hand needs help on refining the sorbet in the ice cream room, he can start the ExpertSearch app on his desktop, smartboard og smartphone.

The ExpertSearch searches within all documents written in the Willy Wonka Factory, and goes through all the discussions, social updates and communities on their internal collaboration system. From finding who has written and talked the most about sorbet, we can most likely end up with suggestion “Roger Loompa” as the subject matter expert.

When we see that Roger in the Ice Cream Lab. in addition mentions sorbet in his latest MySite Status, I guess we have a winner.

–

Making a lot of apps isn’t the goal in itself. Not at all! The goal is to give the users the right answer on top of the search result list, every time.

But making solutions that gives the user the opportunity to tell us more about what kind of an answer he is looking for, even before he makes the search, increases the likelihood of giving the correct answer right away.