<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Nuggets &#187; search-index</title>
	<atom:link href="http://blog.comperiosearch.com/blog/tag/search-index/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.comperiosearch.com</link>
	<description>A blog about Search as THE solution</description>
	<lastBuildDate>Mon, 13 Jun 2016 08:59:45 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=3.9.40</generator>
	<item>
		<title>Idea: Your life searchable through Norch &#8211; NOde seaRCH, IFTTT and Google Drive</title>
		<link>http://blog.comperiosearch.com/blog/2014/11/26/idea-your-life-searchable-norch-node-search-ifttt-google-drive/</link>
		<comments>http://blog.comperiosearch.com/blog/2014/11/26/idea-your-life-searchable-norch-node-search-ifttt-google-drive/#comments</comments>
		<pubDate>Wed, 26 Nov 2014 14:33:08 +0000</pubDate>
		<dc:creator><![CDATA[Espen Klem]]></dc:creator>
				<category><![CDATA[English]]></category>
		<category><![CDATA[User Experience]]></category>
		<category><![CDATA[crawl]]></category>
		<category><![CDATA[Document Processing]]></category>
		<category><![CDATA[Elasticsearch]]></category>
		<category><![CDATA[Google Drive]]></category>
		<category><![CDATA[IFTTT]]></category>
		<category><![CDATA[Index]]></category>
		<category><![CDATA[Json]]></category>
		<category><![CDATA[Life Index]]></category>
		<category><![CDATA[Lifeindex]]></category>
		<category><![CDATA[node]]></category>
		<category><![CDATA[Node Search]]></category>
		<category><![CDATA[node.js]]></category>
		<category><![CDATA[nodejs]]></category>
		<category><![CDATA[norch]]></category>
		<category><![CDATA[Personal Search Engine]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[search-index]]></category>
		<category><![CDATA[sharepoint]]></category>
		<category><![CDATA[Small Data]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://blog.comperiosearch.com/?p=3069</guid>
		<description><![CDATA[First some disclaimers: This has been posted earlier on lab.klemespen.com. Even though some of these ideas are not what you&#8217;d normally implement in a business environment, some of the concepts can obviously be transferred over to businesses trying to provide an efficient workplace for its employees. Norch is developed by Fergus McDowall, an employee of [...]]]></description>
				<content:encoded><![CDATA[<p><strong>First some disclaimers</strong>:</p>
<ul>
<li>This has been posted earlier on <a href="http://lab.klemespen.com/2014/11/25/idea-your-life-searchable-with-norch-node-search-ifttt-and-google-drive-spreadsheets/">lab.klemespen.com</a>.</li>
<li>Even though some of these ideas are not what you&#8217;d normally implement in a business environment, some of the concepts can obviously be transferred over to businesses trying to provide an efficient workplace for its employees.</li>
<li><a href="https://github.com/fergiemcdowall/norch">Norch</a> is developed by <a href="http://blog.comperiosearch.com/blog/author/fmcdowall/">Fergus McDowall</a>, an employee of Comerio.</li>
</ul>
<p>What if you could index your whole life and make this lifeindex available through search? What would that look like, and how could it help you? Refinding information is obviously one of the use case for this type of search. I&#8217;m guessing there&#8217;s a lot more, and I&#8217;m curious to figure them out.</p>
<h2>Actions and reactions instead of web pages</h2>
<p>I had the lifeindex idea for a little while now. Originally the idea was to index everything I browsed. From what I know and where <a href="https://github.com/fergiemcdowall/norch">Norch</a> is, it would take a while before I was anywhere close to achieving that goal. <a href="http://codepen.io/nickmoreton/blog/using-ifttt-and-google-drive-to-create-a-json-api">Then I thought of IFTTT</a>, and saw it as a &#8216;next best thing&#8217;. But then it hit me that now I&#8217;m indexing actions, and that&#8217;s way better than pages. But what I&#8217;m missing from most sources now are the reactions to my actions. If I have a question, I also want to crawl and index the answer. If I have a statement, I want to get the critique indexed.<span id="more-3069"></span></p>
<p>IFTTT and similar services (like Zapier) is quite limiting in their choice of triggers. Not sure if this is because of choices done by those services or limitations from the sites they crawl/pull information from.</p>
<p>A quick fix for this, and a generally good idea for Search Engines, would be to switch from a preview of your content to the actual content in the form of an embed-view. Here exemplified:</p>
<blockquote class="twitter-tweet" data-width="500"><p lang="en" dir="ltr">Will embed-view of your content replace the preview-pane in modern <a href="https://twitter.com/hashtag/search?src=hash&amp;ref_src=twsrc%5Etfw">#search</a>  <a href="https://twitter.com/hashtag/engine?src=hash&amp;ref_src=twsrc%5Etfw">#engine</a> solutions? Why preview when you can have the real deal?</p>
<p>&mdash; Espen Klem (@eklem) <a href="https://twitter.com/eklem/status/536866049078333440?ref_src=twsrc%5Etfw">November 24, 2014</a></p></blockquote>
<p><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<h2>Technology: Hello IFTTT, Google SpreadSheet and Norch</h2>
<p>IFTTT is triggered by my actions, and stores some data to a series of spreadsheets on Google Drive. <a href="http://jsonformatter.curiousconcept.com/#https://spreadsheets.google.com/feeds/list/1B-OFzKIMVNk_3xMX_jBToGGyxSKv6FoyFYTHpGEy5O0/od6/public/values?alt=json">These spreadsheets can deliver JSON</a>. After a little document processing these JSON-files can be fed to the <a href="https://github.com/fergiemcdowall/norch#norch-indexer">Norch-indexer</a>.</p>
<h2>Why hasn&#8217;t this idea popped up earlier?</h2>
<p>Search engines used to be hardware guzzling technology. With Norch, the &#8220;NOde seaRCH&#8221; engine, that has changed. Elasticsearch and Solr are easy and small compared to i.e. SharePoint Search, but still it needs a lot of hardware. Norch can run on a Raspberry Pi, and soon it will be able to run in your browser. Maybe data sets closer to <a href="http://en.wikipedia.org/wiki/Small_data">small data</a> is more interesting than <a href="http://en.wikipedia.org/wiki/Big_data">big data</a>?</p>
<p><a href="http://youtu.be/ijLtk5TgvZg"><img src="http://blog.comperiosearch.com/wp-content/uploads/2014/11/Screen-Shot-2014-11-26-at-16.42.27-300x180.png" alt="Video: Norch running on a Raspberry Pi" width="300" height="180" class="alignnone size-medium wp-image-3075" />Norch running on a Raspberry Pi</a></p>
<h2>Why using a search engine?</h2>
<p>It&#8217;s cheap and quick. I&#8217;m not a developer, and I&#8217;ll still be able to glue all these sources together. Search engines are often a good choice when you have multiple sources. IFTTT and Google SpreadSheet makes it even easier, normalising the input and delivering it as JSON.</p>
<h2>How far in the process have I come?</h2>
<p><a href="https://testlab3.files.wordpress.com/2014/11/15140752323_1f69685449_o.png"><img class="alignnone size-full wp-image-118" src="https://testlab3.files.wordpress.com/2014/11/15140752323_1f69685449_o.png" alt="Illustration: Setting up sources in IFTTT." width="660" height="469" /></a></p>
<p>So far, I&#8217;ve set up a lot of triggers/sources at IFTTT.com:</p>
<ul>
<li>Instagram: When posting or liking both photos and videos.</li>
<li>Flickr: When posting an image, creating a set or linking a photo.</li>
<li>Google Calendar: When adding something to one of my calendars.</li>
<li>Facebook: When i post a link, is tagged, post a status message.</li>
<li>Twitter: When I tweet, retweet, reply or if somebody mentions me.</li>
<li>Youtube: When I post or like a video.</li>
<li>GitHub: When I create an issue, gets assigned to an issue or any issues that I part take in is closed.</li>
<li>WordPress: When new posts or comments on posts.</li>
<li>Android location tracking: When I enter and exit certain areas.</li>
<li>Android phone log: Placed, received and missed calls.</li>
<li>Gmail: Starred emails.</li>
</ul>
<p><a href="https://testlab3.files.wordpress.com/2014/11/screen-shot-2014-11-24-at-13-27-57.png"><img class="alignnone size-full wp-image-127" src="https://testlab3.files.wordpress.com/2014/11/screen-shot-2014-11-24-at-13-27-57.png" alt="Screen Shot 2014-11-24 at 13.27.57" width="660" height="572" /></a></p>
<p><a href="https://testlab3.files.wordpress.com/2014/11/screen-shot-2014-11-24-at-13-31-46.png"><img class="alignnone size-full wp-image-128" src="https://testlab3.files.wordpress.com/2014/11/screen-shot-2014-11-24-at-13-31-46.png" alt="Screen Shot 2014-11-24 at 13.31.46" width="660" height="194" /></a></p>
<p>And gotten a good chunk of data. Indexing my SMS&#8217;es felt a bit creepy, so I stopped doing that. And storing email just sounded too excessive, but I think starred emails would suit the purpose of the project.</p>
<p>Those Google Drive documents are giving me JSON. Not JSON that I can feed directly Norch-indexer, it needs a little trimming.</p>
<h2>Issues discovered so far</h2>
<h3>Manual work</h3>
<p>This search solution needs a lot of manual setup. Every trigger needs to be set up manually. Everytime a new trigger is triggered, I get a new spreadsheet that needs a title row added. Or else, the JSON variables will look funny, since first row is used for variable names.</p>
<p>The spreadsheets only accepts 2000 rows. After that a new file is created. Either I need to delete content, rename the file or reconfigure some stuff.</p>
<h3>Level of maturity</h3>
<p><a href="https://testlab3.files.wordpress.com/2014/11/screen-shot-2014-11-24-at-13-41-34.png"><img class="alignnone size-full wp-image-129" src="https://testlab3.files.wordpress.com/2014/11/screen-shot-2014-11-24-at-13-41-34.png" alt="Screen Shot 2014-11-24 at 13.41.34" width="660" height="664" /></a></p>
<p>IFTTT is a really nice service, and they treat their users well. But, for now, it&#8217;s not something you can trust fully.</p>
<h3>Cleaning up duplicates and obsolete stuff</h3>
<p>I have no way of removing stuff from the index automatically at this point. If I delete something I&#8217;ve added/written/created, it will not be reflected in the index.</p>
<h3>Missing sources</h3>
<p>Books I buy, music I listen to, movies and TV-series I watch. Or Amazon, Spotify, Netflix and HBO. Apart from that, there are no Norwegian services available through IFTTT.</p>
<h3>History</h3>
<p>The crawling is triggered by my actions. That leaves me without history. So, i.e. new contacts on LinkedIn is meaningless when I don&#8217;t get to index the existing ones.</p>
<h2>Next steps</h2>
<h3>JSON clean-up</h3>
<p>I need to make a document processing step. <a href="https://github.com/fergiemcdowall/norch-document-processor">Norch-document-processor</a> would be nice if it had handled JSON in addition to HTML. <a href="https://github.com/fergiemcdowall/norch-document-processor/issues/6">Not yet, but maybe in the future</a>? Anyway, there&#8217;s just a small amount of JSON clean-up before I got my data in and index.</p>
<p>When this step is done, a first version can be demoed.</p>
<h3>UX and front-end code</h3>
<p>To show the full potential, I need some interaction design of the idea. For now they&#8217;re all in my head. And these sketches needs to be converted to HTML, CSS and Angular view.</p>
<h3>Embed codes</h3>
<p>Figure out how to embed Instagram, Flickr, Facebook and LinkedIn-posts, Google Maps, federated phonebook search etc.</p>
<h3>OAUTH configuration</h3>
<p>Set up <a href="https://github.com/ciaranj/node-oauth">OAUTH NPM package</a> to access non-public spreadsheets on Google Drive. Then I can add some of the less open information I have stored.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.comperiosearch.com/blog/2014/11/26/idea-your-life-searchable-norch-node-search-ifttt-google-drive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Elasticsearch: Indexing SQL databases. The easy way.</title>
		<link>http://blog.comperiosearch.com/blog/2014/01/30/elasticsearch-indexing-sql-databases-the-easy-way/</link>
		<comments>http://blog.comperiosearch.com/blog/2014/01/30/elasticsearch-indexing-sql-databases-the-easy-way/#comments</comments>
		<pubDate>Wed, 29 Jan 2014 23:42:24 +0000</pubDate>
		<dc:creator><![CDATA[Christoffer Vig]]></dc:creator>
				<category><![CDATA[English]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[connector]]></category>
		<category><![CDATA[elastic search]]></category>
		<category><![CDATA[Elasticsearch]]></category>
		<category><![CDATA[etl]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[jdbc]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search-index]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.comperiosearch.com/?p=1895</guid>
		<description><![CDATA[Elasticsearch is a great search engine, flexible, fast and fun. So how can I get started with it? This post will go through how to get contents from a SQL database into Elasticsearch. Rivers are deprecated since Elasticsearch version 1.5. Read this official statement https://www.elastic.co/blog/deprecating_rivers. However, river-jdbc lives on as elasticsearch JDBC importer. Some day this post [...]]]></description>
				<content:encoded><![CDATA[<p><a title="Elasticsearch" href="http://www.elasticsearch.org">Elasticsearch </a>is a great search engine, flexible, fast and fun. So how can I get started with it? This post will go through how to get contents from a SQL database into Elasticsearch.</p>
<p><span id="more-1895"></span><span style="color: #ff0000;"><strong>Rivers are deprecated since Elasticsearch version 1.5. Read this official statement <a href="https://www.elastic.co/blog/deprecating_rivers"><span style="color: #ff0000;">https://www.elastic.co/blog/deprecating_rivers</span></a>. However, river-jdbc lives on as <a href="https://github.com/jprante/elasticsearch-jdbc">elasticsearch JDBC importer</a>. Some day this post will be updated with instructions for using JDBC importer mode. </strong></span></p>
<p>Elasticsearch has a set of pluggable services called rivers. A river runs inside an Elasticsearch node, and imports content into the index. There are rivers for twitter, redis, files, and of course, SQL databases. The <a title="river-jdbc" href="https://github.com/jprante/elasticsearch-river-jdbc">river-jdbc plugin</a> connects to SQL databases using JDBC adapters. In this post we will use PostgreSQL, since it is freely available, and populate it with some contents that also are freely available.</p>
<p>So let’s get started</p>
<ol>
<li>Download and install <a title="Elasticsearch download" href="http://www.elasticsearch.org/download/">Elasticsearch</a></li>
<li>Start elasticsearch by running <em>bin/elasticsearch </em>from the installation folder</li>
<li>Install the river-jdbc plugin for Elasticsearch version 1.00RC<br />
<pre class="crayon-plain-tag">./bin/plugin -install river-jdbc -url&nbsp;&lt;em&gt;&lt;a href=&quot;http://bit.ly/1dKqNJy&quot;&gt;http://bit.ly/1dKqNJy&lt;/a&gt; &lt;/em&gt;</pre>
</li>
<li>Download <a href="http://jdbc.postgresql.org/download.html">the PostgreSQL JDBC jar file</a> and copy into the <em>plugins/river-jdbc</em> folder. You should probably <a title="http://jdbc.postgresql.org/download/postgresql-9.3-1100.jdbc41.jar" href="http://jdbc.postgresql.org/download/postgresql-9.3-1100.jdbc41.jar">get the latest version which is for JDBC 41</a></li>
<li>Install PostgreSQL <a href="http://www.postgresql.org/download/">http://www.postgresql.org/download/</a></li>
<li>Import the booktown database. Download the <a title="http://www.commandprompt.com/ppbook/booktown.sql" href="http://www.commandprompt.com/ppbook/booktown.sql">sql file from booktown database</a></li>
<li>Restart Elasticsearch</li>
<li>Start PostgreSQL</li>
</ol>
<p>By this time you should have Elasticsearch and PostgreSQL running, and river-jdbc ready to use.</p>
<p>Now we need to put some contents into the database, using psql, the PostgreSQL command line tool.</p><pre class="crayon-plain-tag">psql -U postgres -f booktown.sql</pre><p>To execute commands to Elasticsearch we will use an online service which functions as a mixture of <a href="https://gist.github.com/">Gist</a>, the code snippet sharing service and <a href="https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo">Sense</a>, a Google Chrome plugin developer console for Elasticsearch. The service is hosted by <a title="http://qbox.io" href="http://qbox.io">http://qbox.io</a>, who provide hosted Elasticsearch services.</p>
<p>Check that everything was correctly installed by opening a browser to <a href="http://sense.qbox.io/gist/8361346733fceefd7f364f0ae1ebe7efa856779e">http://sense.qbox.io/gist/8361346733fceefd7f364f0ae1ebe7efa856779e</a></p>
<p>Select the top most line in the left-hand pane, press CTRL+Enter on your keyboard. You may also click on the little triangle that appears to the right, if you are more of a mouse click kind of person.</p>
<p>You should now see a status message, showing the version of Elasticsearch, node name and such.</p>
<p>Now let’s stop fiddling around the porridge and create a river for our database:</p><pre class="crayon-plain-tag">curl -XPUT &quot;http://localhost:9200/_river/mybooks/_meta&quot; -d'
{
&quot;type&quot;: &quot;jdbc&quot;,
&quot;jdbc&quot;: {
&quot;driver&quot;: &quot;org.postgresql.Driver&quot;,
&quot;url&quot;: &quot;jdbc:postgresql://localhost:5432/booktown&quot;,
&quot;user&quot;: &quot;postgres&quot;,
&quot;password&quot;: &quot;postgres&quot;,
&quot;index&quot;: &quot;booktown&quot;,
&quot;type&quot;: &quot;books&quot;,
&quot;sql&quot;: &quot;select * from authors&quot;
}
}'</pre><p>This will create a “one-shot” river that connects to PostgreSQL on Elasticsearch startup, and pulls the contents from the authors table into the booktown index. The index parameter controls what index the data will be put into, and the type parameter decides the type in the Elasticsearch index. To verify the river was correctly uploaded execute</p><pre class="crayon-plain-tag">GET /_river/mybooks/_meta</pre><p>Restart Elasticsearch, and watch the log for status messages from river-jdbc. Connection problems, SQL errors or other problems should appear in the log . If everything went OK, you should see something like &#8230;SimpleRiverMouth] bulk [1] success [19 items]</p>
<p>Time has come to check out what we got.</p><pre class="crayon-plain-tag">GET /booktown/_search</pre><p>You should now see all the contents from the authors table. The number of items reported under &#8220;hits&#8221; -&gt; &#8220;total&#8221; are the same as what we just saw in the log: 19.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2014/01/hitsSampleBookTown1.png"><img class="alignleft size-full wp-image-1921" src="http://blog.comperiosearch.com/wp-content/uploads/2014/01/hitsSampleBookTown1.png" alt="" width="371" height="431" /></a><br />
But looking more closely at the data, we can see that the _id field has been auto-assigned with some random values. This means that the next time we run the river, all the contents will be re-added.</p>
<p>Luckily, river-jdbc support some <a title="Labeled columns" href="https://github.com/jprante/elasticsearch-river-jdbc/wiki/Labeled-columns">specially labeled fields</a>, that let us control how the contents should be indexed.</p>
<p>Reading up on the docs, we change the SQL definition in our river to</p><pre class="crayon-plain-tag">select id as _id, first_name,
 last_name from authors</pre><p>We need to start afresh and scrap the index we just created:</p><pre class="crayon-plain-tag">DELETE /booktown</pre><p>Restart Elasticsearch. Now you should see a meaningful id in your data.</p>
<p>At this time we could start toying around with queries, mappings and analyzers. But, that&#8217;s not much fun with this little content. We need to join in some tables and get some more interesting data. We can join in the books table, and get all the books for all authors.</p><pre class="crayon-plain-tag">SELECT authors.id as _id, authors.last_name, authors.first_name,
books.id, books.title, books.subject_id 
FROM public.authors left join public.books on books.author_id = authors.id</pre><p>Delete the index, restart Elasticsearch and examine the data. Now you see that we only get one book per author. Executing the SQL statement in pgadmin returns 22 rows, while in Elasticsearch we get 19. This is on account of the _id field, on each attempt to index an existing record with the same _id as a new one, it will be overwritten.</p>
<p>River-jdbc supports <a href="https://github.com/jprante/elasticsearch-river-jdbc/wiki/Structured-Objects">Structured objects</a>, which allows us to create arbitrarily structured JSON documents simply by using SQL aliases. The _id column is used for identity, structured objects will be appended to existing data. This is perhaps best shown by an example:</p><pre class="crayon-plain-tag">SELECT authors.id as _id, authors.last_name, authors.first_name,&nbsp;
books.id as \&quot;Books.id\&quot;, books.title as \&quot;Books.title\&quot;, 
 books.subject_id as \&quot;Books.subject_id\&quot; 
FROM public.authors left join public.books on books.author_id = authors.id order by authors.id</pre><p>Again, delete the index, restart Elasticsearch, wait a few seconds before you search, and you will find structured data in the search results.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2014/01/hitsSampleBookTownWithTwoBooks.png"><img class="alignleft size-full wp-image-1918" src="http://blog.comperiosearch.com/wp-content/uploads/2014/01/hitsSampleBookTownWithTwoBooks.png" alt="" width="259" height="385" /></a></p>
<p>Now we have seen that it is quite easy to get data into Elasticsearch using river-jdbc. We have also seen how it can handle updates. That gets us quite far. Unfortunately, it doesn&#8217;t handle deletions. If a record is deleted from the database, it will not automatically be deleted from the index. There have been some attempts to create support for it, but in the latest release it has been completely dropped.</p>
<p>This is due to the river plugin system having some serious problems, and it will perhaps be deprecated some time after the 1.0 release, at least not actively promoted as &#8220;the way&#8221;. (<a href="http://www.linkedin.com/groups/Official-guide-writing-ElasticSearch-rivers-3393294.S.268274223">see the &#8220;semi-offical statement&#8221; at Linkedin Elasticsearch group</a>). While it is extremely easy to use rivers to get data, there are a lot of problems in having a data integration process running in the same space as Elasticsearch itself. Architecturally, it is perhaps more correct to leave the search engine to itself, and build integrations systems on the side.</p>
<p>Among the recommended alternatives are:<br />
&gt;Use an ETL tool like <a href="http://www.talend.com/">Talend</a><br />
&gt;Create your own script<br />
&gt;Edit the source application to send updates to Elasticsearch</p>
<p>Jörg Prante, who is the man behind river-jdbc, recently started creating a replacement called <a href="https://github.com/jprante/elasticsearch-gatherer">Gatherer</a>.<br />
It is a gathering framework plugin for fetching and indexing data to Elasticsearch, with scalable components.</p>
<p>Anyway, we have data in our index! Rivers may have their problems when used on a large scale, but you would be hard pressed to find anything easier to get started with. Getting data into the index easily is essential when exploring ideas and concepts, creating POCs or just fooling around.</p>
<p>This post has run out of space, but perhaps we can look at some interesting queries next time?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.comperiosearch.com/blog/2014/01/30/elasticsearch-indexing-sql-databases-the-easy-way/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Norch- a search engine for node.js</title>
		<link>http://blog.comperiosearch.com/blog/2013/07/05/norch-a-search-engine-for-node-js/</link>
		<comments>http://blog.comperiosearch.com/blog/2013/07/05/norch-a-search-engine-for-node-js/#comments</comments>
		<pubDate>Fri, 05 Jul 2013 13:24:02 +0000</pubDate>
		<dc:creator><![CDATA[Fergus McDowall]]></dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[forage]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[node]]></category>
		<category><![CDATA[node.js]]></category>
		<category><![CDATA[norch]]></category>
		<category><![CDATA[search-index]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://blog.comperiosearch.com/?p=1495</guid>
		<description><![CDATA[***** UPDATE 10th Sept 2013: Norch is now known as Forage- read about this change here ***** Norch is a search engine written for Node.js. Norch uses the Node search-index module which is in turn written using the super fast levelDB library that Google open-sourced in 2011. The aim of Norch is to make a [...]]]></description>
				<content:encoded><![CDATA[<p>*****<br />
<strong>UPDATE 10th Sept 2013:</strong> Norch is now known as <strong>Forage</strong>- <a href="http://blog.comperiosearch.com/blog/2013/08/26/norch-is-changing-its-name-to-forage/" title="Norch is changing its name to Forage">read about this change here</a><br />
*****</p>
<p><a href="http://fergiemcdowall.github.io/norch/">Norch</a> is a search engine written for Node.js. Norch uses the <a href="https://github.com/fergiemcdowall/search-index">Node search-index module</a> which is in turn written using the super fast levelDB library that Google open-sourced in 2011.</p>
<p>The aim of Norch is to make a simple, fast search server, that requires minimal configuration to set up. Norch sacrifices complex functionality for a limited robust feature set, that can be used to set up a freetext search engine for most enterprise scenarios.</p>
<p>Currently Norch features</p>
<ul>
<li>Full text search</li>
<li>Stopword removal</li>
<li>Faceting</li>
<li>Filtering</li>
<li>Relevance weighting (tf-idf)</li>
<li>Field weighting</li>
<li>Paging (offset and resultset length)</li>
</ul>
<div>&nbsp;</div>
<div>Norch can index any data that is marked up in the appropriate JSON format</div>
<div>&nbsp;</div>
<div><a href="https://github.com/fergiemcdowall/norch/releases/v0.2.1">Download the first release of Norch (0.2.1) here</a></div>
<div></div>
<div></div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.comperiosearch.com/blog/2013/07/05/norch-a-search-engine-for-node-js/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
