<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Nuggets &#187; bitbucket</title>
	<atom:link href="http://blog.comperiosearch.com/blog/tag/bitbucket/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.comperiosearch.com</link>
	<description>A blog about Search as THE solution</description>
	<lastBuildDate>Mon, 13 Jun 2016 08:59:45 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=3.9.40</generator>
	<item>
		<title>Bitbucket to Elasticsearch Connector</title>
		<link>http://blog.comperiosearch.com/blog/2014/09/18/bitbucket-elasticsearch-connector/</link>
		<comments>http://blog.comperiosearch.com/blog/2014/09/18/bitbucket-elasticsearch-connector/#comments</comments>
		<pubDate>Thu, 18 Sep 2014 11:46:16 +0000</pubDate>
		<dc:creator><![CDATA[Murhaf Fares]]></dc:creator>
				<category><![CDATA[English]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[bitbucket]]></category>
		<category><![CDATA[Elasticsearch]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.comperiosearch.com/?p=2989</guid>
		<description><![CDATA[&#8220;Ability to search source code? (BB-39)&#8221; is an issue created in July 2011 on Bitbucket and its status is still new. If you have used Bitbucket before, you would have certainly noticed that there is no way to search in a repository&#8217;s source code. Now what if you had more than 200 repositories (as is [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright wp-image-3002 size-full" src="http://blog.comperiosearch.com/wp-content/uploads/2014/09/bitbucket-logo-a3719e03.png" alt="bitbucket-logo-a3719e03" width="248" height="248" /><br />
<em>&#8220;Ability to search source code? (BB-39)&#8221;</em> is an <a href="https://bitbucket.org/site/master/issue/2874/ability-to-search-source-code-bb-39" target="_blank">issue created in July 2011</a> on Bitbucket and its status is still new. If you have used Bitbucket before, you would have certainly noticed that there is no way to search in a repository&#8217;s source code. Now what if you had more than 200 repositories (as is the case for Comperio) and you wanted to search for some examples on how to use a function, for example? There are two options. Either clone all the repos to your local machine and then do some &#8216;grep&#8217; magic or use our connector to index Bitbucket content in elasticsearch and then search happily ever after.</p>
<p>In this blog post, we introduce an <a href="https://github.com/comperiosearch/bitbucket-elasticsearch-connector" target="_blank">open-source and free connector</a> that indexes content from Bitbucket in elasticsearch. The connector is written in Python and it has two main modes: <em>index</em>, indexes everything from your Bitbucket account in elasticsearch, and <em>update</em>, updates your elasticsearch index based on the commits from the last time your ran the connector (there are three types of git update: add, change and delete).<br />
The connector creates an elasticsearch index (based on the configurations provided in <a href="https://github.com/comperiosearch/bitbucket-elasticsearch-connector/blob/master/elasticsearch.conf" target="_blank">elasticsearch.conf</a>) which in turn has two types of documents, namely &#8216;file&#8217; and &#8216;repo&#8217;. We only provide a <a href="https://github.com/comperiosearch/bitbucket-elasticsearch-connector/blob/master/file_mapping.json" target="_blank">mapping file</a> for the &#8216;file-typed&#8217; documents, you can create one for repos as well. For information on the connector and how to use it, please see the <a href="https://github.com/comperiosearch/bitbucket-elasticsearch-connector" target="_blank">project&#8217;s page</a> on GitHub.</p>
<p><strong>Bitbucket REST APIs</strong><br />
If you check the source code of the connector, you will see that we are using two versions of Bitbucket REST APIs (<a href="https://confluence.atlassian.com/display/BITBUCKET/Version+1" target="_blank">version 1.0</a> and <a href="https://confluence.atlassian.com/display/BITBUCKET/Version+2" target="_blank">version 2.0</a>). We are doing so because not everything supported by version 1.0 is supported by version 2.0 and vice versa, e.g. branches are retrievable in API V 1.0 but not 2.0.</p>
<p><strong>Field collapsing for duplicates from different branches</strong><br />
If a repo has more than one branch, the connector would index the files in all branches as separate documents. This means that whenever you are searching for something, you will see the same matching file from the different branches as separate hits as well. In order to avoid this, we created an ID called <em>collapse_id</em> which allows us to collapse hits of the same file, but from different branches, using queries similar to the following:<br />
<script src="https://gist.github.com/d26cc5a1c0b570de85b8.js?file=collapsing-with-top-hits-agg.json"></script><br />
See another example of field collapsing using the top hits aggregation <a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_field_collapse_example" target="_blank">on elasticsearch.org</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.comperiosearch.com/blog/2014/09/18/bitbucket-elasticsearch-connector/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
