<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Nuggets &#187; pipeline</title>
	<atom:link href="http://blog.comperiosearch.com/blog/tag/pipeline/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.comperiosearch.com</link>
	<description>A blog about Search as THE solution</description>
	<lastBuildDate>Mon, 13 Jun 2016 08:59:45 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=3.9.40</generator>
	<item>
		<title>Solr As A Document Processing Pipeline</title>
		<link>http://blog.comperiosearch.com/blog/2015/01/16/custom-solr-update-request-processors/</link>
		<comments>http://blog.comperiosearch.com/blog/2015/01/16/custom-solr-update-request-processors/#comments</comments>
		<pubDate>Fri, 16 Jan 2015 10:40:48 +0000</pubDate>
		<dc:creator><![CDATA[Seb Muller]]></dc:creator>
				<category><![CDATA[English]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[content enrichment]]></category>
		<category><![CDATA[Document Processing]]></category>
		<category><![CDATA[pipeline]]></category>
		<category><![CDATA[update request processor]]></category>

		<guid isPermaLink="false">http://blog.comperiosearch.com/?p=3050</guid>
		<description><![CDATA[Recently on a project I got an interesting request. Content owners wanted to enrich new documents submitted to the search index with content from documents already present in the index. We use Solr as the search backend for this particular customer so I started thinking about how to achieve this with Solr. A bit of [...]]]></description>
				<content:encoded><![CDATA[<p>Recently on a project I got an interesting request. Content owners wanted to enrich new documents submitted to the search index with content from documents already present in the index. We use Solr as the search backend for this particular customer so I started thinking about how to achieve this with Solr.</p>
<h2>A bit of Solr background</h2>
<p>Solr ships with all the tools and features necessary for an advanced search solution. These include the oft overlooked update request processors. They operate at the document level i.e. prior to individual field tokenisation and allow you to clean, modify and/or enrich incoming documents. Processing options include language identification, duplicate detection and HTML markup handling. Create a chain of them and you have a true document processing pipeline.</p>
<p>The Solr wiki includes a <a title="Update Request Processors" href="https://wiki.apache.org/solr/UpdateRequestProcessor#Full_list_of_UpdateRequestProcessor_Factories">brief entry </a> on the topic with an example of a custom processor that conditionally adds the field &#8220;cat&#8221; with value &#8220;popular&#8221;. The full list of UpdateRequestProcessor factories is available via the <a href="http://www.solr-start.com/info/update-request-processors/">Solr Start project</a>.</p>
<h2>Back to the initial request</h2>
<p>Certain incoming documents would contain a field, topicRef for example, with a reference to one or more documents already present in the index. The referenced documents could either contain a subsequent reference or content that we wanted to add to the incoming document. <a href="http://blog.comperiosearch.com/wp-content/uploads/2014/10/docProcess.png"><img class="size-medium wp-image-3054 alignright" src="http://blog.comperiosearch.com/wp-content/uploads/2014/10/docProcess-220x300.png" alt="document pipeline" width="220" height="300" /></a></p>
<p>I needed a mechanism to retrieve any referenced documents, traverse a tree of subsequently referenced documents if necessary, and then map the eventual leaf documents&#8217; specified content fields to additional new fields in the incoming document.</p>
<p>I created a recursive document enrichment processor to do just that!</p>
<p>Its settings allow for multiple potential field retrievals and mappings, local and foreign key field definitions and the option to retrieve content from a remote Solr index.</p>
<script src="https://gist.github.com/fcd5b45cd42a40b97daa.js?file=RecursiveMergeExistingDocFactory"></script>
<p>A minor drawback of the current iteration of the processor is a high reliance on the existence of referenced documents i.e. if the referenced documents are not already present in the index then the processor will skip over them. To ensure documents are fully enriched, especially if the referenced documents are included in the same indexing batch, reindexes of incoming documents is necessary unless explicitly defining the document indexing order.</p>
<p>In addition, when a referenced document is updated, content owners expect this to have an impact on the content of the parent document and therefore a user&#8217;s search experience. This is currently not the case as parent documents are unaware of their child documents beyond the indexing process.</p>
<p>I&#8217;m now thoroughly enjoying tackling these issues and working on the next iteration of this RecursiveMergeExistingDoc processor!</p>
<h2>Update &#8211; 06/02/15</h2>
<p>The source code is now available on <a href="https://github.com/sebnmuller/SolrDocumentEnricher">github</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.comperiosearch.com/blog/2015/01/16/custom-solr-update-request-processors/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How To: Debug and log FAST Search pipeline extensibility stages in Visual Studio</title>
		<link>http://blog.comperiosearch.com/blog/2010/12/22/how-to-debug-and-log-fast-search-pipeline-extensibility-stages-in-visual-studio/</link>
		<comments>http://blog.comperiosearch.com/blog/2010/12/22/how-to-debug-and-log-fast-search-pipeline-extensibility-stages-in-visual-studio/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 21:55:00 +0000</pubDate>
		<dc:creator><![CDATA[Mikael Svenson]]></dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[extensibility]]></category>
		<category><![CDATA[fs4sp]]></category>
		<category><![CDATA[pipeline]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[sharepoint]]></category>

		<guid isPermaLink="false">http://nuggets.comperiosearch.com/2010/12/how-to-debug-and-log-fast-search-pipeline-extensibility-stages-in-visual-studio/</guid>
		<description><![CDATA[One of the most powerful features with FS4SP is the ability to do work on the indexed data before it’s made searchable. This can include extracting location names from the documents being indexed or enriching the data from external sources by adding financial data to a customers CRM record based on a lookup key. Only [...]]]></description>
				<content:encoded><![CDATA[<p>One of the most powerful features with FS4SP is the ability to do work on the indexed data before it’s made searchable. This can include extracting location names from the documents being indexed or enriching the data from external sources by adding financial data to a customers CRM record based on a lookup key. Only your imagination limits the possibilities.</p>
<p>As the extensibility demo code seems to be missing from MSDN I decided to create a stage which counts the number of words in the crawled document. There is a <a href="http://msdn.microsoft.com/en-us/library/ff795815.aspx" target="_blank">special crawled property set</a> which contains a field named “<strong>body</strong>” which contains the extracted text of the crawled item, “<strong>data</strong>” which is the binary content of the source document in base64 encoding, and “<strong>url</strong>” which is the link used when displaying results. My stage will use the body field.</p>
<p>First I created a new property set for the crawled property I will emit from my program. I could have used one of the existing ones, but I find it easier to have my custom properties in a separate location. I name the property set “mAdcOW” and assign it an arbitrary guid. You can get a GUID in PowerShell with the following command:</p>
<p><span style="font-family: 'Courier New';">[guid]::NewGuid()</span></p>
<p>The PowerShell command to create a new property set/category with my chosen guid looks like this:</p>
<p><span style="font-family: 'Courier New';">New-FASTSearchMetadataCategory -Name &#8220;mAdcOW&#8221; -Propset FA585F53-2679-48d9-976D-9CE62E7E19B7</span></p>
<p>The guid is important as it is later used in the pipeline extensibility configuration. Default, the property set will add newly discovered properties as they are seen during the crawl. This saves us the work of manually creating the crawled properties we are going to be using.</p>
<p>For maintainability I create my own folder below the FASTSearch root for my module named C:\FASTSearch\pipelinemodules. Check the %FASTSEARCH% environmental variable for your actual FS4SP location.</p>
<p>Now over to the actual pipeline stage. In Visual Studio create a new “Console Application”. I give it the name “WordCount”.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/newproject.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="newproject" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/newproject_thumb.png" border="0" alt="newproject" width="504" height="282" /></a></p>
<p>In Program.cs I have the following code:</p>
<div id="codeSnippetWrapper" style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 20px 0px 10px; width: 97.5%; font-family: 'Courier New', courier, monospace; direction: ltr; max-height: 200px; font-size: 8pt; overflow: auto; cursor: text; border: silver 1px solid; padding: 4px;">
<pre class="crayon-plain-tag">&lt;span style=&quot;color: #0000ff;&quot;&gt;private&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;static&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;int&lt;/span&gt; Main(&lt;span style=&quot;color: #0000ff;&quot;&gt;string&lt;/span&gt;[] args){&lt;span style=&quot;color: #cc6633;&quot;&gt;#if&lt;/span&gt; DEBUG    Thread.Sleep(1000 * 90);&lt;span style=&quot;color: #cc6633;&quot;&gt;#endif&lt;/span&gt;    &lt;span style=&quot;color: #0000ff;&quot;&gt;try&lt;/span&gt;    {        Logger.WriteLogFile(args[0], &lt;span style=&quot;color: #006080;&quot;&gt;&quot;input&quot;&lt;/span&gt;);        WordCount wc = &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; WordCount();        wc.DoProcessing(args[0], args[1]);        Logger.WriteLogFile(args[1], &lt;span style=&quot;color: #006080;&quot;&gt;&quot;output&quot;&lt;/span&gt;);    }    &lt;span style=&quot;color: #0000ff;&quot;&gt;catch&lt;/span&gt; (Exception e)    {        &lt;span style=&quot;color: #008000;&quot;&gt;// This will end up in the crawl log, since exit code != 0&lt;/span&gt;        Console.WriteLine(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;Failed: &quot;&lt;/span&gt; + e.Message + &lt;span style=&quot;color: #006080;&quot;&gt;&quot;/&quot;&lt;/span&gt; + e.StackTrace);        &lt;span style=&quot;color: #0000ff;&quot;&gt;return&lt;/span&gt; 1;    }    &lt;span style=&quot;color: #0000ff;&quot;&gt;return&lt;/span&gt; 0;}</pre>
</div>
<p>Take notice of the #if DEBUG part. The pause is there in order to have time to attach the Visual Studio Debugger. I did try to use</p><pre class="crayon-plain-tag">System.Diagnostics.Debugger.Break()</pre><p><!--.csharpcode, .csharpcode pre { 	font-size: small; 	color: black; 	font-family: consolas, "Courier New", courier, monospace; 	background-color: #ffffff; 	/*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt  { 	background-color: #f4f4f4; 	width: 100%; 	margin: 0em; } .csharpcode .lnum { color: #606060; } -->but the context in which the pipeline stage is run under does not have access to invoke the debugger.</p>
<p>You might also note the Logger.WriteLog lines in the Main function. This is something I got from an <a href="https://blogs.msdn.com/b/thomsven/archive/2010/09/23/debugging-and-tracing-fast-search-pipeline-extensibility-stages.aspx" target="_blank">MSDN blog entry</a>, and which I modified a bit for restructuring the code. I also added a configuration key to turn logging on/off and a key for specifying the folder name of the log files. An important piece of information from the blog entry is that you only have write access to the <span style="font-family: 'Courier New';">C:\Users\username\AppData\LocalLow</span> folder. Instead of hard coding the folder name, I added code which uses the Win32 API to get the correct folder name in case it resides on another drive or folder than “Users”.</p>
<p>DoProcessing takes two arguments, the input file to read, and the output file to write. These are passed in from the document processor pipeline, and is how custom stages work. They read in an xml file with the data to process, and write out a new one with the new/modified data.</p>
<p>The code which counts the words uses the XDocument class and linq to xml for reading and writing the input and output data. At the top you see a declaration for the guid I used for my property set, and a guid for the special crawled propery set with the body property. These are the same as in the pipelineextensibility.xml configuration file. In short we select what was specified in the configuration file.</p>
<div id="codeSnippetWrapper" style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 20px 0px 10px; width: 97.5%; font-family: 'Courier New', courier, monospace; direction: ltr; max-height: 200px; font-size: 8pt; overflow: auto; cursor: text; border: silver 1px solid; padding: 4px;">
<pre class="crayon-plain-tag">&lt;span style=&quot;color: #0000ff;&quot;&gt;internal&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;class&lt;/span&gt; WordCount{    &lt;span style=&quot;color: #008000;&quot;&gt;// this propset contains url/body/data - http://msdn.microsoft.com/en-us/library/ff795815.aspx&lt;/span&gt;    &lt;span style=&quot;color: #0000ff;&quot;&gt;private&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;static&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;readonly&lt;/span&gt; Guid CrawledCategoryFAST = &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; Guid(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;11280615-f653-448f-8ed8-2915008789f2&quot;&lt;/span&gt;);    &lt;span style=&quot;color: #0000ff;&quot;&gt;private&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;static&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;readonly&lt;/span&gt; Guid CrawledCategorymAdcOW = &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; Guid(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;fa585f53-2679-48d9-976d-9ce62e7e19b7&quot;&lt;/span&gt;);    &lt;span style=&quot;color: #0000ff;&quot;&gt;private&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;static&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;readonly&lt;/span&gt; Regex WordSplit = &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; Regex(&lt;span style=&quot;color: #006080;&quot;&gt;@&quot;\s+&quot;&lt;/span&gt;, RegexOptions.Compiled);

    &lt;span style=&quot;color: #008000;&quot;&gt;// Actual processing&lt;/span&gt;    &lt;span style=&quot;color: #0000ff;&quot;&gt;public&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;void&lt;/span&gt; DoProcessing(&lt;span style=&quot;color: #0000ff;&quot;&gt;string&lt;/span&gt; inputFile, &lt;span style=&quot;color: #0000ff;&quot;&gt;string&lt;/span&gt; outputFile)    {        XDocument inputDoc = XDocument.Load(inputFile);

        &lt;span style=&quot;color: #008000;&quot;&gt;// Fetch the content type property from the input item&lt;/span&gt;        var res = from cp &lt;span style=&quot;color: #0000ff;&quot;&gt;in&lt;/span&gt; inputDoc.Descendants(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;CrawledProperty&quot;&lt;/span&gt;)                    &lt;span style=&quot;color: #0000ff;&quot;&gt;where&lt;/span&gt; &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; Guid(cp.Attribute(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;propertySet&quot;&lt;/span&gt;).Value).Equals(CrawledCategoryFAST) &amp;amp;&amp;amp;                        cp.Attribute(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;propertyName&quot;&lt;/span&gt;).Value == &lt;span style=&quot;color: #006080;&quot;&gt;&quot;body&quot;&lt;/span&gt; &amp;amp;&amp;amp;                        cp.Attribute(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;varType&quot;&lt;/span&gt;).Value == &lt;span style=&quot;color: #006080;&quot;&gt;&quot;31&quot;&lt;/span&gt;                    select cp.Value;

        &lt;span style=&quot;color: #008000;&quot;&gt;// Count the number of words separated by white space&lt;/span&gt;        &lt;span style=&quot;color: #0000ff;&quot;&gt;int&lt;/span&gt; wordCount = res.Sum(s =&amp;gt; WordSplit.Split(s).Length);

        &lt;span style=&quot;color: #008000;&quot;&gt;// Create the output item&lt;/span&gt;        XElement outputElement = &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; XElement(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;Document&quot;&lt;/span&gt;);        &lt;span style=&quot;color: #0000ff;&quot;&gt;if&lt;/span&gt; (res.Count() &amp;gt; 0 &amp;amp;&amp;amp; res.First().Length &amp;gt; 0)        {            outputElement.Add(                &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; XElement(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;CrawledProperty&quot;&lt;/span&gt;,                                &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; XAttribute(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;propertySet&quot;&lt;/span&gt;, CrawledCategorymAdcOW),                                &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; XAttribute(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;propertyName&quot;&lt;/span&gt;, &lt;span style=&quot;color: #006080;&quot;&gt;&quot;wordcount&quot;&lt;/span&gt;),                                &lt;span style=&quot;color: #0000ff;&quot;&gt;new&lt;/span&gt; XAttribute(&lt;span style=&quot;color: #006080;&quot;&gt;&quot;varType&quot;&lt;/span&gt;, 20), wordCount) &lt;span style=&quot;color: #008000;&quot;&gt;// 20 = integer&lt;/span&gt;                );        }        outputElement.Save(outputFile);    }}</pre>
</div>
<p>After compiling a debug build of the program I copy it over to the folder previously created, C:\FASTSearch\pipelinemodules.</p>
<p>Default an FS4SP installation has 4 document processors running.</p>
<p><span style="font-family: 'Courier New';">nctrl status</span></p>
<p><span style="font-family: 'Courier New'; font-size: xx-small;">Document Processor              procserver_1             11644  Running</span><br />
<span style="font-family: 'Courier New'; font-size: xx-small;">Document Processor              procserver_2              8224  Running</span><br />
<span style="font-family: 'Courier New'; font-size: xx-small;">Document Processor              procserver_3              5452  Running</span><br />
<span style="font-family: 'Courier New'; font-size: xx-small;">Document Processor              procserver_4              5920  Running</span></p>
<p>This means it will process 4 items in parallel. In order to ease debugging we turn off all but one.</p>
<p><span style="font-family: 'Courier New';">nctrl stop procserver_2 procserver_3 procserver_4</span></p>
<p>(Remember to start them once you are done testing if this is a shared or production environment. Replace “stop” with “start” in the above command.)</p>
<p>Next I modify C:\FASTSearch\etc\pipelineextensibility.xml and add my word count stage.</p>
<div id="codeSnippetWrapper" style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 20px 0px 10px; width: 97.5%; font-family: 'Courier New', courier, monospace; direction: ltr; max-height: 200px; font-size: 8pt; overflow: auto; cursor: text; border: silver 1px solid; padding: 4px;">
<pre class="crayon-plain-tag">&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;PipelineExtensibility&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;  &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;Run&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;command&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;C:\FASTSearch\pipelinemodules\WordCount.exe %(input)s %(output)s&quot;&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;    &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;Input&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;      &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;CrawledProperty&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;propertySet&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;11280615-f653-448f-8ed8-2915008789f2&quot;&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;varType&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;31&quot;&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;propertyName&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;body&quot;&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;/&amp;gt;&lt;/span&gt;      &lt;span style=&quot;color: #008000;&quot;&gt;&amp;lt;!-- Included for debugging/traceability purposes --&amp;gt;&lt;/span&gt;      &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;CrawledProperty&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;propertySet&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;11280615-f653-448f-8ed8-2915008789f2&quot;&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;varType&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;31&quot;&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;propertyName&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;url&quot;&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;/&amp;gt;&lt;/span&gt;    &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;Input&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;    &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;Output&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;      &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;CrawledProperty&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;propertySet&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;fa585f53-2679-48d9-976d-9ce62e7e19b7&quot;&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;varType&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;20&quot;&lt;/span&gt; &lt;span style=&quot;color: #ff0000;&quot;&gt;propertyName&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;=&quot;wordcount&quot;&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;/&amp;gt;&lt;/span&gt;    &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;Output&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;  &lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;Run&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;lt;/&lt;/span&gt;&lt;span style=&quot;color: #800000;&quot;&gt;PipelineExtensibility&lt;/span&gt;&lt;span style=&quot;color: #0000ff;&quot;&gt;&amp;gt;&lt;/span&gt;</pre>
</div>
<p>After saving the file I reset the document processors in order to read the updated configuration.</p>
<p><span style="font-family: 'Courier New';">psctrl reset</span></p>
<p>I have now deployed a new pipeline stage ready for testing. On the FAST Content SSA in SharePoint Administration I start a new full crawl for my test source.</p>
<p>Start Windows Task Manager, check “Show processes from all users”, and wait for an instance of the program to appear.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/process.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="process" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/process_thumb.png" border="0" alt="process" width="434" height="223" /></a></p>
<p>Switch back to Visual Studio and set a break point in the code below the sleep statement.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/main-debug.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="main-debug" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/main-debug_thumb.png" border="0" alt="main-debug" width="504" height="356" /></a></p>
<p>Go to the “Debug” menu and choose “Attach to Process”</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/attach_menu.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="attach_menu" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/attach_menu_thumb.png" border="0" alt="attach_menu" width="493" height="183" /></a></p>
<p>Locate the process and click “Attach”. You might have to check “Show processes from all users” her as well for it to be displayed.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/attach_process.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="attach_process" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/attach_process_thumb.png" border="0" alt="attach_process" width="504" height="343" /></a></p>
<p>Once the sleep statement completes you should be able to step thru the code like you normally would in Visual Studio.</p>
<p>If logging is enabled in the configuration file you will see files appearing in the logging folder</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/logfiles.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="logfiles" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/logfiles_thumb.png" border="0" alt="logfiles" width="442" height="210" /></a></p>
<p>where the input files have the url and body fields going in, and the output the wordcount field going out, as specified in the configuration file.</p>
<p>My crawled property “wordcount” has also been added during the crawl.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/image7.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="image" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/image_thumb7.png" border="0" alt="image" width="504" height="132" /></a></p>
<p>I create a new managed property which can be used in the search result page, and map the crawled property to it. This can also be done in the Admin UI instead of with PowerShell.</p>
<div id="codeSnippetWrapper" style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 20px 0px 10px; width: 97.5%; font-family: 'Courier New', courier, monospace; direction: ltr; max-height: 200px; font-size: 8pt; overflow: auto; cursor: text; border: silver 1px solid; padding: 4px;">
<pre class="crayon-plain-tag">$managedproperty = New-FASTSearchMetadataManagedProperty -Name wordcount -Type 2 -Description &lt;span style=&quot;color: #006080;&quot;&gt;&quot;Number of words&quot;&lt;/span&gt;$wordcount = Get-FASTSearchMetadataCrawledProperty -Name wordcountNew-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $managedproperty -CrawledProperty $wordcount</pre>
</div>
<p>The operation shows up in Central Admin</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/image8.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="image" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/image_thumb8.png" border="0" alt="image" width="504" height="193" /></a></p>
<p>and the result xml when executing a search now shows the newly added wordcount property. Remember to add the column to the “Fetched properties” list in the Search Core Result web part.</p>
<p><a href="http://blog.comperiosearch.com/wp-content/uploads/2010/12/image9.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="image" src="http://blog.comperiosearch.com/wp-content/uploads/2010/12/image_thumb9.png" border="0" alt="image" width="383" height="131" /></a></p>
<p>The Visual Studio project for the pipeline stage as well as the pipelineextensibility.xml can be downloaded from my SkyDrive.</p>
<p>(This post is cross-posted from <a href="http://techmikael.blogspot.com/2010/12/how-to-debug-and-log-fast-search.html" target="_blank">Tech and Me</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.comperiosearch.com/blog/2010/12/22/how-to-debug-and-log-fast-search-pipeline-extensibility-stages-in-visual-studio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
