Learning about nctrl, and disabling the FAST Search Web crawler
In FAST Search for SharePoint (FS4SP), there are two methods for crawling web sites. Either use the built-in SharePoint crawler on your FAST Content SSA, or use the far more advanced FAST Search Web crawler. For most small web crawls, it’s not necessary to use the latter, but instead much easier to set up a new web Content Source, and use the standard crawl functionality.
But even if you do, you may have noticed that the FAST Search Web crawler is still turned on behind the scenes (on a default single-node installation). The Node Controller, i.e. the “nctrl” tool that handles FS4SP internal processes, is set to automatically start the Web crawler and its associated processes by default. In this blog post, we’ll circumvent this in order to gain some insight into the inner workings of the Node Controller.
Run a “nctrl status”, and you will see all running FS4SP-processes:
The processes in red are all related to the crawler:
crawler
The main component of the FAST Search Web crawler.
browserengine
A component which emulates a real web browser, and makes sure the crawler “see” what a regular user see while browsing the web.
webanalyzer, fdmworker, walinkstorereceiver, walookupdb0
Components of the Web Analyzer, which calculates a link graph of the crawled content. This data can be used for ranking purposes (compare to Google’s PageRank algorithm). Please note that the Web Analyzer is also used for relevance calculations in SharePoint crawls – and not only for crawled external web content.
If you don’t plan to use the FAST Search Web Crawler, and don’t care for the output from the Web Analyzer module, you can shave off something like 50-200 MB of RAM usage by stopping these processes. Although it’s an easy operation, don’t do this unless you understand the steps involved, the consequences, how to roll back the changes, and most importantly: that you now tread into unsupported and undocumented waters. Also, even though I and a few other people have tested this without running into problems, it’s impossible to guarantee that your particular configuration is unaffected. But at the very least, please read on to gain some insight into how the Node Controller works.
If you still want to disable the FAST Search Web Crawler, stop all crawler-related processes by running the following command in a FS4SP shell:
1 |
nctrl stop browserengine crawler fdmworker walinkstorerreceiver walookupdb0 webanalyzer |
Running “nctrl status” will now show you this:
As the crawler processes are now in the “User suspended” mode, they will remain suspended even after a full restart of FS4SP.
If you like, you can go even further and clean up the output of “nctrl status” by removing the now disabled processes from the list altogether. To do this, open up the Node Controller’s configuration file, i.e. %FASTSEARCH%\etc\NodeConf.xml. At the top, you will see something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<startorder> <proc>nameservice</proc> <proc>configserver</proc> <proc>contentdistributor</proc> <proc>indexer</proc> <proc>search-1</proc> <proc>topfdispatch</proc> <proc>samworker</proc> <proc>samadmin</proc> <proc>qrserver</proc> <proc>qrproxy</proc> <proc>indexingdispatcher</proc> <proc>browserengine</proc> <proc>fdmworker</proc> <proc>walookupdb0</proc> <proc>walinkstorerreceiver</proc> <proc>webanalyzer</proc> <proc>sprel</proc> <proc>spelltuner</proc> </startorder> |
Comment out all crawler-related processes, like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
<startorder> <proc>nameservice</proc> <proc>configserver</proc> <proc>contentdistributor</proc> <proc>indexer</proc> <proc>search-1</proc> <proc>topfdispatch</proc> <proc>samworker</proc> <proc>samadmin</proc> <proc>qrserver</proc> <proc>qrproxy</proc> <proc>indexingdispatcher</proc> <!-- <proc>browserengine</proc> <proc>fdmworker</proc> <proc>walookupdb0</proc> <proc>walinkstorerreceiver</proc> <proc>webanalyzer</proc> --> <proc>sprel</proc> <proc>spelltuner</proc> </startorder> |
Save the file, and then issue the command “nctrl reloadcfg” to make the Node Controller pick up the changes. All processes you commented out are not stripped from the “nctrl status” output. But the “crawler” process remains, indeed it was not listed under the <startorder> tag that you just edited. This is because the crawler is administered through the “nctrl” tool itself, and not through its configuration file. Run “nctrl remove crawler” to get rid of it as well.
Your status output should now look like the below. Nice and clean!
Of course, turning off functionality like this is an advanced operation – not recommended to most users. However, even though you don’t want to do this yourself, at least you’ve now gained some insight into how FS4SP’s Node Controller works. With a little imagination and extrapolating you can now even add your own custom applications to the Node Controller, and make sure they are started alongside FS4SP. Let me know if you have any questions, or run into problems.
Update:
If you’re serious about getting rid of the FAST Search Web crawler, Mikael from Tech and Me kindly pointed out an alternative strategy for disabling it: change the deployment.xml accordingly, and run the Set-FASTSearchConfiguration cmdlet. This will however require you to shut down FS4SP (nctrl stop), which is not necessary using the above procedure. But it’s arguably a cleaner approach for getting rid of the crawler. TechNet has more information about deployment reconfigurations in general. Please note that this will overwrite your %FASTSEARCH%\etc\NodeConf.xml – editing it manually is unsupported after all.
[...] from: http://blog.comperiosearch.com/blog/2011/07/12/learning-about-nctrl-disabling-fast-search-web-crawle… Share this:TwitterFacebookLike this:Like Loading… This entry was posted in Uncategorized and [...]