How to “spy” the data in a custom pipeline extensibility stage with FS4SP
In the old FAST a much used stage during development is the “Spy” stage. What this stage does is dump out a log file of all current attributes and the values assigned to them at that point in the content processing pipeline.
Fortunately for us, this stage still exists in FS4SP, and it might help you when testing and debugging your crawling.
In order to enable the spy stage, first stop the FAST configserver
nctrl stop configserver
Second, open up %FASTSEARCH%\etc\pipelineconfig.xml
Typically you want to add your spy stage before or after the custom extensibility. In the example below I have added it before.
After the edit, save your file, and start the configserver up again.
nctrl start configserver
If you watch the %FASTSEARCH\var\log folder during indexing you will see a file named spy.txt appear which contains all current fields available to you.
The file is overwritten by each processed file and will contain information from the latest document only. If you index using only one document processor it’s still a valuable tool during development to check that you are receiving the data you expect for your custom stage.
This is certainly a pipeline stage that FAST engineers have used more than once when creating search solutions. Can you follow the same approach for custom stages?
You could in theory create custom python stages, but there is no documentation on the new interfaces for properties. So you would have to do a lot of trial an error I guess.