Previously I blogged about the QR Server, my first port of call whenever I need to debug FAST. At that stage I ask the question “What does FAST know about this document?” and the QR Server is the best place to start looking.
Sometimes, the information in the QR Server isn’t enough to figure out what’s going wrong. If that’s the case, my next question is usually “What is FAST told about this document?”
As part of this process it is critical to understand the difference between Crawled & Managed Properties.
A Crawled Property is something that is fed into the FAST document processing engine. It may or may not decide to index and keep what it finds. If the information it does is not ignored it will end up in the Full Text index, a Managed Property, or both. Crawled Properties à What FAST is told, Managed Properties à What FAST knows and has decided to keep.
While the QR Server is for managed properties, you use a tool called FFDDumper to inspect crawled properties.
The debugging process for using the FFDDumper goes like this:
- Enable the FFDDumper & reset your document processors
- Mark an individual document for being re-crawled
- Start an incremental crawl
- Inspect the generated FFD files
- Turn off the FFDDumper and reset your document processors
Enabling the FFDDump
There is a file in your FAST installation folder called optionalprocessing.xml (found under %FASTSEARCH%\etc\config_data\DocumentProcessor\). This file is for configuring optional processing pipelines (duh!).
Find the element that looks like the one below and set it to yes.
<processor name=”FFDDumper” active=”yes” />
THIS WILL SLOW DOWN YOUR SERVER AND GOBBLE DISK SPACE! Use this for temporary debugging only, and really, only use it for incremental crawls or else the sheer volume of data will make it very difficult to find what you want.
Reset your processing pipelines from the FAST PowerShell console with a
> psctrl reset
Marking a document for re-crawl
In SharePoint navigate to your FAST Content Service Application and mark a single document for a re-crawl. Alternatively, make a small change to a document (make sure the document is published).
Kick off an incremental crawl by going to the context menu of your Content Source.
Inspecting the generated FFD Files
If you navigate to %FASTSEARCH%\data\ffd you’ll find one or more folders. Go to the newest folder and look inside. There should be a few .ffd files, one of which will contain the ssic of your document (you can get this from the QR server).
Each line from the FFDDumper will look something like the following:
The format of each line includes:
- Length of property identifier & name
- Property identifier (may be a GUID)
- Property name
- Variant Type
- Length of the property value (often preceded by an ‘s’)
- The property value
Only lines with GUIDs are available to be mapped into Managed Properties. Lines without GUIDs are internal to FAST and are not accessible. If you look at the screenshot below, the unfortunate implication is that you can’t write your own pipeline extension to run against the raw crawled data.
Reset the FFDDump
Once you’re done, reset the FFD Dump to its regular configuration
<processor name=”FFDDumper” active=”no” />
Finally, run a psctrl reset to load the updated settings of optionalprocessing.xml