Windows Search - Architecture

Architecture

Windows Search is implemented as a Windows Service. The search service implements the Windows Search configuration and query APIs and also controls, as all indexing and query components. The most important component of Windows Search is the Indexer, which crawls the file system on initial setup, and then listens for file system notifications to pick up changed files in order to create and maintain the index of data. It achieves this using three processes:

  1. SearchIndexer.exe, which hosts the indexes and the list of URIs that require indexing, as well as exposes the external configuration and query APIs that other applications use to leverage the Windows Search features.
  2. SearchProtocolHost.exe, which hosts the protocol handlers. It runs with the least permission required for the protocol handler. For example, when accessing filesystem, it runs with the credentials of the system account, but on accessing network shares, it runs with the credentials of the user.
  3. SearchFilterHost.exe, which hosts the IFilters and property handlers to extract metadata and textual content. It is a low integrity process, which means that it does not have any permission to change the system settings. So, even if it encounters files with malicious content, and by any chance if they manage to take over the process, they will not be able to change any system settings.

The search service consists of several components, including the Gatherer, the Merger, the Backoff Controller, and the Query Processor, among others. The Gatherer retrieves the list of URIs that need to be crawled and invokes proper protocol handler to access the store that hosts the URI, and then the proper property-handler (to extract metadata) and IFilter to extract the document text. Different indices are created during different runs; it is the job of the Merger to periodically merge the indices. While indexing, the indices are generally maintained in-memory and then flushed to disk after a merge to reduce disk I/O. The metadata is stored in property store, which is a database maintained by the ESE database engine. The text is tokenized and the tokens are stored in a custom database built using Inverted Indices. Apart from the indices and property store, another persistent data structure is maintained: the Gather Queue. The Gather Queue maintains a prioritized queue of URIs that needs indexing. The Backoff Controller mentioned above monitors the available system resources, and controls the rate at which the indexer runs. It has three states:

  1. Running: In this state, the indexer runs without any restrictions. The indexer runs in this state only when there is no contention for resources.
  2. Throttled: In this state, the crawling of URIs and extraction of text and metadata is deliberately throttled, so that the number of operations per minute are kept under a tight control. The indexer is in this state when there is contention for resources, for example, when other applications are running. By throttling the operations, it is ensured that the other operations are not starved of resources they might need.
  3. Backed off: In this state, no indexing is done. Only the Gather Queues are kept active so that items do not go unindexed. This state is activated on extreme resource shortage (less than 5 MB of RAM or 200 MB of disk space), or if indexing is configured to be disabled when the computer is on battery power, or if the indexer is manually paused by the user.

Read more about this topic:  Windows Search

Famous quotes containing the word architecture:

    In short, the building becomes a theatrical demonstration of its functional ideal. In this romanticism, High-Tech architecture is, of course, no different in spirit—if totally different in form—from all the romantic architecture of the past.
    Dan Cruickshank (b. 1949)

    It seems a fantastic paradox, but it is nevertheless a most important truth, that no architecture can be truly noble which is not imperfect.
    John Ruskin (1819–1900)

    They can do without architecture who have no olives nor wines in the cellar.
    Henry David Thoreau (1817–1862)