PDF-Indexing

This extension makes it possible to convert the textual content of PDF documents into a nodeset. This means the content of a PDF document can be used for outputting on the website for example. Probably the most frequent application is the indexing of PDF contents for the normal full text search of a website.

So that the methods for text extraction from .pdf files are available, the module must be configured in the “web.config” of the Render Engine as follows:

Namespace: http://www.getit.de/2008/indexing/pdf

Name	Parameter	Return type	Description
parsePdf	xlink:xlink [fromDataSource:boolean]	nodeset	Returns the textual content of a PDF file as a NodeSet.

applyGlossary parsePdf