PDF-Indexing

This extension makes it possible to convert the textual content of PDF documents into a nodeset. This means the content of a PDF document can be used for outputting on the website for example. Probably the most frequent application is the indexing of PDF contents for the normal full text search of a website.

So that the methods for text extraction from .pdf files are available, the module must be configured in the “web.config” of the Render Engine as follows:

<module type="Onion.RenderEngine.CommonModules.PDFIndexing.Module, Onion.RenderEngine.CommonModules.PDFIndexing" />
Namespace: http://www.getit.de/2008/indexing/pdf
NameParameterReturn typeDescription
parsePdf xlink:xlink
[fromDataSource:boolean]
nodeset Returns the textual content of a PDF file as a NodeSet.