pierluca.sangiorgi at gmail
Jun 11, 2012, 11:42 AM
Post #1 of 3
Hi, I'm new in Solr usage and I want to know if it's the right choice
is solr the right choice for my pdf indexing purpose?
for my problem.
I need to index pdf documents stored in filesystem and make query over them.
So i used solr with solrj as extractingrequesthandler and all works,
but I'm not interested in index pdf metadata, while in the content
text of the document.
I saw that the content is indexed entirely in a single field
("attr_content" in my case), but what i want is to index fields that
are inside the field content.
As example: I've a pdf document that contain an invoice. I need to
extract and index informations relative to recipient, price, sold
items, items description, and so on.
Is Solr the right choice for this purpose or do i need to use other
framework in addiction before posting document to Solr?
thanks in advance