Right,
being a bit of a Perl hacker, I really fancy getting some more options into that indexed search, basically so I can do -
+"men behaving badly" -Gary +Martin Brit*
Which will get all documents containing "men behaving badly" with "Martin", but exclude those which have the word "Gary" in them. Also get any words which have Brit at the start of them (only 3/4 letters more mind you)
Anyway, I'll have to start out small, so I'd like to get +keyword, -keyword and keyword* working firstly.
What I need is a bit of assistance in hacking this into search.pm. Apart from a complete rewrite which is just a little out of my league, it could be a pretty cool hack. I'm thinking the best way to approach this is to return some extra data from sub _tokenize. This information would be all the words which we don't want and those which have a star after them. It would mean parsing the string a little differently to categorise those words with "-" in front and a * at the end of them. In the appropriate subs (currently) _get_by_intersection and _get_by_union we then parse those terms a little further.
For the "-" operator we need to add, - NOT LIKE % $variable % - and if it has a * on the end, it gets - NOT LIKE % $variable % % - obviously the appropriate negation if it is exact matching we're going for.
So anyway, there are some initial ideas, I'll have a hack away to see if I get it, and it would be great if others could take a look and maybe make comment on the above.
cheers,
tom
being a bit of a Perl hacker, I really fancy getting some more options into that indexed search, basically so I can do -
+"men behaving badly" -Gary +Martin Brit*
Which will get all documents containing "men behaving badly" with "Martin", but exclude those which have the word "Gary" in them. Also get any words which have Brit at the start of them (only 3/4 letters more mind you)
Anyway, I'll have to start out small, so I'd like to get +keyword, -keyword and keyword* working firstly.
What I need is a bit of assistance in hacking this into search.pm. Apart from a complete rewrite which is just a little out of my league, it could be a pretty cool hack. I'm thinking the best way to approach this is to return some extra data from sub _tokenize. This information would be all the words which we don't want and those which have a star after them. It would mean parsing the string a little differently to categorise those words with "-" in front and a * at the end of them. In the appropriate subs (currently) _get_by_intersection and _get_by_union we then parse those terms a little further.
For the "-" operator we need to add, - NOT LIKE % $variable % - and if it has a * on the end, it gets - NOT LIKE % $variable % % - obviously the appropriate negation if it is exact matching we're going for.
So anyway, there are some initial ideas, I'll have a hack away to see if I get it, and it would be great if others could take a look and maybe make comment on the above.
cheers,
tom