
marvin at rectangular
Jul 26, 2008, 1:12 AM
Post #2 of 2
(3328 views)
Permalink
|
On Jul 25, 2008, at 7:25 PM, hao chen wrote: > After indexing some html files (4.7G), I got a _1.cfs file that is > 8.4G. Is this normal? Probably. There's the index files used for lookup/scoring, the stored fields returned when retrieving hits, and the data used by the highlighter, which basically duplicates what's in the index files. If you don't care about highlighting/excerpting and you just want to fetch titles, set "stored" and "vectorized" to 0 for everything but the "title" field and you'll cut down significantly on disk usage. > I only modified the directory of the sample invindex.plx file for my > indexing I strongly recommend using a real HTML parser rather than the cheesy regex tag stripper in the sample app. It's only there because it's easy to grok at a glance. Marvin Humphrey Rectangular Research http://www.rectangular.com/ _______________________________________________ KinoSearch mailing list KinoSearch [at] rectangular http://www.rectangular.com/mailman/listinfo/kinosearch
|