Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

lots of .cfs (compound files) in the index directory

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


bogguard at gmail

Mar 15, 2012, 4:14 AM

Post #1 of 11 (735 views)
Permalink
lots of .cfs (compound files) in the index directory

Hi,

We migrated our project to Lucene 3.5 and used the TieredMergePolicy with
all its defaults.
Our project contains a relatively small index (+-90MB) and gets small
writes every few seconds.
The behaviour we see is that when we reindex everything works great and
performant,
but when we keep the index running for some days we see a lot (+-3000) of
very small .cfs files (between 1KB and 10KB) are being generated and the
performance drops significantly.
Reïndexing solves the problem, but this shouldn't be the solution.

Is there a way to change the behaviour of the .cfs files. I noticed you can
fully disable it, but I read that this isn't adviced.

Kind regards,
Bogguard


lucene at mikemccandless

Mar 15, 2012, 5:21 AM

Post #2 of 11 (724 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

Hmm, that's odd...

Can you set IndexWriter's infoStream and then capture the output while
doing the small writes every few seconds and post back?

If you run CheckIndex on the index does it also report ~3000 segments?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert <bogguard [at] gmail> wrote:
> Hi,
>
> We migrated our project to Lucene 3.5 and used the TieredMergePolicy with
> all its defaults.
> Our project contains a relatively small index (+-90MB) and gets small
> writes every few seconds.
> The behaviour we see is that when we reindex everything works great and
> performant,
> but when we keep the index running for some days we see a lot (+-3000) of
> very small .cfs files (between 1KB and 10KB) are being generated and the
> performance drops significantly.
> Reïndexing solves the problem, but this shouldn't be the solution.
>
> Is there a way to change the behaviour of the .cfs files. I noticed you can
> fully disable it, but I read that this isn't adviced.
>
> Kind regards,
> Bogguard

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


bogguard at gmail

Mar 15, 2012, 7:30 AM

Post #3 of 11 (733 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

Hi Mike,

Segment format: FORMAT_3_1 [Lucene 3.1+]
NumBadSegments: 0
NumSegments: 5111
Clean?: true
TotLoseDocCount: 0
Partial: false


I reindexed and let the feed come in again...The logs of the result are
below (I just copied a few) after 10 minutes :
Segment format: FORMAT_3_1 [Lucene 3.1+]
NumBadSegments: 0
NumSegments: 11
Clean?: true
TotLoseDocCount: 0
Partial: false

What I noticed in the log was that _1a.cfs is getting deleted, but it seems
to be still in the directory :

ls -al
-rw-r--r-- 1 web web 1444172 Mar 15 15:13 _17.fdt
-rw-r--r-- 1 web web 88804 Mar 15 15:13 _17.fdx
-rw-r--r-- 1 web web 142 Mar 15 15:13 _17.fnm
-rw-r--r-- 1 web web 4352933 Mar 15 15:13 _17.frq
-rw-r--r-- 1 web web 177604 Mar 15 15:13 _17.nrm
-rw-r--r-- 1 web web 5931105 Mar 15 15:13 _17.prx
-rw-r--r-- 1 web web 18159 Mar 15 15:13 _17.tii
-rw-r--r-- 1 web web 1421338 Mar 15 15:13 _17.tis
-rw-r--r-- 1 web web 36 Mar 15 15:14 _17_1.del
-rw-r--r-- 1 web web 379400 Mar 15 15:13 _18.cfs
-rw-r--r-- 1 web web 424376 Mar 15 15:13 _19.cfs
-rw-r--r-- 1 web web 463177 Mar 15 15:13 _1a.cfs
-rw-r--r-- 1 web web 108875 Mar 15 15:13 _1b.cfs
-rw-r--r-- 1 web web 4481 Mar 15 15:14 _1c.cfs
-rw-r--r-- 1 web web 4465 Mar 15 15:15 _1d.cfs
-rw-r--r-- 1 web web 4534 Mar 15 15:17 _1e.cfs
-rw-r--r-- 1 web web 3252 Mar 15 15:18 _1f.cfs
-rw-r--r-- 1 web web 3314 Mar 15 15:21 _1g.cfs
-rw-r--r-- 1 web web 2340 Mar 15 15:25 _1h.cfs
-rw-r--r-- 1 web web 3902 Mar 15 15:27 _1i.cfs
-rw-r--r-- 1 web web 1687 Mar 15 15:28 _1j.cfs
-rw-r--r-- 1 web web 20 Mar 15 15:28 segments.gen
-rw-r--r-- 1 web web 2735 Mar 15 15:28 segments_1h


Here is (part of) the log :

50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
seg=_h(3.5):c1 size=0.007 MB [merging] [floored]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
seg=_b(3.5):c1 size=0.007 MB [merging] [floored]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
seg=_g(3.5):c1 size=0.006 MB [merging] [floored]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
seg=_c(3.5):c1 size=0.006 MB [merging] [floored]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
seg=_11(3.5):c1 size=0.005 MB [merging] [floored]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
seg=_12(3.5):c1 size=0.005 MB [merging] [floored]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
seg=_6(3.5):c1 size=0.005 MB [merging] [floored]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
allowedSegmentCount=1 vs count=15 (eligible count=5) tooBigCount=0
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: merge time 46
msec for 10 docs
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now flush at close
waitForMerges=true
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS: merge
thread: done
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now trigger flush
reason=explicit flush
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS: set
priority of merge thread Lucene Merge Thread #3 to 6
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: start flush:
applyAllDeletes=true
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: index before flush
_0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
_18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
_h(3.5):c1 _11(3.5):c1 _12(3.5):c1
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: DW: flush: no docs;
skipping
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: apply all deletes
during flush
BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: applyDeletes: no
deletes; skipping
BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: prune
sis=org.apache.lucene.index.SegmentInfos [at] 1cad1e minGen=0 packetCount=0
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: clearFlushPending
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: now merge
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: index:
_0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
_18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
_h(3.5):c1 _11(3.5):c1 _12(3.5):c1
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: no more
merges pending; now return
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge store
matchedCount=10 vs 10
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merged
segment size=0.076 MB vs estimate=0.103 MB
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: commitMerge:
_2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1 _g(3.5):c1
_c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19 index=_0(3.5):C73
_1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1 _18(3.5):c10
_17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1 _h(3.5):c1
_11(3.5):c1 _12(3.5):c1
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
commitMergeDeletes _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1
_g(3.5):c1 _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: after commit:
_0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10 _17(3.5):c10
IFD [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: now checkpoint
"segments_15" [6 segments ; isCommit = false]
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge time 61
msec for 38 docs
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges done
IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: CMS: merge
thread: done
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now call final
commit()
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: start
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: enter lock
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: already
prepared
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
pendingCommit != null
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: wrote
segments file "segments_16"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now checkpoint
"segments_16" [42 segments ; isCommit = true]
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: deleteCommits: now
decRef commit "segments_15"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "segments_15"
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: done
IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: at close:
_0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10 _17(3.5):c10
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tii"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tis"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_18.cfs"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.nrm"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fnm"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdt"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_17.cfs"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.prx"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_16.cfs"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdx"
IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.frq"
Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: applyDeletes: no deletes;
skipping
BD 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: prune
sis=org.apache.lucene.index.SegmentInfos [at] 59d9f minGen=0 packetCount=0
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: clearFlushPending
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: now merge
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: index:
_0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10 _19(3.5):c10
_18(3.5):c10
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: no more
merges pending; now return
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges done
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now call final
commit()
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: start
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: enter lock
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: already
prepared
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
pendingCommit != null
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: wrote
segments file "segments_17"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now checkpoint
"segments_17" [43 segments ; isCommit = true]
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: deleteCommits: now
decRef commit "segments_16"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "segments_16"
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: done
IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: at close:
_0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10 _19(3.5):c10
_18(3.5):c10
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdx"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tis"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.frq"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tii"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fnm"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdt"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_18.cfs"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.nrm"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_19.cfs"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_17.cfs"
IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.prx"

IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #3]: CMS: merge
thread: done
IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #0]: CMS: set
priority of merge thread Lucene Merge Thread #3 to 6
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now flush at close
waitForMerges=true
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now trigger flush
reason=explicit flush
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: start flush:
applyAllDeletes=true
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: index before flush
_0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
_1a(3.5):c10 _19(3.5):c10
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: DW: flush: no docs;
skipping
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: apply all deletes
during flush
BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: applyDeletes: no
deletes; skipping
BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: prune
sis=org.apache.lucene.index.SegmentInfos [at] e91b1 minGen=0 packetCount=0
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: clearFlushPending
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: now merge
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: index:
_0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
_1a(3.5):c10 _19(3.5):c10
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: no more
merges pending; now return
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges done
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now call final
commit()
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: start
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: enter lock
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: already
prepared
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
pendingCommit != null
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: wrote
segments file "segments_18"
IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now checkpoint
"segments_18" [44 segments ; isCommit = true]
IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: deleteCommits: now
decRef commit "segments_17"
IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "segments_17"
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: done
IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: at close:
_0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
_1a(3.5):c10 _19(3.5):c10
IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1b.cfs"
IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_18.cfs"
IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_19.cfs"
IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1a.cfs"

Kind regards,
Bogguard

On Thu, Mar 15, 2012 at 1:21 PM, Michael McCandless <
lucene [at] mikemccandless> wrote:

> Hmm, that's odd...
>
> Can you set IndexWriter's infoStream and then capture the output while
> doing the small writes every few seconds and post back?
>
> If you run CheckIndex on the index does it also report ~3000 segments?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert <bogguard [at] gmail> wrote:
> > Hi,
> >
> > We migrated our project to Lucene 3.5 and used the TieredMergePolicy with
> > all its defaults.
> > Our project contains a relatively small index (+-90MB) and gets small
> > writes every few seconds.
> > The behaviour we see is that when we reindex everything works great and
> > performant,
> > but when we keep the index running for some days we see a lot (+-3000) of
> > very small .cfs files (between 1KB and 10KB) are being generated and the
> > performance drops significantly.
> > Reïndexing solves the problem, but this shouldn't be the solution.
> >
> > Is there a way to change the behaviour of the .cfs files. I noticed you
> can
> > fully disable it, but I read that this isn't adviced.
> >
> > Kind regards,
> > Bogguard
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


lucene at mikemccandless

Mar 15, 2012, 8:04 AM

Post #4 of 11 (724 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

Hmm, which OS/filesystem is the index on? Which Directory impl are
you using...?

Indeed CheckIndex sees the 5111 segments, so something seems amiss.

The log was edited right? EG I can't see the IW open log entries...
and IW 51 is missing (it goes straight from 50 to 52). Can you post
the full log? Also, can you attach it (instead of inlining it), maybe
gzip'd, so it doesn't get line-wrapped?

Finally, can you run things a big longer, so we can see the
pathological behavior emerge?

The only odd thing I see in the log is that IW seems to be attempting
to delete files (eg _1a.cfs) that are still "live", which is odd.

That the delete did not succeed could just be that the file is open
from a reader (do you have a reader open on the index?)... though, I
would have expected to see "unable to remove file..." in the
InfoStream output.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Mar 15, 2012 at 10:30 AM, Tim Bogaert <bogguard [at] gmail> wrote:
> Hi Mike,
>
> Segment format: FORMAT_3_1 [Lucene 3.1+]
> NumBadSegments: 0
> NumSegments: 5111
> Clean?: true
> TotLoseDocCount: 0
> Partial: false
>
>
> I reindexed and let the feed come in again...The logs of the result are
> below (I just copied a few) after 10 minutes :
> Segment format: FORMAT_3_1 [Lucene 3.1+]
> NumBadSegments: 0
> NumSegments: 11
> Clean?: true
> TotLoseDocCount: 0
> Partial: false
>
> What I noticed in the log was that _1a.cfs is getting deleted, but it seems
> to be still in the directory :
>
> ls -al
> -rw-r--r--   1 web      web      1444172 Mar 15 15:13 _17.fdt
> -rw-r--r--   1 web      web        88804 Mar 15 15:13 _17.fdx
> -rw-r--r--   1 web      web          142 Mar 15 15:13 _17.fnm
> -rw-r--r--   1 web      web      4352933 Mar 15 15:13 _17.frq
> -rw-r--r--   1 web      web       177604 Mar 15 15:13 _17.nrm
> -rw-r--r--   1 web      web      5931105 Mar 15 15:13 _17.prx
> -rw-r--r--   1 web      web        18159 Mar 15 15:13 _17.tii
> -rw-r--r--   1 web      web      1421338 Mar 15 15:13 _17.tis
> -rw-r--r--   1 web      web           36 Mar 15 15:14 _17_1.del
> -rw-r--r--   1 web      web       379400 Mar 15 15:13 _18.cfs
> -rw-r--r--   1 web      web       424376 Mar 15 15:13 _19.cfs
> -rw-r--r--   1 web      web       463177 Mar 15 15:13 _1a.cfs
> -rw-r--r--   1 web      web       108875 Mar 15 15:13 _1b.cfs
> -rw-r--r--   1 web      web         4481 Mar 15 15:14 _1c.cfs
> -rw-r--r--   1 web      web         4465 Mar 15 15:15 _1d.cfs
> -rw-r--r--   1 web      web         4534 Mar 15 15:17 _1e.cfs
> -rw-r--r--   1 web      web         3252 Mar 15 15:18 _1f.cfs
> -rw-r--r--   1 web      web         3314 Mar 15 15:21 _1g.cfs
> -rw-r--r--   1 web      web         2340 Mar 15 15:25 _1h.cfs
> -rw-r--r--   1 web      web         3902 Mar 15 15:27 _1i.cfs
> -rw-r--r--   1 web      web         1687 Mar 15 15:28 _1j.cfs
> -rw-r--r--   1 web      web           20 Mar 15 15:28 segments.gen
> -rw-r--r--   1 web      web         2735 Mar 15 15:28 segments_1h
>
>
> Here is (part of) the log :
>
> 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_h(3.5):c1 size=0.007 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_b(3.5):c1 size=0.007 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_g(3.5):c1 size=0.006 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_c(3.5):c1 size=0.006 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_11(3.5):c1 size=0.005 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_12(3.5):c1 size=0.005 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_6(3.5):c1 size=0.005 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> allowedSegmentCount=1 vs count=15 (eligible count=5) tooBigCount=0
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: merge time 46
> msec for 10 docs
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now flush at close
> waitForMerges=true
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS:   merge
> thread: done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now trigger flush
> reason=explicit flush
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS: set
> priority of merge thread Lucene Merge Thread #3 to 6
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:   start flush:
> applyAllDeletes=true
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:   index before flush
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: DW: flush: no docs;
> skipping
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: apply all deletes
> during flush
> BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: applyDeletes: no
> deletes; skipping
> BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: prune
> sis=org.apache.lucene.index.SegmentInfos [at] 1cad1e minGen=0 packetCount=0
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: clearFlushPending
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: now merge
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS:   index:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS:   no more
> merges pending; now return
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge store
> matchedCount=10 vs 10
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merged
> segment size=0.076 MB vs estimate=0.103 MB
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: commitMerge:
> _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1 _g(3.5):c1
> _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19 index=_0(3.5):C73
> _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1 _18(3.5):c10
> _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1 _h(3.5):c1
> _11(3.5):c1 _12(3.5):c1
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> commitMergeDeletes _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1
> _g(3.5):c1 _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: after commit:
> _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10 _17(3.5):c10
> IFD [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: now checkpoint
> "segments_15" [6 segments ; isCommit = false]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge time 61
> msec for 38 docs
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: CMS:   merge
> thread: done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now call final
> commit()
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: start
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: enter lock
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: already
> prepared
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> pendingCommit != null
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: wrote
> segments file "segments_16"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now checkpoint
> "segments_16" [42 segments ; isCommit = true]
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: deleteCommits: now
> decRef commit "segments_15"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "segments_15"
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: at close:
> _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10 _17(3.5):c10
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tii"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tis"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.nrm"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fnm"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdt"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.prx"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_16.cfs"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdx"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.frq"
> Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: applyDeletes: no deletes;
> skipping
> BD 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: prune
> sis=org.apache.lucene.index.SegmentInfos [at] 59d9f minGen=0 packetCount=0
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: clearFlushPending
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: now merge
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS:   index:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10 _19(3.5):c10
> _18(3.5):c10
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS:   no more
> merges pending; now return
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges done
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now call final
> commit()
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: start
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: enter lock
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: already
> prepared
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> pendingCommit != null
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: wrote
> segments file "segments_17"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now checkpoint
> "segments_17" [43 segments ; isCommit = true]
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: deleteCommits: now
> decRef commit "segments_16"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "segments_16"
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: done
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: at close:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10 _19(3.5):c10
> _18(3.5):c10
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdx"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tis"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.frq"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tii"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fnm"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdt"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.nrm"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.prx"
>
> IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #3]: CMS:   merge
> thread: done
> IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #0]: CMS: set
> priority of merge thread Lucene Merge Thread #3 to 6
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now flush at close
> waitForMerges=true
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now trigger flush
> reason=explicit flush
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:   start flush:
> applyAllDeletes=true
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:   index before flush
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> _1a(3.5):c10 _19(3.5):c10
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: DW: flush: no docs;
> skipping
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: apply all deletes
> during flush
> BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: applyDeletes: no
> deletes; skipping
> BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: prune
> sis=org.apache.lucene.index.SegmentInfos [at] e91b1 minGen=0 packetCount=0
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: clearFlushPending
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: now merge
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS:   index:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> _1a(3.5):c10 _19(3.5):c10
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS:   no more
> merges pending; now return
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges done
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now call final
> commit()
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: start
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: enter lock
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: already
> prepared
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> pendingCommit != null
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: wrote
> segments file "segments_18"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now checkpoint
> "segments_18" [44 segments ; isCommit = true]
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: deleteCommits: now
> decRef commit "segments_17"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "segments_17"
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: done
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: at close:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> _1a(3.5):c10 _19(3.5):c10
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1b.cfs"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1a.cfs"
>
> Kind regards,
> Bogguard
>
> On Thu, Mar 15, 2012 at 1:21 PM, Michael McCandless <
> lucene [at] mikemccandless> wrote:
>
>> Hmm, that's odd...
>>
>> Can you set IndexWriter's infoStream and then capture the output while
>> doing the small writes every few seconds and post back?
>>
>> If you run CheckIndex on the index does it also report ~3000 segments?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert <bogguard [at] gmail> wrote:
>> > Hi,
>> >
>> > We migrated our project to Lucene 3.5 and used the TieredMergePolicy with
>> > all its defaults.
>> > Our project contains a relatively small index (+-90MB) and gets small
>> > writes every few seconds.
>> > The behaviour we see is that when we reindex everything works great and
>> > performant,
>> > but when we keep the index running for some days we see a lot (+-3000) of
>> > very small .cfs files (between 1KB and 10KB) are being generated and the
>> > performance drops significantly.
>> > Reïndexing solves the problem, but this shouldn't be the solution.
>> >
>> > Is there a way to change the behaviour of the .cfs files. I noticed you
>> can
>> > fully disable it, but I read that this isn't adviced.
>> >
>> > Kind regards,
>> > Bogguard
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


bogguard at gmail

Mar 15, 2012, 8:14 AM

Post #5 of 11 (720 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

Operating system is Solaris. Directory is a NAS.
Directory implementation is SimpleFSDirectory.
I sent you the full log.

Thanks,
Tim

On Thu, Mar 15, 2012 at 4:04 PM, Michael McCandless <
lucene [at] mikemccandless> wrote:

> Hmm, which OS/filesystem is the index on? Which Directory impl are
> you using...?
>
> Indeed CheckIndex sees the 5111 segments, so something seems amiss.
>
> The log was edited right? EG I can't see the IW open log entries...
> and IW 51 is missing (it goes straight from 50 to 52). Can you post
> the full log? Also, can you attach it (instead of inlining it), maybe
> gzip'd, so it doesn't get line-wrapped?
>
> Finally, can you run things a big longer, so we can see the
> pathological behavior emerge?
>
> The only odd thing I see in the log is that IW seems to be attempting
> to delete files (eg _1a.cfs) that are still "live", which is odd.
>
> That the delete did not succeed could just be that the file is open
> from a reader (do you have a reader open on the index?)... though, I
> would have expected to see "unable to remove file..." in the
> InfoStream output.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 15, 2012 at 10:30 AM, Tim Bogaert <bogguard [at] gmail> wrote:
> > Hi Mike,
> >
> > Segment format: FORMAT_3_1 [Lucene 3.1+]
> > NumBadSegments: 0
> > NumSegments: 5111
> > Clean?: true
> > TotLoseDocCount: 0
> > Partial: false
> >
> >
> > I reindexed and let the feed come in again...The logs of the result are
> > below (I just copied a few) after 10 minutes :
> > Segment format: FORMAT_3_1 [Lucene 3.1+]
> > NumBadSegments: 0
> > NumSegments: 11
> > Clean?: true
> > TotLoseDocCount: 0
> > Partial: false
> >
> > What I noticed in the log was that _1a.cfs is getting deleted, but it
> seems
> > to be still in the directory :
> >
> > ls -al
> > -rw-r--r-- 1 web web 1444172 Mar 15 15:13 _17.fdt
> > -rw-r--r-- 1 web web 88804 Mar 15 15:13 _17.fdx
> > -rw-r--r-- 1 web web 142 Mar 15 15:13 _17.fnm
> > -rw-r--r-- 1 web web 4352933 Mar 15 15:13 _17.frq
> > -rw-r--r-- 1 web web 177604 Mar 15 15:13 _17.nrm
> > -rw-r--r-- 1 web web 5931105 Mar 15 15:13 _17.prx
> > -rw-r--r-- 1 web web 18159 Mar 15 15:13 _17.tii
> > -rw-r--r-- 1 web web 1421338 Mar 15 15:13 _17.tis
> > -rw-r--r-- 1 web web 36 Mar 15 15:14 _17_1.del
> > -rw-r--r-- 1 web web 379400 Mar 15 15:13 _18.cfs
> > -rw-r--r-- 1 web web 424376 Mar 15 15:13 _19.cfs
> > -rw-r--r-- 1 web web 463177 Mar 15 15:13 _1a.cfs
> > -rw-r--r-- 1 web web 108875 Mar 15 15:13 _1b.cfs
> > -rw-r--r-- 1 web web 4481 Mar 15 15:14 _1c.cfs
> > -rw-r--r-- 1 web web 4465 Mar 15 15:15 _1d.cfs
> > -rw-r--r-- 1 web web 4534 Mar 15 15:17 _1e.cfs
> > -rw-r--r-- 1 web web 3252 Mar 15 15:18 _1f.cfs
> > -rw-r--r-- 1 web web 3314 Mar 15 15:21 _1g.cfs
> > -rw-r--r-- 1 web web 2340 Mar 15 15:25 _1h.cfs
> > -rw-r--r-- 1 web web 3902 Mar 15 15:27 _1i.cfs
> > -rw-r--r-- 1 web web 1687 Mar 15 15:28 _1j.cfs
> > -rw-r--r-- 1 web web 20 Mar 15 15:28 segments.gen
> > -rw-r--r-- 1 web web 2735 Mar 15 15:28 segments_1h
> >
> >
> > Here is (part of) the log :
> >
> > 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > seg=_h(3.5):c1 size=0.007 MB [merging] [floored]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > seg=_b(3.5):c1 size=0.007 MB [merging] [floored]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > seg=_g(3.5):c1 size=0.006 MB [merging] [floored]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > seg=_c(3.5):c1 size=0.006 MB [merging] [floored]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > seg=_11(3.5):c1 size=0.005 MB [merging] [floored]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > seg=_12(3.5):c1 size=0.005 MB [merging] [floored]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > seg=_6(3.5):c1 size=0.005 MB [merging] [floored]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > allowedSegmentCount=1 vs count=15 (eligible count=5) tooBigCount=0
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: merge time
> 46
> > msec for 10 docs
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now flush at close
> > waitForMerges=true
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS:
> merge
> > thread: done
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now trigger flush
> > reason=explicit flush
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS: set
> > priority of merge thread Lucene Merge Thread #3 to 6
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: start flush:
> > applyAllDeletes=true
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: index before
> flush
> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: DW: flush: no
> docs;
> > skipping
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: apply all deletes
> > during flush
> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: applyDeletes: no
> > deletes; skipping
> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: prune
> > sis=org.apache.lucene.index.SegmentInfos [at] 1cad1e minGen=0 packetCount=0
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: clearFlushPending
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: now merge
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: index:
> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: no more
> > merges pending; now return
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge store
> > matchedCount=10 vs 10
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merged
> > segment size=0.076 MB vs estimate=0.103 MB
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> commitMerge:
> > _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1 _g(3.5):c1
> > _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19 index=_0(3.5):C73
> > _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1 _18(3.5):c10
> > _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1 _h(3.5):c1
> > _11(3.5):c1 _12(3.5):c1
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> > commitMergeDeletes _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1
> _b(3.5):c1
> > _g(3.5):c1 _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: after
> commit:
> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
> _17(3.5):c10
> > IFD [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: now
> checkpoint
> > "segments_15" [6 segments ; isCommit = false]
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge time
> 61
> > msec for 38 docs
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges done
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: CMS:
> merge
> > thread: done
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now call final
> > commit()
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: start
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: enter lock
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: already
> > prepared
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> > pendingCommit != null
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: wrote
> > segments file "segments_16"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now checkpoint
> > "segments_16" [42 segments ; isCommit = true]
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: deleteCommits: now
> > decRef commit "segments_15"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "segments_15"
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: done
> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: at close:
> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
> _17(3.5):c10
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tii"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tis"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.nrm"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fnm"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdt"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.prx"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_16.cfs"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdx"
> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.frq"
> > Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: applyDeletes: no deletes;
> > skipping
> > BD 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: prune
> > sis=org.apache.lucene.index.SegmentInfos [at] 59d9f minGen=0 packetCount=0
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: clearFlushPending
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: now merge
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: index:
> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
> _19(3.5):c10
> > _18(3.5):c10
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: no more
> > merges pending; now return
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges done
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now call final
> > commit()
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: start
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: enter lock
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: already
> > prepared
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> > pendingCommit != null
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: wrote
> > segments file "segments_17"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now checkpoint
> > "segments_17" [43 segments ; isCommit = true]
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: deleteCommits: now
> > decRef commit "segments_16"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "segments_16"
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: done
> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: at close:
> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
> _19(3.5):c10
> > _18(3.5):c10
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdx"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tis"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.frq"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tii"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fnm"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdt"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.nrm"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.prx"
> >
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #3]: CMS:
> merge
> > thread: done
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #0]: CMS: set
> > priority of merge thread Lucene Merge Thread #3 to 6
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now flush at close
> > waitForMerges=true
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now trigger flush
> > reason=explicit flush
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: start flush:
> > applyAllDeletes=true
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: index before
> flush
> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> > _1a(3.5):c10 _19(3.5):c10
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: DW: flush: no
> docs;
> > skipping
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: apply all deletes
> > during flush
> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: applyDeletes: no
> > deletes; skipping
> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: prune
> > sis=org.apache.lucene.index.SegmentInfos [at] e91b1 minGen=0 packetCount=0
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: clearFlushPending
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: now merge
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: index:
> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> > _1a(3.5):c10 _19(3.5):c10
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: no more
> > merges pending; now return
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges done
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now call final
> > commit()
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: start
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: enter lock
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: already
> > prepared
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> > pendingCommit != null
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: wrote
> > segments file "segments_18"
> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now checkpoint
> > "segments_18" [44 segments ; isCommit = true]
> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: deleteCommits: now
> > decRef commit "segments_17"
> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "segments_17"
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: done
> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: at close:
> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> > _1a(3.5):c10 _19(3.5):c10
> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1b.cfs"
> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1a.cfs"
> >
> > Kind regards,
> > Bogguard
> >
> > On Thu, Mar 15, 2012 at 1:21 PM, Michael McCandless <
> > lucene [at] mikemccandless> wrote:
> >
> >> Hmm, that's odd...
> >>
> >> Can you set IndexWriter's infoStream and then capture the output while
> >> doing the small writes every few seconds and post back?
> >>
> >> If you run CheckIndex on the index does it also report ~3000 segments?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert <bogguard [at] gmail>
> wrote:
> >> > Hi,
> >> >
> >> > We migrated our project to Lucene 3.5 and used the TieredMergePolicy
> with
> >> > all its defaults.
> >> > Our project contains a relatively small index (+-90MB) and gets small
> >> > writes every few seconds.
> >> > The behaviour we see is that when we reindex everything works great
> and
> >> > performant,
> >> > but when we keep the index running for some days we see a lot
> (+-3000) of
> >> > very small .cfs files (between 1KB and 10KB) are being generated and
> the
> >> > performance drops significantly.
> >> > Reïndexing solves the problem, but this shouldn't be the solution.
> >> >
> >> > Is there a way to change the behaviour of the .cfs files. I noticed
> you
> >> can
> >> > fully disable it, but I read that this isn't adviced.
> >> >
> >> > Kind regards,
> >> > Bogguard
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


lucene at mikemccandless

Mar 15, 2012, 8:20 AM

Post #6 of 11 (729 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

Hmm are you calling IW.prepareCommit() before calling IW.close()?

Can you try removing the call the IW.prepareCommit() and see if the
segments are then contained...?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 15, 2012 at 11:14 AM, Tim Bogaert <bogguard [at] gmail> wrote:
> Operating system is Solaris. Directory is a NAS.
> Directory implementation is SimpleFSDirectory.
> I sent you the full log.
>
> Thanks,
> Tim
>
> On Thu, Mar 15, 2012 at 4:04 PM, Michael McCandless <
> lucene [at] mikemccandless> wrote:
>
>> Hmm, which OS/filesystem is the index on?  Which Directory impl are
>> you using...?
>>
>> Indeed CheckIndex sees the 5111 segments, so something seems amiss.
>>
>> The log was edited right?  EG I can't see the IW open log entries...
>> and IW 51 is missing (it goes straight from 50 to 52).  Can you post
>> the full log?  Also, can you attach it (instead of inlining it), maybe
>> gzip'd, so it doesn't get line-wrapped?
>>
>> Finally, can you run things a big longer, so we can see the
>> pathological behavior emerge?
>>
>> The only odd thing I see in the log is that IW seems to be attempting
>> to delete files (eg _1a.cfs) that are still "live", which is odd.
>>
>> That the delete did not succeed could just be that the file is open
>> from a reader (do you have a reader open on the index?)... though, I
>> would have expected to see "unable to remove file..." in the
>> InfoStream output.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Mar 15, 2012 at 10:30 AM, Tim Bogaert <bogguard [at] gmail> wrote:
>> > Hi Mike,
>> >
>> > Segment format: FORMAT_3_1 [Lucene 3.1+]
>> > NumBadSegments: 0
>> > NumSegments: 5111
>> > Clean?: true
>> > TotLoseDocCount: 0
>> > Partial: false
>> >
>> >
>> > I reindexed and let the feed come in again...The logs of the result are
>> > below (I just copied a few) after 10 minutes :
>> > Segment format: FORMAT_3_1 [Lucene 3.1+]
>> > NumBadSegments: 0
>> > NumSegments: 11
>> > Clean?: true
>> > TotLoseDocCount: 0
>> > Partial: false
>> >
>> > What I noticed in the log was that _1a.cfs is getting deleted, but it
>> seems
>> > to be still in the directory :
>> >
>> > ls -al
>> > -rw-r--r--   1 web      web      1444172 Mar 15 15:13 _17.fdt
>> > -rw-r--r--   1 web      web        88804 Mar 15 15:13 _17.fdx
>> > -rw-r--r--   1 web      web          142 Mar 15 15:13 _17.fnm
>> > -rw-r--r--   1 web      web      4352933 Mar 15 15:13 _17.frq
>> > -rw-r--r--   1 web      web       177604 Mar 15 15:13 _17.nrm
>> > -rw-r--r--   1 web      web      5931105 Mar 15 15:13 _17.prx
>> > -rw-r--r--   1 web      web        18159 Mar 15 15:13 _17.tii
>> > -rw-r--r--   1 web      web      1421338 Mar 15 15:13 _17.tis
>> > -rw-r--r--   1 web      web           36 Mar 15 15:14 _17_1.del
>> > -rw-r--r--   1 web      web       379400 Mar 15 15:13 _18.cfs
>> > -rw-r--r--   1 web      web       424376 Mar 15 15:13 _19.cfs
>> > -rw-r--r--   1 web      web       463177 Mar 15 15:13 _1a.cfs
>> > -rw-r--r--   1 web      web       108875 Mar 15 15:13 _1b.cfs
>> > -rw-r--r--   1 web      web         4481 Mar 15 15:14 _1c.cfs
>> > -rw-r--r--   1 web      web         4465 Mar 15 15:15 _1d.cfs
>> > -rw-r--r--   1 web      web         4534 Mar 15 15:17 _1e.cfs
>> > -rw-r--r--   1 web      web         3252 Mar 15 15:18 _1f.cfs
>> > -rw-r--r--   1 web      web         3314 Mar 15 15:21 _1g.cfs
>> > -rw-r--r--   1 web      web         2340 Mar 15 15:25 _1h.cfs
>> > -rw-r--r--   1 web      web         3902 Mar 15 15:27 _1i.cfs
>> > -rw-r--r--   1 web      web         1687 Mar 15 15:28 _1j.cfs
>> > -rw-r--r--   1 web      web           20 Mar 15 15:28 segments.gen
>> > -rw-r--r--   1 web      web         2735 Mar 15 15:28 segments_1h
>> >
>> >
>> > Here is (part of) the log :
>> >
>> > 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > seg=_h(3.5):c1 size=0.007 MB [merging] [floored]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > seg=_b(3.5):c1 size=0.007 MB [merging] [floored]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > seg=_g(3.5):c1 size=0.006 MB [merging] [floored]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > seg=_c(3.5):c1 size=0.006 MB [merging] [floored]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > seg=_11(3.5):c1 size=0.005 MB [merging] [floored]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > seg=_12(3.5):c1 size=0.005 MB [merging] [floored]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > seg=_6(3.5):c1 size=0.005 MB [merging] [floored]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
>> > allowedSegmentCount=1 vs count=15 (eligible count=5) tooBigCount=0
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: merge time
>> 46
>> > msec for 10 docs
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now flush at close
>> > waitForMerges=true
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS:
>> merge
>> > thread: done
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now trigger flush
>> > reason=explicit flush
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS: set
>> > priority of merge thread Lucene Merge Thread #3 to 6
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:   start flush:
>> > applyAllDeletes=true
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:   index before
>> flush
>> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
>> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
>> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: DW: flush: no
>> docs;
>> > skipping
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: apply all deletes
>> > during flush
>> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: applyDeletes: no
>> > deletes; skipping
>> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: prune
>> > sis=org.apache.lucene.index.SegmentInfos [at] 1cad1e minGen=0 packetCount=0
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: clearFlushPending
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: now merge
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS:   index:
>> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
>> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
>> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS:   no more
>> > merges pending; now return
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge store
>> > matchedCount=10 vs 10
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merged
>> > segment size=0.076 MB vs estimate=0.103 MB
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
>> commitMerge:
>> > _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1 _g(3.5):c1
>> > _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19 index=_0(3.5):C73
>> > _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1 _18(3.5):c10
>> > _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1 _h(3.5):c1
>> > _11(3.5):c1 _12(3.5):c1
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
>> > commitMergeDeletes _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1
>> _b(3.5):c1
>> > _g(3.5):c1 _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: after
>> commit:
>> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
>> _17(3.5):c10
>> > IFD [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: now
>> checkpoint
>> > "segments_15" [6 segments ; isCommit = false]
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge time
>> 61
>> > msec for 38 docs
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges done
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: CMS:
>> merge
>> > thread: done
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now call final
>> > commit()
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: start
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: enter lock
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: already
>> > prepared
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
>> > pendingCommit != null
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: wrote
>> > segments file "segments_16"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now checkpoint
>> > "segments_16" [42 segments ; isCommit = true]
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: deleteCommits: now
>> > decRef commit "segments_15"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "segments_15"
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: done
>> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: at close:
>> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
>> _17(3.5):c10
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tii"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tis"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_18.cfs"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.nrm"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fnm"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdt"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_17.cfs"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.prx"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_16.cfs"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdx"
>> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.frq"
>> > Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: applyDeletes: no deletes;
>> > skipping
>> > BD 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: prune
>> > sis=org.apache.lucene.index.SegmentInfos [at] 59d9f minGen=0 packetCount=0
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: clearFlushPending
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: now merge
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS:   index:
>> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
>> _19(3.5):c10
>> > _18(3.5):c10
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS:   no more
>> > merges pending; now return
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges done
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now call final
>> > commit()
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: start
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: enter lock
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: already
>> > prepared
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
>> > pendingCommit != null
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: wrote
>> > segments file "segments_17"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now checkpoint
>> > "segments_17" [43 segments ; isCommit = true]
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: deleteCommits: now
>> > decRef commit "segments_16"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "segments_16"
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: done
>> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: at close:
>> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
>> _19(3.5):c10
>> > _18(3.5):c10
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdx"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tis"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.frq"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tii"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fnm"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdt"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_18.cfs"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.nrm"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_19.cfs"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_17.cfs"
>> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.prx"
>> >
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #3]: CMS:
>> merge
>> > thread: done
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #0]: CMS: set
>> > priority of merge thread Lucene Merge Thread #3 to 6
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now flush at close
>> > waitForMerges=true
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now trigger flush
>> > reason=explicit flush
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:   start flush:
>> > applyAllDeletes=true
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:   index before
>> flush
>> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
>> > _1a(3.5):c10 _19(3.5):c10
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: DW: flush: no
>> docs;
>> > skipping
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: apply all deletes
>> > during flush
>> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: applyDeletes: no
>> > deletes; skipping
>> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: prune
>> > sis=org.apache.lucene.index.SegmentInfos [at] e91b1 minGen=0 packetCount=0
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: clearFlushPending
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: now merge
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS:   index:
>> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
>> > _1a(3.5):c10 _19(3.5):c10
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS:   no more
>> > merges pending; now return
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges done
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now call final
>> > commit()
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: start
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: enter lock
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: already
>> > prepared
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
>> > pendingCommit != null
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: wrote
>> > segments file "segments_18"
>> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now checkpoint
>> > "segments_18" [44 segments ; isCommit = true]
>> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: deleteCommits: now
>> > decRef commit "segments_17"
>> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "segments_17"
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: done
>> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: at close:
>> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
>> > _1a(3.5):c10 _19(3.5):c10
>> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1b.cfs"
>> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_18.cfs"
>> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_19.cfs"
>> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1a.cfs"
>> >
>> > Kind regards,
>> > Bogguard
>> >
>> > On Thu, Mar 15, 2012 at 1:21 PM, Michael McCandless <
>> > lucene [at] mikemccandless> wrote:
>> >
>> >> Hmm, that's odd...
>> >>
>> >> Can you set IndexWriter's infoStream and then capture the output while
>> >> doing the small writes every few seconds and post back?
>> >>
>> >> If you run CheckIndex on the index does it also report ~3000 segments?
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert <bogguard [at] gmail>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > We migrated our project to Lucene 3.5 and used the TieredMergePolicy
>> with
>> >> > all its defaults.
>> >> > Our project contains a relatively small index (+-90MB) and gets small
>> >> > writes every few seconds.
>> >> > The behaviour we see is that when we reindex everything works great
>> and
>> >> > performant,
>> >> > but when we keep the index running for some days we see a lot
>> (+-3000) of
>> >> > very small .cfs files (between 1KB and 10KB) are being generated and
>> the
>> >> > performance drops significantly.
>> >> > Reïndexing solves the problem, but this shouldn't be the solution.
>> >> >
>> >> > Is there a way to change the behaviour of the .cfs files. I noticed
>> you
>> >> can
>> >> > fully disable it, but I read that this isn't adviced.
>> >> >
>> >> > Kind regards,
>> >> > Bogguard
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


bogguard at gmail

Mar 15, 2012, 9:02 AM

Post #7 of 11 (723 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

Mike,

while removing the prepareCommit we noticed we didn't actually called the
IW.commit() method before the IW.close().
Altough the documentation says the close() method commits all the changes
we tried to add the commit() method before the IW.close() and this seems to
fix our problem.
Is this a bug or just a misunderstanding of the documentation and you
always have to call the commit() method first?

Thanks for your help,
Tim


On Thu, Mar 15, 2012 at 4:20 PM, Michael McCandless <
lucene [at] mikemccandless> wrote:

> Hmm are you calling IW.prepareCommit() before calling IW.close()?
>
> Can you try removing the call the IW.prepareCommit() and see if the
> segments are then contained...?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Mar 15, 2012 at 11:14 AM, Tim Bogaert <bogguard [at] gmail> wrote:
> > Operating system is Solaris. Directory is a NAS.
> > Directory implementation is SimpleFSDirectory.
> > I sent you the full log.
> >
> > Thanks,
> > Tim
> >
> > On Thu, Mar 15, 2012 at 4:04 PM, Michael McCandless <
> > lucene [at] mikemccandless> wrote:
> >
> >> Hmm, which OS/filesystem is the index on? Which Directory impl are
> >> you using...?
> >>
> >> Indeed CheckIndex sees the 5111 segments, so something seems amiss.
> >>
> >> The log was edited right? EG I can't see the IW open log entries...
> >> and IW 51 is missing (it goes straight from 50 to 52). Can you post
> >> the full log? Also, can you attach it (instead of inlining it), maybe
> >> gzip'd, so it doesn't get line-wrapped?
> >>
> >> Finally, can you run things a big longer, so we can see the
> >> pathological behavior emerge?
> >>
> >> The only odd thing I see in the log is that IW seems to be attempting
> >> to delete files (eg _1a.cfs) that are still "live", which is odd.
> >>
> >> That the delete did not succeed could just be that the file is open
> >> from a reader (do you have a reader open on the index?)... though, I
> >> would have expected to see "unable to remove file..." in the
> >> InfoStream output.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Thu, Mar 15, 2012 at 10:30 AM, Tim Bogaert <bogguard [at] gmail>
> wrote:
> >> > Hi Mike,
> >> >
> >> > Segment format: FORMAT_3_1 [Lucene 3.1+]
> >> > NumBadSegments: 0
> >> > NumSegments: 5111
> >> > Clean?: true
> >> > TotLoseDocCount: 0
> >> > Partial: false
> >> >
> >> >
> >> > I reindexed and let the feed come in again...The logs of the result
> are
> >> > below (I just copied a few) after 10 minutes :
> >> > Segment format: FORMAT_3_1 [Lucene 3.1+]
> >> > NumBadSegments: 0
> >> > NumSegments: 11
> >> > Clean?: true
> >> > TotLoseDocCount: 0
> >> > Partial: false
> >> >
> >> > What I noticed in the log was that _1a.cfs is getting deleted, but it
> >> seems
> >> > to be still in the directory :
> >> >
> >> > ls -al
> >> > -rw-r--r-- 1 web web 1444172 Mar 15 15:13 _17.fdt
> >> > -rw-r--r-- 1 web web 88804 Mar 15 15:13 _17.fdx
> >> > -rw-r--r-- 1 web web 142 Mar 15 15:13 _17.fnm
> >> > -rw-r--r-- 1 web web 4352933 Mar 15 15:13 _17.frq
> >> > -rw-r--r-- 1 web web 177604 Mar 15 15:13 _17.nrm
> >> > -rw-r--r-- 1 web web 5931105 Mar 15 15:13 _17.prx
> >> > -rw-r--r-- 1 web web 18159 Mar 15 15:13 _17.tii
> >> > -rw-r--r-- 1 web web 1421338 Mar 15 15:13 _17.tis
> >> > -rw-r--r-- 1 web web 36 Mar 15 15:14 _17_1.del
> >> > -rw-r--r-- 1 web web 379400 Mar 15 15:13 _18.cfs
> >> > -rw-r--r-- 1 web web 424376 Mar 15 15:13 _19.cfs
> >> > -rw-r--r-- 1 web web 463177 Mar 15 15:13 _1a.cfs
> >> > -rw-r--r-- 1 web web 108875 Mar 15 15:13 _1b.cfs
> >> > -rw-r--r-- 1 web web 4481 Mar 15 15:14 _1c.cfs
> >> > -rw-r--r-- 1 web web 4465 Mar 15 15:15 _1d.cfs
> >> > -rw-r--r-- 1 web web 4534 Mar 15 15:17 _1e.cfs
> >> > -rw-r--r-- 1 web web 3252 Mar 15 15:18 _1f.cfs
> >> > -rw-r--r-- 1 web web 3314 Mar 15 15:21 _1g.cfs
> >> > -rw-r--r-- 1 web web 2340 Mar 15 15:25 _1h.cfs
> >> > -rw-r--r-- 1 web web 3902 Mar 15 15:27 _1i.cfs
> >> > -rw-r--r-- 1 web web 1687 Mar 15 15:28 _1j.cfs
> >> > -rw-r--r-- 1 web web 20 Mar 15 15:28 segments.gen
> >> > -rw-r--r-- 1 web web 2735 Mar 15 15:28 segments_1h
> >> >
> >> >
> >> > Here is (part of) the log :
> >> >
> >> > 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > seg=_h(3.5):c1 size=0.007 MB [merging] [floored]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > seg=_b(3.5):c1 size=0.007 MB [merging] [floored]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > seg=_g(3.5):c1 size=0.006 MB [merging] [floored]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > seg=_c(3.5):c1 size=0.006 MB [merging] [floored]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > seg=_11(3.5):c1 size=0.005 MB [merging] [floored]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > seg=_12(3.5):c1 size=0.005 MB [merging] [floored]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > seg=_6(3.5):c1 size=0.005 MB [merging] [floored]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> >> > allowedSegmentCount=1 vs count=15 (eligible count=5) tooBigCount=0
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: merge
> time
> >> 46
> >> > msec for 10 docs
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now flush at
> close
> >> > waitForMerges=true
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS:
> >> merge
> >> > thread: done
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now trigger
> flush
> >> > reason=explicit flush
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS: set
> >> > priority of merge thread Lucene Merge Thread #3 to 6
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: start flush:
> >> > applyAllDeletes=true
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: index before
> >> flush
> >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> >> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> >> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: DW: flush: no
> >> docs;
> >> > skipping
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: apply all
> deletes
> >> > during flush
> >> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: applyDeletes:
> no
> >> > deletes; skipping
> >> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: prune
> >> > sis=org.apache.lucene.index.SegmentInfos [at] 1cad1e minGen=0
> packetCount=0
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:
> clearFlushPending
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: now merge
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: index:
> >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> >> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> >> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: no more
> >> > merges pending; now return
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge
> store
> >> > matchedCount=10 vs 10
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merged
> >> > segment size=0.076 MB vs estimate=0.103 MB
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> >> commitMerge:
> >> > _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1 _g(3.5):c1
> >> > _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
> index=_0(3.5):C73
> >> > _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> _18(3.5):c10
> >> > _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1 _h(3.5):c1
> >> > _11(3.5):c1 _12(3.5):c1
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> >> > commitMergeDeletes _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1
> >> _b(3.5):c1
> >> > _g(3.5):c1 _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: after
> >> commit:
> >> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
> >> _17(3.5):c10
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: now
> >> checkpoint
> >> > "segments_15" [6 segments ; isCommit = false]
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge
> time
> >> 61
> >> > msec for 38 docs
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges
> done
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: CMS:
> >> merge
> >> > thread: done
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now call final
> >> > commit()
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: start
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: enter
> lock
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: already
> >> > prepared
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> >> > pendingCommit != null
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: wrote
> >> > segments file "segments_16"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now checkpoint
> >> > "segments_16" [42 segments ; isCommit = true]
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: deleteCommits:
> now
> >> > decRef commit "segments_15"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
> "segments_15"
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: done
> >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: at close:
> >> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
> >> _17(3.5):c10
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tii"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tis"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.nrm"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fnm"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdt"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.prx"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_16.cfs"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdx"
> >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.frq"
> >> > Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: applyDeletes: no
> deletes;
> >> > skipping
> >> > BD 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: prune
> >> > sis=org.apache.lucene.index.SegmentInfos [at] 59d9f minGen=0
> packetCount=0
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]:
> clearFlushPending
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: now merge
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: index:
> >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
> >> _19(3.5):c10
> >> > _18(3.5):c10
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: no more
> >> > merges pending; now return
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges
> done
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now call final
> >> > commit()
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: start
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: enter
> lock
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: already
> >> > prepared
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> >> > pendingCommit != null
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: wrote
> >> > segments file "segments_17"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now checkpoint
> >> > "segments_17" [43 segments ; isCommit = true]
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: deleteCommits:
> now
> >> > decRef commit "segments_16"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
> "segments_16"
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: done
> >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: at close:
> >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
> >> _19(3.5):c10
> >> > _18(3.5):c10
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdx"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tis"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.frq"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tii"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fnm"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdt"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.nrm"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.prx"
> >> >
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #3]: CMS:
> >> merge
> >> > thread: done
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #0]: CMS: set
> >> > priority of merge thread Lucene Merge Thread #3 to 6
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now flush at
> close
> >> > waitForMerges=true
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now trigger
> flush
> >> > reason=explicit flush
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: start flush:
> >> > applyAllDeletes=true
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: index before
> >> flush
> >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10
> _1b(3.5):c10
> >> > _1a(3.5):c10 _19(3.5):c10
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: DW: flush: no
> >> docs;
> >> > skipping
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: apply all
> deletes
> >> > during flush
> >> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: applyDeletes:
> no
> >> > deletes; skipping
> >> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: prune
> >> > sis=org.apache.lucene.index.SegmentInfos [at] e91b1 minGen=0
> packetCount=0
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:
> clearFlushPending
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: now merge
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: index:
> >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10
> _1b(3.5):c10
> >> > _1a(3.5):c10 _19(3.5):c10
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: no more
> >> > merges pending; now return
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges
> done
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now call final
> >> > commit()
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: start
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: enter
> lock
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: already
> >> > prepared
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> >> > pendingCommit != null
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: wrote
> >> > segments file "segments_18"
> >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now checkpoint
> >> > "segments_18" [44 segments ; isCommit = true]
> >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: deleteCommits:
> now
> >> > decRef commit "segments_17"
> >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete
> "segments_17"
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: done
> >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: at close:
> >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10
> _1b(3.5):c10
> >> > _1a(3.5):c10 _19(3.5):c10
> >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1b.cfs"
> >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1a.cfs"
> >> >
> >> > Kind regards,
> >> > Bogguard
> >> >
> >> > On Thu, Mar 15, 2012 at 1:21 PM, Michael McCandless <
> >> > lucene [at] mikemccandless> wrote:
> >> >
> >> >> Hmm, that's odd...
> >> >>
> >> >> Can you set IndexWriter's infoStream and then capture the output
> while
> >> >> doing the small writes every few seconds and post back?
> >> >>
> >> >> If you run CheckIndex on the index does it also report ~3000
> segments?
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >> On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert <bogguard [at] gmail>
> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > We migrated our project to Lucene 3.5 and used the
> TieredMergePolicy
> >> with
> >> >> > all its defaults.
> >> >> > Our project contains a relatively small index (+-90MB) and gets
> small
> >> >> > writes every few seconds.
> >> >> > The behaviour we see is that when we reindex everything works great
> >> and
> >> >> > performant,
> >> >> > but when we keep the index running for some days we see a lot
> >> (+-3000) of
> >> >> > very small .cfs files (between 1KB and 10KB) are being generated
> and
> >> the
> >> >> > performance drops significantly.
> >> >> > Reïndexing solves the problem, but this shouldn't be the solution.
> >> >> >
> >> >> > Is there a way to change the behaviour of the .cfs files. I noticed
> >> you
> >> >> can
> >> >> > fully disable it, but I read that this isn't adviced.
> >> >> >
> >> >> > Kind regards,
> >> >> > Bogguard
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >> For additional commands, e-mail: java-user-help [at] lucene
> >> >>
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


uwe at thetaphi

Mar 15, 2012, 9:26 AM

Post #8 of 11 (733 views)
Permalink
RE: lots of .cfs (compound files) in the index directory [In reply to]

Hi,

It will always commit before closing! But the result depends how you call
close... but depending on the boolean parameter, the background merges can
be interrupted, so it closes as soon as possible, leaving pending merges
undone.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: Tim Bogaert [mailto:bogguard [at] gmail]
> Sent: Thursday, March 15, 2012 5:02 PM
> To: java-user [at] lucene
> Subject: Re: lots of .cfs (compound files) in the index directory
>
> Mike,
>
> while removing the prepareCommit we noticed we didn't actually called the
> IW.commit() method before the IW.close().
> Altough the documentation says the close() method commits all the changes
we
> tried to add the commit() method before the IW.close() and this seems to
fix
> our problem.
> Is this a bug or just a misunderstanding of the documentation and you
always
> have to call the commit() method first?
>
> Thanks for your help,
> Tim
>
>
> On Thu, Mar 15, 2012 at 4:20 PM, Michael McCandless <
> lucene [at] mikemccandless> wrote:
>
> > Hmm are you calling IW.prepareCommit() before calling IW.close()?
> >
> > Can you try removing the call the IW.prepareCommit() and see if the
> > segments are then contained...?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Thu, Mar 15, 2012 at 11:14 AM, Tim Bogaert <bogguard [at] gmail>
> wrote:
> > > Operating system is Solaris. Directory is a NAS.
> > > Directory implementation is SimpleFSDirectory.
> > > I sent you the full log.
> > >
> > > Thanks,
> > > Tim
> > >
> > > On Thu, Mar 15, 2012 at 4:04 PM, Michael McCandless <
> > > lucene [at] mikemccandless> wrote:
> > >
> > >> Hmm, which OS/filesystem is the index on? Which Directory impl are
> > >> you using...?
> > >>
> > >> Indeed CheckIndex sees the 5111 segments, so something seems amiss.
> > >>
> > >> The log was edited right? EG I can't see the IW open log entries...
> > >> and IW 51 is missing (it goes straight from 50 to 52). Can you
> > >> post the full log? Also, can you attach it (instead of inlining
> > >> it), maybe gzip'd, so it doesn't get line-wrapped?
> > >>
> > >> Finally, can you run things a big longer, so we can see the
> > >> pathological behavior emerge?
> > >>
> > >> The only odd thing I see in the log is that IW seems to be
> > >> attempting to delete files (eg _1a.cfs) that are still "live", which
is odd.
> > >>
> > >> That the delete did not succeed could just be that the file is open
> > >> from a reader (do you have a reader open on the index?)... though,
> > >> I would have expected to see "unable to remove file..." in the
> > >> InfoStream output.
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >>
> > >> On Thu, Mar 15, 2012 at 10:30 AM, Tim Bogaert <bogguard [at] gmail>
> > wrote:
> > >> > Hi Mike,
> > >> >
> > >> > Segment format: FORMAT_3_1 [Lucene 3.1+]
> > >> > NumBadSegments: 0
> > >> > NumSegments: 5111
> > >> > Clean?: true
> > >> > TotLoseDocCount: 0
> > >> > Partial: false
> > >> >
> > >> >
> > >> > I reindexed and let the feed come in again...The logs of the
> > >> > result
> > are
> > >> > below (I just copied a few) after 10 minutes :
> > >> > Segment format: FORMAT_3_1 [Lucene 3.1+]
> > >> > NumBadSegments: 0
> > >> > NumSegments: 11
> > >> > Clean?: true
> > >> > TotLoseDocCount: 0
> > >> > Partial: false
> > >> >
> > >> > What I noticed in the log was that _1a.cfs is getting deleted,
> > >> > but it
> > >> seems
> > >> > to be still in the directory :
> > >> >
> > >> > ls -al
> > >> > -rw-r--r-- 1 web web 1444172 Mar 15 15:13 _17.fdt
> > >> > -rw-r--r-- 1 web web 88804 Mar 15 15:13 _17.fdx
> > >> > -rw-r--r-- 1 web web 142 Mar 15 15:13 _17.fnm
> > >> > -rw-r--r-- 1 web web 4352933 Mar 15 15:13 _17.frq
> > >> > -rw-r--r-- 1 web web 177604 Mar 15 15:13 _17.nrm
> > >> > -rw-r--r-- 1 web web 5931105 Mar 15 15:13 _17.prx
> > >> > -rw-r--r-- 1 web web 18159 Mar 15 15:13 _17.tii
> > >> > -rw-r--r-- 1 web web 1421338 Mar 15 15:13 _17.tis
> > >> > -rw-r--r-- 1 web web 36 Mar 15 15:14 _17_1.del
> > >> > -rw-r--r-- 1 web web 379400 Mar 15 15:13 _18.cfs
> > >> > -rw-r--r-- 1 web web 424376 Mar 15 15:13 _19.cfs
> > >> > -rw-r--r-- 1 web web 463177 Mar 15 15:13 _1a.cfs
> > >> > -rw-r--r-- 1 web web 108875 Mar 15 15:13 _1b.cfs
> > >> > -rw-r--r-- 1 web web 4481 Mar 15 15:14 _1c.cfs
> > >> > -rw-r--r-- 1 web web 4465 Mar 15 15:15 _1d.cfs
> > >> > -rw-r--r-- 1 web web 4534 Mar 15 15:17 _1e.cfs
> > >> > -rw-r--r-- 1 web web 3252 Mar 15 15:18 _1f.cfs
> > >> > -rw-r--r-- 1 web web 3314 Mar 15 15:21 _1g.cfs
> > >> > -rw-r--r-- 1 web web 2340 Mar 15 15:25 _1h.cfs
> > >> > -rw-r--r-- 1 web web 3902 Mar 15 15:27 _1i.cfs
> > >> > -rw-r--r-- 1 web web 1687 Mar 15 15:28 _1j.cfs
> > >> > -rw-r--r-- 1 web web 20 Mar 15 15:28 segments.gen
> > >> > -rw-r--r-- 1 web web 2735 Mar 15 15:28 segments_1h
> > >> >
> > >> >
> > >> > Here is (part of) the log :
> > >> >
> > >> > 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > seg=_h(3.5):c1 size=0.007 MB [merging] [floored] IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > seg=_b(3.5):c1 size=0.007 MB [merging] [floored] IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > seg=_g(3.5):c1 size=0.006 MB [merging] [floored] IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > seg=_c(3.5):c1 size=0.006 MB [merging] [floored] IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > seg=_11(3.5):c1 size=0.005 MB [merging] [floored] IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > seg=_12(3.5):c1 size=0.005 MB [merging] [floored] IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > seg=_6(3.5):c1 size=0.005 MB [merging] [floored] IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> > >> > allowedSegmentCount=1 vs count=15 (eligible count=5)
> > >> > tooBigCount=0 IW 50 [.Thu Mar 15 15:25:23 MET 2012; Lucene Merge
> > >> > Thread #2]: merge
> > time
> > >> 46
> > >> > msec for 10 docs
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now flush
> > >> > at
> > close
> > >> > waitForMerges=true
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS:
> > >> merge
> > >> > thread: done
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now
> > >> > trigger
> > flush
> > >> > reason=explicit flush
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]:
> > >> > CMS: set priority of merge thread Lucene Merge Thread #3 to 6
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: start
flush:
> > >> > applyAllDeletes=true
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: index
before
> > >> flush
> > >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10
> > >> > _6(3.5):c1
> > >> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1
> > >> > _g(3.5):c1
> > >> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1 IW 50 [.Thu Mar 15 15:25:23 MET
> > >> > 2012; pool-2-thread-1]: DW: flush: no
> > >> docs;
> > >> > skipping
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: apply all
> > deletes
> > >> > during flush
> > >> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:
applyDeletes:
> > no
> > >> > deletes; skipping
> > >> > BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: prune
> > >> > sis=org.apache.lucene.index.SegmentInfos [at] 1cad1e minGen=0
> > packetCount=0
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:
> > clearFlushPending
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: now
> merge
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS:
index:
> > >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10
> > >> > _6(3.5):c1
> > >> > _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1
> > >> > _g(3.5):c1
> > >> > _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: no
> more
> > >> > merges pending; now return
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:
> > >> > waitForMerges IW 50 [.Thu Mar 15 15:25:23 MET 2012; Lucene Merge
> > >> > Thread #3]: merge
> > store
> > >> > matchedCount=10 vs 10
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> > >> > merged segment size=0.076 MB vs estimate=0.103 MB IW 50 [.Thu Mar
> > >> > 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> > >> commitMerge:
> > >> > _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1
> > >> > _g(3.5):c1
> > >> > _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
> > index=_0(3.5):C73
> > >> > _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> > _18(3.5):c10
> > >> > _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> > >> > _h(3.5):c1
> > >> > _11(3.5):c1 _12(3.5):c1
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> > >> > commitMergeDeletes _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1
> > >> _b(3.5):c1
> > >> > _g(3.5):c1 _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> > >> > after
> > >> commit:
> > >> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
> > >> _17(3.5):c10
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: now
> > >> checkpoint
> > >> > "segments_15" [6 segments ; isCommit = false] IW 50 [.Thu Mar 15
> > >> > 15:25:23 MET 2012; Lucene Merge Thread #3]: merge
> > time
> > >> 61
> > >> > msec for 38 docs
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:
> > >> > waitForMerges
> > done
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: CMS:
> > >> merge
> > >> > thread: done
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now call
> > >> > final
> > >> > commit()
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> > >> > start IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:
> > >> > commit: enter
> > lock
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> > >> > already prepared IW 50 [.Thu Mar 15 15:25:23 MET 2012;
> > >> > pool-2-thread-1]: commit:
> > >> > pendingCommit != null
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> > >> > wrote segments file "segments_16"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now
> > >> > checkpoint "segments_16" [42 segments ; isCommit = true] IFD [.Thu
> > >> > Mar 15 15:25:23 MET 2012; pool-2-thread-1]: deleteCommits:
> > now
> > >> > decRef commit "segments_15"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
> > "segments_15"
> > >> > IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> > >> > done IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: at
close:
> > >> > _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10
> > >> _17(3.5):c10
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.tii"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.tis"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_18.cfs"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.nrm"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.fnm"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.fdt"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_17.cfs"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.prx"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_16.cfs"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.fdx"
> > >> > IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete
"_19.frq"
> > >> > Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: applyDeletes: no
> > deletes;
> > >> > skipping
> > >> > BD 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: prune
> > >> > sis=org.apache.lucene.index.SegmentInfos [at] 59d9f minGen=0
> > packetCount=0
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]:
> > clearFlushPending
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: now
> merge
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS:
index:
> > >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
> > >> _19(3.5):c10
> > >> > _18(3.5):c10
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: no
> more
> > >> > merges pending; now return
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]:
> > >> > waitForMerges IW 52 [.Thu Mar 15 15:25:27 MET 2012;
> > >> > pool-2-thread-1]: waitForMerges
> > done
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now call
> > >> > final
> > >> > commit()
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> > >> > start IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]:
> > >> > commit: enter
> > lock
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> > >> > already prepared IW 52 [.Thu Mar 15 15:25:27 MET 2012;
> > >> > pool-2-thread-1]: commit:
> > >> > pendingCommit != null
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> > >> > wrote segments file "segments_17"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now
> > >> > checkpoint "segments_17" [43 segments ; isCommit = true] IFD [.Thu
> > >> > Mar 15 15:25:27 MET 2012; pool-2-thread-1]: deleteCommits:
> > now
> > >> > decRef commit "segments_16"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
> > "segments_16"
> > >> > IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> > >> > done IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: at
close:
> > >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10
> > >> _19(3.5):c10
> > >> > _18(3.5):c10
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.fdx"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.tis"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.frq"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.tii"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.fnm"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.fdt"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_18.cfs"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.nrm"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_19.cfs"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_17.cfs"
> > >> > IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete
"_1a.prx"
> > >> >
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #3]: CMS:
> > >> merge
> > >> > thread: done
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #0]:
> > >> > CMS: set priority of merge thread Lucene Merge Thread #3 to 6 IW
> > >> > 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now flush at
> > close
> > >> > waitForMerges=true
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now
> > >> > trigger
> > flush
> > >> > reason=explicit flush
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: start
flush:
> > >> > applyAllDeletes=true
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: index
before
> > >> flush
> > >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10
> > _1b(3.5):c10
> > >> > _1a(3.5):c10 _19(3.5):c10
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: DW: flush:
> > >> > no
> > >> docs;
> > >> > skipping
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: apply all
> > deletes
> > >> > during flush
> > >> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:
applyDeletes:
> > no
> > >> > deletes; skipping
> > >> > BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: prune
> > >> > sis=org.apache.lucene.index.SegmentInfos [at] e91b1 minGen=0
> > packetCount=0
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:
> > clearFlushPending
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: now
> merge
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS:
index:
> > >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10
> > _1b(3.5):c10
> > >> > _1a(3.5):c10 _19(3.5):c10
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: no
> more
> > >> > merges pending; now return
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:
> > >> > waitForMerges IW 53 [.Thu Mar 15 15:25:38 MET 2012;
> > >> > pool-2-thread-1]: waitForMerges
> > done
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now call
> > >> > final
> > >> > commit()
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> > >> > start IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:
> > >> > commit: enter
> > lock
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> > >> > already prepared IW 53 [.Thu Mar 15 15:25:38 MET 2012;
> > >> > pool-2-thread-1]: commit:
> > >> > pendingCommit != null
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> > >> > wrote segments file "segments_18"
> > >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now
> > >> > checkpoint "segments_18" [44 segments ; isCommit = true] IFD [.Thu
> > >> > Mar 15 15:25:38 MET 2012; pool-2-thread-1]: deleteCommits:
> > now
> > >> > decRef commit "segments_17"
> > >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete
> > "segments_17"
> > >> > IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> > >> > done IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: at
close:
> > >> > _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10
> > _1b(3.5):c10
> > >> > _1a(3.5):c10 _19(3.5):c10
> > >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete
"_1b.cfs"
> > >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete
"_18.cfs"
> > >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete
"_19.cfs"
> > >> > IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete
"_1a.cfs"
> > >> >
> > >> > Kind regards,
> > >> > Bogguard
> > >> >
> > >> > On Thu, Mar 15, 2012 at 1:21 PM, Michael McCandless <
> > >> > lucene [at] mikemccandless> wrote:
> > >> >
> > >> >> Hmm, that's odd...
> > >> >>
> > >> >> Can you set IndexWriter's infoStream and then capture the output
> > while
> > >> >> doing the small writes every few seconds and post back?
> > >> >>
> > >> >> If you run CheckIndex on the index does it also report ~3000
> > segments?
> > >> >>
> > >> >> Mike McCandless
> > >> >>
> > >> >> http://blog.mikemccandless.com
> > >> >>
> > >> >> On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert
> > >> >> <bogguard [at] gmail>
> > >> wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > We migrated our project to Lucene 3.5 and used the
> > TieredMergePolicy
> > >> with
> > >> >> > all its defaults.
> > >> >> > Our project contains a relatively small index (+-90MB) and
> > >> >> > gets
> > small
> > >> >> > writes every few seconds.
> > >> >> > The behaviour we see is that when we reindex everything works
> > >> >> > great
> > >> and
> > >> >> > performant,
> > >> >> > but when we keep the index running for some days we see a lot
> > >> (+-3000) of
> > >> >> > very small .cfs files (between 1KB and 10KB) are being
> > >> >> > generated
> > and
> > >> the
> > >> >> > performance drops significantly.
> > >> >> > Reïndexing solves the problem, but this shouldn't be the
solution.
> > >> >> >
> > >> >> > Is there a way to change the behaviour of the .cfs files. I
> > >> >> > noticed
> > >> you
> > >> >> can
> > >> >> > fully disable it, but I read that this isn't adviced.
> > >> >> >
> > >> >> > Kind regards,
> > >> >> > Bogguard
> > >> >>
> > >> >> ----------------------------------------------------------------
> > >> >> ----- To unsubscribe, e-mail:
> > >> >> java-user-unsubscribe [at] lucene
> > >> >> For additional commands, e-mail:
> > >> >> java-user-help [at] lucene
> > >> >>
> > >> >>
> > >>
> > >> -------------------------------------------------------------------
> > >> -- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > >> For additional commands, e-mail: java-user-help [at] lucene
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Mar 15, 2012, 9:28 AM

Post #9 of 11 (734 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

On Thu, Mar 15, 2012 at 12:02 PM, Tim Bogaert <bogguard [at] gmail> wrote:

> while removing the prepareCommit we noticed we didn't actually called the
> IW.commit() method before the IW.close().
> Altough the documentation says the close() method commits all the changes
> we tried to add the commit() method before the IW.close() and this seems to
> fix our problem.

OK, thanks for bringing closure.

> Is this a bug or just a misunderstanding of the documentation and you
> always have to call the commit() method first?

You're actually always supposed to call commit() (or, rollback())
after prepareCommit.

But it's a nasty trap that close() silently drops changes.

I've opened https://issues.apache.org/jira/browse/LUCENE-3872 to fix
that trap...

Thanks for raising this!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Mar 15, 2012, 9:33 AM

Post #10 of 11 (731 views)
Permalink
RE: lots of .cfs (compound files) in the index directory [In reply to]

Close calls and always did call commit in 3.x?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: Michael McCandless [mailto:lucene [at] mikemccandless]
> Sent: Thursday, March 15, 2012 5:29 PM
> To: java-user [at] lucene
> Subject: Re: lots of .cfs (compound files) in the index directory
>
> On Thu, Mar 15, 2012 at 12:02 PM, Tim Bogaert <bogguard [at] gmail>
> wrote:
>
> > while removing the prepareCommit we noticed we didn't actually called
> > the
> > IW.commit() method before the IW.close().
> > Altough the documentation says the close() method commits all the
> > changes we tried to add the commit() method before the IW.close() and
> > this seems to fix our problem.
>
> OK, thanks for bringing closure.
>
> > Is this a bug or just a misunderstanding of the documentation and you
> > always have to call the commit() method first?
>
> You're actually always supposed to call commit() (or, rollback()) after
> prepareCommit.
>
> But it's a nasty trap that close() silently drops changes.
>
> I've opened https://issues.apache.org/jira/browse/LUCENE-3872 to fix that
> trap...
>
> Thanks for raising this!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Mar 15, 2012, 10:06 AM

Post #11 of 11 (719 views)
Permalink
Re: lots of .cfs (compound files) in the index directory [In reply to]

On Thu, Mar 15, 2012 at 12:33 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> Close calls and always did call commit in 3.x?

Right, it does.

But in the case when prepareCommit was called... it then only commits
the changes as of that prepareCommit and *not* any changes done after
that and before close. That's the bug/trap.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.