tomdzk at gmail
Aug 8, 2005, 12:46 PM
Post #2 of 2
On 8/8/05, Erik Hatcher <erik [at] ehatchersolutions> wrote:
> Thomas - have you had a look at PyLucene and how they do the gcj/SWIG
> wizardry? What kinds of issues did you encounter with gcj? Perhaps
> Andi Vajda from PyLucene could offer some advice?
> I'd rather see the gcj/SWIG approach moving forward so that SWIG
> Lucene doesn't lag behind Java Lucene where all the innovation happens.
Yep, I tried to compile PyLucene on my Mac, but it failed because of
the Python version that comes with Mac OS 10.4 (which is 2.3). To be
fair to PyLucene, I only tried for a couple of hours as I don't really
have an interest in Python, I actually only wanted to see how they use
But aside from that, I tried the PyLucene way first for a whole week.
First the issue of getting to run gcj on Mac OS X which ain't easy at
all - I had to install darwinports with a fresh gcc. Getting gcj to
run over Lucene is easy, works out of the box. But linking ruby with
swig-wrapped gcj-compiled lucene is not, all I got is a gcj internal
compiler error (with both gcc/gcj 3.4.3 and 4.0.1). This bug is in the
gcc bug list marked as a regression.
On Windows I had a similar amount of trouble using both MingW and
cygwin; I wasn't able to compile & link the stuff against ruby.
So to summarize, while there is definitely a strong argument for using
gcj to create other-language bindings from the Java-version, there are
a few issues that IMO make a strong case for CLucene:
* at best gcj is difficult to use; but on Windows & MacOS it is quite
involved and difficult. For me it was nearly impossible as I'm no
* it prevents or at least makes it extremely difficult to create
certain bindings such as COM and C# (perhaps except mono) as MingW is
not easily combined with VisualC++ AFAIK. And I don't think that there
is any chance of debugging such a combination when a problem arises.
* the amount of work necessary to swig-wrap the gcj-compiled Lucene to
a given target language is immense - just have a look at the swig file
of PyLucene and the Makefile to make the magic happen; I think this
must be a nightmare to maintain. I cannot really tell what amount of
work would be necessary for CLucene but since it is a straight C++
library and built with swig in mind, I would be surprised if it is not
a lot less
So from a technical point of view, it is my opinion that a pure C++
version is easier to maintain and evolve right now. I also think that
most of the innovation in Lucene is not Java-specific so while it
would be duplicated implementation work, the algorithms are the same
(or near enough). Also, a pure C++ version of Lucene gives it more
momentum IMO in both the Linux world (mbox_lucene or something similar
comes to mind) and the Microsoft world (.Net etc.)
> As for Lucene4C versus CLucene and moving CLucene to Apache - I'll
> let the c-dev [at] lucen list discuss it. I'm happy to have CLucene at
> Apache too, though it seems simpler for us to only house a single
> implementation in C. The gcj version would be ideal in my mind, but
> I'm also not skilled in gcj (and haven't touched C in decades,
> practically) - so it certainly is up to the actual coders where to go
> with it.
I don't know whether it is a "Lucene4C vs. CLucene" anyway. From what
I understand Lucene4C tries to create a simpler API for Lucene, and
while they are building on top of a gcj-compiled version of Java
Lucene, that is likely not a requirement (I don't think that they want
to expose any of the gcj-generated classes).
Besides, CLucene is quite far so from a practical point of view it
would make sense to use /maintain it. Being the practical guy that I
am, I think that any issues between Lucene4C, PyLucene, CLucene can be
worked out if the developers work together. After all, for all I know
it might even be possible to use a mixture of the Lucene4C API (for
plain C) and the CLucene API (for C++) in front of a gcj-compiled Java
Lucene, and all SWIG wrappers could then be build on top of this API.
At lest technically this is possible and perhaps even feasible.