Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Question on ElisionFilter with d'

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


yamo93 at gmail

Jul 25, 2012, 6:01 AM

Post #1 of 7 (226 views)
Permalink
Question on ElisionFilter with d'

Hello,

I'm using ElisionFilter to index french text.
The filter works but ignore the d letter followed by an apostrophe
(example: d'une).

Is-it an expected behaviour or is it an issue ?

Regards,
Yann.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Jul 25, 2012, 7:36 AM

Post #2 of 7 (222 views)
Permalink
Re: Question on ElisionFilter with d' [In reply to]

I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French)

In written French, elision (both phonetic and orthographic) is
obligatory for the following words:
...

the preposition de
...
Le père d'Albert vient d'arriver.



So surely the removal of d' is correct.


--
Ian.


On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <yamo93 [at] gmail> wrote:
> Hello,
>
> I'm using ElisionFilter to index french text.
> The filter works but ignore the d letter followed by an apostrophe (example:
> d'une).
>
> Is-it an expected behaviour or is it an issue ?
>
> Regards,
> Yann.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


yamo93 at gmail

Jul 25, 2012, 7:56 AM

Post #3 of 7 (221 views)
Permalink
Re: Question on ElisionFilter with d' [In reply to]

Thanks for replying,

The problem is that the filter don't remove d' (and c' too).
Shall i open an issue on jira ?

On 07/25/2012 04:36 PM, Ian Lea wrote:
> I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French)
>
> In written French, elision (both phonetic and orthographic) is
> obligatory for the following words:
> ...
>
> the preposition de
> ...
> Le père d'Albert vient d'arriver.
>
>
>
> So surely the removal of d' is correct.
>
>
> --
> Ian.
>
>
> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <yamo93 [at] gmail> wrote:
>> Hello,
>>
>> I'm using ElisionFilter to index french text.
>> The filter works but ignore the d letter followed by an apostrophe (example:
>> d'une).
>>
>> Is-it an expected behaviour or is it an issue ?
>>
>> Regards,
>> Yann.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Jul 25, 2012, 8:05 AM

Post #4 of 7 (220 views)
Permalink
Re: Question on ElisionFilter with d' [In reply to]

Ah, OK. I thought you were saying it was removing d' when you thought
it shouldn't. Sounds like a bug to me but I don't know enough about
it to express a strong opinion.


--
Ian.


On Wed, Jul 25, 2012 at 3:56 PM, yamo93 <yamo93 [at] gmail> wrote:
> Thanks for replying,
>
> The problem is that the filter don't remove d' (and c' too).
> Shall i open an issue on jira ?
>
>
> On 07/25/2012 04:36 PM, Ian Lea wrote:
>>
>> I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French)
>>
>> In written French, elision (both phonetic and orthographic) is
>> obligatory for the following words:
>> ...
>>
>> the preposition de
>> ...
>> Le père d'Albert vient d'arriver.
>>
>>
>>
>> So surely the removal of d' is correct.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <yamo93 [at] gmail> wrote:
>>>
>>> Hello,
>>>
>>> I'm using ElisionFilter to index french text.
>>> The filter works but ignore the d letter followed by an apostrophe
>>> (example:
>>> d'une).
>>>
>>> Is-it an expected behaviour or is it an issue ?
>>>
>>> Regards,
>>> Yann.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jack at basetechnology

Jul 25, 2012, 6:52 PM

Post #5 of 7 (214 views)
Permalink
Re: Question on ElisionFilter with d' [In reply to]

The filter should work (remove the letter and apostrophe).

Could you supply an exact code fragment that shows the literal term, the
code invoking the filter, and the exact literal output?

And, which release of Lucene?

-- Jack Krupansky

-----Original Message-----
From: yamo93
Sent: Wednesday, July 25, 2012 9:56 AM
To: java-user [at] lucene
Subject: Re: Question on ElisionFilter with d'

Thanks for replying,

The problem is that the filter don't remove d' (and c' too).
Shall i open an issue on jira ?

On 07/25/2012 04:36 PM, Ian Lea wrote:
> I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French)
>
> In written French, elision (both phonetic and orthographic) is
> obligatory for the following words:
> ...
>
> the preposition de
> ...
> Le père d'Albert vient d'arriver.
>
>
>
> So surely the removal of d' is correct.
>
>
> --
> Ian.
>
>
> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <yamo93 [at] gmail> wrote:
>> Hello,
>>
>> I'm using ElisionFilter to index french text.
>> The filter works but ignore the d letter followed by an apostrophe
>> (example:
>> d'une).
>>
>> Is-it an expected behaviour or is it an issue ?
>>
>> Regards,
>> Yann.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


yamo93 at gmail

Jul 26, 2012, 1:10 AM

Post #6 of 7 (213 views)
Permalink
Re: Question on ElisionFilter with d' [In reply to]

Hi,

Sorry I forgot the most important : i use lucene 3.6.

Here is my code : tokenStream = new ElisionFilter(Version.LUCENE_36,
tokenStream);

I looked at the source code of ElisionFilter, and DEFAULT_ARTICLES
doesn't contain "d" and "c", in order to manage terms like /"d'une/" or
"/c'est"/.

A possible workaround would be to call this constructor
ElisionFilter(Version matchVersion, TokenStream input, Set<?> articles).

But i don't understand why this "d" and "c" are not present in default
articles.

Yann.

On 07/26/2012 03:52 AM, Jack Krupansky wrote:
> The filter should work (remove the letter and apostrophe).
>
> Could you supply an exact code fragment that shows the literal term,
> the code invoking the filter, and the exact literal output?
>
> And, which release of Lucene?
>
> -- Jack Krupansky
>
> -----Original Message----- From: yamo93
> Sent: Wednesday, July 25, 2012 9:56 AM
> To: java-user [at] lucene
> Subject: Re: Question on ElisionFilter with d'
>
> Thanks for replying,
>
> The problem is that the filter don't remove d' (and c' too).
> Shall i open an issue on jira ?
>
> On 07/25/2012 04:36 PM, Ian Lea wrote:
>> I bet it's expected. From http://en.wikipedia.org/wiki/Elision_(French)
>>
>> In written French, elision (both phonetic and orthographic) is
>> obligatory for the following words:
>> ...
>>
>> the preposition de
>> ...
>> Le père d'Albert vient d'arriver.
>>
>>
>>
>> So surely the removal of d' is correct.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <yamo93 [at] gmail> wrote:
>>> Hello,
>>>
>>> I'm using ElisionFilter to index french text.
>>> The filter works but ignore the d letter followed by an apostrophe
>>> (example:
>>> d'une).
>>>
>>> Is-it an expected behaviour or is it an issue ?
>>>
>>> Regards,
>>> Yann.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


rcmuir at gmail

Jul 26, 2012, 6:55 AM

Post #7 of 7 (208 views)
Permalink
Re: Question on ElisionFilter with d' [In reply to]

On Thu, Jul 26, 2012 at 4:10 AM, yamo93 <yamo93 [at] gmail> wrote:
> A possible workaround would be to call this constructor
> ElisionFilter(Version matchVersion, TokenStream input, Set<?> articles).
>

Thats the way, just supply the list you want.

> But i don't understand why this "d" and "c" are not present in default
> articles.
>

Its just historically that was the default list in the file.
This list should really be removed, and moved to FrenchAnalyzer, as
this filter is not only used for french:
https://issues.apache.org/jira/browse/LUCENE-3884

--
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.