Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Re: GSoC 2012: Proposal-Wikipedia Corpus Tools (Oren Bochman) (Amir E. Aharoni)(Gregory Varnum)

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


karthikprasad008 at gmail

Apr 4, 2012, 7:31 AM

Post #1 of 2 (128 views)
Permalink
Re: GSoC 2012: Proposal-Wikipedia Corpus Tools (Oren Bochman) (Amir E. Aharoni)(Gregory Varnum)

Dear Sirs,
I am grateful for your valuable feedback and suggestions.

I have updated my proposal based on the inputs given by you. The split-up
of the deliverables on the ideas page indeed helped me understand the
requirements more clearly.

The link to my updated proposal is
https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal

I request you and everyone to kindly skim through my proposal once again
and suggest changes/additions.
I am very excited about this project and working with you; and truth be
told, 23rd April seems like ages ahead.

Thanking you,
Yours sincerely,
Karthik


> Date: Wed, 4 Apr 2012 11:49:41 +0200
> From: "Oren Bochman" <orenbochman [at] gmail>
> To: "'Wikimedia developers'" <wikitech-l [at] lists>
> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
> Message-ID: <007f01cd1248$42ee6f40$c8cb4dc0$@com>
> Content-Type: text/plain; charset="utf-8"
>
> You do understand correctly!
>
> The main idea about NLP components is with POS tagger as an example:
>
> 1. a fall back system that does unsupervised POS tagging.
> 2. the ability to plug in an existing POS tagger as these become
> available for specific languages.
>
> I would as supervisor would recommend working with 3 languages.
> English, Hebrew, and the GSOC native language.
>
> If we could get QA from other native speakers we would incorporate them
> into the workflow.
>
> I think that by using a deletion/reversion based heuristic we may also be
> able to make a spam corpus to boost the accuracy of the corpuses.
>
>
> Operation Manager
> E-mail: oren [at] romai-horizon
> Mobil: +36 30 866 6706
>
>
>
> R?mai Horizon Kft.
> H-1039 Budapest
> Kir?lyok ?tja 291. D. ?p. fszt. 2.
> Tel: +36 1 492 1492
> Fax: +36 1 266 5529
>
> -----Original Message-----
> From: wikitech-l-bounces [at] lists [mailto:
> wikitech-l-bounces [at] lists] On Behalf Of Amir E. Aharoni
> Sent: Tuesday, April 03, 2012 10:19 PM
> To: Wikimedia developers
> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
>
> 2012/4/3 karthik prasad <karthikprasad008 [at] gmail>:
> > Hello,
> > I am a GSoC aspirant and have compiled a proposal for one of the
> > project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I
> > would sincerely appreciate if you could kindly go through it and
> > suggest corrections/additions so that I can settle with a coherent
> proposal.
> >
> > Link to my proposal :
> > https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
>
> Nice, but why only English?
>
> If i understand the proposal correctly, this project is supposed to be
> able to work with almost any language with very little effort.
>
> --
> Amir Elisha Aharoni ? ?????? ????????? ??????????
> http://aharoni.wordpress.com ??We're living in pieces, I want to live in
> peace.? ? T. Moore?
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
>
> ------------------------------
>
>
> Date: Wed, 4 Apr 2012 12:58:11 +0300
> From: "Amir E. Aharoni" <amir.aharoni [at] mail>
> To: Wikimedia developers <wikitech-l [at] lists>
> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
> Message-ID:
> <CACtNa8tS-PifzJS1JsF02k3qW_-7=UK-wDQnVSfLGLufhxnmNw [at] mail
> >
> Content-Type: text/plain; charset=UTF-8
>
> 2012/4/4 Oren Bochman <orenbochman [at] gmail>:
> > You do understand correctly!
> >
> > The main idea about NLP components is with POS tagger as an example:
>
> Just to make sure, POS = part of speech, isn't it?
>
> It's one of the most confusing TLAs in computing :)
>
> > If we could get QA from other native speakers we would incorporate them
> into the workflow.
>
> Good. As long as there is a way to plug other languages and a way for
> speakers of other languages to contribute QA, i'm very happy.
>
> --
> Amir Elisha Aharoni ? ?????? ????????? ??????????
> http://aharoni.wordpress.com
> ??We're living in pieces,
> I want to live in peace.? ? T. Moore?
>


Date: Wed, 4 Apr 2012 00:28:29 -0400
From: Gregory Varnum <gregory.varnum [at] gmail>
To: Wikimedia developers <wikitech-l [at] lists>
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
Message-ID: <AC4C429F-A839-4911-BE9B-C8928AA2DD8C [at] gmail>
Content-Type: text/plain; charset=utf-8

Whoops - I meant that email to be directed to Karthik - although Amir
you're welcome to read it as well. :)

-greg


On Apr 3, 2012, at 11:24 PM, Gregory Varnum <gregory.varnum [at] gmail>
wrote:

> Amir,
>
> Thank you for your GSOC proposal! :)
>
> Between now and Google's submission deadline on April 6th - you are
invited to further modify your proposals. The GSOC page on MW.org -
https://www.mediawiki.org/wiki/GSOC - and our IRC rooms -
https://www.mediawiki.org/wiki/MediaWiki_on_IRC
>
> Looking over your proposal - I think you've got good background
information on yourself. However, I think you should flush out more
details on the proposed project. Without more familiarity with corpus (and
with no links to find that info) - it's hard for everyone to weigh in
equally or to make sure your project gets the full consideration you'd like.
>
> -greg aka varnent
>
>
> On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni <amir.aharoni [at] mail>
wrote:
>
>> 2012/4/3 karthik prasad <karthikprasad008 [at] gmail>:
>>> Hello,
>>> I am a GSoC aspirant and have compiled a proposal for one of the project
>>> ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman]
>>> I would sincerely appreciate if you could kindly go through it and
suggest
>>> corrections/additions so that I can settle with a coherent proposal.
>>>
>>> Link to my proposal :
>>> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
>>
>> Nice, but why only English?
>>
>> If i understand the proposal correctly, this project is supposed to be
>> able to work with almost any language with very little effort.
>>
>> --
>> Amir Elisha Aharoni ? ?????? ????????? ??????????
>> http://aharoni.wordpress.com
>> ??We're living in pieces,
>> I want to live in peace.? ? T. Moore?
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l [at] lists
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gregory.varnum at gmail

Apr 4, 2012, 2:52 PM

Post #2 of 2 (127 views)
Permalink
Re: GSoC 2012: Proposal-Wikipedia Corpus Tools (Oren Bochman) (Amir E. Aharoni)(Gregory Varnum) [In reply to]

This looks much more in-depth and helpful. I think your best next step is to, if you haven't already, connect with potential mentors and indicate who those folks are within your proposal.

-Greg
___________
Sent from my iPad. Apologies for any typos. A more detailed response may be sent later.

On Apr 4, 2012, at 10:31 AM, karthik prasad <karthikprasad008 [at] gmail> wrote:

> Dear Sirs,
> I am grateful for your valuable feedback and suggestions.
>
> I have updated my proposal based on the inputs given by you. The split-up
> of the deliverables on the ideas page indeed helped me understand the
> requirements more clearly.
>
> The link to my updated proposal is
> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
>
> I request you and everyone to kindly skim through my proposal once again
> and suggest changes/additions.
> I am very excited about this project and working with you; and truth be
> told, 23rd April seems like ages ahead.
>
> Thanking you,
> Yours sincerely,
> Karthik
>
>
>> Date: Wed, 4 Apr 2012 11:49:41 +0200
>> From: "Oren Bochman" <orenbochman [at] gmail>
>> To: "'Wikimedia developers'" <wikitech-l [at] lists>
>> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
>> Message-ID: <007f01cd1248$42ee6f40$c8cb4dc0$@com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> You do understand correctly!
>>
>> The main idea about NLP components is with POS tagger as an example:
>>
>> 1. a fall back system that does unsupervised POS tagging.
>> 2. the ability to plug in an existing POS tagger as these become
>> available for specific languages.
>>
>> I would as supervisor would recommend working with 3 languages.
>> English, Hebrew, and the GSOC native language.
>>
>> If we could get QA from other native speakers we would incorporate them
>> into the workflow.
>>
>> I think that by using a deletion/reversion based heuristic we may also be
>> able to make a spam corpus to boost the accuracy of the corpuses.
>>
>>
>> Operation Manager
>> E-mail: oren [at] romai-horizon
>> Mobil: +36 30 866 6706
>>
>>
>>
>> R?mai Horizon Kft.
>> H-1039 Budapest
>> Kir?lyok ?tja 291. D. ?p. fszt. 2.
>> Tel: +36 1 492 1492
>> Fax: +36 1 266 5529
>>
>> -----Original Message-----
>> From: wikitech-l-bounces [at] lists [mailto:
>> wikitech-l-bounces [at] lists] On Behalf Of Amir E. Aharoni
>> Sent: Tuesday, April 03, 2012 10:19 PM
>> To: Wikimedia developers
>> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
>>
>> 2012/4/3 karthik prasad <karthikprasad008 [at] gmail>:
>>> Hello,
>>> I am a GSoC aspirant and have compiled a proposal for one of the
>>> project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I
>>> would sincerely appreciate if you could kindly go through it and
>>> suggest corrections/additions so that I can settle with a coherent
>> proposal.
>>>
>>> Link to my proposal :
>>> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
>>
>> Nice, but why only English?
>>
>> If i understand the proposal correctly, this project is supposed to be
>> able to work with almost any language with very little effort.
>>
>> --
>> Amir Elisha Aharoni ? ?????? ????????? ??????????
>> http://aharoni.wordpress.com ??We're living in pieces, I want to live in
>> peace.? ? T. Moore?
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l [at] lists
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>>
>>
>> ------------------------------
>>
>>
>> Date: Wed, 4 Apr 2012 12:58:11 +0300
>> From: "Amir E. Aharoni" <amir.aharoni [at] mail>
>> To: Wikimedia developers <wikitech-l [at] lists>
>> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
>> Message-ID:
>> <CACtNa8tS-PifzJS1JsF02k3qW_-7=UK-wDQnVSfLGLufhxnmNw [at] mail
>>>
>> Content-Type: text/plain; charset=UTF-8
>>
>> 2012/4/4 Oren Bochman <orenbochman [at] gmail>:
>>> You do understand correctly!
>>>
>>> The main idea about NLP components is with POS tagger as an example:
>>
>> Just to make sure, POS = part of speech, isn't it?
>>
>> It's one of the most confusing TLAs in computing :)
>>
>>> If we could get QA from other native speakers we would incorporate them
>> into the workflow.
>>
>> Good. As long as there is a way to plug other languages and a way for
>> speakers of other languages to contribute QA, i'm very happy.
>>
>> --
>> Amir Elisha Aharoni ? ?????? ????????? ??????????
>> http://aharoni.wordpress.com
>> ??We're living in pieces,
>> I want to live in peace.? ? T. Moore?
>>
>
>
> Date: Wed, 4 Apr 2012 00:28:29 -0400
> From: Gregory Varnum <gregory.varnum [at] gmail>
> To: Wikimedia developers <wikitech-l [at] lists>
> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools
> Message-ID: <AC4C429F-A839-4911-BE9B-C8928AA2DD8C [at] gmail>
> Content-Type: text/plain; charset=utf-8
>
> Whoops - I meant that email to be directed to Karthik - although Amir
> you're welcome to read it as well. :)
>
> -greg
>
>
> On Apr 3, 2012, at 11:24 PM, Gregory Varnum <gregory.varnum [at] gmail>
> wrote:
>
>> Amir,
>>
>> Thank you for your GSOC proposal! :)
>>
>> Between now and Google's submission deadline on April 6th - you are
> invited to further modify your proposals. The GSOC page on MW.org -
> https://www.mediawiki.org/wiki/GSOC - and our IRC rooms -
> https://www.mediawiki.org/wiki/MediaWiki_on_IRC
>>
>> Looking over your proposal - I think you've got good background
> information on yourself. However, I think you should flush out more
> details on the proposed project. Without more familiarity with corpus (and
> with no links to find that info) - it's hard for everyone to weigh in
> equally or to make sure your project gets the full consideration you'd like.
>>
>> -greg aka varnent
>>
>>
>> On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni <amir.aharoni [at] mail>
> wrote:
>>
>>> 2012/4/3 karthik prasad <karthikprasad008 [at] gmail>:
>>>> Hello,
>>>> I am a GSoC aspirant and have compiled a proposal for one of the project
>>>> ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman]
>>>> I would sincerely appreciate if you could kindly go through it and
> suggest
>>>> corrections/additions so that I can settle with a coherent proposal.
>>>>
>>>> Link to my proposal :
>>>> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal
>>>
>>> Nice, but why only English?
>>>
>>> If i understand the proposal correctly, this project is supposed to be
>>> able to work with almost any language with very little effort.
>>>
>>> --
>>> Amir Elisha Aharoni ? ?????? ????????? ??????????
>>> http://aharoni.wordpress.com
>>> ??We're living in pieces,
>>> I want to live in peace.? ? T. Moore?
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> Wikitech-l [at] lists
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.