
gregory.varnum at gmail
Apr 4, 2012, 2:52 PM
Post #2 of 2
(57 views)
Permalink
|
|
Re: GSoC 2012: Proposal-Wikipedia Corpus Tools (Oren Bochman) (Amir E. Aharoni)(Gregory Varnum)
[In reply to]
|
|
This looks much more in-depth and helpful. I think your best next step is to, if you haven't already, connect with potential mentors and indicate who those folks are within your proposal. -Greg ___________ Sent from my iPad. Apologies for any typos. A more detailed response may be sent later. On Apr 4, 2012, at 10:31 AM, karthik prasad <karthikprasad008 [at] gmail> wrote: > Dear Sirs, > I am grateful for your valuable feedback and suggestions. > > I have updated my proposal based on the inputs given by you. The split-up > of the deliverables on the ideas page indeed helped me understand the > requirements more clearly. > > The link to my updated proposal is > https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal > > I request you and everyone to kindly skim through my proposal once again > and suggest changes/additions. > I am very excited about this project and working with you; and truth be > told, 23rd April seems like ages ahead. > > Thanking you, > Yours sincerely, > Karthik > > >> Date: Wed, 4 Apr 2012 11:49:41 +0200 >> From: "Oren Bochman" <orenbochman [at] gmail> >> To: "'Wikimedia developers'" <wikitech-l [at] lists> >> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools >> Message-ID: <007f01cd1248$42ee6f40$c8cb4dc0$@com> >> Content-Type: text/plain; charset="utf-8" >> >> You do understand correctly! >> >> The main idea about NLP components is with POS tagger as an example: >> >> 1. a fall back system that does unsupervised POS tagging. >> 2. the ability to plug in an existing POS tagger as these become >> available for specific languages. >> >> I would as supervisor would recommend working with 3 languages. >> English, Hebrew, and the GSOC native language. >> >> If we could get QA from other native speakers we would incorporate them >> into the workflow. >> >> I think that by using a deletion/reversion based heuristic we may also be >> able to make a spam corpus to boost the accuracy of the corpuses. >> >> >> Operation Manager >> E-mail: oren [at] romai-horizon >> Mobil: +36 30 866 6706 >> >> >> >> R?mai Horizon Kft. >> H-1039 Budapest >> Kir?lyok ?tja 291. D. ?p. fszt. 2. >> Tel: +36 1 492 1492 >> Fax: +36 1 266 5529 >> >> -----Original Message----- >> From: wikitech-l-bounces [at] lists [mailto: >> wikitech-l-bounces [at] lists] On Behalf Of Amir E. Aharoni >> Sent: Tuesday, April 03, 2012 10:19 PM >> To: Wikimedia developers >> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools >> >> 2012/4/3 karthik prasad <karthikprasad008 [at] gmail>: >>> Hello, >>> I am a GSoC aspirant and have compiled a proposal for one of the >>> project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I >>> would sincerely appreciate if you could kindly go through it and >>> suggest corrections/additions so that I can settle with a coherent >> proposal. >>> >>> Link to my proposal : >>> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal >> >> Nice, but why only English? >> >> If i understand the proposal correctly, this project is supposed to be >> able to work with almost any language with very little effort. >> >> -- >> Amir Elisha Aharoni ? ?????? ????????? ?????????? >> http://aharoni.wordpress.com ??We're living in pieces, I want to live in >> peace.? ? T. Moore? >> >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l [at] lists >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> >> >> >> >> ------------------------------ >> >> >> Date: Wed, 4 Apr 2012 12:58:11 +0300 >> From: "Amir E. Aharoni" <amir.aharoni [at] mail> >> To: Wikimedia developers <wikitech-l [at] lists> >> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools >> Message-ID: >> <CACtNa8tS-PifzJS1JsF02k3qW_-7=UK-wDQnVSfLGLufhxnmNw [at] mail >>> >> Content-Type: text/plain; charset=UTF-8 >> >> 2012/4/4 Oren Bochman <orenbochman [at] gmail>: >>> You do understand correctly! >>> >>> The main idea about NLP components is with POS tagger as an example: >> >> Just to make sure, POS = part of speech, isn't it? >> >> It's one of the most confusing TLAs in computing :) >> >>> If we could get QA from other native speakers we would incorporate them >> into the workflow. >> >> Good. As long as there is a way to plug other languages and a way for >> speakers of other languages to contribute QA, i'm very happy. >> >> -- >> Amir Elisha Aharoni ? ?????? ????????? ?????????? >> http://aharoni.wordpress.com >> ??We're living in pieces, >> I want to live in peace.? ? T. Moore? >> > > > Date: Wed, 4 Apr 2012 00:28:29 -0400 > From: Gregory Varnum <gregory.varnum [at] gmail> > To: Wikimedia developers <wikitech-l [at] lists> > Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools > Message-ID: <AC4C429F-A839-4911-BE9B-C8928AA2DD8C [at] gmail> > Content-Type: text/plain; charset=utf-8 > > Whoops - I meant that email to be directed to Karthik - although Amir > you're welcome to read it as well. :) > > -greg > > > On Apr 3, 2012, at 11:24 PM, Gregory Varnum <gregory.varnum [at] gmail> > wrote: > >> Amir, >> >> Thank you for your GSOC proposal! :) >> >> Between now and Google's submission deadline on April 6th - you are > invited to further modify your proposals. The GSOC page on MW.org - > https://www.mediawiki.org/wiki/GSOC - and our IRC rooms - > https://www.mediawiki.org/wiki/MediaWiki_on_IRC >> >> Looking over your proposal - I think you've got good background > information on yourself. However, I think you should flush out more > details on the proposed project. Without more familiarity with corpus (and > with no links to find that info) - it's hard for everyone to weigh in > equally or to make sure your project gets the full consideration you'd like. >> >> -greg aka varnent >> >> >> On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni <amir.aharoni [at] mail> > wrote: >> >>> 2012/4/3 karthik prasad <karthikprasad008 [at] gmail>: >>>> Hello, >>>> I am a GSoC aspirant and have compiled a proposal for one of the project >>>> ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] >>>> I would sincerely appreciate if you could kindly go through it and > suggest >>>> corrections/additions so that I can settle with a coherent proposal. >>>> >>>> Link to my proposal : >>>> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal >>> >>> Nice, but why only English? >>> >>> If i understand the proposal correctly, this project is supposed to be >>> able to work with almost any language with very little effort. >>> >>> -- >>> Amir Elisha Aharoni ? ?????? ????????? ?????????? >>> http://aharoni.wordpress.com >>> ??We're living in pieces, >>> I want to live in peace.? ? T. Moore? >>> >>> _______________________________________________ >>> Wikitech-l mailing list >>> Wikitech-l [at] lists >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > _______________________________________________ > Wikitech-l mailing list > Wikitech-l [at] lists > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l [at] lists https://lists.wikimedia.org/mailman/listinfo/wikitech-l
|