Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics

 

 

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


nemowiki at gmail

Aug 10, 2009, 11:16 PM

Post #1 of 9 (997 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics

Samuel Klein, 11/08/2009 07:00:
> Let's take a practical example. A classics professor I know (Greg
> Crane, copied here) has scans of primary source materials, some with
> approximate or hand-polished OCR, waiting to be uploaded and converted
> into a useful online resource for editors, translators, and
> classicists around the world.
>
> Where should he and his students post that material?

Slovene Wikisource did something similar:
http://meta.wikimedia.org/wiki/Slovene_student_projects_in_Wikipedia_and_Wikisource#Wikisource

Nemo

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


lars at aronsson

Aug 11, 2009, 6:16 PM

Post #2 of 9 (934 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

Samuel Klein wrote:

> I think we agree on what needs to happen. The only thing I am
> not sure of is where you would like to see the work take place.

I'm not so sure we agree. I think we're talking about two
different things.

This thread started out with a discussion of why it is so hard to
start new projects within the Wikimedia Foundation. My stance is
that projects like OpenStreetMap.org and OpenLibrary.org are doing
fine as they are, and there is no need to duplicate their effort
within the WMF. The example you gave was this:

> >> >> *A wiki for book metadata, with an entry for every published
> >> >> work, statistics about its use and siblings, and discussion
> >> >> about its usefulness as a citation (a collaboration with
> >> >> OpenLibrary, merging WikiCite ideas)

To me, that sounds exactly as what OpenLibrary already does (or
could be doing in the near time), so why even set up a new project
that would collaborate with it? Later you added:

> >> I could see this happening on Wikisource.

That's when I asked why this couldn't be done inside OpenLibrary.

I added:

> > (Plus you would have to motivate why a copy of OpenLibrary should
> > go into the English Wikisource and not the German or French one.)

You replied:

> I don't understand what you mean -- English source materials and
> metadata go on en:ws, German on de:ws, &c. How is this different from
> what happens today?

I was talking about the metadata for all books ever published,
including the Swedish translations of Mark Twain's works, which
are part of Mark Twain's bibliography, of the translator's
bibliography, of American literature, and of Swedish language
literature. In OpenLibrary all of these are contained in one
project. In Wikisource, they are split in one section for English
and another section for Swedish. That division makes sense for
the contents of the book, but not for the book metadata.

Now you write:

> However, the project I have in mind for OCR cleaning and
> translation needs to

That is a change of subject. That sounds just like what Wikisource
(or PGDP.net) is about. OCR cleaning is one thing, but it is an
entirely different thing to set up "a wiki for book metadata, with
an entry for every published work". So which of these two project
ideas are we talking about?

Every book ever published means more than 10 million records.
(It probably means more than 100 million records.) OCR cleaning
attracts hundreds or a few thousand volunteers, which is
sufficient to take on thousands of books, but not millions.

Google scanned millions of books already, but I haven't heard of
any plans for cleaning all that OCR text.

> Let's take a practical example. A classics professor I know
> (Greg Crane, copied here) has scans of primary source materials,
> some with approximate or hand-polished OCR, waiting to be
> uploaded and converted into a useful online resource for
> editors, translators, and classicists around the world.
>
> Where should he and his students post that material?

On Wikisource. What's stopping them?



--
Lars Aronsson (lars [at] aronsson)
Aronsson Datateknik - http://aronsson.se

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


yann at forget-me

Aug 18, 2009, 12:45 PM

Post #3 of 9 (877 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

Hello,

Lars Aronsson wrote:
> Yann Forget wrote:
>
>> This discussion is very interesting. I would like to make a summary, so
>> that we can go further.
>>
>> 1. A database of all books ever published is one of the thing
>> still missing.
>
> No, no, no, this is *not* missing. This is exactly the scope of
> OpenLibrary. Just as Wikipedia is not yet a complete encyclopedia,
> or OpenStreetMap is not yet a complete map of the world, some
> books are still missing from OpenLibrary's database, but it is a
> project aiming to compile a database of every book ever published.

At least Wikipedia can say that it has the most complete encyclopedia,
and OpenStreetMap the most complete free maps that ever existed. AFAIK
OpenLibrary is very very far to have anything comprensive, through I am
curious to have the figures. As I already said, the first steps would be
to import existing databases, and Wikimedians are very good at this job.

>> Personally I don't find OL very practical. May be I am too much
>> used too Mediawiki. ;oD
>
> And therefore, you would not try to improve OpenLibrary, but
> rather start an entirely new project based on MediaWiki? I'm
> afraid that this ("not invented here") is a common sentiment, and
> a major reason that we will get nowhere.

You are wrong here. I was delighted to see a project as OL and I
inserted a few books and authors, but I have not been convinced. On
books and authors, Wikimedia projects have already much more data than
OL, and a lot of basic funtionalities are not available: tagging 2
entries as identical (redirect), multilinguism, links between related
entries (interwiki), etc.

I don't really care who would host this "Universal Library", as long as
it is freely available with a powerful search engine, and no restriction
on reuse. What I say is that Mediawiki is really much better that
anything else for any massive online cooperative work. The most
important point for such a project is building a community. OpenLibrary
has certainly done a good job, but I don't see _a community_. The tools
and the social environment available on Wikimedia projects are missing.
I believe the social environment is a consequence both of the software
and the leadership. Once the community exists it may be self-sustaining
if other conditions are met. OL lacks a good software as Mediawiki and a
leader as Jimbo.

Yann
--
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


lars at aronsson

Aug 20, 2009, 6:53 PM

Post #4 of 9 (870 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

Yann Forget wrote:

> As I already said, the first steps would be to import existing
> databases, and Wikimedians are very good at this job.

Do you have a bibliographic database (library catalog) of French
literature that you can upload? How many records? Convincing
libraries to donate copies of their catalogs has been a bottleneck
for OpenLibrary.


--
Lars Aronsson (lars [at] aronsson)
Aronsson Datateknik - http://aronsson.se

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


yann at forget-me

Aug 21, 2009, 4:33 AM

Post #5 of 9 (858 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

Lars Aronsson wrote:
> Yann Forget wrote:
>
>> As I already said, the first steps would be to import existing
>> databases, and Wikimedians are very good at this job.
>
> Do you have a bibliographic database (library catalog) of French
> literature that you can upload? How many records? Convincing
> libraries to donate copies of their catalogs has been a bottleneck
> for OpenLibrary.

No, I don't have such a database. There is a copyright on databases in
Europe, which makes things complicated.

Probably we need to start with libraries which are already collaborating
with open content projects. There was a GLAM-wiki meeting in Australia
recently: there might be a possibility with an Australian library?

But even before that, if we could extract the data from Wikimedia
projects, we could create a basic working frame. I have been collecting
such data on Wikisource and Wikibooks, but the lack of a structured
system is a bottleneck.

Examples:
1. Comprehensive bibliography of Gandhi in French
http://fr.wikibooks.org/wiki/Bibliographie_de_Gandhi

2. French translations of Russian authors:
http://fr.wikisource.org/wiki/Discussion_Auteur:L%C3%A9on_Tolsto%C3%AF
http://fr.wikisource.org/wiki/Discussion_Auteur:F%C3%A9dor_Mikha%C3%AFlovitch_Dosto%C3%AFevski

Regards,

Yann
--
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


joshuagay at gmail

Aug 21, 2009, 5:52 AM

Post #6 of 9 (869 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

David Strauss did a quick implementation (basically a demo) of an
OpenLibrary extension for MediaWiki. In very little amount of code, he was
able to easily search the OL (via AJAX) and when the user selected a given
result, it poppulated a Citation template. What was nice is that when no
results came up for a given search, there was an "add to open library"
button that brought you to the OL site to add your bibliographic
information.

I think it would be easy to build upon this work and one could do a really
powerful MW extension (and maybe some new templates, etc) that would allow
people to contribute to both MW and OL simultaneously.

I think that the OL should continue to do what is trying to do. I also think
people should be able to quickly and easily create new and important
wikimedia projects, especially when people are passionate to do so. And, I
think when different projects on the Internet have a lot of overlap in what
they are trying to do, and share similar philosophy and ethics, that they
should have their machines play nice with each other and make sharing
(reading and writing) data between them easy.

-Josh


On Fri, Aug 21, 2009 at 7:33 AM, Yann Forget <yann [at] forget-me> wrote:

> Lars Aronsson wrote:
> > Yann Forget wrote:
> >
> >> As I already said, the first steps would be to import existing
> >> databases, and Wikimedians are very good at this job.
> >
> > Do you have a bibliographic database (library catalog) of French
> > literature that you can upload? How many records? Convincing
> > libraries to donate copies of their catalogs has been a bottleneck
> > for OpenLibrary.
>
> No, I don't have such a database. There is a copyright on databases in
> Europe, which makes things complicated.
>
> Probably we need to start with libraries which are already collaborating
> with open content projects. There was a GLAM-wiki meeting in Australia
> recently: there might be a possibility with an Australian library?
>
> But even before that, if we could extract the data from Wikimedia
> projects, we could create a basic working frame. I have been collecting
> such data on Wikisource and Wikibooks, but the lack of a structured
> system is a bottleneck.
>
> Examples:
> 1. Comprehensive bibliography of Gandhi in French
> http://fr.wikibooks.org/wiki/Bibliographie_de_Gandhi
>
> 2. French translations of Russian authors:
> http://fr.wikisource.org/wiki/Discussion_Auteur:L%C3%A9on_Tolsto%C3%AF
>
> http://fr.wikisource.org/wiki/Discussion_Auteur:F%C3%A9dor_Mikha%C3%AFlovitch_Dosto%C3%AFevski
>
> Regards,
>
> Yann
> --
> http://www.non-violence.org/ | Site collaboratif sur la non-violence
> http://www.forget-me.net/ | Alternatives sur le Net
> http://fr.wikisource.org/ | Bibliothèque libre
> http://wikilivres.info | Documents libres
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
I am running the Arizona Rock'n'Roll marathon with Team in Training. Help me
reach my fundraising goals:
http://pages.teamintraining.org/ma/pfchangs10/joshuagay
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


yann at forget-me

Aug 21, 2009, 7:09 AM

Post #7 of 9 (857 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

Joshua Gay wrote:
> David Strauss did a quick implementation (basically a demo) of an
> OpenLibrary extension for MediaWiki. In very little amount of code, he was
> able to easily search the OL (via AJAX) and when the user selected a given
> result, it poppulated a Citation template. What was nice is that when no
> results came up for a given search, there was an "add to open library"
> button that brought you to the OL site to add your bibliographic
> information.

Interesting, I didn't know that. Is this demo available somewhere?

Yann

--
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


phoebe.wiki at gmail

Aug 21, 2009, 8:20 AM

Post #8 of 9 (868 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

On Fri, Aug 21, 2009 at 5:52 AM, Joshua Gay<joshuagay [at] gmail> wrote:
>
> I think it would be easy to build upon this work and one could do a really
> powerful MW extension (and maybe some new templates, etc) that would allow
> people to contribute to both MW and OL simultaneously.
>
> I think that the OL should continue to do what is trying to do. I also think
> people should be able to quickly and easily create new and important
> wikimedia projects, especially when people are passionate to do so. And, I
> think when different projects on the Internet have a lot of overlap in what
> they are trying to do, and share similar philosophy and ethics,  that they
> should have their machines play nice with each other and make sharing
> (reading and writing) data between them easy.
>
> -Josh

I was gong to say basically this, and then Josh said it better :)
There's no special reason to reinvent the wheel; as DGG mentioned
there are several very difficult aspects of building a big
bibliographic database (cataloging standards, getting the data in the
first place, theoretical relationships between works) that the OL
folks have tackled with some success; and there is value in having a
project that focuses just on this hard problem. SJ is right that
Wikimedian expertise lies in making large wikis functional and
multilingual, and augmenting data; but that doesn't mean such a
project has to be a *Wikimedia* project. I think cooperation between
the projects would be better. Interlinking into Wikip/media would
raise OL's profile substantially, and would mean that WP had access to
some sort of canonical catalog data; a win for everyone.
-- Phoebe

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


joshuagay at gmail

Aug 21, 2009, 2:10 PM

Post #9 of 9 (855 views)
Permalink
Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics [In reply to]

>
> Interesting, I didn't know that. Is this demo available somewhere?


Here is a demo of it up and running:
http://ol.fkbuild.com/w/index.php/Main_Page

Click edit and then click on the OL button on the tool bar and enter a
search item.

Also, I think someone I shared this with had trouble getting it to work with
IE -- I've only ever tried it on firefox.

-Josh
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.