Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

 

 

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


dgerard at gmail

Jun 20, 2009, 8:41 AM

Post #1 of 36 (1963 views)
Permalink
Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

Interesting. How well does this fit with what Wikisource does?


- d.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


meta.sj at gmail

Jun 20, 2009, 9:10 AM

Post #2 of 36 (1915 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

There is a wealth of work done all the time by primary source
researchers and publishers, which could be improved on by having
wikisource entries, translations, &c.

Related question : how appropriate would large numbers of public
domain texts, with page scans and the best available OCR [and
translations of same], fit with what Wikisource does now? This is
clearly a wiki project that needs to happen : OCR even at its best
misses rare meaning-bearing words. If not Wikisource, where should
this work take place?

SJ

On Sat, Jun 20, 2009 at 11:41 AM, David Gerard<dgerard [at] gmail> wrote:
> http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/
>
> Interesting. How well does this fit with what Wikisource does?
>
>
> - d.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


removed at example

Jun 20, 2009, 9:29 AM

Post #3 of 36 (1924 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

This has reminded me to complain about Google Books. Google has the world's
best OCR (in virtue of having the largest OCR'able dataset) and also has a
mission to scan in all the public domain books they can get their hand on.
They recently updated their interface to, as they put it, "make it easier to
find our plain text versions of public domain books. If a book is available
in full view, you can click the 'Plain text' button in the toolbar."
Unfortunately the only way I've found to download the full text of a public
domain book from Google is to flip through the book a page at a time,
copying the text to your clipboard.
There are roughly 2-3 million public domain books in Google Books.


On Sat, Jun 20, 2009 at 10:10 AM, Samuel Klein <meta.sj [at] gmail> wrote:

> There is a wealth of work done all the time by primary source
> researchers and publishers, which could be improved on by having
> wikisource entries, translations, &c.
>
> Related question : how appropriate would large numbers of public
> domain texts, with page scans and the best available OCR [and
> translations of same], fit with what Wikisource does now? This is
> clearly a wiki project that needs to happen : OCR even at its best
> misses rare meaning-bearing words. If not Wikisource, where should
> this work take place?
>
> SJ
>
> On Sat, Jun 20, 2009 at 11:41 AM, David Gerard<dgerard [at] gmail> wrote:
> >
> http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/
> >
> > Interesting. How well does this fit with what Wikisource does?
> >
> >
> > - d.
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l [at] lists
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> >
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Platonides at gmail

Jun 20, 2009, 11:02 AM

Post #4 of 36 (1923 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Brian wrote:
> Unfortunately the only way I've found to download the full text of a public
> domain book from Google is to flip through the book a page at a time,
> copying the text to your clipboard.
> There are roughly 2-3 million public domain books in Google Books.

That's easy to fix :)


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


removed at example

Jun 20, 2009, 11:04 AM

Post #5 of 36 (1915 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Not likely. I've been banned from Google's regular search at least a dozen
times during semi-frenetic search sprees in which I was identified as a bot.
There is no doubt that if you try to automate it you will be quickly shot
down.

On Sat, Jun 20, 2009 at 12:02 PM, Platonides <Platonides [at] gmail> wrote:

> Brian wrote:
> > Unfortunately the only way I've found to download the full text of a
> public
> > domain book from Google is to flip through the book a page at a time,
> > copying the text to your clipboard.
> > There are roughly 2-3 million public domain books in Google Books.
>
> That's easy to fix :)
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 20, 2009, 11:26 AM

Post #6 of 36 (1919 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Easier than scanning, though :)

On Sat, Jun 20, 2009 at 2:04 PM, Brian <Brian.Mingus [at] colorado> wrote:

> Not likely. I've been banned from Google's regular search at least a dozen
> times during semi-frenetic search sprees in which I was identified as a
> bot.
> There is no doubt that if you try to automate it you will be quickly shot
> down.
>
> On Sat, Jun 20, 2009 at 12:02 PM, Platonides <Platonides [at] gmail> wrote:
>
> > Brian wrote:
> > > Unfortunately the only way I've found to download the full text of a
> > public
> > > domain book from Google is to flip through the book a page at a time,
> > > copying the text to your clipboard.
> > > There are roughly 2-3 million public domain books in Google Books.
> >
> > That's easy to fix :)
> >
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l [at] lists
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


alex.public.account+WikimediaMailingList at gmail

Jun 20, 2009, 11:34 AM

Post #7 of 36 (1915 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

So the bot just has to run at human speeds so it does not get banned, it
still won't get tired or make unpredictable mistakes. And you can run it
from different IPs to parallelize.

--Falcorian

On Sat, Jun 20, 2009 at 11:04 AM, Brian <Brian.Mingus [at] colorado> wrote:

> Not likely. I've been banned from Google's regular search at least a dozen
> times during semi-frenetic search sprees in which I was identified as a
> bot.
> There is no doubt that if you try to automate it you will be quickly shot
> down.
>
> On Sat, Jun 20, 2009 at 12:02 PM, Platonides <Platonides [at] gmail> wrote:
>
> > Brian wrote:
> > > Unfortunately the only way I've found to download the full text of a
> > public
> > > domain book from Google is to flip through the book a page at a time,
> > > copying the text to your clipboard.
> > > There are roughly 2-3 million public domain books in Google Books.
> >
> > That's easy to fix :)
> >
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l [at] lists
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


removed at example

Jun 20, 2009, 11:47 AM

Post #8 of 36 (1916 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

That is against the law. It violates Google's ToS.

I'm mostly complaining that Google is being Very Evil. There is nothing we
can do about it except complain to them. Which I don't know how to do - they
apparently believe that the plain text versions of their books are akin to
their intellectual property and are unwilling to give them away.

On Sat, Jun 20, 2009 at 12:34 PM, Falcorian <
alex.public.account+WikimediaMailingList [at] gmail<alex.public.account%2BWikimediaMailingList [at] gmail>
> wrote:

> So the bot just has to run at human speeds so it does not get banned, it
> still won't get tired or make unpredictable mistakes. And you can run it
> from different IPs to parallelize.
>
> --Falcorian
>
> On Sat, Jun 20, 2009 at 11:04 AM, Brian <Brian.Mingus [at] colorado> wrote:
>
> > Not likely. I've been banned from Google's regular search at least a
> dozen
> > times during semi-frenetic search sprees in which I was identified as a
> > bot.
> > There is no doubt that if you try to automate it you will be quickly shot
> > down.
> >
> > On Sat, Jun 20, 2009 at 12:02 PM, Platonides <Platonides [at] gmail>
> wrote:
> >
> > > Brian wrote:
> > > > Unfortunately the only way I've found to download the full text of a
> > > public
> > > > domain book from Google is to flip through the book a page at a time,
> > > > copying the text to your clipboard.
> > > > There are roughly 2-3 million public domain books in Google Books.
> > >
> > > That's easy to fix :)
> > >
> > >
> > > _______________________________________________
> > > foundation-l mailing list
> > > foundation-l [at] lists
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> > >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l [at] lists
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


geo.plrd at yahoo

Jun 20, 2009, 12:22 PM

Post #9 of 36 (1917 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

For some reason, I am reminded of a Supreme Court case about the information in telephone directories. Maybe because of the insanity of trying to put public domain material under copyright.




________________________________
From: Brian <Brian.Mingus [at] colorado>
To: Wikimedia Foundation Mailing List <foundation-l [at] lists>
Sent: Saturday, June 20, 2009 11:47:28 AM
Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

That is against the law. It violates Google's ToS.

I'm mostly complaining that Google is being Very Evil. There is nothing we
can do about it except complain to them. Which I don't know how to do - they
apparently believe that the plain text versions of their books are akin to
their intellectual property and are unwilling to give them away.

On Sat, Jun 20, 2009 at 12:34 PM, Falcorian <
alex.public.account+WikimediaMailingList [at] gmail<alex.public.account%2BWikimediaMailingList [at] gmail>
> wrote:

> So the bot just has to run at human speeds so it does not get banned, it
> still won't get tired or make unpredictable mistakes. And you can run it
> from different IPs to parallelize.
>
> --Falcorian
>
> On Sat, Jun 20, 2009 at 11:04 AM, Brian <Brian.Mingus [at] colorado> wrote:
>
> > Not likely. I've been banned from Google's regular search at least a
> dozen
> > times during semi-frenetic search sprees in which I was identified as a
> > bot.
> > There is no doubt that if you try to automate it you will be quickly shot
> > down.
> >
> > On Sat, Jun 20, 2009 at 12:02 PM, Platonides <Platonides [at] gmail>
> wrote:
> >
> > > Brian wrote:
> > > > Unfortunately the only way I've found to download the full text of a
> > > public
> > > > domain book from Google is to flip through the book a page at a time,
> > > > copying the text to your clipboard.
> > > > There are roughly 2-3 million public domain books in Google Books.
> > >
> > > That's easy to fix :)
> > >
> > >
> > > _______________________________________________
> > > foundation-l mailing list
> > > foundation-l [at] lists
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> > >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l [at] lists
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l




_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


geo.plrd at yahoo

Jun 20, 2009, 12:27 PM

Post #10 of 36 (1925 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

For Supreme Court cases, would it be possible to have a bot pull the audio decisions from Oyez, and convert them into text?




________________________________
From: David Gerard <dgerard [at] gmail>
To: Wikimedia Foundation Mailing List <foundation-l [at] lists>
Sent: Saturday, June 20, 2009 8:41:45 AM
Subject: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

Interesting. How well does this fit with what Wikisource does?


- d.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l




_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


parkerhiggins at gmail

Jun 20, 2009, 12:27 PM

Post #11 of 36 (1931 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Except google isn't asserting any kind of copyright control over these
books, they're just not making it convenient to download them in your
preferred format. Maybe not The Right Thing, but not as boneheaded as suing
a party who reprints public domain material, as was the case in Feist v.
Rural (the supreme court case you mention.)

Sent from my portable e-mail unit

On Jun 20, 2009 3:23 PM, "Geoffrey Plourde" <geo.plrd [at] yahoo> wrote:

For some reason, I am reminded of a Supreme Court case about the information
in telephone directories. Maybe because of the insanity of trying to put
public domain material under copyright.




________________________________
From: Brian <Brian.Mingus [at] colorado>
To: Wikimedia Foundation Mailing List <foundation-l [at] lists>
Sent: Saturday, June 20, 2009 11:47:28 AM
Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an
Alternative Open Access Repository for Legal Scholarship

That is against the law. It violates Google's ToS. I'm mostly complaining
that Google is being Ver...
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Platonides at gmail

Jun 20, 2009, 12:29 PM

Post #12 of 36 (1921 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Brian wrote:
> That is against the law. It violates Google's ToS.
>
> I'm mostly complaining that Google is being Very Evil. There is nothing we
> can do about it except complain to them. Which I don't know how to do - they
> apparently believe that the plain text versions of their books are akin to
> their intellectual property and are unwilling to give them away.

Where does it forbid them?
The most related part is section 5.
I understand that doing queries at bot rate may be against #5.3 but I
don't see anything against this.
Unlike searches, the book OCR result will be cached, so this shouldn't
be inconvenience them (and they don't place ads there!).

I'd wikify the html instead of just moving to plain text, though.


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 20, 2009, 12:39 PM

Post #13 of 36 (1915 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Wow, what's Wikipedia's policy about using a bot to scrape everything?

On Sat, Jun 20, 2009 at 2:47 PM, Brian <Brian.Mingus [at] colorado> wrote:

> That is against the law. It violates Google's ToS.
>
> I'm mostly complaining that Google is being Very Evil. There is nothing we
> can do about it except complain to them. Which I don't know how to do -
> they
> apparently believe that the plain text versions of their books are akin to
> their intellectual property and are unwilling to give them away.
>
> On Sat, Jun 20, 2009 at 12:34 PM, Falcorian <
> alex.public.account+WikimediaMailingList [at] gmail<alex.public.account%2BWikimediaMailingList [at] gmail>
> <alex.public.account%2BWikimediaMailingList [at] gmail<alex.public.account%252BWikimediaMailingList [at] gmail>
> >
> > wrote:
>
> > So the bot just has to run at human speeds so it does not get banned, it
> > still won't get tired or make unpredictable mistakes. And you can run it
> > from different IPs to parallelize.
> >
> > --Falcorian
> >
> > On Sat, Jun 20, 2009 at 11:04 AM, Brian <Brian.Mingus [at] colorado>
> wrote:
> >
> > > Not likely. I've been banned from Google's regular search at least a
> > dozen
> > > times during semi-frenetic search sprees in which I was identified as a
> > > bot.
> > > There is no doubt that if you try to automate it you will be quickly
> shot
> > > down.
> > >
> > > On Sat, Jun 20, 2009 at 12:02 PM, Platonides <Platonides [at] gmail>
> > wrote:
> > >
> > > > Brian wrote:
> > > > > Unfortunately the only way I've found to download the full text of
> a
> > > > public
> > > > > domain book from Google is to flip through the book a page at a
> time,
> > > > > copying the text to your clipboard.
> > > > > There are roughly 2-3 million public domain books in Google Books.
> > > >
> > > > That's easy to fix :)
> > > >
> > > >
> > > > _______________________________________________
> > > > foundation-l mailing list
> > > > foundation-l [at] lists
> > > > Unsubscribe:
> https://lists.wikimedia.org/mailman/listinfo/foundation-l
> > > >
> > > _______________________________________________
> > > foundation-l mailing list
> > > foundation-l [at] lists
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> > >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l [at] lists
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


removed at example

Jun 20, 2009, 12:58 PM

Post #14 of 36 (1913 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sat, Jun 20, 2009 at 1:29 PM, Platonides <Platonides [at] gmail> wrote:

> Where does it forbid them?


5.3 You agree not to access (or attempt to access) any of the Services by
any means other than through the interface that is provided by Google,
unless you have been specifically allowed to do so in a separate agreement
with Google. You specifically agree not to access (or attempt to access) any
of the Services through any automated means (including use of scripts or web
crawlers) and shall ensure that you comply with the instructions set out in
any robots.txt file present on the Services.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


stephen.bain at gmail

Jun 20, 2009, 6:55 PM

Post #15 of 36 (1902 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 5:27 AM, Parker Higgins<parkerhiggins [at] gmail> wrote:
> Except google isn't asserting any kind of copyright control over these
> books, they're just not making it convenient to download them in your
> preferred format.  Maybe not The Right Thing, but not as boneheaded as suing
> a party who reprints public domain material, as was the case in Feist v.
> Rural (the supreme court case you mention.)

They want people to use their service. Fair enough, given that the
scanning and OCRing happened on their dime.

--
Stephen Bain
stephen.bain [at] gmail

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Platonides at gmail

Jun 20, 2009, 7:00 PM

Post #16 of 36 (1908 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Brian wrote:
>> Where does it forbid them?
>
>
> 5.3 You agree not to access (or attempt to access) any of the Services by
> any means other than through the interface that is provided by Google,
> unless you have been specifically allowed to do so in a separate agreement
> with Google. You specifically agree not to access (or attempt to access) any
> of the Services through any automated means (including use of scripts or web
> crawlers) and shall ensure that you comply with the instructions set out in
> any robots.txt file present on the Services.

Uh?
That's not the TOS I am reading:
> 5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google.
>
> 5.4 You agree that you will not engage in any activity that interferes with or disrupts the Services (or the servers and networks which are connected to the Services).

The second part is missing.
Seems that US have different terms than the rest of us.

-http://www.google.com/accounts/TOS


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


smolensk at eunet

Jun 21, 2009, 12:11 AM

Post #17 of 36 (1896 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Дана Saturday 20 June 2009 18:29:24 Brian напиÑа:
> This has reminded me to complain about Google Books. Google has the world's
> best OCR (in virtue of having the largest OCR'able dataset) and also has a
> mission to scan in all the public domain books they can get their hand on.
> They recently updated their interface to, as they put it, "make it easier
> to find our plain text versions of public domain books. If a book is
> available in full view, you can click the 'Plain text' button in the
> toolbar." Unfortunately the only way I've found to download the full text
> of a public domain book from Google is to flip through the book a page at a
> time, copying the text to your clipboard.

Often, these books are available in the Million Books Project too.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 21, 2009, 4:33 AM

Post #18 of 36 (1871 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 7:17 AM, Anthony <wikimail [at] inbox> wrote:

> (*) Personally, I'm of the opinion that merely accessing a website is not
> sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
> not have to even click "agree" to is a unilateral contract which can only
> impose promises upon the offeror, though this is not a legal opinion but
> merely my opinion of what the law should be.
>

You know what, after further thought I'm going to withdraw that. First of
all, I think Google does require you to click agree before you can access
the service we're talking about. But more importantly, I'm going to cast
doubt on my previously held opinion of whether or not a TOS should be able
to bind someone who didn't click on anything. If I leave a bunch of Apples
on the table at work and put next to it a sign that says "Apples: $.25
each"... I don't know, I'll have to think about it.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jayvdb at gmail

Jun 21, 2009, 4:54 AM

Post #19 of 36 (1869 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 9:17 PM, Anthony <wikimail [at] inbox> wrote:
>
> On Sun, Jun 21, 2009 at 1:51 AM, Ray Saintonge <saintonge [at] telus> wrote:
>
> > Stephen Bain wrote:
> > > On Sun, Jun 21, 2009 at 5:27 AM, Parker Higgins<parkerhiggins [at] gmail>
> > wrote:
> > >
> > >> Except google isn't asserting any kind of copyright control over these
> > >> books, they're just not making it convenient to download them in your
> > >> preferred format.  Maybe not The Right Thing, but not as boneheaded as
> > suing
> > >> a party who reprints public domain material, as was the case in Feist v.
> > >> Rural (the supreme court case you mention.)
> > >>
> > > They want people to use their service. Fair enough, given that the
> > > scanning and OCRing happened on their dime.
> > >
> > >
> > How does that give them any special rights?  There are no database
> > protection laws in the US, and sweat-of-the-brow has been rejected as a
> > basis for new copyrights.
>
>
> You're right, it doesn't give them any *special* rights.  They have the same
> rights as any other computer owner.  Specifically, they have the right to
> choose who uses their computers, and how they use them.  Whether or not a
> terms of service is legally binding is really not the issue. (*)  The issue
> is whether or not they have a duty to make it *convenient* for you to
> download the data.  Of course they don't.  Why should they be required to
> help you put them out of business?  That kind of twisted logic might make
> sense in the non-profit world (although I still haven't seen the WMF step up
> to the plate and make it easy for people to make a full history fork, or
> even to download all the images), but Google is not a non-profit
> organization.  Google would be Evil if it *didn't* protect itself against
> this, as it'd be breaking a promise to its shareholders.
>
> (*) Personally, I'm of the opinion that merely accessing a website is not
> sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
> not have to even click "agree" to is a unilateral contract which can only
> impose promises upon the offeror, though this is not a legal opinion but
> merely my opinion of what the law should be.

Whether Google is good or evil is off-topic, and irrelevant to boot.

There are nearly _750,000_ books from Google that are available on
archive.org, available in DJVU format with OCR.

http://www.archive.org/details/googlebooks

Microsoft donated many texts directly to IA, but that approach only
netted 440,000 books.

http://www.archive.org/details/msn_books

See here for more of the collections:
http://www.archive.org/details/texts

Also worth noting, Project Gutenberg has digitised less than 30,000
books since 1971. Distributed Proofreaders has done 15,000 of those
since 2000, so throughput is picking up. But, there are more than
enough too keep everyone busy for a very long time.

--
John Vandenberg

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 21, 2009, 5:07 AM

Post #20 of 36 (1871 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg <jayvdb [at] gmail> wrote:

> Whether Google is good or evil is off-topic, and irrelevant to boot.
>

Whether or not they have a right to exclude bots isn't.

Also worth noting, Project Gutenberg has digitised less than 30,000
> books since 1971. Distributed Proofreaders has done 15,000 of those
> since 2000, so throughput is picking up. But, there are more than
> enough too keep everyone busy for a very long time.


The interesting thing is, even if you don't use a bot, it's still faster to
copy/paste from Google manually than it is to get the book and scan it in
yourself (assuming you don't want to destroy the original, anyway).

If you're going to make a project out OCRing books that Google has already
OCRed, I don't see any point in reinventing the scanning or first pass
OCRing part.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jayvdb at gmail

Jun 21, 2009, 5:35 AM

Post #21 of 36 (1879 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 10:07 PM, Anthony <wikimail [at] inbox> wrote:
>
> On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg <jayvdb [at] gmail> wrote:
>
> > Whether Google is good or evil is off-topic, and irrelevant to boot.
> >
>
> Whether or not they have a right to exclude bots isn't.

Actually, it is. This mailing list is about the Wikimedia Foundation
and its project, and this thread is about Wikisource. Anyone who has
done significant amounts of Wikisource work will tell you that they
don't consider Google Book click through license to be an problem that
needs discussing at this level.

Do you think that 750,000 Google Books were manually converted to
DJVU, and copied over to Internet Archive?

Is there a book that you seek that isn't available at Internet Archive?

I wrote a GreaseMonkey user script to scrape the text from Google
Books; it is now broken and unmaintained because I no longer need to
take text from Google Books, as the vast majority of the texts I want
are now on Internet Archive, and that is a more productive workflow.

> Also worth noting, Project Gutenberg has digitised less than 30,000
> > books since 1971.  Distributed Proofreaders has done 15,000 of those
> > since 2000, so throughput is picking up.  But, there are more than
> > enough too keep everyone busy for a very long time.
>
>
> The interesting thing is, even if you don't use a bot, it's still faster to
> copy/paste from Google manually than it is to get the book and scan it in
> yourself (assuming you don't want to destroy the original, anyway).

No, it is quicker to download the DJVU file from Internet Archive,
upload it to Wikisource, set up a transcription project, and fix the
OCR text there, and copy and paste it wherever you like.

It takes about 10 minutes unless there is some copyright concern.

> If you're going to make a project out OCRing books that Google has already
> OCRed, I don't see any point in reinventing the scanning or first pass
> OCRing part.

I suggest you take a look at a few of the DJVU files provided by
Internet Archive. Then you can point out real faults that you see.

--
John Vandenberg

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 21, 2009, 7:23 AM

Post #22 of 36 (1872 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg <jayvdb [at] gmail> wrote:

> I suggest you take a look at a few of the DJVU files provided by
> Internet Archive. Then you can point out real faults that you see.


I will. My apologies for misunderstanding your email.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 21, 2009, 7:55 AM

Post #23 of 36 (1867 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 10:23 AM, Anthony <wikimail [at] inbox> wrote:

> On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg <jayvdb [at] gmail> wrote:
>
>> I suggest you take a look at a few of the DJVU files provided by
>> Internet Archive. Then you can point out real faults that you see.
>
>
> I will. My apologies for misunderstanding your email.
>

Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
be the first book I randomly picked from Google Book Search. There's no
text version.

And the text version I find of other editions seems to be much much worse
than the google OCR results.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 21, 2009, 8:20 AM

Post #24 of 36 (1864 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 10:55 AM, Anthony <wikimail [at] inbox> wrote:

> On Sun, Jun 21, 2009 at 10:23 AM, Anthony <wikimail [at] inbox> wrote:
>
>> On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg <jayvdb [at] gmail>wrote:
>>
>>> I suggest you take a look at a few of the DJVU files provided by
>>> Internet Archive. Then you can point out real faults that you see.
>>
>>
>> I will. My apologies for misunderstanding your email.
>>
>
> Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
> be the first book I randomly picked from Google Book Search. There's no
> text version.
>
> And the text version I find of other editions seems to be much much worse
> than the google OCR results.
>

http://books.google.com/books?id=TZ0UAAAAYAAJ strike two, not even there.
http://books.google.com/books?id=PYAaAAAAYAAJ strike three
http://www.archive.org/details/happinessessays00hiltgoog finally...let's
compare the OCR:

"Great numbers of thoughtful people are just now much perplexed to know what
to make of the faffs of life, and are looking about them for some reasonable
interpretation of the modern world. They cannot abandon the work of the
world, but they are conscious that they have not learned the art of work."

"Greaf numbers of thoughtful people are just now much perplexed to know what
to make of thefaSls of life^ and are looking about them for some reasonable
interpretation of the modem world. They cannot abandon the work of the
worlds but they are conscious that they have not learned the art of work."
---
"Few people, however, really know how to work, and even in an age when
oftener perhaps than ever before we hear of "work" and "workers" one cannot
observe that the art of work makes much positive progress. On the contrary,
the general inclination seems to be to work as little as possible, or to
work for a short time in order to pass the remainder of one's life in rest."

"Few people, however, really know how to work, and even in an age when
oftener perhaps than ever before we hear of" work " and " workers " one
cannotobserve that the art of work makes much positive progress. On the
contrary, the general inclination seems to be to work as little as possible,
or to work for a short time in order to pass the remainder of one's life in
rest. "
---
I guess that's acceptable. The Catholic encyclopedia results were much
worse, though. Maybe it was a font thing, but I'm not quite interested
enough to bother doing a more in depth study right now.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jayvdb at gmail

Jun 21, 2009, 5:18 PM

Post #25 of 36 (1846 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Sun, Jun 21, 2009 at 1:41 AM, David Gerard <dgerard [at] gmail> wrote:
>
> http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/
>
> Interesting. How well does this fit with what Wikisource does?

Here are seven articles from PLoS One.

http://en.wikisource.org/wiki/Category:Plosone

We have other published material that has been released under CC licenses:

http://en.wikisource.org/wiki/Unhappy_Thought

And books under various licenses:

http://en.wikisource.org/wiki/Bulgarian_Policies_on_the_Republic_of_Macedonia
http://en.wikisource.org/wiki/A_Short_History_of_Russian_%22Fantastica%22
http://en.wikisource.org/wiki/Free_as_in_Freedom

--
John Vandenberg

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.