Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

 

 

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


Platonides at gmail

Jun 22, 2009, 4:23 PM

Post #26 of 36 (911 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Anthony wrote:
> On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg <jayvdb [at] gmail> wrote:
>
>> Whether Google is good or evil is off-topic, and irrelevant to boot.
>>
>
> Whether or not they have a right to exclude bots isn't.
>
> Also worth noting, Project Gutenberg has digitised less than 30,000
>> books since 1971. Distributed Proofreaders has done 15,000 of those
>> since 2000, so throughput is picking up. But, there are more than
>> enough too keep everyone busy for a very long time.
>
>
> The interesting thing is, even if you don't use a bot, it's still faster to
> copy/paste from Google manually than it is to get the book and scan it in
> yourself (assuming you don't want to destroy the original, anyway).
>
> If you're going to make a project out OCRing books that Google has already
> OCRed, I don't see any point in reinventing the scanning or first pass
> OCRing part.

IMHO the interesting bit would be to make a google books browser
prefiling the wiki editor.


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


swatjester at gmail

Jun 22, 2009, 5:01 PM

Post #27 of 36 (925 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

The statute supports that as well, providing a private right of action
and civil remedy. It's not entirely that cut and dry (there are
certain restrictions that must be met) but yeah, it appears that in
some cases TOS violations can be illegal.

-Dan
On Jun 22, 2009, at 7:49 PM, Mark Wagner wrote:

> On Sat, Jun 20, 2009 at 14:35, Ray Saintonge<saintonge [at] telus>
> wrote:
>> Brian wrote:
>>> That is against the law. It violates Google's ToS.
>>>
>>> I'm mostly complaining that Google is being Very Evil. There is
>>> nothing we
>>> can do about it except complain to them. Which I don't know how to
>>> do - they
>>> apparently believe that the plain text versions of their books are
>>> akin to
>>> their intellectual property and are unwilling to give them away.
>>>
>>>
>> How is violating Google's ToS against the law?
>
> The verdict in _United States v. Lori Drew_ appears to set a precedent
> that violating a site's Terms of Service is a violation of the
> Computer Fraud and Abuse Act. It's not a very strong precedent, but
> it's still there.
>
> --
> Mark
> [[en:User:Carnildo]]
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Platonides at gmail

Jun 22, 2009, 6:15 PM

Post #28 of 36 (917 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Anthony wrote:
> (although I still haven't seen the WMF step up
> to the plate and make it easy for people to make a full history fork, or
> even to download all the images)

You'll find full history dumps of almost all wikis at
http://download.wikimedia.org/

Although not trivial, downloading all images is in fact quite easy. You
can find scripts to do that already made. You can also ask Brion to
rsync3 them.
But do you have enough space to dedicate?
How many wikis do you want to mirror? Just commons is more than 3 TB...

That's the reason so few people were interested in the images when the
image dump was available.


_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


grinapo at gmail

Jun 23, 2009, 2:26 AM

Post #29 of 36 (914 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Tue, Jun 23, 2009 at 03:15, Platonides<Platonides [at] gmail> wrote:
> Although not trivial, downloading all images is in fact quite easy. You
> can find scripts to do that already made. You can also ask Brion to
> rsync3 them.
> But do you have enough space to dedicate?
> How many wikis do you want to mirror? Just commons is more than 3 TB...

Well disks are cheap nowadays. If it's really just the question of
asking, I may be interested. for example.

The more complex question is the parameters of such usage, meaning
what can I do with the images after I've got them. This is the main
reason behind not publishing them in the first hand: the images itself
aren't suggesting any particular license.

Now that I wrote this, it would be possible (not sure if feasible,
though) to publish CC-BY-SA pictures with author info in the comment
of the image itself. Most image formats support sizeable comment
blocks, and standardised templates make it possible to select media by
license, and get author/copyright info to put into the file.

> That's the reason so few people were interested in the images when the
> image dump was available.

People are interested, generally, but not in mirroring the whole shebang. :-)

grin

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


meta.sj at gmail

Jun 23, 2009, 5:19 AM

Post #30 of 36 (912 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

Yes, but my understanding is that while google provided part of the mbp data
and scans, its continued updates to ocr since then are not being shared. I
would be glad to learn this was not the case...

samuel klein. sj [at] laptop +1 617 529 4266

On Jun 21, 2009 3:14 AM, "Nikola Smolenski" <smolensk [at] eunet> wrote:

δΑΞΑ Saturday 20 June 2009 18:29:24 Brian ΞΑΠΙΣΑ:

> This has reminded me to complain about Google Books. Google has the
world's > best OCR (in virtue ...
Often, these books are available in the Million Books Project too.

_______________________________________________ foundation-l mailing list
foundation-l [at] lists
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 23, 2009, 5:59 AM

Post #31 of 36 (917 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Mon, Jun 22, 2009 at 9:15 PM, Platonides <Platonides [at] gmail> wrote:

> Anthony wrote:
> > (although I still haven't seen the WMF step up
> > to the plate and make it easy for people to make a full history fork, or
> > even to download all the images)
>
> You'll find full history dumps of almost all wikis at
> http://download.wikimedia.org/


Key word being "almost".

Although not trivial, downloading all images is in fact quite easy.


Yep. All I need is permission.


> But do you have enough space to dedicate?


Not at the moment. No sense in buying the drives when I don't have
permission to fill them up.


> How many wikis do you want to mirror? Just commons is more than 3 TB...


Commons and En.wikipedia would probably be good for starters.

The main thing I want is permission to scrape en.wikipedia, though. (Not
really scraping, as I'd probably use the API and Special:Export. Basically
I just would like someone official to tell me how *fast* I'm allowed to use
the API and Special:Export. Special:Export especially, because I could
easily overwhelm the servers using that, due to a bug in the script.)

That's the reason so few people were interested in the images when the
> image dump was available.


I downloaded it. It was well under 1 TB at the time.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


removed at example

Jun 23, 2009, 10:09 AM

Post #32 of 36 (909 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

2009/6/23 Samuel Klein <meta.sj [at] gmail>

> Yes, but my understanding is that while google provided part of the mbp
> data
> and scans, its continued updates to ocr since then are not being shared. I
> would be glad to learn this was not the case...
>

The dataset you need to train an OCR system to be as good as theirs is the
raw images and the plain text. They aren't making it easy to get either of
those things :( They have presumably improved the software in other ways as
well..

WTF GOOG?
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 23, 2009, 12:52 PM

Post #33 of 36 (910 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Tue, Jun 23, 2009 at 1:09 PM, Brian <Brian.Mingus [at] colorado> wrote:

> 2009/6/23 Samuel Klein <meta.sj [at] gmail>
>
> > Yes, but my understanding is that while google provided part of the mbp
> > data
> > and scans, its continued updates to ocr since then are not being shared.
> I
> > would be glad to learn this was not the case...
> >
>
> The dataset you need to train an OCR system to be as good as theirs is the
> raw images and the plain text. They aren't making it easy to get either of
> those things :( They have presumably improved the software in other ways as
> well..
>
> WTF GOOG?


It's almost like they're trying to run a business or something.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 23, 2009, 12:58 PM

Post #34 of 36 (910 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Tue, Jun 23, 2009 at 2:24 PM, Brian <Brian.Mingus [at] colorado> wrote:

> Ok Shakespeare. But in plain english you appear to be saying that
> corporations are inherently greedy and have a tendency to be evil. Sure,
> but
> we expect more out of GOOG. This is not MSFT we are talking about.


Of course they're inherently greedy. That's the whole purpose of a
for-profit corporation - to make as much money as possible for its
shareholders. As for "tendency to be evil", I think that rests on your
definition of "evil".
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Jun 23, 2009, 1:10 PM

Post #35 of 36 (908 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Tue, Jun 23, 2009 at 3:58 PM, Anthony <wikimail [at] inbox> wrote:

> On Tue, Jun 23, 2009 at 2:24 PM, Brian <Brian.Mingus [at] colorado> wrote:
>
>> Ok Shakespeare. But in plain english you appear to be saying that
>> corporations are inherently greedy and have a tendency to be evil. Sure,
>> but
>> we expect more out of GOOG. This is not MSFT we are talking about.
>
>
> Of course they're inherently greedy. That's the whole purpose of a
> for-profit corporation - to make as much money as possible for its
> shareholders.
>

I guess even a non-profit is inherently greedy, it's just greedy for
something other than money. The WMF is greedy for the spread of free
knowledge.

But this is off-topic. Let's take it to another list or something.
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


jayvdb at gmail

Jun 23, 2009, 3:33 PM

Post #36 of 36 (912 views)
Permalink
Re: Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship [In reply to]

On Wed, Jun 24, 2009 at 6:10 AM, Anthony <wikimail [at] inbox> wrote:
>
> On Tue, Jun 23, 2009 at 3:58 PM, Anthony <wikimail [at] inbox> wrote:
>
> > On Tue, Jun 23, 2009 at 2:24 PM, Brian <Brian.Mingus [at] colorado> wrote:
> >
> >> Ok Shakespeare. But in plain english you appear to be saying that
> >> corporations are inherently greedy and have a tendency to be evil. Sure,
> >> but
> >> we expect more out of GOOG. This is not MSFT we are talking about.
> >
> >
> > Of course they're inherently greedy.  That's the whole purpose of a
> > for-profit corporation - to make as much money as possible for its
> > shareholders.
> >
>
> I guess even a non-profit is inherently greedy, it's just greedy for
> something other than money.  The WMF is greedy for the spread of free
> knowledge.
>
> But this is off-topic.  Let's take it to another list or something.

off-topic?? ... surely you jest!!

I think about _three_ of the 50+ emails in this thread have been on
the topic of open access journal articles on Wikisource.

--
John Vandenberg

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.