Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


Simetrical+wikilist at gmail

Jul 9, 2008, 3:06 PM

Post #1 of 8 (1137 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3

On Wed, Jul 9, 2008 at 5:11 PM, <vasilievvv [at] svn> wrote:
> Log Message:
> -----------
> * Forbid files with * and ? to be uploaded under Windows (it caused internal errors since such characters are illegal there)

It seems like it would be a better idea to be consistent across
platforms here. Otherwise you're just going to cause trouble for
portability; for instance, Windows users would be unable to easily use
an image dump from Wikimedia, or other Unix-based MediaWiki
installations.

However, we don't *really* have to use the same name in the filesystem
as we use as a title. This seems to me like it would be better
implemented by mangling the filename somehow. The invalid Windows/DOS
characters are supposedly:

? [ ] / \ = + < > : ; " ,

Of those, I think the following are currently legal in image names
(before your commit):

? \ = + : ; " ,

Each of these could be replaced in the filesystem by some character
that Windows will accept, or some combination of them, which are
invalid image names anyway. For instance, you could replace them with
{question} {backslash} {equals} {plus} {colon} {semicolon} {quote}
{comma}; these will work correctly because {} are illegal in page
titles but legal in Windows filenames. (But they could send filenames
over the file length limit, so more creative substitutes might be a
better idea.) This way the rules for image titles remain unchanged,
which is nice because a lot of those characters are quite handy to
have in titles.

(Googled sources actually conflict as to the exact list of prohibited
characters. Some say * is prohibited, some don't mention it. Same
for |. ^ is apparently supposed to be illegal in FAT, according to
one source, and there are other restrictions, like no trailing space
or period, and a list of reserved names like "com1" and "nul".
Probably it varies across different versions, but it's a lot bigger
than just ? and *, anyway.)

> * Forbid files to be moved to invalid filenames

This might be more cleanly implemented by making invalid filenames
invalid titles in the Image namespace. That would make things
somewhat simpler by keeping things in more expected places. It also
makes sense to prohibit image pages from existing when it's not
possible for an image of that title to exist. (But projects will need
to be checked for pages that will become invalid under this scheme, of
course, perhaps using a maintenance script.)

> +/**
> + * Checks filename for validity
> + * @param mixed $title Filename or title to check
> + */
> +function wfIsValidFileName( $name ) {

Surely this shouldn't be a global function, but a static method of
something? Or even a non-static method of something.

> + elseif( wfIsWindows() && ( in_string( '*', $name ) || in_string( '?', $name ) ) )
> + return false;
> . . .
> + if( wfIsWindows() )
> + $filtered = preg_replace ( "/[*?]/", '-', $filtered );

Magic constants here. You have a list of blacklisted characters
scattered across multiple files, that's bad. They could become
inconsistent over time.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Jul 9, 2008, 9:39 PM

Post #2 of 8 (1079 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3 [In reply to]

Simetrical wrote:
> However, we don't *really* have to use the same name in the filesystem
> as we use as a title. This seems to me like it would be better
> implemented by mangling the filename somehow. The invalid Windows/DOS
> characters are supposedly:
>
> ? [ ] / \ = + < > : ; " ,
>
> Of those, I think the following are currently legal in image names
> (before your commit):
>
> ? \ = + : ; " ,

'\' would be excluded by wfBaseName(), and ':' is explicitly stripped in
UploadForm::internalProcessUpload().

The others may have been allowed previously, though at least ?, ;, and "
seem unwise. :)

My recommendation is to ditch the use of raw filesystem filenames --
which already fail on Windows due to the weird charset encoding system
breaking any non-ASCII characters -- and allow media files to have any
name in the database, while they're stored with a nice clean content
hash on the filesystem (when a filesystem is used at all as backend).

This has been planned for a long time, but implementation has gotten
stalled while other things get done. (Though we already store deleted
files in this way, and it works pretty well.)

-- brion

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jul 10, 2008, 7:34 AM

Post #3 of 8 (1067 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3 [In reply to]

Simetrical wrote:
> However, we don't *really* have to use the same name in the filesystem
> as we use as a title. This seems to me like it would be better
> implemented by mangling the filename somehow. The invalid Windows/DOS
> characters are supposedly:
>
> ? [ ] / \ = + < > : ; " ,

[]=+;, are legal on windows.

> Of those, I think the following are currently legal in image names
> (before your commit):
>
> ? \ = + : ; " ,
>


> (Googled sources actually conflict as to the exact list of prohibited
> characters. Some say * is prohibited, some don't mention it.
It is for being a wildcard.

> Same for |.
It is for being the pipe character.

> ^ is apparently supposed to be illegal in FAT, according to
> one source,
It is an escape character for windows shell, but legal in fat. Perhaps
only legal in vfat?

> and there are other restrictions, like no trailing space
> or period,
hmm, right. Although not really applicable for images which will have an
extension appended.

> and a list of reserved names like "com1" and "nul".
Strangely, not only are com1 and nul prohibited, but also nul.png or
com1.jpg


> Probably it varies across different versions, but it's a lot bigger
> than just ? and *, anyway.)
>
>> * Forbid files to be moved to invalid filenames

Instead of checking the filename against a list of bad characters, why
not try to actually do it, and abort the rename if it can't be done?
That way no special case will be missing, and it won't be that frequent
anyway. YOu only need to avoid the slashes / \ (and : if wgUploadDir can
be "")


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


bryan.tongminh at gmail

Jul 10, 2008, 7:45 AM

Post #4 of 8 (1067 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3 [In reply to]

On Thu, Jul 10, 2008 at 4:34 PM, Platonides <Platonides [at] gmail> wrote:
> Simetrical wrote:
>> However, we don't *really* have to use the same name in the filesystem
>> as we use as a title. This seems to me like it would be better
>> implemented by mangling the filename somehow. The invalid Windows/DOS
>> characters are supposedly:
>>
>> ? [ ] / \ = + < > : ; " ,
>
> []=+;, are legal on windows.
>
>> Of those, I think the following are currently legal in image names
>> (before your commit):
>>
>> ? \ = + : ; " ,
>>
>
>
>> (Googled sources actually conflict as to the exact list of prohibited
>> characters. Some say * is prohibited, some don't mention it.
> It is for being a wildcard.
>
>> Same for |.
> It is for being the pipe character.
>
Regardless of it being allowed or not, it sound like a bad idea to me
to allow ?*| in filenames.

>> and there are other restrictions, like no trailing space
>> or period,
> hmm, right. Although not really applicable for images which will have an
> extension appended.
>
But it still is something which wfStripIllegalFilenameChars should catch.

> > and a list of reserved names like "com1" and "nul".
> Strangely, not only are com1 and nul prohibited, but also nul.png or
> com1.jpg
>
Uh... ok... so basically MediaWiki installations having com1.png files
are not platform compatible?

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Jul 10, 2008, 10:39 AM

Post #5 of 8 (1072 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3 [In reply to]

On Thu, Jul 10, 2008 at 10:45 AM, Bryan Tong Minh
<bryan.tongminh [at] gmail> wrote:
> Regardless of it being allowed or not, it sound like a bad idea to me
> to allow ?*| in filenames.

By that logic, shouldn't we ban ();!$ and so on for being shell
characters as well? If people want to move files around manually, and
want to use a command line instead of a GUI or a (non-shell) script,
they can be careful with their escaping. But as Brion says, the plan
is to eventually move to hash-based filenames anyway.

> Uh... ok... so basically MediaWiki installations having com1.png files
> are not platform compatible?

Apparently, yeah . . . Unix's "everything but / or null" seems a lot
more convenient here. Although it's kind of a pain when you get some
unprintable binary gibberish for a filename by mistake. :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mrzmanwiki at gmail

Jul 10, 2008, 5:51 PM

Post #6 of 8 (1061 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3 [In reply to]

Bryan Tong Minh wrote:
> Uh... ok... so basically MediaWiki installations having com1.png files
> are not platform compatible?
>

Testing on my own wiki (running on Windows), trying to upload a file
with a correct local name, but having MediaWiki change the name to
Com1.png on upload causes an internal error:
Could not rename file "public/e/ee/Com1.png" to
"public/archive/e/ee/20080711003730!Com1.png".

Uploading to a server not running Windows, it works fine, the only thing
you can't do is save the image from the site to a Windows PC without
changing the name before saving.
<http://test.wikipedia.org/wiki/Image:Com1.png>

--
Alex (w:en:User:Mr.Z-man)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


bryan.tongminh at gmail

Jul 13, 2008, 10:38 AM

Post #7 of 8 (1043 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3 [In reply to]

On Thu, Jul 10, 2008 at 12:06 AM, Simetrical
<Simetrical+wikilist [at] gmail> wrote:
> However, we don't *really* have to use the same name in the filesystem
> as we use as a title. This seems to me like it would be better
> implemented by mangling the filename somehow. The invalid Windows/DOS
> characters are supposedly:
>
> ? [ ] / \ = + < > : ; " ,
>
> Of those, I think the following are currently legal in image names
> (before your commit):
>
> ? \ = + : ; " ,
>

I was just reading the FAT specification and +,;=[] are valid
characters for FAT drivers that support LFN (Long File Names) which is
basically everything starting from Windows 95.

To be exact, under FAT a file name is allowed to contain any letters,
digits or characters with code point above 127. Also the following
characters are allowed: $%'-_@~`!(){}^#& For Windows 95 and above the
characters mentioned above are allowed as well.

I don't know about NTFS though.

Bryan

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


s.mazeland at xs4all

Jul 13, 2008, 11:04 AM

Post #8 of 8 (1037 views)
Permalink
Re: [MediaWiki-CVS] SVN: [37443] trunk/phase3 [In reply to]

In Posix namespace, any UTF-16 code unit (case sensitive) except U+0000
(NUL) and / (slash). In Win32 namespace, any UTF-16 code unit (case
insensitive) except U+0000 (NUL) / (slash) \ (backslash) : (colon) *
(asterisk) ? (Question mark) " (quote) < (less than) > (greater than) and |
(pipe) [1,2]

Cheers! Siebrand

[1] http://en.wikipedia.org/wiki/NTFS
[2] http://data.linux-ntfs.org/ntfsdoc.html.gz

-----Oorspronkelijk bericht-----
Van: wikitech-l-bounces [at] lists
[mailto:wikitech-l-bounces [at] lists] Namens Bryan Tong Minh
Verzonden: zondag 13 juli 2008 19:38
Aan: Wikimedia developers
Onderwerp: Re: [Wikitech-l] [MediaWiki-CVS] SVN: [37443] trunk/phase3

I was just reading the FAT specification and +,;=[] are valid characters for
FAT drivers that support LFN (Long File Names) which is basically everything
starting from Windows 95.

To be exact, under FAT a file name is allowed to contain any letters, digits
or characters with code point above 127. Also the following characters are
allowed: $%'-_@~`!(){}^#& For Windows 95 and above the characters mentioned
above are allowed as well.

I don't know about NTFS though.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.