Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ModPerl: ModPerl

Any success with storing photos in a database?

 

 

First page Previous page 1 2 Next page Last page  View All ModPerl modperl RSS feed   Index | Next | Previous | View Threaded


mark at summersault

Sep 29, 2008, 8:21 PM

Post #1 of 34 (2300 views)
Permalink
Any success with storing photos in a database?

This question isn't so much a mod_perl question, as it is a question
about building high performance websites with Perl.

We have a large, busy, database application that relates to millions of
photos, which we also need to store and display. We've been keeping the
meta data about the photos in PostgreSQL, and the files on the file
system. (Implemented much like CGI::Uploader ).

This has worked great in terms of performance, but with so much data to
manage, over time we have run into data inconsistency issues between the
file system and the database.

So, I'm asking if anyone has had experience successfully storing photos
(or othe files) directly in database? That would solve the consistency
issue, but may create a performance issue. Perhaps the right kind of
caching layer could solve that.

Mark

--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark [at] summersault Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .


aw at ice-sa

Sep 29, 2008, 1:16 PM

Post #2 of 34 (2239 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

Mark Stosberg wrote:
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).
>
> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.
>
> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue. Perhaps the right kind of
> caching layer could solve that.
>
I am curious about your application, because we have something similar
and similar volumes, not with photos but with documents in general.

The following is an opinion piece.

We have also looked at various data organisations over time.
Regarding storing large objects directly in a database, the one issue is
always that (because of the object sizes), it makes any operation on the
rows of such a database or table very heavy. Imagine having to dump or
reload a table that contains 500,000 "blobs" of 2-3 MB each.
(Don't know about PostgreSQL, but many db systems require a dump and a
reload when you change a table structure).
Or simply take a backup of that table. And you cannot make an "rsync"
of a database as easily as of a filesystem.
It also means that any intermediate buffer (which are often used to
improve retrieval of "nearby" rows) is quickly full, with few rows in it.
(On the other hand, if you just keep thumbnails of a couple of Kb, I
guess it would not matter much.)

Another issue is that it happens that databases get screwed up, and the
likelihood probably increases as you push them to their limits (for
example with very large row sizes). Resolving some inconsistencies
between database rows and files on disk may be no fun, but resolving
inconsistencies within a database may be even less so.

One point I am curious about, is what kind of file structure you use to
store the millions of images on the filesystem. I can't imagine that
you do it really into one flat directory ?
And are you storing the real paths directly in the database ?

To get back to your issue of inconsistency : maybe the best strategy is
just to check for such inconsistencies as early as possible ? For
example, when you add or delete objects, write this information
somewhere in a daily "transactions" file, which is then analysed at
night by some job which checks that everything is really where it is
supposed to be, and lets you know when not.

Regarding efficiency : when you think about it, a modern filesystem is
about the fastest, most efficient in space and most reliable database
system that one can think of, with the lowest overhead, as long as you
know the exact path of an object, and as long as all the directories in
the path are kept to a reasonable size (important). It has its inherent
buffering at various levels, optimised to access files. It has a whole
bunch of utilities to manipulate it; it it shareable, but can be locked
when you need it. It is portable.
It does have one inconvenient : it has a single "key" to access an
object (the path). But for that, you have your database system.

Oh, and I've thought of another advantage, in an Apache/web context : to
send the content of a file to a browser, you can take advantage of the
sendfile() call, which is very efficient. Now if your file is a blob in
a row of a database, you have to read it yourself in memory, and send
it, don't you ?

I've just re-convinced myself not to change our storage structure.


davidnicol at gmail

Sep 29, 2008, 1:36 PM

Post #3 of 34 (2240 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Mon, Sep 29, 2008 at 3:16 PM, André Warnier <aw [at] ice-sa> wrote:

> Oh, and I've thought of another advantage, in an Apache/web context : to
> send the content of a file to a browser, you can take advantage of the
> sendfile() call, which is very efficient. Now if your file is a blob in a
> row of a database, you have to read it yourself in memory, and send it,
> don't you ?

SQLite has an API for opening handles to blobs, but you would have to
write something in C to work with it.


geekout at gmail

Sep 29, 2008, 1:42 PM

Post #4 of 34 (2247 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

I'm pretty sure that the consensus is to never actually store the files in
the database. What actual inconsistencies are you seeing that you are
trying to fix?

On Mon, Sep 29, 2008 at 1:00 PM, Mark Stosberg <mark [at] summersault> wrote:

>
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).
>
> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.
>
> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue. Perhaps the right kind of
> caching layer could solve that.
>
> Mark
>
> --
> . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Mark Stosberg Principal Developer
> mark [at] summersault Summersault, LLC
> 765-939-9301 ext 202 database driven websites
> . . . . . http://www.summersault.com/ . . . . . . . .
>
>
>


--
~Tyler


frank at wiles

Sep 29, 2008, 1:49 PM

Post #5 of 34 (2242 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Mon, 29 Sep 2008 15:00:41 -0400
Mark Stosberg <mark [at] summersault> wrote:

> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions
> of photos, which we also need to store and display. We've been
> keeping the meta data about the photos in PostgreSQL, and the files
> on the file system. (Implemented much like CGI::Uploader ).
>
> This has worked great in terms of performance, but with so much data
> to manage, over time we have run into data inconsistency issues
> between the file system and the database.
>
> So, I'm asking if anyone has had experience successfully storing
> photos (or othe files) directly in database? That would solve the
> consistency issue, but may create a performance issue. Perhaps the
> right kind of caching layer could solve that.

Actually you're already doing it correctly. Andre already mentioned
many of the pitfalls of trying to store large binary data in a database,
so I won't rehash them again.

The only issue you seem to be having is the inconsistency. That issue
is going to be much easier to solve than trying to scale by putting the
photos in the database.

Usually people just make sure inserts/updates to the photo table is
done in a transaction and if that transaction succeeds or fails, it
does the appropriate write/delete on the file system.

But since you're using PostgreSQL ( my favorite database and a large
part of my consulting practice ) you could even go so far as to write a
few pl/perl stored procedures to handle keeping the file system in sync
with the database.

-------------------------------------------------------
Frank Wiles, Revolution Systems, LLC.
Personal : frank [at] wiles http://www.wiles.org
Work : frank [at] revsys http://www.revsys.com


frank at wiles

Sep 29, 2008, 1:52 PM

Post #6 of 34 (2239 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Mon, 29 Sep 2008 22:16:12 +0200
André Warnier <aw [at] ice-sa> wrote:

> We have also looked at various data organisations over time.
> Regarding storing large objects directly in a database, the one issue
> is always that (because of the object sizes), it makes any operation
> on the rows of such a database or table very heavy. Imagine having
> to dump or reload a table that contains 500,000 "blobs" of 2-3 MB
> each. (Don't know about PostgreSQL, but many db systems require a
> dump and a reload when you change a table structure).

FYI PostgreSQL doesn't require a dump/reload when altering the table
structure.

> Oh, and I've thought of another advantage, in an Apache/web context :
> to send the content of a file to a browser, you can take advantage of
> the sendfile() call, which is very efficient. Now if your file is a
> blob in a row of a database, you have to read it yourself in memory,
> and send it, don't you ?

That is one advantage, but I'm going to take it a step further and say
by having the files outside of the database you don't even need Apache,
but should in fact use a lighter weight web server like tux, nginx, or
lighttpd for serving up static media like that.

-------------------------------------------------------
Frank Wiles, Revolution Systems, LLC.
Personal : frank [at] wiles http://www.wiles.org
Work : frank [at] revsys http://www.revsys.com


perrin at elem

Sep 29, 2008, 2:04 PM

Post #7 of 34 (2238 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg <mark [at] summersault> wrote:
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display.

Have you read Cal Henderson's book about how Flickr works? It's a bit
extreme, but interesting. A smaller version of the "many photos"
problem is LiveJournal, who use their custom file storage API
(MogileFS) and serve the files with their own web server (perlbal).
Definitely another extreme solution that should probably be considered
a last resort after trying to make easier stuff work, but it sounds
like the easier stuff is not working for you.

- Perrin


cosimo at streppone

Sep 29, 2008, 2:04 PM

Post #8 of 34 (2239 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

In data 29 settembre 2008 alle ore 21:00:41, Mark Stosberg
<mark [at] summersault> ha scritto:

> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).

We have:

a) ~150,000 avatar tiny pictures (50x50);
b) ~300,000 user photos (320x240 originals), but
also available in 4 more sizes;
c) tens of millions of album pictures in original
and thumbnail sizes;

We're using MySQL 5.0 with MyISAM storage engine. Yes.
Until recently, a) & b) were stored into a MySQL blob field. Yes.
Did you hear me screaming? :-)

Problems I found when I started working here:

- our mod_perl backends were serving 20-40% of picture requests,
which is completely insane;

- our picture-serving code was fetching the picture from the database,
eventually scaling it on the fly (!), saving it in memcached
and $r->print()ing it out down the wire.
That's completely insane. The scaling even disabled caching.

- when you update a picture metadata (this is MySQL), you _LOCK_
the _ENTIRE_ table with hundreds of thousands images.

- fetching from a blob field in MySQL is expensive.

Now instead:

- avatars were our first experiment. They are stored as static resources
with a hierachical and balanced filesystem structure
(using digests and splitting them up);

We completely removed the caching layer from our mod_perl code,
because caching happens directly in the browser for static
resources;

We managed to move away 500,000 req/day from mod_perl to static
HTTP servers.

- User photos are coming. We wrote a nice application layer that
can upload a single resource to many pools of static servers and
in different sizes with automatic thumbnailing. The filesystem path
scheme can be defined into the resource perl class, but basically
is the same digest + splitting, like in:

http://static.myapp.com/pool1/a1/b2c/d3f4g5h6.../123456_m.jpg

We are mass-exporting pictures from the database blobs to our
filesystems via DAV. Using DAV is not the most efficient way but
allows you to attach arbitrary metadata to the filesystem.

We now use that to resolve inconsistencies and "sync" the
metadata in the database.
This should go live with next release. I'll let you know :)

I'm not sure what to do for album pictures. They are already out
of the database, thank god, but the "design" guys now want 3-4
thumbnails even for those pics. Suggestions?

> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.

Can you explain the issues you found?
I'd really like to know, so I'm prepared. :-)

> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue.

Yes, performance issues.

> Perhaps the right kind of caching layer could solve that.

I'm not sure. If you throw caching into the "pics-in-the-db" mess,
IMHO you only make the situation worse.

--
Cosimo


js5 at sanger

Sep 29, 2008, 2:45 PM

Post #9 of 34 (2236 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

There are good reasons to store images (especially small ones) in
databases (and with careful management of headers in your mod_perl).

Some of you have missed inherent problems with the file systems
even balanced heirarchical tree - ones in a shared server
environment which can lead to gross efficiencies - in your cases
you may not be doing multiple deletes - but in the examples I work
with it is not the creation and storage which breaks the file
system, but the requirement to clear our old files before filling
up the file system.


--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.


cosimo at streppone

Sep 29, 2008, 2:55 PM

Post #10 of 34 (2236 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

In data 29 settembre 2008 alle ore 23:45:05, James Smith
<js5 [at] sanger> ha scritto:

> There are good reasons to store images (especially small ones) in
> databases (and with careful management of headers in your mod_perl).
>
> Some of you have missed inherent problems with the file systems
> even balanced heirarchical tree - ones in a shared server
> environment which can lead to gross efficiencies - in your cases
> you may not be doing multiple deletes - but in the examples I work
> with it is not the creation and storage which breaks the file
> system, but the requirement to clear our old files before filling
> up the file system.

If you have "proper" metadata, you can go and delete your files.
In our case, we chose to hash our paths by basically user-id,
so every file owned by a user is in the same folder and
can be deleted without any problems.

Maybe I didn't get your point.

--
Cosimo


aw at ice-sa

Sep 29, 2008, 3:07 PM

Post #11 of 34 (2245 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

James Smith wrote:
>
>
> There are good reasons to store images (especially small ones) in
> databases (and with careful management of headers in your mod_perl).
>
> Some of you have missed inherent problems with the file systems
> even balanced heirarchical tree - ones in a shared server
> environment which can lead to gross efficiencies - in your cases
> you may not be doing multiple deletes - but in the examples I work
> with it is not the creation and storage which breaks the file
> system, but the requirement to clear our old files before filling
> up the file system.
>
>

Finally, someone who does not agree with the general line.
That makes it more interesting.

As for the size, the OP indicated that his objects were pictures (photos
I believe), which tend to be in the multi-MB range, and growing as
cameras get more pixels. The documents we handle also, since there is a
notable tendency to inflation of office document sizes, what with
embedded pictures and graphics and such.

I guess indeed it also depends on whether the stored images/documents
are transient, or stay forever. In our case they stay forever, because
they are part of a kind of archive. Once loaded, a document is never
deleted.
That is why we do not just store the original documents in a purely
hierarchical file structure, but have developed a system based on a
notion of "logical volumes", which can be moved without affecting the
link between the database records and the stored objects.
This being said, the organisation in question is still on top of
classical filesystems.
In recent years, the universal support under Unix/Linux of "LVM's" much
simplifies the question of space management, physical location etc.
Disk storage per volume unit is also getting ever larger and cheaper, so
it does not cause us big concern.
At the limit, if a certain document collection is not needed anymore,
the corresponding records can be moved to an archive table or database,
but the documents themselves stay.


js5 at sanger

Sep 29, 2008, 3:09 PM

Post #12 of 34 (2245 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Mon, 29 Sep 2008, Cosimo Streppone wrote:

> In data 29 settembre 2008 alle ore 23:45:05, James Smith <js5 [at] sanger>
> ha scritto:
>
>> There are good reasons to store images (especially small ones) in databases
>> (and with careful management of headers in your mod_perl).
>>
>> Some of you have missed inherent problems with the file systems
>> even balanced heirarchical tree - ones in a shared server
>> environment which can lead to gross efficiencies - in your cases
>> you may not be doing multiple deletes - but in the examples I work
>> with it is not the creation and storage which breaks the file
>> system, but the requirement to clear our old files before filling
>> up the file system.
>
> If you have "proper" metadata, you can go and delete your files.
> In our case, we chose to hash our paths by basically user-id,
> so every file owned by a user is in the same folder and
> can be deleted without any problems.
>

We have "proper" meta data - deleting files is in all file systems an
expensive operation, if you have a large number of files to delete,
the overhead of deleting files can become excessive - we produce and
delete anwhere up to and including 1/2 million files per day - and
the deletion is the crippling stage on a journalled file system

> Maybe I didn't get your point.
>
> --
> Cosimo


--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.


cosimo at streppone

Sep 29, 2008, 3:24 PM

Post #13 of 34 (2232 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

In data 30 settembre 2008 alle ore 00:09:52, James Smith
<js5 [at] sanger> ha scritto:

> On Mon, 29 Sep 2008, Cosimo Streppone wrote:
>
>> In data 29 settembre 2008 alle ore 23:45:05, James Smith
>> <js5 [at] sanger> ha scritto:
>>
>>> There are good reasons to store images (especially small ones) in
>>> databases (and with careful management of headers in your mod_perl).

>> If you have "proper" metadata, you can go and delete your files.
>
> We have "proper" meta data

Yes, sorry. I was thinking about our case.

> we produce and delete anwhere up to and including 1/2
> million files per day - and
> the deletion is the crippling stage on a journalled file system

I see. Again, our case is very different though.
99,99% is create/add/modify and we almost never delete.

What filesystem/os do you use?

--
Cosimo


js5 at sanger

Sep 29, 2008, 4:11 PM

Post #14 of 34 (2240 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Tue, 30 Sep 2008, Cosimo Streppone wrote:

> In data 30 settembre 2008 alle ore 00:09:52, James Smith <js5 [at] sanger>
> ha scritto:
>
>> On Mon, 29 Sep 2008, Cosimo Streppone wrote:
>>
>>> In data 29 settembre 2008 alle ore 23:45:05, James Smith
>>> <js5 [at] sanger> ha scritto:
>>>
>>>> There are good reasons to store images (especially small ones) in
>>>> databases (and with careful management of headers in your mod_perl).
>
>>> If you have "proper" metadata, you can go and delete your files.
>>
>> We have "proper" meta data
>
> Yes, sorry. I was thinking about our case.
>
>> we produce and delete anwhere up to and including 1/2
>> million files per day - and
>> the deletion is the crippling stage on a journalled file system
>
> I see. Again, our case is very different though.
> 99,99% is create/add/modify and we almost never delete.
>
> What filesystem/os do you use?

It has to be a shared filesystem - so at the moment GPFS/red had; we are
moving over to a memcached system to store and server the temporary images;
otherwise we would be stuck with NFS or Lustre both of which fail
quite badly with small files.. all this is backed by fibre attached
SAN storage

(they really are temporary and can be easily restored)

James

>
> --
> Cosimo


--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.


himanshu.garg at gmail

Sep 29, 2008, 6:48 PM

Post #15 of 34 (2243 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

2008/9/30 Frank Wiles <frank [at] wiles>:
> On Mon, 29 Sep 2008 22:16:12 +0200
> André Warnier <aw [at] ice-sa> wrote:
>
>> We have also looked at various data organisations over time.
>> Regarding storing large objects directly in a database, the one issue
>> is always that (because of the object sizes), it makes any operation
>> on the rows of such a database or table very heavy. Imagine having
>> to dump or reload a table that contains 500,000 "blobs" of 2-3 MB
>> each. (Don't know about PostgreSQL, but many db systems require a
>> dump and a reload when you change a table structure).
>
> FYI PostgreSQL doesn't require a dump/reload when altering the table
> structure.
>
>> Oh, and I've thought of another advantage, in an Apache/web context :
>> to send the content of a file to a browser, you can take advantage of
>> the sendfile() call, which is very efficient. Now if your file is a
>> blob in a row of a database, you have to read it yourself in memory,
>> and send it, don't you ?
>
> That is one advantage, but I'm going to take it a step further and say
> by having the files outside of the database you don't even need Apache,
> but should in fact use a lighter weight web server like tux, nginx, or
> lighttpd for serving up static media like that.

A newbie question in the insightful thread. Can serving static files
and cookie authentication go together? If yes, any hints. Second what
are the sizes at which one should start moving to files. e.g. how
about plain text blog entries containing no more than say 5 KBytes.

Thanks,
Himanshu

>
> -------------------------------------------------------
> Frank Wiles, Revolution Systems, LLC.
> Personal : frank [at] wiles http://www.wiles.org
> Work : frank [at] revsys http://www.revsys.com
>
>


pangj at laposte

Sep 30, 2008, 5:11 AM

Post #16 of 34 (2215 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

> Message du 29/09/08 23:05
> De : "Perrin Harkins"
> A : "Mark Stosberg"
> Copie à : modperl [at] perl
> Objet : Re: Any success with storing photos in a database?
>
>
> On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg wrote:
> > We have a large, busy, database application that relates to millions of
> > photos, which we also need to store and display.
>
> Have you read Cal Henderson's book about how Flickr works? It's a bit
> extreme, but interesting. A smaller version of the "many photos"
> problem is LiveJournal, who use their custom file storage API
> (MogileFS) and serve the files with their own web server (perlbal).
> I had used MogileFS for storing photos, that's a good application.See:
http://www.danga.com/mogilefs/
AFAIK, Perlbal is reverse proxy before the webserver, not a real web server.


Regards,
Jeff.

Créez votre adresse électronique prenom.nom [at] laposte
1 Go d'espace de stockage, anti-spam et anti-virus intégrés.


mpeters at plusthree

Sep 30, 2008, 6:11 AM

Post #17 of 34 (2209 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

Himanshu wrote:

> A newbie question in the insightful thread. Can serving static files
> and cookie authentication go together?

Yes. You can do this with something like mod_auth_tkt in a proxy web server (very light) and have a
backend mod_perl (or php, or Java, etc) server set the cookie after someone has logged in. Then you
just configure your auth in your proxy for which groups have access to which files.

> Second what
> are the sizes at which one should start moving to files. e.g. how
> about plain text blog entries containing no more than say 5 KBytes.

It really depends on you database. In MySQL using BLOBS won't really cause
problems unless you use queries that involve full table scans or queries that actually pull or
manipulate the BLOBS a lot. So if you have the proper indexes and only pull the BLOBS rarely, then
you should be ok for most uses.

But as far as blogs go, you really shouldn't be pulling the text of the blog out of the DB for every
request. Since it changes very rarely, especially compared to the number of times it's read then
storing it in a "published" state on the filesystem is usually best. And if you still need to add
dynamic bits to the page at run time, then instead of publishing a .html file, you can publish a
template (for Template Toolkit, HTML::Template, etc).

--
Michael Peters
Plus Three, LP


perrin at elem

Sep 30, 2008, 9:40 AM

Post #18 of 34 (2191 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Tue, Sep 30, 2008 at 8:11 AM, Jeff Pang <pangj [at] laposte> wrote:
> AFAIK, Perlbal is reverse proxy before the webserver, not a real web server.

I don't use it, but it can do auth and serve images. If you read the
presentations about LiveJournal's backend, they explain this.

- Perrin


davidnicol at gmail

Sep 30, 2008, 10:11 AM

Post #19 of 34 (2193 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Tue, Sep 30, 2008 at 11:40 AM, Perrin Harkins <perrin [at] elem> wrote:
> I don't use it, but it can do auth and serve images.

as can Apache itself, with appropriate access control. The two steps
(this should not be news to anyone here) are checking the auth then
something like "exec cat $filename" for which modperl is kind of
heavy.

Best to let Apache serve the images from a static directory, after
devising some kind of dynamic .htaccess scheme driven by the smarter
pieces of your system.


mark at summersault

Sep 30, 2008, 10:20 AM

Post #20 of 34 (2192 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

> Usually people just make sure inserts/updates to the photo table is
> done in a transaction and if that transaction succeeds or fails, it
> does the appropriate write/delete on the file system.

I could perhaps be better about "rolling back" file system actions
if a DB transaction files.

I'll be looking more into the cause today. Initial findings suggest that
there was import issue with legacy data long ago, and a significant
issue may not even remain in in the current code.

> But since you're using PostgreSQL ( my favorite database and a large
> part of my consulting practice ) you could even go so far as to write a
> few pl/perl stored procedures to handle keeping the file system in sync
> with the database.

Ah, interesting idea. I'll keep that in mind. I'll have to deal with the
hoop that the image storage server is currently not accessible from the
master PostgreSQL server.

Mark

--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark [at] summersault Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .


mark at summersault

Sep 30, 2008, 10:20 AM

Post #21 of 34 (2196 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

> One point I am curious about, is what kind of file structure you use to
> store the millions of images on the filesystem. I can't imagine that
> you do it really into one flat directory ?

Thanks for the response.

We use the 'md5' scheme in CGI::Uploader. From the docs:
"[We] will create three levels of directories based on the first three letters
of the ID's md5 sum. The result may look like this:"
2/0/2/123.jpg"

> And are you storing the real paths directly in the database ?

No, we generate the path based on the unique ID.

You points were all interesting and steer me back in the direction of trying to better
understand and eliminate how inconsistency is creeping into our system.

Mark

--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark [at] summersault Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .


mark at summersault

Sep 30, 2008, 10:20 AM

Post #22 of 34 (2194 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Mon, 29 Sep 2008 17:04:29 -0400
"Perrin Harkins" <perrin [at] elem> wrote:

> On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg <mark [at] summersault> wrote:
> > We have a large, busy, database application that relates to millions of
> > photos, which we also need to store and display.
>
> Have you read Cal Henderson's book about how Flickr works?

It looks familiar, but I don't think I've read it. I've bookmarked it
now. Thanks for the suggestion, Perrin!

If I find anything "interesting" about our inconsistency, I'll report back.

I was just reviewing the code flow of 'store_upload()' in CGI::Uploader:

http://search.cpan.org/src/MARKSTOS/CGI-Uploader-2.15/lib/CGI/Uploader.pm

The key part I see is for the 'update' case where we:

1. Delete old generated files (like Thumbnails)
2. Run the DB update
3. Regenerate the thumbnails.

The DB update failed, we couldn't "undelete" the thumbnails. That design could perhaps could be improved.

Also, I see that this code doesn't use transactions, I think because it expects that the outer application code may wrap this work in a larger action.

With current DBD::Pg and PostgreSQL versions, I could add some "savepoints" here, to make some parts
of this more transactional (like inserting a photo and all of it's thumbnails in the same in one transactions).

The savepoint could would be specific to using the PostgreSQL driver, as I'm not aware that other DBs support the same syntax.

Other ideas for improvements here are welcome.

Mark

--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark [at] summersault Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .


perrin at elem

Sep 30, 2008, 10:24 AM

Post #23 of 34 (2187 views)
Permalink
Re: Any success with storing photos in a database? [In reply to]

On Tue, Sep 30, 2008 at 1:11 PM, David Nicol <davidnicol [at] gmail> wrote:
> On Tue, Sep 30, 2008 at 11:40 AM, Perrin Harkins <perrin [at] elem> wrote:
>> I don't use it, but it can do auth and serve images.
>
> as can Apache itself, with appropriate access control. The two steps
> (this should not be news to anyone here) are checking the auth then
> something like "exec cat $filename" for which modperl is kind of
> heavy.

Perlbal does have some unique features. It can check the auth for an
image with your mod_perl backend and then serve the image itself.
It's built on an event loop model and it has some direct ties to
MogileFS. There's a good feature list here:
http://www.danga.com/perlbal/

I would typically stick with apache and mod_auth_tkt, but for people
who already use MogileFS this must be a pretty attractive setup.

- Perrin


perrin at elem

Oct 15, 2008, 9:41 AM

Post #24 of 34 (1966 views)
Permalink
Re: Any success with storing photos in a database? (prevents double-submits) [In reply to]

On Wed, Oct 15, 2008 at 12:31 PM, Mark Stosberg <mark [at] summersault> wrote:
> We had a "double submit" bug that allowed a form to be submitted twice when we
> weren't fully prepared for that. We are still researching the best practices to
> address this a general case. One approach we are considering is change the
> submit action on forms with JavaScript, so it disables the submit button, and
> then actually submit the form, preventing one kind of double-submission. It
> seems like I don't see this approach happening in the wild much, though. I
> suspect there is a better solution.

JavaScript is okay, but can be a problem when people hit back
expecting to use the form again and the button is still disabled.
Another approach is a unique ID in the form that you track in the
user's session (i.e. this ID was seen before). If the problem is
large uploads with no feedback until they finish, you can use one of
the upload progress tools.

- Perrin


mark at summersault

Oct 15, 2008, 10:20 AM

Post #25 of 34 (1976 views)
Permalink
Re: Any success with storing photos in a database? (prevents double-submits) [In reply to]

On Tue, 30 Sep 2008 10:06:26 -0400
Mark Stosberg <mark [at] summersault> wrote:

> On Mon, 29 Sep 2008 17:04:29 -0400
> "Perrin Harkins" <perrin [at] elem> wrote:
>
> > On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg <mark [at] summersault> wrote:
> > > We have a large, busy, database application that relates to millions of
> > > photos, which we also need to store and display.
>
> If I find anything "interesting" about our inconsistency, I'll report back.

I said I'd report back if I found out where the inconsistency was appearing, between files
on the file system and meta-data in the database.

So far, I've found the following:

We had a "double submit" bug that allowed a form to be submitted twice when we
weren't fully prepared for that. We are still researching the best practices to
address this a general case. One approach we are considering is change the
submit action on forms with JavaScript, so it disables the submit button, and
then actually submit the form, preventing one kind of double-submission. It
seems like I don't see this approach happening in the wild much, though. I
suspect there is a better solution.

Beyond this, I think we've found the remaining inconsistencies to be happening
at extremely low rates, to the point where it might be worth completely
tracking down the final issue.

In summary, I think the majority advice found on this list holds true: storing
image files on the file system and meta-data in the database is a good way to
go.

Mark

--
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Stosberg Principal Developer
mark [at] summersault Summersault, LLC
765-939-9301 ext 202 database driven websites
. . . . . http://www.summersault.com/ . . . . . . . .

First page Previous page 1 2 Next page Last page  View All ModPerl modperl RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.