Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Catalyst: Users

Large requests with JSON?

 

 

Catalyst users RSS feed   Index | Next | Previous | View Threaded


moseley at hank

Feb 5, 2010, 12:54 PM

Post #1 of 8 (1864 views)
Permalink
Large requests with JSON?

As you might have picked up I'm working on an REST api that uses JSON in the
request. I need to also allow large file uploads.

HTTP::Body::OctetStream will chunk the request body and send to a temp file,
but Catalyst::Action::Deserialize::JSON will load the temp file into memory.
Obviously, want to limit that.

AFAIK, there's no way to stream parse JSON (so that only part is in memory
at any given time). What would be the recommended serialization for
uploaded files -- just use multipart/form-data for the uploads?

BTW -- I don't see any code in HTTP::Body to limit body size. Doesn't that
seem like a pretty easy DoS for Catalyst apps? I do set a request size
limit in the web server, but if I need to allow 1/2GB uploads or so then
could kill the machine pretty easily, no?



--
Bill Moseley
moseley [at] hank


bobtfish at bobtfish

Feb 5, 2010, 8:56 PM

Post #2 of 8 (1818 views)
Permalink
Re: Large requests with JSON? [In reply to]

On 5 Feb 2010, at 20:54, Bill Moseley wrote:
> AFAIK, there's no way to stream parse JSON (so that only part is in
> memory at any given time). What would be the recommended
> serialization for uploaded files -- just use multipart/form-data for
> the uploads?

Don't?

Why not just do a PUT request with all the data as unmangled binary?

> BTW -- I don't see any code in HTTP::Body to limit body size.
> Doesn't that seem like a pretty easy DoS for Catalyst apps? I do
> set a request size limit in the web server, but if I need to allow
> 1/2GB uploads or so then could kill the machine pretty easily, no?

Well, you set it at the web server.. That stops both overlarge content-
length requests, and when the body exceeds the specified content length.

But yes, you have to provision temp file space for n files in flight x
max file size...

(I have an HTTP::Body subclass I use to stream stuff directly into
mogilefs rather than getting a temp file - code on request)...

Cheers
t0m


_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


moseley at hank

Feb 6, 2010, 8:28 AM

Post #3 of 8 (1800 views)
Permalink
Re: Large requests with JSON? [In reply to]

On Fri, Feb 5, 2010 at 8:56 PM, Tomas Doran <bobtfish [at] bobtfish> wrote:

>
> On 5 Feb 2010, at 20:54, Bill Moseley wrote:
>
>> AFAIK, there's no way to stream parse JSON (so that only part is in memory
>> at any given time). What would be the recommended serialization for
>> uploaded files -- just use multipart/form-data for the uploads?
>>
>
> Don't?


> Why not just do a PUT request with all the data as unmangled binary?


As in don't provide a way to upload meta data along with the file (name,
date, description, author, title, reference id) like the web upload allows
with multipart/form-data? Or invent some new serialization where the meta
data is embedded in the upload? Or do a POST with the file, then flag the
new upload as incomplete until a PUT is done to set associated meta data?

The API is suppose to offer much of the same functionality as the web
interface. JSON is somewhat nice because, well, customers have requested
it, and also that it lends itself to more complex (not flat) data
representations. Of course, urlencoded doesn't have to be flat -- we have
some YUI-based AJAX code that sends json in $c->req->params->{data}. But I
digress.

The 'multipart/form-data' is nice because if the client is well behaved
uploads are chunked to disk. XML can also do this, too (I have an
HTTP::Body subclass for XML-RPC that chunks base64 elements to disk).



> BTW -- I don't see any code in HTTP::Body to limit body size. Doesn't
>> that seem like a pretty easy DoS for Catalyst apps? I do set a request size
>> limit in the web server, but if I need to allow 1/2GB uploads or so then
>> could kill the machine pretty easily, no?
>>
>
> Well, you set it at the web server.. That stops both overlarge
> content-length requests, and when the body exceeds the specified content
> length.
>

Yes, for example in Apache LimitRequestBody can be set and if you send a
content-length header larger than that value the request is rejected right
away. And, IIRC, Apache will just discard any data over the what is
specified in the content-length header (i.e. Catalyst won't see any data
past the content length from Apache).



> But yes, you have to provision temp file space for n files in flight x max
> file size...
>

You are making an assumption that the request body actually makes it to a
temp file.

Imagine you allow uploads of CD iso files, so say 700MB. So, you set the
webserver's limit to that. Normally, when someone uploads HTTP::Body you
expect OctetStream or form-data posts which ends up buffering to disk.

Now, if someone sets their content type to Urlencoded then HTTP::Body just
gathers up that 700MB in memory. MaxClients is 50, so do that math.

Granted someone would have to work very hard to get enough data at once all
to the same web server, and if an attacker is that determined they could
find other equally damaging attacks. And a good load balancer can monitor
memory on disk space on the web servers and stop sending requests to a
server low on resources.


Most applications don't have this problem since uploading that large of a
file is likely rare. Well, that assumes that everyone is using something in
front of Catalyst that limits upload size (like Apache's LimitRequestBody).

It's unusual to have a very large valid Urlencoded (or non-upload form-data)
body in a normal request (that's a lot of radio buttons and text to type!)
so, it would is not be wise for HTTP::Body to limit the size of
$self->{buffer} to something sane? I suppose it could flush to disk after
getting too big, but that doesn't really help because some serializations
require reading the entire thing into memory to parse.



--
Bill Moseley
moseley [at] hank


pagaltzis at gmx

Feb 6, 2010, 11:29 AM

Post #4 of 8 (1802 views)
Permalink
Re: Large requests with JSON? [In reply to]

* Bill Moseley <moseley [at] hank> [2010-02-06 17:30]:
> As in don't provide a way to upload meta data along with the
> file (name, date, description, author, title, reference id)
> like the web upload allows with multipart/form-data? Or invent
> some new serialization where the meta data is embedded in the
> upload?

Neither, depending on your metadata. The things you did mention
could quite well be sent as request headers. No need to put
another envelope inside the HTTP request envelope.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


moseley at hank

Feb 6, 2010, 2:36 PM

Post #5 of 8 (1795 views)
Permalink
Re: Re: Large requests with JSON? [In reply to]

On Sat, Feb 6, 2010 at 11:29 AM, Aristotle Pagaltzis <pagaltzis [at] gmx>wrote:

> * Bill Moseley <moseley [at] hank> [2010-02-06 17:30]:
> > As in don't provide a way to upload meta data along with the
> > file (name, date, description, author, title, reference id)
> > like the web upload allows with multipart/form-data? Or invent
> > some new serialization where the meta data is embedded in the
> > upload?
>
> Neither, depending on your metadata. The things you did mention
> could quite well be sent as request headers. No need to put
> another envelope inside the HTTP request envelope.


Could you be more specific? For example API request to

1) create a new user in account #1234 with name, email, etc.
2) create a user but also provide a photo when creating the user
3) upload a document for the user and the document must include an
associated collection of meta data (e.g. filename, timestamp, author etc.).
The uploaded document must include
this meta data before it can be accepted.





--
Bill Moseley
moseley [at] hank


pagaltzis at gmx

Feb 9, 2010, 2:36 AM

Post #6 of 8 (1664 views)
Permalink
Re: Large requests with JSON? [In reply to]

* Bill Moseley <moseley [at] hank> [2010-02-06 23:35]:
> 1) create a new user in account #1234 with name, email, etc.

This is just a normal form POST.

> 2) create a user but also provide a photo when creating the user

I might separate this out into two requests – whatever the POST
request returns would contain a link to which the client can PUT
the photo.

> 3) upload a document for the user and the document must include
> an associated collection of meta data (e.g. filename,
> timestamp, author etc.). The uploaded document must include
> this meta data before it can be accepted.

That sounds like the case I was thinking about: just do a PUT
request with X-MyApp-Filename, X-MyApp-Timestamp etc headers.

(Another option, which is better in some ways I think, would be
the two-request approach as above, though that would be more
complicated. Ie. the client POSTs the metadata, the server files
the data away temporarily and returns a link to which the client
can PUT the file, and only once that request has succeeded does
the server store both metadata and file in their proper place.)

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


moseley at hank

Feb 9, 2010, 7:12 AM

Post #7 of 8 (1649 views)
Permalink
Re: Re: Large requests with JSON? [In reply to]

On Tue, Feb 9, 2010 at 2:36 AM, Aristotle Pagaltzis <pagaltzis [at] gmx>wrote:

> > 3) upload a document for the user and the document must include
> > an associated collection of meta data (e.g. filename,
> > timestamp, author etc.). The uploaded document must include
> > this meta data before it can be accepted.
>
> That sounds like the case I was thinking about: just do a PUT
> request with X-MyApp-Filename, X-MyApp-Timestamp etc headers.
>

Of course, I left out the ability to upload multiple flies at once. Doing
that with headers could get ugly. (X-MyApp-Filename-01,
X-MyApp-Filename-02, ...) Of course, could just not provide that
multiple-file upload ability to API users and limit it to web users. That
would work ok.

With XML-RPC we just have multiple <upload> struct elements that are
containers for the meta data and the base64 file contents.



> (Another option, which is better in some ways I think, would be
> the two-request approach as above, though that would be more
> complicated. Ie. the client POSTs the metadata, the server files
> the data away temporarily and returns a link to which the client
> can PUT the file, and only once that request has succeeded does
> the server store both metadata and file in their proper place.)
>

That's a bit of redesign of the application for a two-phase upload. Seems a
shame to have to add new database tables and cron jobs to clean up
incomplete uploads just because of my choice of serialization. I agree
that's probably the cleanest design, though. From past experience I can
assume some customers will have trouble adding request headers for the
libraries they are using.

form-data is possible serialization, but it's a flat serialization so also
need to have fields like filename_01, title_01, filename_02, title_02 to
handle multiple uploads at once. (Plus, the app already handles that
form-data). I'm not sure how much meta data can be associated with an
upload in form-data (other than filename, content-disposition, and
content-type), or if the libraries clients use to create a request can be
that creative.

XML-RPC is ugly but nicely handles multiple uploads with associated meta
data for each, and can be stream parsed so that the base64 file data is
chunked to a temp file and not stored in memory.

JSON provides the nice nested structures but, IIUC, has to be in-memory to
parse. I hate those "out of memory!" messages, so it would be very nice to
not have the file uploads in JSON.

Not pretty at all, but maybe using form-data with a JSON-encoded "meta"
field that has a list of uploads with associated meta-data including a
field_name with each upload that associates it with a field that contained
the uploaded file. Most client libraries have a way to send form-data, so
that would be easy for customers to implement.

None of those are great options.


--
Bill Moseley
moseley [at] hank


pagaltzis at gmx

Feb 9, 2010, 11:27 AM

Post #8 of 8 (1639 views)
Permalink
Re: Large requests with JSON? [In reply to]

* Bill Moseley <moseley [at] hank> [2010-02-09 16:10]:
> On Tue, Feb 9, 2010 at 2:36 AM, Aristotle Pagaltzis <pagaltzis [at] gmx>wrote:
> > That sounds like the case I was thinking about: just do a PUT
> > request with X-MyApp-Filename, X-MyApp-Timestamp etc headers.
>
> Of course, I left out the ability to upload multiple flies at
> once. Doing that with headers could get ugly.
> (X-MyApp-Filename-01, X-MyApp-Filename-02, ...) Of course,
> could just not provide that multiple-file upload ability to API
> users and limit it to web users. That would work ok.

I would seriously just not provide multiple uploads via the API.
For the browser UI they’re a necessity because it’s so awkward to
upload files one at a time, but the API is a completely different
category. This falls under “batching”, and all the HTTP sages
will tell you “don’t do that”. It makes both the server and the
client more complicated without any discernible upsides. (In
fact, if you do pipelining, then separate PUT requests are
actually more efficient in terms of roundtrips and overhead.)

> From past experience I can assume some customers will have
> trouble adding request headers for the libraries they are
> using.

That would be a problem, yes. (Damn people treating HTTP as
a transport protocol… *mutter*)

> form-data is possible serialization, but it's a flat
> serialization so also need to have fields like filename_01,
> title_01, filename_02, title_02 to handle multiple uploads at
> once. (Plus, the app already handles that form-data).

Just don’t do batch uploads in the API.

> XML-RPC

Yuck.

> JSON provides the nice nested structures but, IIUC, has to be
> in-memory to parse.

Not in principle, although it may well be that there isn’t any
library that implements a streaming parser yet.

> Not pretty at all, but maybe using form-data with
> a JSON-encoded "meta" field that has a list of uploads with
> associated meta-data including a field_name with each upload
> that associates it with a field that contained the uploaded
> file. Most client libraries have a way to send form-data, so
> that would be easy for customers to implement.
>
> None of those are great options.

Actually, that sounds like a decent option if you really need
a nested data structure and can’t use headers. (I’d still not
do batch uploads, though.)

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/

Catalyst users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.