Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ModPerl: ModPerl

mod_perl and Transfer-Encoding: chunked

 

 

ModPerl modperl RSS feed   Index | Next | Previous | View Threaded


moseley at hank

Jul 2, 2013, 8:20 AM

Post #1 of 15 (267 views)
Permalink
mod_perl and Transfer-Encoding: chunked

For requests that are chunked (Transfer-Encoding: chunked and no
Content-Length header) calling $r->read returns *unchunked* data from the
socket.

That's indeed handy. Is that mod_perl doing that un-chunking or is it
Apache?

But, it leads to some questions.

First, if $r->read reads unchunked data then why is there a
Transfer-Encoding header saying that the content is chunked? Shouldn't
that header be removed? How does one know if the content is chunked or
not, otherwise?

Second, if there's no Content-Length header then how does one know how much
data to read using $r->read?

One answer is until $r->read returns zero bytes, of course. But, is
that guaranteed to always be the case, even for, say, pipelined requests?
My guess is yes because whatever is de-chunking the request knows to stop
after reading the last chunk, trailer and empty line. Can
anyone elaborate on how Apache/mod_perl is doing this?


Perhaps I'm approaching this incorrectly, but this is all a bit untidy.

I'm using Catalyst and Catalyst needs a Content-Length. So, I have a Plack
Middleware component that creates a temporary file writing the buffer from
$r->read( my $buffer, 64 * 1024 ) until that returns zero bytes. I pass
this file handle onto Catalyst.

Then, for some content-types, Catalyst (via HTTP::Body) writes the body to *
another* temp file. I don't know how Apache/mod_perl does its
de-chunking, but I can call $r->read with a huge buffer length and Apache
returns that. So, maybe Apache is buffering to disk, too.

In other words, for each tiny chunked JSON POST or PUT I'm creating two (or
three?) temp files which doesn't seem ideal.


--
Bill Moseley
moseley [at] hank


jschueler at eloquency

Jul 3, 2013, 11:34 AM

Post #2 of 15 (251 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

I played around with chunking recently in the context of media streaming:
The client is only requesting a "chunk" of data. "Chunking" is how media
players perform a "seek". It was originally implemented for FTP
transfers: E.g, to transfer a large file in (say 10K) chunks. In the
case that you describe below, if no Content-Length is specified, that
indicates "send the remainder".

From what I know, a "chunk" request header is used this way to specify the
server response. It does not reflect anything about the data included in
the body of the request. So first, I would ask if you're confused about
this request information.

Hypothetically, some browsers might try to upload large files in small
chunks and the "chunk" header might reflect a push transfer. I don't know
if "chunk" is ever used for this purpose. But it would require the
following characteristics:

1. The browser would need to originally inquire if the server is
capable of this type of request.
2. Each chunk of data will arrive in a separate and independent HTTP
request. Not necessarily in the order they were sent.
3. Two or more requests may be handled by separate processes
simultaneously that can't be written into a single destination.
4. Somehow the server needs to request a resend if a chunk is missing.
Solving this problem requires an imaginitive use of HTTP.

Sounds messy. But might be appropriate for 100M+ sized uploads. This
*may* reflect your situation. Can you please confirm?

For a single process, the incoming content-length is unnecessary. Buffered
I/O automatically knows when transmission is complete. The read()
argument is the buffer size, not the content length. Whether you spool
the buffer to disk or simply enlarge the buffer should be determined by
your hardware capabilities. This is standard IO behavior that has nothing
to do with HTTP chunk. Without a "Content-Length" header, after looping
your read() operation, determine the length of the aggregate data and pass
that to Catalyst.

But if you're confident that the complete request spans several smaller
(chunked) HTTP requests, you'll need to address all the problems I've
described above, plus the problem of re-assembling the whole thing for
Catalyst. I don't know anything about Plack, maybe it can perform all
this required magic.

Otherwise, if the whole purpose of the Plack temporary file is to pass a
file handle, you can pass a buffer as a file handle. Used to be
IO::String, but now that functionality is built into the core.

By your last paragraph, I'm really lost. Since you're already passing the
request as a file handle, I'm guessing that Catalyst creates the
tempororary file for the *response* body. Can you please clarify? Also,
what do you mean by "de-chunking"? Is that the same think as
re-assembling?

Wish I could give a better answer. Let me know if this helps.

-Jim


On Tue, 2 Jul 2013, Bill Moseley wrote:

> For requests that are chunked (Transfer-Encoding: chunked and no
> Content-Length header) calling $r->read returns unchunked data from the
> socket.
> That's indeed handy.  Is that mod_perl doing that un-chunking or is it
> Apache?
>
> But, it leads to some questions.   
>
> First, if $r->read reads unchunked data then why is there a
> Transfer-Encoding header saying that the content is chunked?   Shouldn't
> that header be removed?   How does one know if the content is chunked or
> not, otherwise?
>
> Second, if there's no Content-Length header then how does one know how much
> data to read using $r->read?   
>
> One answer is until $r->read returns zero bytes, of course.  But, is
> that guaranteed to always be the case, even for, say, pipelined requests?  
> My guess is yes because whatever is de-chunking the request knows to stop
> after reading the last chunk, trailer and empty line.   Can anyone elaborate
> on how Apache/mod_perl is doing this? 
>
>
> Perhaps I'm approaching this incorrectly, but this is all a bit untidy.
>
> I'm using Catalyst and Catalyst needs a Content-Length.  So, I have a Plack
> Middleware component that creates a temporary file writing the buffer from
> $r->read( my $buffer, 64 * 1024 ) until that returns zero bytes.  I pass
> this file handle onto Catalyst.
>
> Then, for some content-types, Catalyst (via HTTP::Body) writes the body to
> another temp file.    I don't know how Apache/mod_perl does its de-chunking,
> but I can call $r->read with a huge buffer length and Apache returns that.
>  So, maybe Apache is buffering to disk, too.
>
> In other words, for each tiny chunked JSON POST or PUT I'm creating two (or
> three?) temp files which doesn't seem ideal.
>
>
> --
> Bill Moseley
> moseley [at] hank
>
>


moseley at hank

Jul 3, 2013, 11:44 AM

Post #3 of 15 (258 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

Hi Jim,

This is the Transfer-Encoding: chunked I was writing about:

http://tools.ietf.org/html/rfc2616#section-3.6.1



On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency>wrote:

> I played around with chunking recently in the context of media streaming:
> The client is only requesting a "chunk" of data. "Chunking" is how media
> players perform a "seek". It was originally implemented for FTP transfers:
> E.g, to transfer a large file in (say 10K) chunks. In the case that you
> describe below, if no Content-Length is specified, that indicates "send the
> remainder".
>
> From what I know, a "chunk" request header is used this way to specify the
> server response. It does not reflect anything about the data included in
> the body of the request. So first, I would ask if you're confused about
> this request information.
>
> Hypothetically, some browsers might try to upload large files in small
> chunks and the "chunk" header might reflect a push transfer. I don't know
> if "chunk" is ever used for this purpose. But it would require the
> following characteristics:
>
> 1. The browser would need to originally inquire if the server is
> capable of this type of request.
> 2. Each chunk of data will arrive in a separate and independent HTTP
> request. Not necessarily in the order they were sent.
> 3. Two or more requests may be handled by separate processes
> simultaneously that can't be written into a single destination.
> 4. Somehow the server needs to request a resend if a chunk is missing.
> Solving this problem requires an imaginitive use of HTTP.
>
> Sounds messy. But might be appropriate for 100M+ sized uploads. This
> *may* reflect your situation. Can you please confirm?
>
> For a single process, the incoming content-length is unnecessary. Buffered
> I/O automatically knows when transmission is complete. The read() argument
> is the buffer size, not the content length. Whether you spool the buffer
> to disk or simply enlarge the buffer should be determined by your hardware
> capabilities. This is standard IO behavior that has nothing to do with
> HTTP chunk. Without a "Content-Length" header, after looping your read()
> operation, determine the length of the aggregate data and pass that to
> Catalyst.
>
> But if you're confident that the complete request spans several smaller
> (chunked) HTTP requests, you'll need to address all the problems I've
> described above, plus the problem of re-assembling the whole thing for
> Catalyst. I don't know anything about Plack, maybe it can perform all this
> required magic.
>
> Otherwise, if the whole purpose of the Plack temporary file is to pass a
> file handle, you can pass a buffer as a file handle. Used to be
> IO::String, but now that functionality is built into the core.
>
> By your last paragraph, I'm really lost. Since you're already passing the
> request as a file handle, I'm guessing that Catalyst creates the
> tempororary file for the *response* body. Can you please clarify? Also,
> what do you mean by "de-chunking"? Is that the same think as re-assembling?
>
> Wish I could give a better answer. Let me know if this helps.
>
> -Jim
>
>
>
> On Tue, 2 Jul 2013, Bill Moseley wrote:
>
> For requests that are chunked (Transfer-Encoding: chunked and no
>> Content-Length header) calling $r->read returns unchunked data from the
>> socket.
>> That's indeed handy. Is that mod_perl doing that un-chunking or is it
>> Apache?
>>
>> But, it leads to some questions.
>>
>> First, if $r->read reads unchunked data then why is there a
>> Transfer-Encoding header saying that the content is chunked? Shouldn't
>> that header be removed? How does one know if the content is chunked or
>> not, otherwise?
>>
>> Second, if there's no Content-Length header then how does one know how
>> much
>> data to read using $r->read?
>>
>> One answer is until $r->read returns zero bytes, of course. But, is
>> that guaranteed to always be the case, even for, say, pipelined requests?
>>
>> My guess is yes because whatever is de-chunking the request knows to stop
>> after reading the last chunk, trailer and empty line. Can
>> anyone elaborate
>> on how Apache/mod_perl is doing this?
>>
>>
>> Perhaps I'm approaching this incorrectly, but this is all a bit untidy.
>>
>> I'm using Catalyst and Catalyst needs a Content-Length. So, I have a
>> Plack
>> Middleware component that creates a temporary file writing the buffer from
>> $r->read( my $buffer, 64 * 1024 ) until that returns zero bytes. I pass
>> this file handle onto Catalyst.
>>
>> Then, for some content-types, Catalyst (via HTTP::Body) writes the body to
>> another temp file. I don't know how Apache/mod_perl does its
>> de-chunking,
>> but I can call $r->read with a huge buffer length and Apache returns that.
>> So, maybe Apache is buffering to disk, too.
>>
>> In other words, for each tiny chunked JSON POST or PUT I'm creating two
>> (or
>> three?) temp files which doesn't seem ideal.
>>
>>
>> --
>> Bill Moseley
>> moseley [at] hank
>>
>>


--
Bill Moseley
moseley [at] hank


joe_schaefer at yahoo

Jul 3, 2013, 11:53 AM

Post #4 of 15 (251 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

When you read from the input filter chain as $r->read does, the http input filter automatically handles the protocol and passes the dechunked data up to the caller. It does not spool the stream at all.

You'd have to look at how mod perl implements read to see if it loops its ap_get_brigade calls on the input filter chain to fill the passed buffer to the desired length or not. But under no circumstances should you have to deal with chunked data directly.

HTH

Sent from my iPhone

On Jul 3, 2013, at 2:44 PM, Bill Moseley <moseley [at] hank> wrote:

> Hi Jim,
>
> This is the Transfer-Encoding: chunked I was writing about:
>
> http://tools.ietf.org/html/rfc2616#section-3.6.1
>
>
>
> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency> wrote:
>> I played around with chunking recently in the context of media streaming: The client is only requesting a "chunk" of data. "Chunking" is how media players perform a "seek". It was originally implemented for FTP transfers: E.g, to transfer a large file in (say 10K) chunks. In the case that you describe below, if no Content-Length is specified, that indicates "send the remainder".
>>
>> From what I know, a "chunk" request header is used this way to specify the server response. It does not reflect anything about the data included in the body of the request. So first, I would ask if you're confused about this request information.
>>
>> Hypothetically, some browsers might try to upload large files in small chunks and the "chunk" header might reflect a push transfer. I don't know if "chunk" is ever used for this purpose. But it would require the following characteristics:
>>
>> 1. The browser would need to originally inquire if the server is
>> capable of this type of request.
>> 2. Each chunk of data will arrive in a separate and independent HTTP
>> request. Not necessarily in the order they were sent.
>> 3. Two or more requests may be handled by separate processes
>> simultaneously that can't be written into a single destination.
>> 4. Somehow the server needs to request a resend if a chunk is missing.
>> Solving this problem requires an imaginitive use of HTTP.
>>
>> Sounds messy. But might be appropriate for 100M+ sized uploads. This *may* reflect your situation. Can you please confirm?
>>
>> For a single process, the incoming content-length is unnecessary. Buffered I/O automatically knows when transmission is complete. The read() argument is the buffer size, not the content length. Whether you spool the buffer to disk or simply enlarge the buffer should be determined by your hardware capabilities. This is standard IO behavior that has nothing to do with HTTP chunk. Without a "Content-Length" header, after looping your read() operation, determine the length of the aggregate data and pass that to Catalyst.
>>
>> But if you're confident that the complete request spans several smaller (chunked) HTTP requests, you'll need to address all the problems I've described above, plus the problem of re-assembling the whole thing for Catalyst. I don't know anything about Plack, maybe it can perform all this required magic.
>>
>> Otherwise, if the whole purpose of the Plack temporary file is to pass a file handle, you can pass a buffer as a file handle. Used to be IO::String, but now that functionality is built into the core.
>>
>> By your last paragraph, I'm really lost. Since you're already passing the request as a file handle, I'm guessing that Catalyst creates the tempororary file for the *response* body. Can you please clarify? Also, what do you mean by "de-chunking"? Is that the same think as re-assembling?
>>
>> Wish I could give a better answer. Let me know if this helps.
>>
>> -Jim
>>
>>
>>
>> On Tue, 2 Jul 2013, Bill Moseley wrote:
>>
>>> For requests that are chunked (Transfer-Encoding: chunked and no
>>> Content-Length header) calling $r->read returns unchunked data from the
>>> socket.
>>> That's indeed handy. Is that mod_perl doing that un-chunking or is it
>>> Apache?
>>>
>>> But, it leads to some questions.
>>>
>>> First, if $r->read reads unchunked data then why is there a
>>> Transfer-Encoding header saying that the content is chunked? Shouldn't
>>> that header be removed? How does one know if the content is chunked or
>>> not, otherwise?
>>>
>>> Second, if there's no Content-Length header then how does one know how much
>>> data to read using $r->read?
>>>
>>> One answer is until $r->read returns zero bytes, of course. But, is
>>> that guaranteed to always be the case, even for, say, pipelined requests?
>>> My guess is yes because whatever is de-chunking the request knows to stop
>>> after reading the last chunk, trailer and empty line. Can anyone elaborate
>>> on how Apache/mod_perl is doing this?
>>>
>>>
>>> Perhaps I'm approaching this incorrectly, but this is all a bit untidy.
>>>
>>> I'm using Catalyst and Catalyst needs a Content-Length. So, I have a Plack
>>> Middleware component that creates a temporary file writing the buffer from
>>> $r->read( my $buffer, 64 * 1024 ) until that returns zero bytes. I pass
>>> this file handle onto Catalyst.
>>>
>>> Then, for some content-types, Catalyst (via HTTP::Body) writes the body to
>>> another temp file. I don't know how Apache/mod_perl does its de-chunking,
>>> but I can call $r->read with a huge buffer length and Apache returns that.
>>> So, maybe Apache is buffering to disk, too.
>>>
>>> In other words, for each tiny chunked JSON POST or PUT I'm creating two (or
>>> three?) temp files which doesn't seem ideal.
>>>
>>>
>>> --
>>> Bill Moseley
>>> moseley [at] hank
>
>
>
> --
> Bill Moseley
> moseley [at] hank


jschueler at eloquency

Jul 3, 2013, 1:26 PM

Post #5 of 15 (250 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

Thanks for the prompt response, but this is your question, not mine. I
hardly need an RTFM for my trouble.

I drew my conclusions using a packet sniffer. And as far-fetched as my
answer may seem, it's more plausible than your theory that Apache or
modperl is decoding a raw socket stream.

The crux of your question seems to be how the request content gets
magically re-assembled. I don't think it was ever disassembled in the
first place. But if you don't like my answer, and you don't want to
ignore it either, then please restate the question. I can't find any
definition for unchunked, and Wiktionary's definition of de-chunk says to
"break apart a chunk", that is (counter-intuitively) chunk a chunk.


> Second, if there's no Content-Length header then how
> does one know how much
> data to read using $r->read?   
>
> One answer is until $r->read returns zero bytes, of
> course.  But, is
> that guaranteed to always be the case, even for,
> say, pipelined requests?  
> My guess is yes because whatever is de-chunking the

read() is blocking. So it never returns 0, even in a pipeline request (if
no data is available, it simply waits). I don't wish to discuss the
merits here, but there is no technical imperative for a content-length
request in the request header.

-Jim






On Wed, 3 Jul 2013, Bill Moseley wrote:

> Hi Jim,
> This is the Transfer-Encoding: chunked I was writing about:
>
> http://tools.ietf.org/html/rfc2616#section-3.6.1
>
>
>
> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency>
> wrote:
> I played around with chunking recently in the context of media
> streaming: The client is only requesting a "chunk" of data.
>  "Chunking" is how media players perform a "seek".  It was
> originally implemented for FTP transfers:  E.g, to transfer a
> large file in (say 10K) chunks.  In the case that you describe
> below, if no Content-Length is specified, that indicates "send
> the remainder".
>
> >From what I know, a "chunk" request header is used this way to
> specify the server response.  It does not reflect anything about
> the data included in the body of the request.  So first, I would
> ask if you're confused about this request information.
>
> Hypothetically, some browsers might try to upload large files in
> small chunks and the "chunk" header might reflect a push
> transfer.  I don't know if "chunk" is ever used for this
> purpose.  But it would require the following characteristics:
>
>   1.  The browser would need to originally inquire if the server
> is
>       capable of this type of request.
>   2.  Each chunk of data will arrive in a separate and
> independent HTTP
>       request.  Not necessarily in the order they were sent.
>   3.  Two or more requests may be handled by separate processes
>       simultaneously that can't be written into a single
> destination.
>   4.  Somehow the server needs to request a resend if a chunk is
> missing.
>       Solving this problem requires an imaginitive use of HTTP.
>
> Sounds messy.  But might be appropriate for 100M+ sized uploads.
>  This *may* reflect your situation.  Can you please confirm?
>
> For a single process, the incoming content-length is
> unnecessary. Buffered I/O automatically knows when transmission
> is complete.  The read() argument is the buffer size, not the
> content length.  Whether you spool the buffer to disk or simply
> enlarge the buffer should be determined by your hardware
> capabilities.  This is standard IO behavior that has nothing to
> do with HTTP chunk.  Without a "Content-Length" header, after
> looping your read() operation, determine the length of the
> aggregate data and pass that to Catalyst.
>
> But if you're confident that the complete request spans several
> smaller (chunked) HTTP requests, you'll need to address all the
> problems I've described above, plus the problem of re-assembling
> the whole thing for Catalyst.  I don't know anything about
> Plack, maybe it can perform all this required magic.
>
> Otherwise, if the whole purpose of the Plack temporary file is
> to pass a file handle, you can pass a buffer as a file handle.
>  Used to be IO::String, but now that functionality is built into
> the core.
>
> By your last paragraph, I'm really lost.  Since you're already
> passing the request as a file handle, I'm guessing that Catalyst
> creates the tempororary file for the *response* body.  Can you
> please clarify?  Also, what do you mean by "de-chunking"?  Is
> that the same think as re-assembling?
>
> Wish I could give a better answer.  Let me know if this helps.
>
> -Jim
>
>
> On Tue, 2 Jul 2013, Bill Moseley wrote:
>
> For requests that are chunked (Transfer-Encoding:
> chunked and no
> Content-Length header) calling $r->read returns
> unchunked data from the
> socket.
> That's indeed handy.  Is that mod_perl doing that
> un-chunking or is it
> Apache?
>
> But, it leads to some questions.   
>
> First, if $r->read reads unchunked data then why is
> there a
> Transfer-Encoding header saying that the content is
> chunked?   Shouldn't
> that header be removed?   How does one know if the
> content is chunked or
> not, otherwise?
>
> Second, if there's no Content-Length header then how
> does one know how much
> data to read using $r->read?   
>
> One answer is until $r->read returns zero bytes, of
> course.  But, is
> that guaranteed to always be the case, even for,
> say, pipelined requests?  
> My guess is yes because whatever is de-chunking the
> request knows to stop
> after reading the last chunk, trailer and empty
> line.   Can anyone elaborate
> on how Apache/mod_perl is doing this? 
>
>
> Perhaps I'm approaching this incorrectly, but this
> is all a bit untidy.
>
> I'm using Catalyst and Catalyst needs a
> Content-Length.  So, I have a Plack
> Middleware component that creates a temporary file
> writing the buffer from
> $r->read( my $buffer, 64 * 1024 ) until that returns
> zero bytes.  I pass
> this file handle onto Catalyst.
>
> Then, for some content-types, Catalyst (via
> HTTP::Body) writes the body to
> another temp file.    I don't know how
> Apache/mod_perl does its de-chunking,
> but I can call $r->read with a huge buffer length
> and Apache returns that.
>  So, maybe Apache is buffering to disk, too.
>
> In other words, for each tiny chunked JSON POST or
> PUT I'm creating two (or
> three?) temp files which doesn't seem ideal.
>
>
> --
> Bill Moseley
> moseley [at] hank
>
>
>
>
> --
> Bill Moseley
> moseley [at] hank
>
>


jschueler at eloquency

Jul 3, 2013, 1:31 PM

Post #6 of 15 (250 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

In light of Joe Schaefer's response, I appear to be outgunned. So, if
nothing else, can someone please clarify whether "de-chunked" means
re-assembled?

-Jim

On Wed, 3 Jul 2013, Jim Schueler wrote:

> Thanks for the prompt response, but this is your question, not mine. I
> hardly need an RTFM for my trouble.
>
> I drew my conclusions using a packet sniffer. And as far-fetched as my
> answer may seem, it's more plausible than your theory that Apache or modperl
> is decoding a raw socket stream.
>
> The crux of your question seems to be how the request content gets
> magically re-assembled. I don't think it was ever disassembled in the first
> place. But if you don't like my answer, and you don't want to ignore it
> either, then please restate the question. I can't find any definition for
> unchunked, and Wiktionary's definition of de-chunk says to "break apart a
> chunk", that is (counter-intuitively) chunk a chunk.
>
>
>> Second, if there's no Content-Length header then how
>> does one know how much
>> data to read using $r->read?   
>>
>> One answer is until $r->read returns zero bytes, of
>> course.  But, is
>> that guaranteed to always be the case, even for,
>> say, pipelined requests?  
>> My guess is yes because whatever is de-chunking the
>
> read() is blocking. So it never returns 0, even in a pipeline request (if no
> data is available, it simply waits). I don't wish to discuss the merits
> here, but there is no technical imperative for a content-length request in
> the request header.
>
> -Jim
>
>
>
>
>
>
> On Wed, 3 Jul 2013, Bill Moseley wrote:
>
>> Hi Jim,
>> This is the Transfer-Encoding: chunked I was writing about:
>>
>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>
>>
>>
>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency>
>> wrote:
>> I played around with chunking recently in the context of media
>> streaming: The client is only requesting a "chunk" of data.
>>  "Chunking" is how media players perform a "seek".  It was
>> originally implemented for FTP transfers:  E.g, to transfer a
>> large file in (say 10K) chunks.  In the case that you describe
>> below, if no Content-Length is specified, that indicates "send
>> the remainder".
>>
>> >From what I know, a "chunk" request header is used this way to
>> specify the server response.  It does not reflect anything about
>> the data included in the body of the request.  So first, I would
>> ask if you're confused about this request information.
>>
>> Hypothetically, some browsers might try to upload large files in
>> small chunks and the "chunk" header might reflect a push
>> transfer.  I don't know if "chunk" is ever used for this
>> purpose.  But it would require the following characteristics:
>>
>>   1.  The browser would need to originally inquire if the server
>> is
>>       capable of this type of request.
>>   2.  Each chunk of data will arrive in a separate and
>> independent HTTP
>>       request.  Not necessarily in the order they were sent.
>>   3.  Two or more requests may be handled by separate processes
>>       simultaneously that can't be written into a single
>> destination.
>>   4.  Somehow the server needs to request a resend if a chunk is
>> missing.
>>       Solving this problem requires an imaginitive use of HTTP.
>>
>> Sounds messy.  But might be appropriate for 100M+ sized uploads.
>>  This *may* reflect your situation.  Can you please confirm?
>>
>> For a single process, the incoming content-length is
>> unnecessary. Buffered I/O automatically knows when transmission
>> is complete.  The read() argument is the buffer size, not the
>> content length.  Whether you spool the buffer to disk or simply
>> enlarge the buffer should be determined by your hardware
>> capabilities.  This is standard IO behavior that has nothing to
>> do with HTTP chunk.  Without a "Content-Length" header, after
>> looping your read() operation, determine the length of the
>> aggregate data and pass that to Catalyst.
>>
>> But if you're confident that the complete request spans several
>> smaller (chunked) HTTP requests, you'll need to address all the
>> problems I've described above, plus the problem of re-assembling
>> the whole thing for Catalyst.  I don't know anything about
>> Plack, maybe it can perform all this required magic.
>>
>> Otherwise, if the whole purpose of the Plack temporary file is
>> to pass a file handle, you can pass a buffer as a file handle.
>>  Used to be IO::String, but now that functionality is built into
>> the core.
>>
>> By your last paragraph, I'm really lost.  Since you're already
>> passing the request as a file handle, I'm guessing that Catalyst
>> creates the tempororary file for the *response* body.  Can you
>> please clarify?  Also, what do you mean by "de-chunking"?  Is
> > that the same think as re-assembling?
>>
>> Wish I could give a better answer.  Let me know if this helps.
>>
>> -Jim
>>
>>
>> On Tue, 2 Jul 2013, Bill Moseley wrote:
>>
>> For requests that are chunked (Transfer-Encoding:
>> chunked and no
>> Content-Length header) calling $r->read returns
>> unchunked data from the
>> socket.
>> That's indeed handy.  Is that mod_perl doing that
>> un-chunking or is it
>> Apache?
>>
>> But, it leads to some questions.   
>>
>> First, if $r->read reads unchunked data then why is
>> there a
>> Transfer-Encoding header saying that the content is
>> chunked?   Shouldn't
>> that header be removed?   How does one know if the
>> content is chunked or
>> not, otherwise?
>>
>> Second, if there's no Content-Length header then how
>> does one know how much
>> data to read using $r->read?   
>>
>> One answer is until $r->read returns zero bytes, of
>> course.  But, is
>> that guaranteed to always be the case, even for,
>> say, pipelined requests?  
>> My guess is yes because whatever is de-chunking the
>> request knows to stop
>> after reading the last chunk, trailer and empty
>> line.   Can anyone elaborate
>> on how Apache/mod_perl is doing this? 
>>
>>
>> Perhaps I'm approaching this incorrectly, but this
>> is all a bit untidy.
>>
>> I'm using Catalyst and Catalyst needs a
>> Content-Length.  So, I have a Plack
>> Middleware component that creates a temporary file
>> writing the buffer from
>> $r->read( my $buffer, 64 * 1024 ) until that returns
>> zero bytes.  I pass
>> this file handle onto Catalyst.
>>
>> Then, for some content-types, Catalyst (via
>> HTTP::Body) writes the body to
>> another temp file.    I don't know how
>> Apache/mod_perl does its de-chunking,
>> but I can call $r->read with a huge buffer length
>> and Apache returns that.
>>  So, maybe Apache is buffering to disk, too.
>>
>> In other words, for each tiny chunked JSON POST or
>> PUT I'm creating two (or
>> three?) temp files which doesn't seem ideal.
>>
>>
>> --
>> Bill Moseley
>> moseley [at] hank
>>
>>
>>
>>
>> --
>> Bill Moseley
>> moseley [at] hank
>>
>


trawick at gmail

Jul 3, 2013, 1:41 PM

Post #7 of 15 (250 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

On Wed, Jul 3, 2013 at 4:31 PM, Jim Schueler <jschueler [at] eloquency>wrote:

> In light of Joe Schaefer's response, I appear to be outgunned. So, if
> nothing else, can someone please clarify whether "de-chunked" means
> re-assembled?


yes, where re-assembled means convert it back to the original data stream
without any sort of transport encoding


>
>
> -Jim
>
>
> On Wed, 3 Jul 2013, Jim Schueler wrote:
>
> Thanks for the prompt response, but this is your question, not mine. I
>> hardly need an RTFM for my trouble.
>>
>> I drew my conclusions using a packet sniffer. And as far-fetched as my
>> answer may seem, it's more plausible than your theory that Apache or
>> modperl is decoding a raw socket stream.
>>
>> The crux of your question seems to be how the request content gets
>> magically re-assembled. I don't think it was ever disassembled in the
>> first place. But if you don't like my answer, and you don't want to ignore
>> it either, then please restate the question. I can't find any definition
>> for unchunked, and Wiktionary's definition of de-chunk says to "break apart
>> a chunk", that is (counter-intuitively) chunk a chunk.
>>
>>
>> Second, if there's no Content-Length header then how
>>> does one know how much
>>> data to read using $r->read?
>>>
>>> One answer is until $r->read returns zero bytes, of
>>> course. But, is
>>> that guaranteed to always be the case, even for,
>>> say, pipelined requests?
>>> My guess is yes because whatever is de-chunking the
>>>
>>
>> read() is blocking. So it never returns 0, even in a pipeline request
>> (if no data is available, it simply waits). I don't wish to discuss the
>> merits here, but there is no technical imperative for a content-length
>> request in the request header.
>>
>> -Jim
>>
>>
>>
>>
>>
>>
>> On Wed, 3 Jul 2013, Bill Moseley wrote:
>>
>> Hi Jim,
>>> This is the Transfer-Encoding: chunked I was writing about:
>>>
>>> http://tools.ietf.org/html/**rfc2616#section-3.6.1<http://tools.ietf.org/html/rfc2616#section-3.6.1>
>>>
>>>
>>>
>>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency>
>>> wrote:
>>> I played around with chunking recently in the context of media
>>> streaming: The client is only requesting a "chunk" of data.
>>> "Chunking" is how media players perform a "seek". It was
>>> originally implemented for FTP transfers: E.g, to transfer a
>>> large file in (say 10K) chunks. In the case that you describe
>>> below, if no Content-Length is specified, that indicates "send
>>> the remainder".
>>>
>>> >From what I know, a "chunk" request header is used this way to
>>> specify the server response. It does not reflect anything about
>>> the data included in the body of the request. So first, I would
>>> ask if you're confused about this request information.
>>>
>>> Hypothetically, some browsers might try to upload large files in
>>> small chunks and the "chunk" header might reflect a push
>>> transfer. I don't know if "chunk" is ever used for this
>>> purpose. But it would require the following characteristics:
>>>
>>> 1. The browser would need to originally inquire if the server
>>> is
>>> capable of this type of request.
>>> 2. Each chunk of data will arrive in a separate and
>>> independent HTTP
>>> request. Not necessarily in the order they were sent.
>>> 3. Two or more requests may be handled by separate processes
>>> simultaneously that can't be written into a single
>>> destination.
>>> 4. Somehow the server needs to request a resend if a chunk is
>>> missing.
>>> Solving this problem requires an imaginitive use of HTTP.
>>>
>>> Sounds messy. But might be appropriate for 100M+ sized uploads.
>>> This *may* reflect your situation. Can you please confirm?
>>>
>>> For a single process, the incoming content-length is
>>> unnecessary. Buffered I/O automatically knows when transmission
>>> is complete. The read() argument is the buffer size, not the
>>> content length. Whether you spool the buffer to disk or simply
>>> enlarge the buffer should be determined by your hardware
>>> capabilities. This is standard IO behavior that has nothing to
>>> do with HTTP chunk. Without a "Content-Length" header, after
>>> looping your read() operation, determine the length of the
>>> aggregate data and pass that to Catalyst.
>>>
>>> But if you're confident that the complete request spans several
>>> smaller (chunked) HTTP requests, you'll need to address all the
>>> problems I've described above, plus the problem of re-assembling
>>> the whole thing for Catalyst. I don't know anything about
>>> Plack, maybe it can perform all this required magic.
>>>
>>> Otherwise, if the whole purpose of the Plack temporary file is
>>> to pass a file handle, you can pass a buffer as a file handle.
>>> Used to be IO::String, but now that functionality is built into
>>> the core.
>>>
>>> By your last paragraph, I'm really lost. Since you're already
>>> passing the request as a file handle, I'm guessing that Catalyst
>>> creates the tempororary file for the *response* body. Can you
>>> please clarify? Also, what do you mean by "de-chunking"? Is
>>>
>> > that the same think as re-assembling?
>>
>>>
>>> Wish I could give a better answer. Let me know if this helps.
>>>
>>> -Jim
>>>
>>>
>>> On Tue, 2 Jul 2013, Bill Moseley wrote:
>>>
>>> For requests that are chunked (Transfer-Encoding:
>>> chunked and no
>>> Content-Length header) calling $r->read returns
>>> unchunked data from the
>>> socket.
>>> That's indeed handy. Is that mod_perl doing that
>>> un-chunking or is it
>>> Apache?
>>>
>>> But, it leads to some questions.
>>>
>>> First, if $r->read reads unchunked data then why is
>>> there a
>>> Transfer-Encoding header saying that the content is
>>> chunked? Shouldn't
>>> that header be removed? How does one know if the
>>> content is chunked or
>>> not, otherwise?
>>>
>>> Second, if there's no Content-Length header then how
>>> does one know how much
>>> data to read using $r->read?
>>>
>>> One answer is until $r->read returns zero bytes, of
>>> course. But, is
>>> that guaranteed to always be the case, even for,
>>> say, pipelined requests?
>>> My guess is yes because whatever is de-chunking the
>>> request knows to stop
>>> after reading the last chunk, trailer and empty
>>> line. Can anyone elaborate
>>> on how Apache/mod_perl is doing this?
>>>
>>>
>>> Perhaps I'm approaching this incorrectly, but this
>>> is all a bit untidy.
>>>
>>> I'm using Catalyst and Catalyst needs a
>>> Content-Length. So, I have a Plack
>>> Middleware component that creates a temporary file
>>> writing the buffer from
>>> $r->read( my $buffer, 64 * 1024 ) until that returns
>>> zero bytes. I pass
>>> this file handle onto Catalyst.
>>>
>>> Then, for some content-types, Catalyst (via
>>> HTTP::Body) writes the body to
>>> another temp file. I don't know how
>>> Apache/mod_perl does its de-chunking,
>>> but I can call $r->read with a huge buffer length
>>> and Apache returns that.
>>> So, maybe Apache is buffering to disk, too.
>>>
>>> In other words, for each tiny chunked JSON POST or
>>> PUT I'm creating two (or
>>> three?) temp files which doesn't seem ideal.
>>>
>>>
>>> --
>>> Bill Moseley
>>> moseley [at] hank
>>>
>>>
>>>
>>>
>>> --
>>> Bill Moseley
>>> moseley [at] hank
>>>
>>>


--
Born in Roswell... married an alien...
http://emptyhammock.com/


joe_schaefer at yahoo

Jul 3, 2013, 1:42 PM

Post #8 of 15 (250 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

Dechunked means it strips out the lines containing metadata about the next block of raw data. The metadata is just the length of the next block of data. Imagine a chunked stream is like having partial content length headers embedded in the data stream.

The http filter embedded in httpd takes care of the metadata so you don't have to parse the stream yourself. $r->read will always provide only the raw data in a blocking call, until the stream is finished in which case it should return 0 or an error code. Check the mod perl docs, or better the source, to see if the semantics are more like perl's sysread or more like read.

Sent from my iPhone

On Jul 3, 2013, at 4:31 PM, Jim Schueler <jschueler [at] eloquency> wrote:

> In light of Joe Schaefer's response, I appear to be outgunned. So, if nothing else, can someone please clarify whether "de-chunked" means re-assembled?
>
> -Jim
>
> On Wed, 3 Jul 2013, Jim Schueler wrote:
>
>> Thanks for the prompt response, but this is your question, not mine. I hardly need an RTFM for my trouble.
>>
>> I drew my conclusions using a packet sniffer. And as far-fetched as my answer may seem, it's more plausible than your theory that Apache or modperl is decoding a raw socket stream.
>>
>> The crux of your question seems to be how the request content gets
>> magically re-assembled. I don't think it was ever disassembled in the first place. But if you don't like my answer, and you don't want to ignore it either, then please restate the question. I can't find any definition for unchunked, and Wiktionary's definition of de-chunk says to "break apart a chunk", that is (counter-intuitively) chunk a chunk.
>>
>>
>>> Second, if there's no Content-Length header then how
>>> does one know how much
>>> data to read using $r->read?
>>>
>>> One answer is until $r->read returns zero bytes, of
>>> course. But, is
>>> that guaranteed to always be the case, even for,
>>> say, pipelined requests?
>>> My guess is yes because whatever is de-chunking the
>>
>> read() is blocking. So it never returns 0, even in a pipeline request (if no data is available, it simply waits). I don't wish to discuss the merits here, but there is no technical imperative for a content-length request in the request header.
>>
>> -Jim
>>
>>
>>
>>
>>
>>
>> On Wed, 3 Jul 2013, Bill Moseley wrote:
>>
>>> Hi Jim,
>>> This is the Transfer-Encoding: chunked I was writing about:
>>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency>
>>> wrote:
>>> I played around with chunking recently in the context of media
>>> streaming: The client is only requesting a "chunk" of data.
>>> "Chunking" is how media players perform a "seek". It was
>>> originally implemented for FTP transfers: E.g, to transfer a
>>> large file in (say 10K) chunks. In the case that you describe
>>> below, if no Content-Length is specified, that indicates "send
>>> the remainder".
>>>
>>>> From what I know, a "chunk" request header is used this way to
>>> specify the server response. It does not reflect anything about
>>> the data included in the body of the request. So first, I would
>>> ask if you're confused about this request information.
>>>
>>> Hypothetically, some browsers might try to upload large files in
>>> small chunks and the "chunk" header might reflect a push
>>> transfer. I don't know if "chunk" is ever used for this
>>> purpose. But it would require the following characteristics:
>>>
>>> 1. The browser would need to originally inquire if the server
>>> is
>>> capable of this type of request.
>>> 2. Each chunk of data will arrive in a separate and
>>> independent HTTP
>>> request. Not necessarily in the order they were sent.
>>> 3. Two or more requests may be handled by separate processes
>>> simultaneously that can't be written into a single
>>> destination.
>>> 4. Somehow the server needs to request a resend if a chunk is
>>> missing.
>>> Solving this problem requires an imaginitive use of HTTP.
>>>
>>> Sounds messy. But might be appropriate for 100M+ sized uploads.
>>> This *may* reflect your situation. Can you please confirm?
>>>
>>> For a single process, the incoming content-length is
>>> unnecessary. Buffered I/O automatically knows when transmission
>>> is complete. The read() argument is the buffer size, not the
>>> content length. Whether you spool the buffer to disk or simply
>>> enlarge the buffer should be determined by your hardware
>>> capabilities. This is standard IO behavior that has nothing to
>>> do with HTTP chunk. Without a "Content-Length" header, after
>>> looping your read() operation, determine the length of the
>>> aggregate data and pass that to Catalyst.
>>>
>>> But if you're confident that the complete request spans several
>>> smaller (chunked) HTTP requests, you'll need to address all the
>>> problems I've described above, plus the problem of re-assembling
>>> the whole thing for Catalyst. I don't know anything about
>>> Plack, maybe it can perform all this required magic.
>>>
>>> Otherwise, if the whole purpose of the Plack temporary file is
>>> to pass a file handle, you can pass a buffer as a file handle.
>>> Used to be IO::String, but now that functionality is built into
>>> the core.
>>>
>>> By your last paragraph, I'm really lost. Since you're already
>>> passing the request as a file handle, I'm guessing that Catalyst
>>> creates the tempororary file for the *response* body. Can you
>>> please clarify? Also, what do you mean by "de-chunking"? Is
>>> that the same think as re-assembling?
>>>
>>> Wish I could give a better answer. Let me know if this helps.
>>>
>>> -Jim
>>>
>>> On Tue, 2 Jul 2013, Bill Moseley wrote:
>>>
>>> For requests that are chunked (Transfer-Encoding:
>>> chunked and no
>>> Content-Length header) calling $r->read returns
>>> unchunked data from the
>>> socket.
>>> That's indeed handy. Is that mod_perl doing that
>>> un-chunking or is it
>>> Apache?
>>>
>>> But, it leads to some questions.
>>>
>>> First, if $r->read reads unchunked data then why is
>>> there a
>>> Transfer-Encoding header saying that the content is
>>> chunked? Shouldn't
>>> that header be removed? How does one know if the
>>> content is chunked or
>>> not, otherwise?
>>>
>>> Second, if there's no Content-Length header then how
>>> does one know how much
>>> data to read using $r->read?
>>>
>>> One answer is until $r->read returns zero bytes, of
>>> course. But, is
>>> that guaranteed to always be the case, even for,
>>> say, pipelined requests?
>>> My guess is yes because whatever is de-chunking the
>>> request knows to stop
>>> after reading the last chunk, trailer and empty
>>> line. Can anyone elaborate
>>> on how Apache/mod_perl is doing this?
>>>
>>> Perhaps I'm approaching this incorrectly, but this
>>> is all a bit untidy.
>>>
>>> I'm using Catalyst and Catalyst needs a
>>> Content-Length. So, I have a Plack
>>> Middleware component that creates a temporary file
>>> writing the buffer from
>>> $r->read( my $buffer, 64 * 1024 ) until that returns
>>> zero bytes. I pass
>>> this file handle onto Catalyst.
>>>
>>> Then, for some content-types, Catalyst (via
>>> HTTP::Body) writes the body to
>>> another temp file. I don't know how
>>> Apache/mod_perl does its de-chunking,
>>> but I can call $r->read with a huge buffer length
>>> and Apache returns that.
>>> So, maybe Apache is buffering to disk, too.
>>>
>>> In other words, for each tiny chunked JSON POST or
>>> PUT I'm creating two (or
>>> three?) temp files which doesn't seem ideal.
>>>
>>> --
>>> Bill Moseley
>>> moseley [at] hank
>>> --
>>> Bill Moseley
>>> moseley [at] hank


margol at beamartyr

Jul 4, 2013, 1:37 AM

Post #9 of 15 (242 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

On 03/07/2013 21:53, Joseph Schaefer wrote:
> When you read from the input filter chain as $r->read does, the http
> input filter automatically handles the protocol and passes the dechunked
> data up to the caller. It does not spool the stream at all.
>
> You'd have to look at how mod perl implements read to see if it loops
> its ap_get_brigade calls on the input filter chain to fill the passed
> buffer to the desired length or not. But under no circumstances should
> you have to deal with chunked data directly.

I'm pretty sure that it's not even a mod_perl thing. IIRC, httpd itself
sticks a chunk/de-chunk filter near the respective ends of the filter
chain. So if you can't find the code in mod_perl land, you might want
to check httpd source.

>
> HTH
>
> Sent from my iPhone
>
> On Jul 3, 2013, at 2:44 PM, Bill Moseley <moseley [at] hank
> <mailto:moseley [at] hank>> wrote:
>
>> Hi Jim,
>>
>> This is the Transfer-Encoding: chunked I was writing about:
>>
>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>
>>
>>
>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency
>> <mailto:jschueler [at] eloquency>> wrote:
>>
>> I played around with chunking recently in the context of media
>> streaming: The client is only requesting a "chunk" of data.
>> "Chunking" is how media players perform a "seek". It was
>> originally implemented for FTP transfers: E.g, to transfer a
>> large file in (say 10K) chunks. In the case that you describe
>> below, if no Content-Length is specified, that indicates "send the
>> remainder".
>>
>> From what I know, a "chunk" request header is used this way to
>> specify the server response. It does not reflect anything about
>> the data included in the body of the request. So first, I would
>> ask if you're confused about this request information.
>>
>> Hypothetically, some browsers might try to upload large files in
>> small chunks and the "chunk" header might reflect a push transfer.
>> I don't know if "chunk" is ever used for this purpose. But it
>> would require the following characteristics:
>>
>> 1. The browser would need to originally inquire if the server is
>> capable of this type of request.
>> 2. Each chunk of data will arrive in a separate and independent
>> HTTP
>> request. Not necessarily in the order they were sent.
>> 3. Two or more requests may be handled by separate processes
>> simultaneously that can't be written into a single destination.
>> 4. Somehow the server needs to request a resend if a chunk is
>> missing.
>> Solving this problem requires an imaginitive use of HTTP.
>>
>> Sounds messy. But might be appropriate for 100M+ sized uploads.
>> This *may* reflect your situation. Can you please confirm?
>>
>> For a single process, the incoming content-length is unnecessary.
>> Buffered I/O automatically knows when transmission is complete.
>> The read() argument is the buffer size, not the content length.
>> Whether you spool the buffer to disk or simply enlarge the buffer
>> should be determined by your hardware capabilities. This is
>> standard IO behavior that has nothing to do with HTTP chunk.
>> Without a "Content-Length" header, after looping your read()
>> operation, determine the length of the aggregate data and pass
>> that to Catalyst.
>>
>> But if you're confident that the complete request spans several
>> smaller (chunked) HTTP requests, you'll need to address all the
>> problems I've described above, plus the problem of re-assembling
>> the whole thing for Catalyst. I don't know anything about Plack,
>> maybe it can perform all this required magic.
>>
>> Otherwise, if the whole purpose of the Plack temporary file is to
>> pass a file handle, you can pass a buffer as a file handle. Used
>> to be IO::String, but now that functionality is built into the core.
>>
>> By your last paragraph, I'm really lost. Since you're already
>> passing the request as a file handle, I'm guessing that Catalyst
>> creates the tempororary file for the *response* body. Can you
>> please clarify? Also, what do you mean by "de-chunking"? Is that
>> the same think as re-assembling?
>>
>> Wish I could give a better answer. Let me know if this helps.
>>
>> -Jim
>>
>>
>>
>> On Tue, 2 Jul 2013, Bill Moseley wrote:
>>
>> For requests that are chunked (Transfer-Encoding: chunked and no
>> Content-Length header) calling $r->read returns unchunked data
>> from the
>> socket.
>> That's indeed handy. Is that mod_perl doing that un-chunking
>> or is it
>> Apache?
>>
>> But, it leads to some questions.
>>
>> First, if $r->read reads unchunked data then why is there a
>> Transfer-Encoding header saying that the content is chunked?
>> Shouldn't
>> that header be removed? How does one know if the content is
>> chunked or
>> not, otherwise?
>>
>> Second, if there's no Content-Length header then how does one
>> know how much
>> data to read using $r->read?
>>
>> One answer is until $r->read returns zero bytes, of course.
>> But, is
>> that guaranteed to always be the case, even for, say,
>> pipelined requests?
>> My guess is yes because whatever is de-chunking the request
>> knows to stop
>> after reading the last chunk, trailer and empty line. Can
>> anyone elaborate
>> on how Apache/mod_perl is doing this?
>>
>>
>> Perhaps I'm approaching this incorrectly, but this is all a
>> bit untidy.
>>
>> I'm using Catalyst and Catalyst needs a Content-Length. So, I
>> have a Plack
>> Middleware component that creates a temporary file writing the
>> buffer from
>> $r->read( my $buffer, 64 * 1024 ) until that returns zero
>> bytes. I pass
>> this file handle onto Catalyst.
>>
>> Then, for some content-types, Catalyst (via HTTP::Body) writes
>> the body to
>> another temp file. I don't know how Apache/mod_perl does
>> its de-chunking,
>> but I can call $r->read with a huge buffer length and Apache
>> returns that.
>> So, maybe Apache is buffering to disk, too.
>>
>> In other words, for each tiny chunked JSON POST or PUT I'm
>> creating two (or
>> three?) temp files which doesn't seem ideal.
>>
>>
>> --
>> Bill Moseley
>> moseley [at] hank <mailto:moseley [at] hank>
>>
>>
>>
>>
>> --
>> Bill Moseley
>> moseley [at] hank <mailto:moseley [at] hank>


margol at beamartyr

Jul 4, 2013, 1:40 AM

Post #10 of 15 (242 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

On 03/07/2013 23:26, Jim Schueler wrote:

>
>> Second, if there's no Content-Length header then how
>> does one know how much
>> data to read using $r->read?
>>
>> One answer is until $r->read returns zero bytes, of
>> course. But, is
>> that guaranteed to always be the case, even for,
>> say, pipelined requests?
>> My guess is yes because whatever is de-chunking the
>
> read() is blocking. So it never returns 0, even in a pipeline request
> (if no data is available, it simply waits). I don't wish to discuss the
> merits here, but there is no technical imperative for a content-length
> request in the request header.
>
> -Jim

Probably. If you, for some reason, were doing the chunking work
yourself, each chunk says how many bytes are in it (or in the next one
perhaps; I forget offhand), so you'd know what size read to do.

>
>
>
>
>
> On Wed, 3 Jul 2013, Bill Moseley wrote:
>
>> Hi Jim,
>> This is the Transfer-Encoding: chunked I was writing about:
>>
>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>
>>
>>
>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency>
>> wrote:
>> I played around with chunking recently in the context of media
>> streaming: The client is only requesting a "chunk" of data.
>> "Chunking" is how media players perform a "seek". It was
>> originally implemented for FTP transfers: E.g, to transfer a
>> large file in (say 10K) chunks. In the case that you describe
>> below, if no Content-Length is specified, that indicates "send
>> the remainder".
>>
>> >From what I know, a "chunk" request header is used this way to
>> specify the server response. It does not reflect anything about
>> the data included in the body of the request. So first, I would
>> ask if you're confused about this request information.
>>
>> Hypothetically, some browsers might try to upload large files in
>> small chunks and the "chunk" header might reflect a push
>> transfer. I don't know if "chunk" is ever used for this
>> purpose. But it would require the following characteristics:
>>
>> 1. The browser would need to originally inquire if the server
>> is
>> capable of this type of request.
>> 2. Each chunk of data will arrive in a separate and
>> independent HTTP
>> request. Not necessarily in the order they were sent.
>> 3. Two or more requests may be handled by separate processes
>> simultaneously that can't be written into a single
>> destination.
>> 4. Somehow the server needs to request a resend if a chunk is
>> missing.
>> Solving this problem requires an imaginitive use of HTTP.
>>
>> Sounds messy. But might be appropriate for 100M+ sized uploads.
>> This *may* reflect your situation. Can you please confirm?
>>
>> For a single process, the incoming content-length is
>> unnecessary. Buffered I/O automatically knows when transmission
>> is complete. The read() argument is the buffer size, not the
>> content length. Whether you spool the buffer to disk or simply
>> enlarge the buffer should be determined by your hardware
>> capabilities. This is standard IO behavior that has nothing to
>> do with HTTP chunk. Without a "Content-Length" header, after
>> looping your read() operation, determine the length of the
>> aggregate data and pass that to Catalyst.
>>
>> But if you're confident that the complete request spans several
>> smaller (chunked) HTTP requests, you'll need to address all the
>> problems I've described above, plus the problem of re-assembling
>> the whole thing for Catalyst. I don't know anything about
>> Plack, maybe it can perform all this required magic.
>>
>> Otherwise, if the whole purpose of the Plack temporary file is
>> to pass a file handle, you can pass a buffer as a file handle.
>> Used to be IO::String, but now that functionality is built into
>> the core.
>>
>> By your last paragraph, I'm really lost. Since you're already
>> passing the request as a file handle, I'm guessing that Catalyst
>> creates the tempororary file for the *response* body. Can you
>> please clarify? Also, what do you mean by "de-chunking"? Is
> > that the same think as re-assembling?
>>
>> Wish I could give a better answer. Let me know if this helps.
>>
>> -Jim
>>
>>
>> On Tue, 2 Jul 2013, Bill Moseley wrote:
>>
>> For requests that are chunked (Transfer-Encoding:
>> chunked and no
>> Content-Length header) calling $r->read returns
>> unchunked data from the
>> socket.
>> That's indeed handy. Is that mod_perl doing that
>> un-chunking or is it
>> Apache?
>>
>> But, it leads to some questions.
>>
>> First, if $r->read reads unchunked data then why is
>> there a
>> Transfer-Encoding header saying that the content is
>> chunked? Shouldn't
>> that header be removed? How does one know if the
>> content is chunked or
>> not, otherwise?
>>
>> Second, if there's no Content-Length header then how
>> does one know how much
>> data to read using $r->read?
>>
>> One answer is until $r->read returns zero bytes, of
>> course. But, is
>> that guaranteed to always be the case, even for,
>> say, pipelined requests?
>> My guess is yes because whatever is de-chunking the
>> request knows to stop
>> after reading the last chunk, trailer and empty
>> line. Can anyone elaborate
>> on how Apache/mod_perl is doing this?
>>
>>
>> Perhaps I'm approaching this incorrectly, but this
>> is all a bit untidy.
>>
>> I'm using Catalyst and Catalyst needs a
>> Content-Length. So, I have a Plack
>> Middleware component that creates a temporary file
>> writing the buffer from
>> $r->read( my $buffer, 64 * 1024 ) until that returns
>> zero bytes. I pass
>> this file handle onto Catalyst.
>>
>> Then, for some content-types, Catalyst (via
>> HTTP::Body) writes the body to
>> another temp file. I don't know how
>> Apache/mod_perl does its de-chunking,
>> but I can call $r->read with a huge buffer length
>> and Apache returns that.
>> So, maybe Apache is buffering to disk, too.
>>
>> In other words, for each tiny chunked JSON POST or
>> PUT I'm creating two (or
>> three?) temp files which doesn't seem ideal.
>>
>>
>> --
>> Bill Moseley
>> moseley [at] hank
>>
>>
>>
>>
>> --
>> Bill Moseley
>> moseley [at] hank
>>
>>


margol at beamartyr

Jul 4, 2013, 1:41 AM

Post #11 of 15 (242 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

On 03/07/2013 23:42, Joseph Schaefer wrote:
> Dechunked means it strips out the lines containing metadata about the next block of raw data. The metadata is just the length of the next block of data. Imagine a chunked stream is like having partial content length headers embedded in the data stream.
>
> The http filter embedded in httpd takes care of the metadata so you don't have to parse the stream yourself. $r->read will always provide only the raw data in a blocking call, until the stream is finished in which case it should return 0 or an error code. Check the mod perl docs, or better the source, to see if the semantics are more like perl's sysread or more like read.
>

Yep. That makes sense to me too - it's just not what I read in your
previous email, but maybe I read it wrong :)

> Sent from my iPhone
>
> On Jul 3, 2013, at 4:31 PM, Jim Schueler <jschueler [at] eloquency> wrote:
>
>> In light of Joe Schaefer's response, I appear to be outgunned. So, if nothing else, can someone please clarify whether "de-chunked" means re-assembled?
>>
>> -Jim
>>
>> On Wed, 3 Jul 2013, Jim Schueler wrote:
>>
>>> Thanks for the prompt response, but this is your question, not mine. I hardly need an RTFM for my trouble.
>>>
>>> I drew my conclusions using a packet sniffer. And as far-fetched as my answer may seem, it's more plausible than your theory that Apache or modperl is decoding a raw socket stream.
>>>
>>> The crux of your question seems to be how the request content gets
>>> magically re-assembled. I don't think it was ever disassembled in the first place. But if you don't like my answer, and you don't want to ignore it either, then please restate the question. I can't find any definition for unchunked, and Wiktionary's definition of de-chunk says to "break apart a chunk", that is (counter-intuitively) chunk a chunk.
>>>
>>>
>>>> Second, if there's no Content-Length header then how
>>>> does one know how much
>>>> data to read using $r->read?
>>>>
>>>> One answer is until $r->read returns zero bytes, of
>>>> course. But, is
>>>> that guaranteed to always be the case, even for,
>>>> say, pipelined requests?
>>>> My guess is yes because whatever is de-chunking the
>>>
>>> read() is blocking. So it never returns 0, even in a pipeline request (if no data is available, it simply waits). I don't wish to discuss the merits here, but there is no technical imperative for a content-length request in the request header.
>>>
>>> -Jim
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 3 Jul 2013, Bill Moseley wrote:
>>>
>>>> Hi Jim,
>>>> This is the Transfer-Encoding: chunked I was writing about:
>>>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler [at] eloquency>
>>>> wrote:
>>>> I played around with chunking recently in the context of media
>>>> streaming: The client is only requesting a "chunk" of data.
>>>> "Chunking" is how media players perform a "seek". It was
>>>> originally implemented for FTP transfers: E.g, to transfer a
>>>> large file in (say 10K) chunks. In the case that you describe
>>>> below, if no Content-Length is specified, that indicates "send
>>>> the remainder".
>>>>
>>>>> From what I know, a "chunk" request header is used this way to
>>>> specify the server response. It does not reflect anything about
>>>> the data included in the body of the request. So first, I would
>>>> ask if you're confused about this request information.
>>>>
>>>> Hypothetically, some browsers might try to upload large files in
>>>> small chunks and the "chunk" header might reflect a push
>>>> transfer. I don't know if "chunk" is ever used for this
>>>> purpose. But it would require the following characteristics:
>>>>
>>>> 1. The browser would need to originally inquire if the server
>>>> is
>>>> capable of this type of request.
>>>> 2. Each chunk of data will arrive in a separate and
>>>> independent HTTP
>>>> request. Not necessarily in the order they were sent.
>>>> 3. Two or more requests may be handled by separate processes
>>>> simultaneously that can't be written into a single
>>>> destination.
>>>> 4. Somehow the server needs to request a resend if a chunk is
>>>> missing.
>>>> Solving this problem requires an imaginitive use of HTTP.
>>>>
>>>> Sounds messy. But might be appropriate for 100M+ sized uploads.
>>>> This *may* reflect your situation. Can you please confirm?
>>>>
>>>> For a single process, the incoming content-length is
>>>> unnecessary. Buffered I/O automatically knows when transmission
>>>> is complete. The read() argument is the buffer size, not the
>>>> content length. Whether you spool the buffer to disk or simply
>>>> enlarge the buffer should be determined by your hardware
>>>> capabilities. This is standard IO behavior that has nothing to
>>>> do with HTTP chunk. Without a "Content-Length" header, after
>>>> looping your read() operation, determine the length of the
>>>> aggregate data and pass that to Catalyst.
>>>>
>>>> But if you're confident that the complete request spans several
>>>> smaller (chunked) HTTP requests, you'll need to address all the
>>>> problems I've described above, plus the problem of re-assembling
>>>> the whole thing for Catalyst. I don't know anything about
>>>> Plack, maybe it can perform all this required magic.
>>>>
>>>> Otherwise, if the whole purpose of the Plack temporary file is
>>>> to pass a file handle, you can pass a buffer as a file handle.
>>>> Used to be IO::String, but now that functionality is built into
>>>> the core.
>>>>
>>>> By your last paragraph, I'm really lost. Since you're already
>>>> passing the request as a file handle, I'm guessing that Catalyst
>>>> creates the tempororary file for the *response* body. Can you
>>>> please clarify? Also, what do you mean by "de-chunking"? Is
>>>> that the same think as re-assembling?
>>>>
>>>> Wish I could give a better answer. Let me know if this helps.
>>>>
>>>> -Jim
>>>>
>>>> On Tue, 2 Jul 2013, Bill Moseley wrote:
>>>>
>>>> For requests that are chunked (Transfer-Encoding:
>>>> chunked and no
>>>> Content-Length header) calling $r->read returns
>>>> unchunked data from the
>>>> socket.
>>>> That's indeed handy. Is that mod_perl doing that
>>>> un-chunking or is it
>>>> Apache?
>>>>
>>>> But, it leads to some questions.
>>>>
>>>> First, if $r->read reads unchunked data then why is
>>>> there a
>>>> Transfer-Encoding header saying that the content is
>>>> chunked? Shouldn't
>>>> that header be removed? How does one know if the
>>>> content is chunked or
>>>> not, otherwise?
>>>>
>>>> Second, if there's no Content-Length header then how
>>>> does one know how much
>>>> data to read using $r->read?
>>>>
>>>> One answer is until $r->read returns zero bytes, of
>>>> course. But, is
>>>> that guaranteed to always be the case, even for,
>>>> say, pipelined requests?
>>>> My guess is yes because whatever is de-chunking the
>>>> request knows to stop
>>>> after reading the last chunk, trailer and empty
>>>> line. Can anyone elaborate
>>>> on how Apache/mod_perl is doing this?
>>>>
>>>> Perhaps I'm approaching this incorrectly, but this
>>>> is all a bit untidy.
>>>>
>>>> I'm using Catalyst and Catalyst needs a
>>>> Content-Length. So, I have a Plack
>>>> Middleware component that creates a temporary file
>>>> writing the buffer from
>>>> $r->read( my $buffer, 64 * 1024 ) until that returns
>>>> zero bytes. I pass
>>>> this file handle onto Catalyst.
>>>>
>>>> Then, for some content-types, Catalyst (via
>>>> HTTP::Body) writes the body to
>>>> another temp file. I don't know how
>>>> Apache/mod_perl does its de-chunking,
>>>> but I can call $r->read with a huge buffer length
>>>> and Apache returns that.
>>>> So, maybe Apache is buffering to disk, too.
>>>>
>>>> In other words, for each tiny chunked JSON POST or
>>>> PUT I'm creating two (or
>>>> three?) temp files which doesn't seem ideal.
>>>>
>>>> --
>>>> Bill Moseley
>>>> moseley [at] hank
>>>> --
>>>> Bill Moseley
>>>> moseley [at] hank


aw at ice-sa

Jul 4, 2013, 4:06 AM

Post #12 of 15 (242 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

Not disregarding the other answers to your questions, but I believe that maybe one aspect
has been neglected here.

Bill Moseley wrote:
> For requests that are chunked (Transfer-Encoding: chunked and no
> Content-Length header) calling $r->read returns *unchunked* data from the
> socket.
>
> That's indeed handy. Is that mod_perl doing that un-chunking or is it
> Apache?
>
> But, it leads to some questions.
>
> First, if $r->read reads unchunked data then why is there a
> Transfer-Encoding header saying that the content is chunked? Shouldn't
> that header be removed? How does one know if the content is chunked or
> not, otherwise?

The real question is : does one need to know ?

The transfer-coding is something that even an intermediate HTTP proxy may
be allowed to change, for reasons to do with transport of the request along a section of
the network path.
It should be entirely transparent to the application receiving the data.

>
> Second, if there's no Content-Length header then how does one know how much
> data to read using $r->read?
>
> One answer is until $r->read returns zero bytes, of course.

Indeed. That means that the end of *this* request body has been encountered.

But, is
> that guaranteed to always be the case, even for, say, pipelined requests?

It should be, because $r concerns the present request being processed.
If there is another request pipelined onto that same connection, it is a separate request
and a different $r.

> My guess is yes because whatever is de-chunking the request knows to stop
> after reading the last chunk, trailer and empty line. Can
> anyone elaborate on how Apache/mod_perl is doing this?
>

I can't really, but it should be done by something at some fairly low level. It should be
the *first* thing which happens to the request body, before any request-level body access
is allowed.
(Similarly, at the response level, "chunking" a response body should be the last thing
happening before the request is put on the wire out.)

>
> Perhaps I'm approaching this incorrectly, but this is all a bit untidy.
>
> I'm using Catalyst and Catalyst needs a Content-Length.

I would posit then that Catalyst is wrong (or not compatible with HTTP 1.1 in that respect).

So, I have a Plack
> Middleware component that creates a temporary file writing the buffer from
> $r->read( my $buffer, 64 * 1024 ) until that returns zero bytes. I pass
> this file handle onto Catalyst.
>

So what you wrote then is a patch to Catalyst.

> Then, for some content-types, Catalyst (via HTTP::Body) writes the body to *
> another* temp file. I don't know how Apache/mod_perl does its
> de-chunking, but I can call $r->read with a huge buffer length and Apache
> returns that. So, maybe Apache is buffering to disk, too.
>
> In other words, for each tiny chunked JSON POST or PUT I'm creating two (or
> three?) temp files which doesn't seem ideal.
>
>

I realise that my comments above don't really help you in your specific predicament, but I
just felt that it was good to put things back in their place, particularly that at the $r
(request) level, you should not have to know if the request came in chunked or not.
And that if a client sends a request with a chunked body, you are not necessarily gettting
it so on the server on the which application runs. And vice-versa.


moseley at hank

Jul 4, 2013, 8:48 AM

Post #13 of 15 (242 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

André, thanks for the response:

On Thu, Jul 4, 2013 at 4:06 AM, André Warnier <aw [at] ice-sa> wrote:

>
> Bill Moseley wrote:
>
>>
>> First, if $r->read reads unchunked data then why is there a
>> Transfer-Encoding header saying that the content is chunked? Shouldn't
>> that header be removed?
>>
>
Looking at the RFC again the answer appears to be yes. Look at the last
line in this decoding example in
http://tools.ietf.org/html/rfc2616#section-19.4.6

A process for decoding the "chunked" transfer-coding (section 3.6
<http://tools.ietf.org/html/rfc2616#section-3.6>)
can be represented in pseudo-code as:

length := 0
read chunk-size, chunk-extension (if any) and CRLF
while (chunk-size > 0) {
read chunk-data and CRLF
append chunk-data to entity-body
length := length + chunk-size
read chunk-size and CRLF
}
read entity-header
while (entity-header not empty) {
append entity-header to existing header fields
read entity-header
}
Content-Length := length
Remove "chunked" from Transfer-Encoding


Apache/mod_perl is doing the first part but not updating the headers.

There's more on Content-Length and Transfer-Encoding here:
http://tools.ietf.org/html/rfc2616#section-4.4



How does one know if the content is chunked or not, otherwise?
>>
>
> The real question is : does one need to know ?
>

Perhaps. That's an interesting question. Applications probably don't
need to care. They should receive the body -- so for mod_perl that means
reading data using $r->read until there's no more to read and then the app
should never need to look at the Transfer-Encoding header -- or
Content-Length header for that matter by that reasoning.

It's a bit less clear if you think about Plack. It sits between web
servers and applications. What should, say, a Plack Middleware component
see in the body if the headers say Trasnfer-Encoding: chunked? The
decoding probably should happen in the
server<https://github.com/plack/Plack/issues/404#issuecomment-18124054>,
but the headers would need to indicate that by removing the
Transfer-Encoding header and adding in the Content-Length.


>> Perhaps I'm approaching this incorrectly, but this is all a bit untidy.
>>
>> I'm using Catalyst and Catalyst needs a Content-Length.
>>
>
> I would posit then that Catalyst is wrong (or not compatible with HTTP 1.1
> in that respect).


But, Catalyst is a web application (framework) and from your point above it
should not care about the encoding and just read the input stream by
calling ->read(). Really, if you think about Plack, Catalyst should never
make exceptions based on $ENV{MOD_PERL}.

So, the separation of concerns between the web server and the app is not
very clean.



> So, I have a Plack
>
>> Middleware component that creates a temporary file writing the buffer from
>> $r->read( my $buffer, 64 * 1024 ) until that returns zero bytes. I pass
>> this file handle onto Catalyst.
>>
>>
> So what you wrote then is a patch to Catalyst.
>

No, the Middleware component should be usable for any application. And
likewise, for any web server. That's the point of Plack.

Obviously, there's differences between web servers and maybe we need code
that understans when running under mod_perl that the Transfer-Encoding:
chunked header should be ignored, but if that code must live in Catalyst
then that's really breaking the separation that Plack provides.

I think the sane thing here is if Apache/mod_perl didn't provide a header
saying the body is chunked, when it isn't. Otherwise, code (Plack, web
apps) that receive a set of headers and a handle to read from don't really
have any choice but to believe what it is told.



--
Bill Moseley
moseley [at] hank


aw at ice-sa

Jul 4, 2013, 9:48 AM

Post #14 of 15 (242 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

Bill Moseley wrote:
> André, thanks for the response:
>
> On Thu, Jul 4, 2013 at 4:06 AM, André Warnier <aw [at] ice-sa> wrote:
>
>> Bill Moseley wrote:
>>
>>> First, if $r->read reads unchunked data then why is there a
>>> Transfer-Encoding header saying that the content is chunked? Shouldn't
>>> that header be removed?
>>>
> Looking at the RFC again the answer appears to be yes. Look at the last
> line in this decoding example in
> http://tools.ietf.org/html/rfc2616#section-19.4.6
>
> A process for decoding the "chunked" transfer-coding (section 3.6
> <http://tools.ietf.org/html/rfc2616#section-3.6>)
> can be represented in pseudo-code as:
>
> length := 0
> read chunk-size, chunk-extension (if any) and CRLF
> while (chunk-size > 0) {
> read chunk-data and CRLF
> append chunk-data to entity-body
> length := length + chunk-size
> read chunk-size and CRLF
> }
> read entity-header
> while (entity-header not empty) {
> append entity-header to existing header fields
> read entity-header
> }
> Content-Length := length
> Remove "chunked" from Transfer-Encoding
>
>
> Apache/mod_perl is doing the first part but not updating the headers.
>
> There's more on Content-Length and Transfer-Encoding here:
> http://tools.ietf.org/html/rfc2616#section-4.4
>
>
>
> How does one know if the content is chunked or not, otherwise?
>> The real question is : does one need to know ?
>>
>
> Perhaps. That's an interesting question. Applications probably don't
> need to care. They should receive the body -- so for mod_perl that means
> reading data using $r->read until there's no more to read and then the app
> should never need to look at the Transfer-Encoding header -- or
> Content-Length header for that matter by that reasoning.
>
> It's a bit less clear if you think about Plack. It sits between web
> servers and applications. What should, say, a Plack Middleware component
> see in the body if the headers say Trasnfer-Encoding: chunked? The
> decoding probably should happen in the
> server<https://github.com/plack/Plack/issues/404#issuecomment-18124054>,
> but the headers would need to indicate that by removing the
> Transfer-Encoding header and adding in the Content-Length.
>
>
>>> Perhaps I'm approaching this incorrectly, but this is all a bit untidy.
>>>
>>> I'm using Catalyst and Catalyst needs a Content-Length.
>>>
>> I would posit then that Catalyst is wrong (or not compatible with HTTP 1.1
>> in that respect).
>
>
> But, Catalyst is a web application (framework) and from your point above it
> should not care about the encoding and just read the input stream by
> calling ->read(). Really, if you think about Plack, Catalyst should never
> make exceptions based on $ENV{MOD_PERL}.
>
> So, the separation of concerns between the web server and the app is not
> very clean.
>
>
>
>> So, I have a Plack
>>
>>> Middleware component that creates a temporary file writing the buffer from
>>> $r->read( my $buffer, 64 * 1024 ) until that returns zero bytes. I pass
>>> this file handle onto Catalyst.
>>>
>>>
>> So what you wrote then is a patch to Catalyst.
>>
>
> No, the Middleware component should be usable for any application. And
> likewise, for any web server. That's the point of Plack.
>
> Obviously, there's differences between web servers and maybe we need code
> that understans when running under mod_perl that the Transfer-Encoding:
> chunked header should be ignored, but if that code must live in Catalyst
> then that's really breaking the separation that Plack provides.
>
> I think the sane thing here is if Apache/mod_perl didn't provide a header
> saying the body is chunked, when it isn't. Otherwise, code (Plack, web
> apps) that receive a set of headers and a handle to read from don't really
> have any choice but to believe what it is told.
>
>
>

I can see your point, but to me it depends at which level this add-on code "lives".
I do not know Plack or Catalyst, and do not know at which level each of them is supposed
to "live".
But to me, if the code lives at the "web-app" level, at that point it should just consider
the request body as one piece or stream, without intervening "chunk headers".
(and it should treat the Content-transfer-encoding header as informational only, maybe to
know that it should not expect a Content-length header, and that it can only know the body
length by reading it).

It is different in the case of a mod_perl "connection filter". That one really sees the
stream of bytes coming from the browser, request line, headers, body chunked or not, etc..
(And it should see several requests pipelined on the same connection, one after the other,
as one stream of bytes, without any particular break between them other that what it can
figure out itself.)

But even a "request filter" (which comes before a web-app) should see the request body as
already "de-chunked" (re-assembled).

See here for example :
http://perl.apache.org/docs/2.0/user/handlers/filters.html#HTTP_Request_Versus_Connection_Filters

which I got to starting from here :
http://perl.apache.org/docs/2.0/user/handlers/protocols.html


torsten.foertsch at gmx

Jul 5, 2013, 5:51 AM

Post #15 of 15 (226 views)
Permalink
Re: mod_perl and Transfer-Encoding: chunked [In reply to]

On 04/07/13 17:48, Bill Moseley wrote:
> Applications probably don't
> need to care. They should receive the body -- so for mod_perl that means
> reading data using $r->read until there's no more to read and then the app
> should never need to look at the Transfer-Encoding header -- or
> Content-Length header for that matter by that reasoning.

Modperl simply makes (most of) the httpd interface available to perl. I
see no reason to remove the TE header in modperl. If httpd decides to do
that, so be it. Modperl must not.

$r->read is simply a convenience layer on top of bucket brigades. If you
don't like $r->read, fetch the buckets from the input filter chain.

But I see, the documentation of $r->read could be improved.

Note, it's also safer to use bucket brigades directly unless you know
your input filters quite well. Have a look at the XXX comment in
modperl_request_read in modperl_io_apache.c. Normally, httpd tries to
reads its input in 8k chunks. So, providing a buffer of 10k to $r->read
should be enough. Though, you don't have to provide the actual space.
$r->read expands the provided buffer as necessary. Only make sure not to
pass a length parameter that is too small.

Actually, I think we should either ignore the length parameter and pass
the flattened brigade to the caller or we should introduce some kind of
buffering to remove that XXX bug.

I'd prefer the former.

Opinions?

Torsten

ModPerl modperl RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.