Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Varnish: Dev

[PATCH] Normalizing the Host: header

 

 

Varnish dev RSS feed   Index | Next | Previous | View Threaded


nils.goroll at uplex

Feb 17, 2011, 4:55 AM

Post #1 of 20 (2259 views)
Permalink
[PATCH] Normalizing the Host: header

Hi,

we were discussion this on VUG3: comparisons on the Host: header should be case
insensitive. Reflecting on this, I think that normalizing the Host: header in
Varnish would actually be the better idea and should avoid common errors.

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Schwanenwik 24
22087 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

http://uplex.de/
Attachments: 0001-Normalize-the-Host-header-according-to-the-recommen.patch (5.05 KB)
  signature.asc (0.25 KB)


phk at phk

Feb 17, 2011, 9:32 AM

Post #2 of 20 (2230 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

In message <4D5D1AC9.3000507 [at] uplex>, Nils Goroll writes:

>we were discussion this on VUG3: comparisons on the Host: header should be case
>insensitive. Reflecting on this, I think that normalizing the Host: header in
>Varnish would actually be the better idea and should avoid common errors.

What a great idea for a VMOD :-)

But this is exactly the kind of needless text-processing we should avoid
if we want to be the fastest cache on the planet: a regexp or a strncasecmp()
is not measurably slower than their case-sensitive parallels and if you
don't need to inspect the host-header at all, case-folding it is pure
wasted effort.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk [at] FreeBSD | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


nils.goroll at uplex

Feb 17, 2011, 10:03 AM

Post #3 of 20 (2231 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

Hi phk,

> But this is exactly the kind of needless text-processing we should avoid

In general: Absolutely, yes.

In this case, normalizing once should pay off wherever the host header needs to
be checked at least once.

> a regexp or a strncasecmp()

We don't have stncasecmp() in VCL at this point, so we need to compare
performance of pcre_exec() and strcmp().

Besides this, my main motivation for this suggestion was to avoid wrong host
header comparisons for all of those who have not spotted the right place in the
docs.

If this suggestion is not found useful, I think we should at least fix all the
wrong examples for host header comparison in the docs.

Thanks, Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Schwanenwik 24
22087 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

http://uplex.de/
Attachments: signature.asc (0.25 KB)


phk at phk

Feb 17, 2011, 10:47 AM

Post #4 of 20 (2232 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

In message <4D5D6308.7000304 [at] uplex>, Nils Goroll writes:

>> a regexp or a strncasecmp()
>
>We don't have stncasecmp() in VCL at this point, so we need to compare
>performance of pcre_exec() and strcmp().

Maybe the right solution is to make sure we do have it.

>If this suggestion is not found useful, I think we should at least fix all
>the wrong examples for host header comparison in the docs.

It is not that it is not useful, it's just not the right way to fix it.

And yes, the docs should be correct.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk [at] FreeBSD | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


perbu at varnish-software

Feb 17, 2011, 11:43 AM

Post #5 of 20 (2227 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

On Thu, Feb 17, 2011 at 7:47 PM, Poul-Henning Kamp <phk [at] phk> wrote:
>
> It is not that it is not useful, it's just not the right way to fix it.
>
> And yes, the docs should be correct.

I've look through "man vcl" and I can spot any errors. Any hints on
where to look for these errors?


--
Per Buer, Varnish Software
Phone: +47 21 98 92 61 / Mobile: +47 958 39 117 / Skype: per.buer
Varnish makes websites fly!
Want to learn more about Varnish? http://www.varnish-software.com/whitepapers

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


nils.goroll at uplex

Feb 17, 2011, 11:50 AM

Post #6 of 20 (2227 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

> I've look through "man vcl" and I can spot any errors. Any hints on
> where to look for these errors?

There are many examples for matches on the host header like

if (req.http.host ~ "^(www.)?example.com$") {
set req.backend = www;
}

To accept mixed case host headers, examples for matches on http.host should all
use the (?i) option setting.

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Schwanenwik 24
22087 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

http://uplex.de/
Attachments: signature.asc (0.25 KB)


nils.goroll at uplex

Feb 17, 2011, 11:52 AM

Post #7 of 20 (2231 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

hope to have cought them all:

haggis:~/Devel/varnish-git$ find . -name \*.rst| xargs egrep 'req.http.host *~'
| grep -v '\(\?i\)'
./varnish-cache/doc/sphinx/faq/general.rst: if (req.http.host ~
"^(www.)?example.com") {
./varnish-cache/doc/sphinx/faq/general.rst: if (req.http.host ~
"^(www.)?example.com") {
./varnish-cache/doc/sphinx/tutorial/increasing_your_hitrate.rst: if
(req.http.host ~ "^(www.)?varnish-?software.com") {
./varnish-cache/doc/sphinx/reference/vcl.rst: if (req.http.host ~
"^(www.)?example.com$") {
./varnish-cache/doc/sphinx/reference/vcl.rst: if (req.http.host ~ "example.com") {
./varnish-cache/doc/sphinx/reference/vcl.rst: } elsif (req.http.host ~
"example.org") {
./varnish-cache/doc/sphinx/reference/vcl.rst: if (req.http.host ~
"^(www.)?example.com$") {
./varnish-cache/doc/sphinx/reference/vcl.rst: if (req.http.host ~
"^(www.)?example.com$") {
./varnish-cache/doc/sphinx/reference/vcl.rst: } elsif (req.http.host ~
"^images.example.com$") {
./varnish-cache/doc/sphinx/reference/varnishd.rst: req.http.host ~
"^(www\.)example.com$" && obj.set-cookie ~ "USERID=1663"

--

** * * UPLEX - Nils Goroll Systemoptimierung

Schwanenwik 24
22087 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

http://uplex.de/
Attachments: signature.asc (0.25 KB)


perbu at varnish-software

Feb 17, 2011, 11:53 AM

Post #8 of 20 (2236 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

On Thu, Feb 17, 2011 at 8:50 PM, Nils Goroll <nils.goroll [at] uplex> wrote:
>
>> I've look through "man vcl" and I can spot any errors. Any hints on
>> where to look for these errors?
>
> There are many examples for matches on the host header like
>
>  if (req.http.host ~ "^(www.)?example.com$") {
>    set req.backend = www;
>  }
>
> To accept mixed case host headers, examples for matches on http.host should all
> use the (?i) option setting.

Right. Of course. Silly me. I'll get to work.

--
Per Buer, Varnish Software
Phone: +47 21 98 92 61 / Mobile: +47 958 39 117 / Skype: per.buer
Varnish makes websites fly!
Want to learn more about Varnish? http://www.varnish-software.com/whitepapers

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


perbu at varnish-software

Feb 17, 2011, 12:04 PM

Post #9 of 20 (2230 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

On Thu, Feb 17, 2011 at 8:53 PM, Per Buer <perbu [at] varnish-software> wrote:
> On Thu, Feb 17, 2011 at 8:50 PM, Nils Goroll <nils.goroll [at] uplex> wrote:

>> To accept mixed case host headers, examples for matches on http.host should all
>> use the (?i) option setting.
>
> Right. Of course. Silly me. I'll get to work.

Do you think this bears relevance for any other header treatment or
just the Host: header?

--
Per Buer, Varnish Software
Phone: +47 21 98 92 61 / Mobile: +47 958 39 117 / Skype: per.buer
Varnish makes websites fly!
Want to learn more about Varnish? http://www.varnish-software.com/whitepapers

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


nils.goroll at uplex

Feb 17, 2011, 12:55 PM

Post #10 of 20 (2230 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

> Do you think this bears relevance for any other header treatment or
> just the Host: header?

Anything which can contain a host name would apply.

Location comes into my mind, but that is a response header which only needs to
get matched and also it will most likely be generated by the backend the varnish
admin has under control.

Of the request headers, Referer is probably matched sometimes.

Nils

P.S.: I just noticed that facebook is actually amongst the clients sending
non-normalized Host headers (under the assumption that I can trust the UA).

--

** * * UPLEX - Nils Goroll Systemoptimierung

Schwanenwik 24
22087 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

http://uplex.de/
Attachments: signature.asc (0.25 KB)


nils.goroll at uplex

Feb 18, 2011, 2:07 AM

Post #11 of 20 (2229 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

> It is not that it is not useful, it's just not the right way to fix it.

Shouldn't we normalize the host header anyway to maximize cache hit rates?

Nils

--

** * * UPLEX - Nils Goroll Systemoptimierung

Schwanenwik 24
22087 Hamburg

tel +49 40 28805731
mob +49 170 2723133
fax +49 40 42949753

http://uplex.de/
Attachments: signature.asc (0.25 KB)


phk at phk

Feb 18, 2011, 2:26 AM

Post #12 of 20 (2225 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

In message <4D5E44FF.5060302 [at] uplex>, Nils Goroll writes:

>Shouldn't we normalize the host header anyway to maximize cache hit rates ?

Good question.

List consensus ?

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk [at] FreeBSD | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


phk at phk

Feb 18, 2011, 2:27 AM

Post #13 of 20 (2236 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

In message <4D5E44FF.5060302 [at] uplex>, Nils Goroll writes:

>Shouldn't we normalize the host header anyway to maximize cache hit rates ?

Relevant question in this context: Do we know how to ?

Remember that DNS names can contain ideograms in asia these days...

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk [at] FreeBSD | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


tfheen at varnish-software

Feb 18, 2011, 4:32 AM

Post #14 of 20 (2236 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

]] "Poul-Henning Kamp"

| In message <4D5E44FF.5060302 [at] uplex>, Nils Goroll writes:
|
| >Shouldn't we normalize the host header anyway to maximize cache hit rates ?
|
| Relevant question in this context: Do we know how to ?

Normalise usually means rewriting from a list of various legal names to
the canonical name, so in the general case: no.

| Remember that DNS names can contain ideograms in asia these days...

They're still just ascii under the hood, escaped by xn-- and the crazy
IDN scheme.

--
Tollef Fog Heen
Varnish Software
t: +47 21 98 92 64

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


allan_wind at lifeintegrity

Feb 19, 2011, 9:09 AM

Post #15 of 20 (2194 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

On 2011-02-18 10:26:20, Poul-Henning Kamp wrote:
> In message <4D5E44FF.5060302 [at] uplex>, Nils Goroll writes:
>
> >Shouldn't we normalize the host header anyway to maximize cache hit rates ?
>
> Good question.
>
> List consensus ?

Makes sense to me.

Is there a reasonable use case for non-normalized host headers
(without the ignore case option on regex)?


/Allan
--
Allan Wind
Life Integrity, LLC
<http://lifeintegrity.com>

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


slink at schokola

Feb 21, 2011, 8:09 AM

Post #16 of 20 (2189 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

On 02/18/11 01:32 PM, Tollef Fog Heen wrote:
> | >Shouldn't we normalize the host header anyway to maximize cache hit rates ?
> | Relevant question in this context: Do we know how to ?
>
> Normalise usually means rewriting from a list of various legal names to
> the canonical name, so in the general case: no.

The normalization you are referring to is site-specific, yes.

The generic normalization according to rfc3986 implies case folding for
percent-encodings (toupper()) and all other characters (tolower()).

This would help maximize cache efficiency whenever Host headers are not
normalized to some (set of) site-specific const value(s).

Nils

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


slink at schokola

Mar 11, 2011, 11:38 AM

Post #17 of 20 (2114 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

Hi phk,

can we come to a conclusion of this discussion?

My two cent:

- where the host header does not undergo site-specific normalization, case
folding it will once

- avoid multiplication of cache content for case variations of the host
header

- avoid case-insensitive matching (so matches using the existing (case
sensitive) ~ operator would be correct)

- lead to many existing, currently wrong, VCL examples become correct

- case-folding once in-place does not use any additional session space
as case-folding in VCL would

- it's cheaper than any alternative in a vmod/VCL can be

- host header normalization is a recommendation in rfc3986

Nils

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


phk at phk

Mar 11, 2011, 2:39 PM

Post #18 of 20 (2109 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

In message <4D7A7A4F.3030109 [at] schokola>, Nils Goroll writes:

>can we come to a conclusion of this discussion?

What bothers me is the "magic" aspect of this kind of stuff.

Where does the magic intuitively begin and end ?

Should we also case fold anything compared to or matched to the
host header ?

What about host part of Location headers ?

Where does the magic end ?

I very much prefer to make these things explict and consistent,
so that people see them happen and know where they happen.

... but I also don't want to clutter up default.vcl with "mandatory stuff".

>- case-folding once in-place does not use any additional session space
> as case-folding in VCL would

In my mind, this is probably the best argument for doing it,
I am just not sure it convinces me.

Next issue which comes right behind is: Should we also
normalize URL's by reducing pointless %xx and other escapes ?

Are there any other kinds of request normalization we should do ?

And should we also normalize backend responses ?

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk [at] FreeBSD | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


aotto at mosso

Mar 11, 2011, 3:05 PM

Post #19 of 20 (2121 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

Some advice in-line:

On Mar 11, 2011, at 2:39 PM, Poul-Henning Kamp wrote:

> In message <4D7A7A4F.3030109 [at] schokola>, Nils Goroll writes:
>
>> can we come to a conclusion of this discussion?
>
> What bothers me is the "magic" aspect of this kind of stuff.
>
> Where does the magic intuitively begin and end ?
>
> Should we also case fold anything compared to or matched to the
> host header ?
>
> What about host part of Location headers ?
>
> Where does the magic end ?

In this case, I offer the advice that all host related headers should be case folded, because DNS naming is case insensitive. So essentially anywhere Varnish handles a hostname for any comparison, it should follow the same rules.

> I very much prefer to make these things explict and consistent,
> so that people see them happen and know where they happen.
>
> ... but I also don't want to clutter up default.vcl with "mandatory stuff".

That's why it's a good idea to build this in as default behavior. If it costs any meaningful performance penalty to do it, then consider using a run-time configuration variable.

>> - case-folding once in-place does not use any additional session space
>> as case-folding in VCL would
>
> In my mind, this is probably the best argument for doing it,
> I am just not sure it convinces me.
>
> Next issue which comes right behind is: Should we also
> normalize URL's by reducing pointless %xx and other escapes ?

Yes, escapes should also be reduced to the lowest common denominator.

> Are there any other kinds of request normalization we should do ?
>
> And should we also normalize backend responses ?

Yes, in the same way, as those may be used for comparisons also.

Adrian
_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


phk at phk

Mar 14, 2011, 9:08 AM

Post #20 of 20 (2090 views)
Permalink
Re: [PATCH] Normalizing the Host: header [In reply to]

In message <10ABE5A2-05CF-47CF-8FA7-FC5A0ECB6C91 [at] mosso>, Adrian Otto writes
:

>In this case, I offer the advice that all host related headers should be
>case folded, because DNS naming is case insensitive. So essentially
>anywhere Varnish handles a hostname for any comparison, it should follow
>the same rules.

Yeah, and that is where the trouble start, short at guessing, we have
no way of knowing which strings are hostnames and which are not.
(think X-My-Secret and cookies...)

Rather than venture into guessing, my attitude so far has been to
take a hands off aproach and force people to think about this
themselves.

Obviously that is nor particularly practical either.

What I'm looking for is the sensible middle ground...

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk [at] FreeBSD | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Varnish dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.