Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Varnish: Dev

libvmod-dns (super alpha)

 

 

Varnish dev RSS feed   Index | Next | Previous | View Threaded


kenshaw at gmail

Apr 1, 2013, 4:21 AM

Post #1 of 5 (231 views)
Permalink
libvmod-dns (super alpha)

Hi,

I spent a bit of time today developing a DNS module for Varnish.

It is available here:

https://github.com/kenshaw/libvmod-dns/

The reason for this development is to cut off bots that abuse the
User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a
reverse and then forward DNS against the client.ip/X-Forwarded-For header
and comparing with a regex against the resultant domain.

The logic is meant to work something like this:

sub vcl_recv {
# do a dns check on "good" crawlers
if (req.http.user-agent ~ "(?i)(googlebot|bingbot|slurp|teoma)") {
# do a reverse lookup on the client.ip (X-Forwarded-For) and check
that its in the allowed domains
set req.http.X-Crawler-DNS-Reverse =
dns.rresolve(req.http.X-Forwarded-For);

# check that the RDNS points to an allowed domain -- 403 error if
it doesn't
if (req.http.X-Crawler-DNS-Reverse !~
"(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") {
error 403 "Forbidden";
}

# do a forward lookup on the DNS
set req.http.X-Crawler-DNS-Forward =
dns.resolve(req.http.X-Crawler-DNS-Reverse);

# if the client.ip/X-Forwarded-For doesn't match, then the
user-agent is fake
if (req.http.X-Crawler-DNS-Forward != req.http.X-Forwarded-For) {
error 403 "Forbidden";
}
}
}

While this is not being used in production (yet), I plan to do so later
this week against a production system receiving ~10,000+ requests/sec. I
will report back afterwards.

I realize the code currently has issues (memory, documentation, etc.),
which will be fixed in the near future.

I also realize there are better ways to head malicious bots off at the pass
through DNS, etc (which we are doing as well). The largest issue here for
my purposes is that it is difficult / impossible to identify all traffic.
Additionally, it is nice to be able to monitor the actual traffic coming
through and not completely dropping it at the edge.

Any input/comments against what I've written so far would be gladly
appreciated! Thanks!

-Ken


phk at phk

Apr 2, 2013, 2:36 AM

Post #2 of 5 (204 views)
Permalink
Re: libvmod-dns (super alpha) [In reply to]

In message <CAAyX=LFoLivEXXBvdG33PJP_jprzW+wUVRwZWgcw2qXPK6XbGw [at] mail>
, Kenneth Shaw writes:

>I spent a bit of time today developing a DNS module for Varnish.

I think that is a really good idea, and once we have it figured out
and are happy with it, it should probably become a standard VMOD
in the varnish releases.

I don't have much concrete feedback though, I hope others can weigh
in with that...

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk [at] FreeBSD | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


gonzalo.paniagua at acquia

Apr 2, 2013, 6:49 AM

Post #3 of 5 (207 views)
Permalink
Re: libvmod-dns (super alpha) [In reply to]

On Mon, Apr 1, 2013 at 7:21 AM, Kenneth Shaw <kenshaw [at] gmail> wrote:
[...]
>
> Any input/comments against what I've written so far would be gladly
> appreciated! Thanks!

Great work.

My only comment is that AFAIR, using getaddrinfo() from multiple
threads makes all of them go through a single lock to access a socket.
You might want to consider an alternative resolver library.

-Gonzalo

_______________________________________________
varnish-dev mailing list
varnish-dev [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev


kenshaw at gmail

Apr 8, 2013, 7:40 AM

Post #4 of 5 (188 views)
Permalink
Re: libvmod-dns (super alpha) [In reply to]

Hi All,

This has been successfully deployed in production, and the code (as-is) is
handling "many thousands" of connections per second from fake and
legitimate bots advertising themselves as Googlebot/Bingbot/etc with no
apparent issues/problems. The configuration we've deployed is essentially
the same as provided here (and in the code base).

Anyway, if anyone else ends up finding libvmod-dns helpful, please consider
it "emailware" -- ie, drop me an email and let me know (off-the-record, of
course) how you're making use of it. I'm curious more than anything!



-Ken


On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <kenshaw [at] gmail> wrote:

> Hi,
>
> I spent a bit of time today developing a DNS module for Varnish.
>
> It is available here:
>
> https://github.com/kenshaw/libvmod-dns/
>
> The reason for this development is to cut off bots that abuse the
> User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a
> reverse and then forward DNS against the client.ip/X-Forwarded-For header
> and comparing with a regex against the resultant domain.
>
> The logic is meant to work something like this:
>
> sub vcl_recv {
> # do a dns check on "good" crawlers
> if (req.http.user-agent ~ "(?i)(googlebot|bingbot|slurp|teoma)") {
> # do a reverse lookup on the client.ip (X-Forwarded-For) and check
> that its in the allowed domains
> set req.http.X-Crawler-DNS-Reverse =
> dns.rresolve(req.http.X-Forwarded-For);
>
> # check that the RDNS points to an allowed domain -- 403 error if
> it doesn't
> if (req.http.X-Crawler-DNS-Reverse !~
> "(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") {
> error 403 "Forbidden";
> }
>
> # do a forward lookup on the DNS
> set req.http.X-Crawler-DNS-Forward =
> dns.resolve(req.http.X-Crawler-DNS-Reverse);
>
> # if the client.ip/X-Forwarded-For doesn't match, then the
> user-agent is fake
> if (req.http.X-Crawler-DNS-Forward != req.http.X-Forwarded-For) {
> error 403 "Forbidden";
> }
> }
> }
>
> While this is not being used in production (yet), I plan to do so later
> this week against a production system receiving ~10,000+ requests/sec. I
> will report back afterwards.
>
> I realize the code currently has issues (memory, documentation, etc.),
> which will be fixed in the near future.
>
> I also realize there are better ways to head malicious bots off at the
> pass through DNS, etc (which we are doing as well). The largest issue here
> for my purposes is that it is difficult / impossible to identify all
> traffic. Additionally, it is nice to be able to monitor the actual traffic
> coming through and not completely dropping it at the edge.
>
> Any input/comments against what I've written so far would be gladly
> appreciated! Thanks!
>
> -Ken
>


kenshaw at gmail

Jun 24, 2013, 1:54 AM

Post #5 of 5 (84 views)
Permalink
Re: libvmod-dns (super alpha) [In reply to]

I updated the tests on libvmod-dns -- 'make check' should now work as
expected.


-Ken


On Mon, Apr 8, 2013 at 9:40 PM, Kenneth Shaw <kenshaw [at] gmail> wrote:

> Hi All,
>
> This has been successfully deployed in production, and the code (as-is) is
> handling "many thousands" of connections per second from fake and
> legitimate bots advertising themselves as Googlebot/Bingbot/etc with no
> apparent issues/problems. The configuration we've deployed is essentially
> the same as provided here (and in the code base).
>
> Anyway, if anyone else ends up finding libvmod-dns helpful, please
> consider it "emailware" -- ie, drop me an email and let me know
> (off-the-record, of course) how you're making use of it. I'm curious more
> than anything!
>
>
>
> -Ken
>
>
> On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <kenshaw [at] gmail> wrote:
>
>> Hi,
>>
>> I spent a bit of time today developing a DNS module for Varnish.
>>
>> It is available here:
>>
>> https://github.com/kenshaw/libvmod-dns/
>>
>> The reason for this development is to cut off bots that abuse the
>> User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a
>> reverse and then forward DNS against the client.ip/X-Forwarded-For header
>> and comparing with a regex against the resultant domain.
>>
>> The logic is meant to work something like this:
>>
>> sub vcl_recv {
>> # do a dns check on "good" crawlers
>> if (req.http.user-agent ~ "(?i)(googlebot|bingbot|slurp|teoma)") {
>> # do a reverse lookup on the client.ip (X-Forwarded-For) and
>> check that its in the allowed domains
>> set req.http.X-Crawler-DNS-Reverse =
>> dns.rresolve(req.http.X-Forwarded-For);
>>
>> # check that the RDNS points to an allowed domain -- 403 error if
>> it doesn't
>> if (req.http.X-Crawler-DNS-Reverse !~
>> "(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") {
>> error 403 "Forbidden";
>> }
>>
>> # do a forward lookup on the DNS
>> set req.http.X-Crawler-DNS-Forward =
>> dns.resolve(req.http.X-Crawler-DNS-Reverse);
>>
>> # if the client.ip/X-Forwarded-For doesn't match, then the
>> user-agent is fake
>> if (req.http.X-Crawler-DNS-Forward != req.http.X-Forwarded-For) {
>> error 403 "Forbidden";
>> }
>> }
>> }
>>
>> While this is not being used in production (yet), I plan to do so later
>> this week against a production system receiving ~10,000+ requests/sec. I
>> will report back afterwards.
>>
>> I realize the code currently has issues (memory, documentation, etc.),
>> which will be fixed in the near future.
>>
>> I also realize there are better ways to head malicious bots off at the
>> pass through DNS, etc (which we are doing as well). The largest issue here
>> for my purposes is that it is difficult / impossible to identify all
>> traffic. Additionally, it is nice to be able to monitor the actual traffic
>> coming through and not completely dropping it at the edge.
>>
>> Any input/comments against what I've written so far would be gladly
>> appreciated! Thanks!
>>
>> -Ken
>>
>
>

Varnish dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.