Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Hash collision security issue (now public)

 

 

First page Previous page 1 2 3 4 5 6 7 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


fuzzyman at voidspace

Dec 28, 2011, 5:28 PM

Post #1 of 151 (7384 views)
Permalink
Hash collision security issue (now public)

Hello all,

A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:

http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

Although it's a security issue I'm posting it here because it is now public and seems important.

The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:

reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
7 minutes of CPU usage for a 1 MB request
~20 kbits/s → keep one Core Duo core busy

This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).

The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.

Their recommended fix is to randomize the hash function.

All the best,

Michael


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing
http://www.sqlite.org/different.html





_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


jnoller at gmail

Dec 28, 2011, 5:37 PM

Post #2 of 151 (7283 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:

> Hello all,
>
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>
> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Although it's a security issue I'm posting it here because it is now public and seems important.
>
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>
> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> 7 minutes of CPU usage for a 1 MB request
> ~20 kbits/s → keep one Core Duo core busy
>
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>
> Their recommended fix is to randomize the hash function.
>
> All the best,
>
> Michael
>
Back up link for the PDF:
http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

Ocert disclosure:
http://www.ocert.org/advisories/ocert-2011-003.html

jesse


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


jnoller at gmail

Dec 28, 2011, 5:48 PM

Post #3 of 151 (7290 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Wednesday, December 28, 2011 at 8:37 PM, Jesse Noller wrote:

>
>
> On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:
>
> > Hello all,
> >
> > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
> >
> > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
> >
> > Although it's a security issue I'm posting it here because it is now public and seems important.
> >
> > The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
> >
> > reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> > 7 minutes of CPU usage for a 1 MB request
> > ~20 kbits/s → keep one Core Duo core busy
> >
> > This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
> >
> > The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
> >
> > Their recommended fix is to randomize the hash function.
> >
> > All the best,
> >
> > Michael
>
> Back up link for the PDF:
> http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Ocert disclosure:
> http://www.ocert.org/advisories/ocert-2011-003.html

And more analysis/information:

http://cryptanalysis.eu/blog/2011/12/28/effective-dos-attacks-against-web-application-plattforms-hashdos/


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ericsnowcurrently at gmail

Dec 28, 2011, 5:49 PM

Post #4 of 151 (7284 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord
<fuzzyman [at] voidspace> wrote:
> Hello all,
>
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>
>         http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Although it's a security issue I'm posting it here because it is now public and seems important.
>
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>
>        reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>        7 minutes of CPU usage for a 1 MB request
>        ~20 kbits/s → keep one Core Duo core busy
>
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>
> Their recommended fix is to randomize the hash function.

Ironically, this morning I ran across a discussion from about 8 years
ago on basically the same thing:

http://mail.python.org/pipermail/python-dev/2003-May/035874.html

From what I read in the thread, it didn't seem like anyone here was
too worried about it. Does this new research change anything?

-eric
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


alex.gaynor at gmail

Dec 28, 2011, 5:51 PM

Post #5 of 151 (7290 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

A few thoughts on this:

a) This is not a new issue, I'm curious what the new interest is in it.

b) Whatever the solution to this is, it is *not* CPython specific, any decision
should be reflected in the Python language spec IMO, if CPython has the semantic
that dicts aren't vulnerable to hash collision then users *will* rely on this
and another implementation having a different (valid) behavior opens up users to
security issues.

c) I'm not convinced a randomized hash is appropriate for the default dict, for
a number of reasons: it's a performance hit on every dict operations, using a
per-process seed means you can't compile the hash of an obj at Python's compile
time, a per-dict seed inhibits a bunch of other optimizations. These may not be
relevant to CPython, but they are to PyPy and probably the invoke-dynamic work
on Jython (pursuant to point b).

Therefore I think these should be considered application issues, since request
limiting is difficult and error prone, I'd recommend the Python stdlib including
a non-hash based map (such as a binary tree).

Alex

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


raymond.hettinger at gmail

Dec 28, 2011, 6:09 PM

Post #6 of 151 (7296 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue.
It is believed that they give us better-than-random results for commonly
encountered datasets. A change to randomized hashes would have a
negative performance impact on those cases.

Also, randomizing the hash wreaks havoc on doctests, book examples
not matching actual dict reprs, and on efforts by users to optimize
the insertion order into dicts with frequent lookups.


Raymond





On Dec 28, 2011, at 5:28 PM, Michael Foord wrote:

> Hello all,
>
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>
> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Although it's a security issue I'm posting it here because it is now public and seems important.
>
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>
> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> 7 minutes of CPU usage for a 1 MB request
> ~20 kbits/s → keep one Core Duo core busy
>
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>
> Their recommended fix is to randomize the hash function.
>
> All the best,
>
> Michael
>
>
> --
> http://www.voidspace.org.uk/
>
>
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
> May you share freely, never taking more than you give.
> -- the sqlite blessing
> http://www.sqlite.org/different.html
>
>
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Dec 28, 2011, 6:55 PM

Post #7 of 151 (7282 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Am 29.12.2011 03:09, schrieb Raymond Hettinger:
> FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue.
> It is believed that they give us better-than-random results for commonly
> encountered datasets. A change to randomized hashes would have a
> negative performance impact on those cases.
>
> Also, randomizing the hash wreaks havoc on doctests, book examples
> not matching actual dict reprs, and on efforts by users to optimize
> the insertion order into dicts with frequent lookups.

My five cents on the topic:

I totally concur with Raymound. He, Tim and all the others did a
fantastic job with the dict implementation and optimization. We
shouldn't overreact and mess with the current hashing and dict code just
because a well-known and old attack vector pops up again. The dict code
is far too crucial for Python's overall performance. However the issue
should be documented in our docs.

I've been dealing with web stuff and security for almost a decade. I've
seen far worse attack vectors. This one can easily be solved with a
couple of lines of Python code. For example Application developers can
limit the maximum amount of POST parameters to a sensible amount and
limit the length of each key, too. The issue less severe on platforms
with 64bit hashes, so it won't affect most people. I think only 32bit
Unix and Windows in general (32bit long) are in trouble.

CPython could aid developers with a special subclass of dict. The
crucial lookup function is already overwrite-able per dict instance and
on subclasses of dict through PyDictObj's struct member PyDictEntry
*(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
specialized subclass could limit the seach for a free slot to n
recursions or choose to ignore the hash argument and calculate its own
hash of the key.

Christian
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Dec 28, 2011, 7:04 PM

Post #8 of 151 (7274 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Am 29.12.2011 02:37, schrieb Jesse Noller:
> Back up link for the PDF:
> http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Ocert disclosure:
> http://www.ocert.org/advisories/ocert-2011-003.html

>From http://www.nruns.com/_downloads/advisory28122011.pdf

---
Python uses a hash function which is very similar to DJBX33X, which can
be broken using a
meet-in-the-middle attack. It operates on register size and is thus
different for 64 and 32 bit
machines. While generating multi-collisions efficiently is also possible
for the 64 bit version
of the function, the resulting colliding strings are too large to be
relevant for anything more
than an academic attack.

Plone as the most prominent Python web framework accepts 1 MB of POST
data, which it
parses in about 7 minutes of CPU time in the worst case.
This gives an attacker with about 20 kbit/s the possibility to keep one
Core Duo core
constantly busy. If the attacker is in the position to have a Gigabit
line available, he can keep
about 50.000 Core Duo cores busy.
---

If I remember correctly CPython uses the long for its hash function so
64bit Windows uses a 32bit hash.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brian at python

Dec 28, 2011, 7:41 PM

Post #9 of 151 (7278 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Wed, Dec 28, 2011 at 19:51, Alex Gaynor <alex.gaynor [at] gmail> wrote:
> A few thoughts on this:
>
> a) This is not a new issue, I'm curious what the new interest is in it.

Well they (the presenters of the report) had to be accepted to that
conference for *something*, otherwise we wouldn't know they exist.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Dec 29, 2011, 2:32 AM

Post #10 of 151 (7260 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Thu, 29 Dec 2011 04:04:17 +0100
Christian Heimes <lists [at] cheimes> wrote:
> Am 29.12.2011 02:37, schrieb Jesse Noller:
> > Back up link for the PDF:
> > http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
> >
> > Ocert disclosure:
> > http://www.ocert.org/advisories/ocert-2011-003.html
>
> >From http://www.nruns.com/_downloads/advisory28122011.pdf
>
> ---
> Python uses a hash function which is very similar to DJBX33X, which can
> be broken using a
> meet-in-the-middle attack. It operates on register size and is thus
> different for 64 and 32 bit
> machines. While generating multi-collisions efficiently is also possible
> for the 64 bit version
> of the function, the resulting colliding strings are too large to be
> relevant for anything more
> than an academic attack.
>
> Plone as the most prominent Python web framework accepts 1 MB of POST
> data, which it
> parses in about 7 minutes of CPU time in the worst case.
> This gives an attacker with about 20 kbit/s the possibility to keep one
> Core Duo core
> constantly busy. If the attacker is in the position to have a Gigabit
> line available, he can keep
> about 50.000 Core Duo cores busy.
> ---
>
> If I remember correctly CPython uses the long for its hash function so
> 64bit Windows uses a 32bit hash.

Not anymore, Py_hash_t is currently aligned with Py_ssize_t.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Dec 29, 2011, 3:10 AM

Post #11 of 151 (7251 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Thu, 29 Dec 2011 03:55:22 +0100
Christian Heimes <lists [at] cheimes> wrote:
>
> I've been dealing with web stuff and security for almost a decade. I've
> seen far worse attack vectors. This one can easily be solved with a
> couple of lines of Python code. For example Application developers can
> limit the maximum amount of POST parameters to a sensible amount and
> limit the length of each key, too.

Shouldn't the setting be implemented by frameworks?

> CPython could aid developers with a special subclass of dict. The
> crucial lookup function is already overwrite-able per dict instance and
> on subclasses of dict through PyDictObj's struct member PyDictEntry
> *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
> specialized subclass could limit the seach for a free slot to n
> recursions or choose to ignore the hash argument and calculate its own
> hash of the key.

Or, rather, the specialized subclass could implement hash randomization.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mark at hotpy

Dec 29, 2011, 3:13 AM

Post #12 of 151 (7260 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Michael Foord wrote:
> Hello all,
>
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>
> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Although it's a security issue I'm posting it here because it is now public and seems important.
>
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>
> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> 7 minutes of CPU usage for a 1 MB request
> ~20 kbits/s → keep one Core Duo core busy
>
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>
> Their recommended fix is to randomize the hash function.
>

The attack relies on being able to predict the hash value for a given
string. Randomising the string hash function is quite straightforward.
There is no need to change the dictionary code.

A possible (*untested*) patch is attached. I'll leave it for those more
familiar with unicodeobject.c to do properly.

Cheers,
Mark
Attachments: hash.patch (1.33 KB)


mark at hotpy

Dec 29, 2011, 3:25 AM

Post #13 of 151 (7250 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Raymond Hettinger wrote:
> FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue.
> It is believed that they give us better-than-random results for commonly
> encountered datasets. A change to randomized hashes would have a
> negative performance impact on those cases.

Tim Peter's analysis applies mainly to ints which would be unchanged.

A change to the hash function for strings would make no difference to
the performance of the dict, as the ordering of the hash values is
already quite different from the ordering of the strings for any string
of more than 3 characters.

>
> Also, randomizing the hash wreaks havoc on doctests, book examples
> not matching actual dict reprs, and on efforts by users to optimize
> the insertion order into dicts with frequent lookups.

The docs clearly state that the ordering of iteration over dicts is
arbitrary. Perhaps changing it once in a while might be a good thing :)


Cheers,
Mark.

>
>
> Raymond
>
>
>
>
>
> On Dec 28, 2011, at 5:28 PM, Michael Foord wrote:
>
>> Hello all,
>>
>> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>>
>> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>>
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>>
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>>
>> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>> 7 minutes of CPU usage for a 1 MB request
>> ~20 kbits/s → keep one Core Duo core busy
>>
>> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>>
>> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>>
>> Their recommended fix is to randomize the hash function.
>>
>> All the best,
>>
>> Michael
>>
>>
>> --
>> http://www.voidspace.org.uk/
>>
>>
>> May you do good and not evil
>> May you find forgiveness for yourself and forgive others
>> May you share freely, never taking more than you give.
>> -- the sqlite blessing
>> http://www.sqlite.org/different.html
>>
>>
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev [at] python
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


armin.ronacher at active-4

Dec 29, 2011, 3:29 AM

Post #14 of 151 (7253 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Hi,

Just some extra thoughts about the whole topic in the light of web
applications (since this was hinted in the talk) running on Python:

Yes, you can limit the number of maximum allowed parameters for post
data but really there are so many places where data is parsed into
hashing containers that it's quite a worthless task. Here a very
brief list of things usually parsed into a dict or set and where it
happens:

- URL parameters and url encoded form data
Generally this happens somewhere in a framework but typically also
in utility libraries that deal with URLs. For instance the
stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do
just that and that code is used left and right.

Even if a framework would start limiting it's own URL parsing there
is still a lot of code that does not do that the stdlib does that
as well.

With form data it's worse because you have multipart headers that
need parsing and that is usually abstracted away so far from the
user that they do not do that. Many frameworks just use the cgi
module's parsing functions which also just directly feed into a
dictionary.

- HTTP headers.
There is zero a WSGI framework can do about that since the headers
are parsed into a dictionary by the WSGI server.

- Incoming JSON data.
Again outside of what the framework can do for the most part.
simplejson can be modified to stop parsing with the hook stuff
but nobody does that and since users invoke simplejson's parsing
routines themselves most webapps would still be vulnerable even
if all frameworks would fix the problem.

- Hidden dict parameters.
Things like the parameter part of content-type or the
content-disposition headers are generally also just parsed into a
dictionary. Likewise many frameworks parse things into set headers
(for instance incoming etags). The cookie header is usually parsed
into a dictionary as well.

The issue is nothing new and at least my current POV on this topic was
that your server should be guarded and shoot handlers of requests going
rogue. Dictionaries are not the only thing that has a worst case
performance that could be triggered by user input.

That said. Considering that there are so many different places where
things are probably close to arbitrarily long that is parsed into a
dictionary or other hashing structure it's hard for a web application
developer or framework to protect itself against.

In case the watchdog is not a viable solution as I had assumed it was, I
think it's more reasonable to indeed consider adding a flag to Python
that allows randomization of hashes optionally before startup.

However as it was said earlier, the attack is a lot more complex to
carry out on a 64bit environment that it's probably (as it stands right
now!) safe to ignore.

The main problem there however is not that it's a new attack but that
some dickheads could now make prebaked attacks against websites to
disrupt them that might cause some negative publicity. In general
though there are so many more ways to DDOS a website than this that I
would rate the whole issue very low.


Regards,
Armin

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Dec 29, 2011, 3:42 AM

Post #15 of 151 (7250 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Thu, 29 Dec 2011 11:25:03 +0000
Mark Shannon <mark [at] hotpy> wrote:
> >
> > Also, randomizing the hash wreaks havoc on doctests, book examples
> > not matching actual dict reprs, and on efforts by users to optimize
> > the insertion order into dicts with frequent lookups.
>
> The docs clearly state that the ordering of iteration over dicts is
> arbitrary. Perhaps changing it once in a while might be a good thing :)

We already change it once in a while.
http://twistedmatrix.com/trac/ticket/5352
;)

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mal at egenix

Dec 29, 2011, 4:49 AM

Post #16 of 151 (7237 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Mark Shannon wrote:
> Michael Foord wrote:
>> Hello all,
>>
>> A paper (well, presentation) has been published highlighting security problems with the hashing
>> algorithm (exploiting collisions) in many programming languages Python included:
>>
>>
>> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>>
>>
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>>
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of
>> cpu. For Python the figures they quoted:
>>
>> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>> 7 minutes of CPU usage for a 1 MB request
>> ~20 kbits/s → keep one Core Duo core busy
>>
>> This was apparently reported to the security list, but hasn't been responded to beyond an
>> acknowledgement on November 24th (the original report didn't make it onto the security list
>> because it was held in a moderation queue).
>> The same vulnerability was reported against various languages and web frameworks, and is already
>> fixed in some of them.
>>
>> Their recommended fix is to randomize the hash function.
>>
>
> The attack relies on being able to predict the hash value for a given string. Randomising the string
> hash function is quite straightforward.
> There is no need to change the dictionary code.
>
> A possible (*untested*) patch is attached. I'll leave it for those more familiar with
> unicodeobject.c to do properly.

The paper mentions that several web frameworks work around this by
limiting the number of parameters per GET/POST/HEAD request.

This sounds like a better alternative than randomizing the hash
function of strings.

Uncontrollable randomization has issues when you work with
multi-process setups, since the processes would each use different
hash values for identical strings. Putting the base_hash value
under application control could be done to solve this problem,
making sure that all processes use the same random base value.

BTW: Since your randomization trick uses the current time, it would
also be rather easy to tune an attack to find the currently
used base_hash. To make this safe, you'd have to use a more
random source for initializing the base_hash.

Note that the same hash collision attack can be used for
other key types as well, e.g. integers (where it's very easy
to find hash collisions), so this kind of randomization
would have to be applied to other basic types too.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Dec 29 2011)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


armin.ronacher at active-4

Dec 29, 2011, 4:57 AM

Post #17 of 151 (7243 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Hi,

Something I should add to this now that I thought about it a bit more:

Assuming this should be fixed on a language level the solution would
probably be to salt hashes. The most common hash to salt here is the
PyUnicode hash for obvious reasons.

- Option a: Compiled in Salt
+ Easy to implement
- Breaks unittests most likely (those were broken in the first place
but that's still a very annoying change to make)
- Might cause problems with interoperability of Pythons compiled with
different hash salts
- You're not really solving the problem because each linux
distribution (besides Gentoo I guess) would have just one salt
compiled in and that would be popular enough to have the same
issue.

- Option b: Environment variable for the salt
+ Easy-ish to implement
+ Easy to synchronize over different machines
- initialization for base types happens early and unpredictive which
makes it hard for embedded Python interpreters (think mod_wsgi and
other things) to specify the salt

- Option c: Random salt at runtime
+ Easy to implement
- impossible to synchronize
- breaks unittests in the same way as a compiled in salt would do

Where to add the salt to? Unicode strings and bytestrings (byte
objects) I guess since those are the most common offenders. Sometimes
tuples are keys of dictionaries but in that case a contributing factor
to the hash is the string in the tuple anyways.

Also related: since this is a security related issue, would this be
something that goes into Python 2? Does that affect how a fix would
look like?


Regards,
Armin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Dec 29, 2011, 5:04 AM

Post #18 of 151 (7233 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Am 29.12.2011 12:10, schrieb Antoine Pitrou:
>> I've been dealing with web stuff and security for almost a decade. I've
>> seen far worse attack vectors. This one can easily be solved with a
>> couple of lines of Python code. For example Application developers can
>> limit the maximum amount of POST parameters to a sensible amount and
>> limit the length of each key, too.
>
> Shouldn't the setting be implemented by frameworks?

Web framework like Django or CherryPy can be considered an application
from the CPython core's point of view. ;)
You are right. The term "framework" is a better word.

>> CPython could aid developers with a special subclass of dict. The
>> crucial lookup function is already overwrite-able per dict instance and
>> on subclasses of dict through PyDictObj's struct member PyDictEntry
>> *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
>> specialized subclass could limit the seach for a free slot to n
>> recursions or choose to ignore the hash argument and calculate its own
>> hash of the key.
>
> Or, rather, the specialized subclass could implement hash randomization.

Yeah! I was thinking about the same when I wrote "calculate its own
hash" but I was too sloppy to carry on my argument. Please take 3am as
my excuse.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


hs at ox

Dec 29, 2011, 5:11 AM

Post #19 of 151 (7240 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Hi,

how about

Option d: Host based salt
+ Easy-ish to implement – how about basing it on the hostname for example?
+ transparent for all processes on the same host
- probably unit test breakage

In fact, we could use host based as default with the option to specify own which would solve the sync problems.

That said, I agree with Armin that fixing this in the frameworks isn't an option.

Regards,
Hynek


Am Donnerstag, 29. Dezember 2011 um 13:57 schrieb Armin Ronacher:

> Hi,
>
> Something I should add to this now that I thought about it a bit more:
>
> Assuming this should be fixed on a language level the solution would
> probably be to salt hashes. The most common hash to salt here is the
> PyUnicode hash for obvious reasons.
>
> - Option a: Compiled in Salt
> + Easy to implement
> - Breaks unittests most likely (those were broken in the first place
> but that's still a very annoying change to make)
> - Might cause problems with interoperability of Pythons compiled with
> different hash salts
> - You're not really solving the problem because each linux
> distribution (besides Gentoo I guess) would have just one salt
> compiled in and that would be popular enough to have the same
> issue.
>
> - Option b: Environment variable for the salt
> + Easy-ish to implement
> + Easy to synchronize over different machines
> - initialization for base types happens early and unpredictive which
> makes it hard for embedded Python interpreters (think mod_wsgi and
> other things) to specify the salt
>
> - Option c: Random salt at runtime
> + Easy to implement
> - impossible to synchronize
> - breaks unittests in the same way as a compiled in salt would do
>
> Where to add the salt to? Unicode strings and bytestrings (byte
> objects) I guess since those are the most common offenders. Sometimes
> tuples are keys of dictionaries but in that case a contributing factor
> to the hash is the string in the tuple anyways.
>
> Also related: since this is a security related issue, would this be
> something that goes into Python 2? Does that affect how a fix would
> look like?
>
>
> Regards,
> Armin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python (mailto:Python-Dev [at] python)
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/hs%40ox.cx



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Dec 29, 2011, 5:19 AM

Post #20 of 151 (7240 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Am 29.12.2011 12:13, schrieb Mark Shannon:
> The attack relies on being able to predict the hash value for a given
> string. Randomising the string hash function is quite straightforward.
> There is no need to change the dictionary code.
>
> A possible (*untested*) patch is attached. I'll leave it for those more
> familiar with unicodeobject.c to do properly.

I'm worried that hash randomization of str is going to break 3rd party
software that rely on a stable hash across multiple Python instances.
Persistence layers like ZODB and cross interpreter communication
channels used by multiprocessing may (!) rely on the fact that the hash
of a string is fixed.

Perhaps the dict code is a better place for randomization. The code in
lookdict() and lookdict_unicode() could add a value to the hash. My
approach is less intrusive and also closes the attack vector for all
possible objects including str, byte, int and so on. I like also Armin's
idea of an optional hash randomization.

Christian
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Dec 29, 2011, 5:21 AM

Post #21 of 151 (7236 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Thu, 29 Dec 2011 13:57:07 +0100
Armin Ronacher <armin.ronacher [at] active-4> wrote:
>
> - Option c: Random salt at runtime
> + Easy to implement
> - impossible to synchronize
> - breaks unittests in the same way as a compiled in salt would do

This option would have my preference. I don't think hash() was ever
meant to be "synchronizable". Already using a 32-bit Python will give
you different results from a 64-bit Python.

As for breaking unittests, these tests were broken in the first place.
hash() does change from time to time.

> Where to add the salt to? Unicode strings and bytestrings (byte
> objects) I guess since those are the most common offenders. Sometimes
> tuples are keys of dictionaries but in that case a contributing factor
> to the hash is the string in the tuple anyways.

Or it could be a process-wide constant for all dicts. If the constant
is additive as proposed by Mark the impact should be negligible.
(but the randomness must be good enough)

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


fdrake at acm

Dec 29, 2011, 5:30 AM

Post #22 of 151 (7240 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Thu, Dec 29, 2011 at 8:19 AM, Christian Heimes <lists [at] cheimes> wrote:
> Persistence layers like ZODB and cross interpreter communication
> channels used by multiprocessing may (!) rely on the fact that the hash
> of a string is fixed.

ZODB does not rely on a fixed hash function for strings; for any application
to rely on a stable hash would cause problems when updating Python versions.


-Fred

--
Fred L. Drake, Jr. <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
--Samuel Langhorne Clemens
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Dec 29, 2011, 5:32 AM

Post #23 of 151 (7230 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Am 29.12.2011 13:57, schrieb Armin Ronacher:
> Hi,
>
> Something I should add to this now that I thought about it a bit more:
>
> Assuming this should be fixed on a language level the solution would
> probably be to salt hashes. The most common hash to salt here is the
> PyUnicode hash for obvious reasons.
>
> - Option a: Compiled in Salt
> + Easy to implement
> - Breaks unittests most likely (those were broken in the first place
> but that's still a very annoying change to make)
> - Might cause problems with interoperability of Pythons compiled with
> different hash salts
> - You're not really solving the problem because each linux
> distribution (besides Gentoo I guess) would have just one salt
> compiled in and that would be popular enough to have the same
> issue.
>
> - Option b: Environment variable for the salt
> + Easy-ish to implement
> + Easy to synchronize over different machines
> - initialization for base types happens early and unpredictive which
> makes it hard for embedded Python interpreters (think mod_wsgi and
> other things) to specify the salt
>
> - Option c: Random salt at runtime
> + Easy to implement
> - impossible to synchronize
> - breaks unittests in the same way as a compiled in salt would do

- Option d: Don't change __hash__ but only use randomized hash for
PyDictEntry lookup
+ Easy to implement
- breaks only software to relies on a fixed order of dict keys
- breaks only a few to no unit tests

IMHO we don't have to alter the outcome of hash("some string"), hash(1)
and all other related types. We just need to reduce the change the an
attacker can produce collisions in the dict (and set?) code that looks
up the slot (PyDictEntry). How about adding the random value in
Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
lookdict_unicode() (Python 3.x)? With this approach the hash of all our
objects stay the same and just the dict code needs to be altered. The
approach has also the benefit that all possible objects gain a
randomized hash.

> Also related: since this is a security related issue, would this be
> something that goes into Python 2? Does that affect how a fix would
> look like?

IMHO it does affect the fix. A changed and randomized hash function may
break software that relies on a stable hash() function.

Christian
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Dec 29, 2011, 5:34 AM

Post #24 of 151 (7226 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

Am 29.12.2011 11:32, schrieb Antoine Pitrou:
>> If I remember correctly CPython uses the long for its hash function so
>> 64bit Windows uses a 32bit hash.
>
> Not anymore, Py_hash_t is currently aligned with Py_ssize_t.

Thanks for the update. Python 2.x still uses long and several large
frameworks like Zope/Plone require 2.x.

Christian
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Dec 29, 2011, 6:14 AM

Post #25 of 151 (7232 views)
Permalink
Re: Hash collision security issue (now public) [In reply to]

On Thu, 29 Dec 2011 14:32:21 +0100
Christian Heimes <lists [at] cheimes> wrote:
> Am 29.12.2011 13:57, schrieb Armin Ronacher:
> > Hi,
> >
> > Something I should add to this now that I thought about it a bit more:
> >
> > Assuming this should be fixed on a language level the solution would
> > probably be to salt hashes. The most common hash to salt here is the
> > PyUnicode hash for obvious reasons.
> >
> > - Option a: Compiled in Salt
> > + Easy to implement
> > - Breaks unittests most likely (those were broken in the first place
> > but that's still a very annoying change to make)
> > - Might cause problems with interoperability of Pythons compiled with
> > different hash salts
> > - You're not really solving the problem because each linux
> > distribution (besides Gentoo I guess) would have just one salt
> > compiled in and that would be popular enough to have the same
> > issue.
> >
> > - Option b: Environment variable for the salt
> > + Easy-ish to implement
> > + Easy to synchronize over different machines
> > - initialization for base types happens early and unpredictive which
> > makes it hard for embedded Python interpreters (think mod_wsgi and
> > other things) to specify the salt
> >
> > - Option c: Random salt at runtime
> > + Easy to implement
> > - impossible to synchronize
> > - breaks unittests in the same way as a compiled in salt would do
>
> - Option d: Don't change __hash__ but only use randomized hash for
> PyDictEntry lookup
> + Easy to implement
> - breaks only software to relies on a fixed order of dict keys
> - breaks only a few to no unit tests
>
> IMHO we don't have to alter the outcome of hash("some string"), hash(1)
> and all other related types. We just need to reduce the change the an
> attacker can produce collisions in the dict (and set?) code that looks
> up the slot (PyDictEntry). How about adding the random value in
> Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
> lookdict_unicode() (Python 3.x)? With this approach the hash of all our
> objects stay the same and just the dict code needs to be altered. The
> approach has also the benefit that all possible objects gain a
> randomized hash.

I basically agree with your proposal. The only downside is that custom
hashed containers (such as _pickle.c's memotable) don't
automatically benefit. That said, I think it would be difficult to
craft an attack against the aforementioned memotable (you would have
to basically choose the addresses of pickled objects).

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 3 4 5 6 7 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.