Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Static analysis of CPython using coccinelle/spatch

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


dmalcolm at redhat

Nov 16, 2009, 12:27 PM

Post #1 of 5 (784 views)
Permalink
Static analysis of CPython using coccinelle/spatch

Has anyone else looked at using Coccinelle/spatch[1] on CPython source
code?

It's a GPL-licensed tool for matching semantic patterns in C source
code. It's been used on the Linux kernel for detecting and fixing
problems, and for autogenerating patches when refactoring
(http://coccinelle.lip6.fr/impact_linux.php). Although it's implemented
in OCaml, it is scriptable using Python.

I've been experimenting with using it on CPython code, both on the core
implementation, and on C extension modules.

As a test, I've written a validator for the mini-language used by
PyArg_ParseTuple and its variants. My code examines the types of the
variables passed as varargs, and attempts to check that they are
correct, according to the rules here
http://docs.python.org/c-api/arg.html (and in Python/getargs.c)

It can detect this old error (fixed in svn r34931):
buggy.c:12:socket_htons:Mismatching type of argument 1 in ""i:htons"":
expected "int *" but got "unsigned long *"

Similarly, it finds the deliberate error in xxmodule.c:
xxmodule.c:207:xx_roj:unknown format char in "O#:roj": '#'

(Unfortunately, when run on the full source tree, I see numerous
messages, and as far as I can tell, the others are false positives)

You can see the code here:
http://fedorapeople.org/gitweb?p=dmalcolm/public_git/check-cpython.git;a=tree
and download using anonymous git in this manner:
git clone git://fedorapeople.org/home/fedora/dmalcolm/public_git/check-cpython.git

The .cocci file detects invocations of PyArg_ParseTuple and determines
the types of the arguments. At each matching call site it invokes
python code, passing the type information to validate.py's
validate_types.

(I suspect it's possible to use spatch to detect reference counting
antipatterns; I've also attempted 2to3 refactoring of c code using
semantic patches, but so far macros tend to get in the way).

Alternatively, are there any other non-proprietary static analysis tools
for CPython?

Thoughts?
Dave

[1] http://coccinelle.lip6.fr/

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Nov 17, 2009, 1:03 PM

Post #2 of 5 (709 views)
Permalink
Re: Static analysis of CPython using coccinelle/spatch [In reply to]

On Mon, Nov 16, 2009 at 12:27, David Malcolm <dmalcolm [at] redhat> wrote:
> Has anyone else looked at using Coccinelle/spatch[1] on CPython source
> code?

Not that has been mentioned on the list before.

>
> It's a GPL-licensed tool for matching semantic patterns in C source
> code. It's been used on the Linux kernel for detecting and fixing
> problems, and for autogenerating patches when refactoring
> (http://coccinelle.lip6.fr/impact_linux.php).  Although it's implemented
> in OCaml, it is scriptable using Python.
>
> I've been experimenting with using it on CPython code, both on the core
> implementation, and on C extension modules.
>
> As a test, I've written a validator for the mini-language used by
> PyArg_ParseTuple and its variants.  My code examines the types of the
> variables passed as varargs, and attempts to check that they are
> correct, according to the rules here
> http://docs.python.org/c-api/arg.html (and in Python/getargs.c)
>
> It can detect this old error (fixed in svn r34931):
> buggy.c:12:socket_htons:Mismatching type of argument 1 in ""i:htons"":
> expected "int *" but got "unsigned long *"
>
> Similarly, it finds the deliberate error in xxmodule.c:
> xxmodule.c:207:xx_roj:unknown format char in "O#:roj": '#'
>
> (Unfortunately, when run on the full source tree, I see numerous
> messages, and as far as I can tell, the others are false positives)
>
> You can see the code here:
> http://fedorapeople.org/gitweb?p=dmalcolm/public_git/check-cpython.git;a=tree
> and download using anonymous git in this manner:
> git clone git://fedorapeople.org/home/fedora/dmalcolm/public_git/check-cpython.git
>
> The .cocci file detects invocations of PyArg_ParseTuple and determines
> the types of the arguments.  At each matching call site it invokes
> python code, passing the type information to validate.py's
> validate_types.
>
> (I suspect it's possible to use spatch to detect reference counting
> antipatterns; I've also attempted 2to3 refactoring of c code using
> semantic patches, but so far macros tend to get in the way).
>
> Alternatively, are there any other non-proprietary static analysis tools
> for CPython?

Specific to CPython? No. But I had a chance to run practically every
major commercial static analysis tool over the code base back on 2006.
We also occasionally run valgrind over the code. But thanks to have we
have structured the code and taken performance shortcuts static
analysis tools easily get tripped up by CPython (as you have
discovered).

>
> Thoughts?

Running the tool over the code base and reporting the found bugs would
be appreciated.

-Brett


> Dave
>
> [1] http://coccinelle.lip6.fr/
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


amk at amk

Nov 17, 2009, 1:41 PM

Post #3 of 5 (699 views)
Permalink
Re: Static analysis of CPython using coccinelle/spatch [In reply to]

On Mon, Nov 16, 2009 at 03:27:53PM -0500, David Malcolm wrote:
> Has anyone else looked at using Coccinelle/spatch[1] on CPython source
> code?

For an excellent explanation of Coccinelle, see
<http://lwn.net/Articles/315686/>.

--amk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


tjreedy at udel

Nov 17, 2009, 4:45 PM

Post #4 of 5 (707 views)
Permalink
Re: Static analysis of CPython using coccinelle/spatch [In reply to]

A.M. Kuchling wrote:
> On Mon, Nov 16, 2009 at 03:27:53PM -0500, David Malcolm wrote:
>> Has anyone else looked at using Coccinelle/spatch[1] on CPython source
>> code?
>
> For an excellent explanation of Coccinelle, see
> <http://lwn.net/Articles/315686/>.

For those who have not looked, Coccinelle means ladybug (a bug-eating
bug ;-) in French. Its principle use to to take C code and a SmPl file
of high-level patch descriptions (fixers, in 2to3 talk) and produce a
standard diff file. I wonder if this could be used to help people
migrate C extensions to 3.1, by developing a SmPl file with the needed
changes dictated by API changes. This is similar to its motivating
application to Linux. From

http://coccinelle.lip6.fr/

"Coccinelle is a program matching and transformation engine which
provides the language SmPL (Semantic Patch Language) for specifying
desired matches and transformations in C code. Coccinelle was initially
targeted towards performing collateral evolutions in Linux. Such
evolutions comprise the changes that are needed in client code in
response to evolutions in library APIs, and may include modifications
such as renaming a function, adding a function argument whose value is
somehow context-dependent, and reorganizing a data structure. "

As I understand it, the problem with C extensions and 3.1 is the current
lack of a "collateral evolution" tool like 2to3 for Python code.

Terry Jan Reedy




_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


dmalcolm at redhat

Nov 18, 2009, 11:09 AM

Post #5 of 5 (690 views)
Permalink
Re: Static analysis of CPython using coccinelle/spatch [In reply to]

On Tue, 2009-11-17 at 13:03 -0800, Brett Cannon wrote:
> On Mon, Nov 16, 2009 at 12:27, David Malcolm <dmalcolm [at] redhat> wrote:
> > Has anyone else looked at using Coccinelle/spatch[1] on CPython source
> > code?
[snip]

> Running the tool over the code base and reporting the found bugs would
> be appreciated.

Discounting the false positives, the only issue it finds in python
itself (trunk) is the deliberate mistake in Modules/xxmodule.c

I also ran it on a random sample of extension modules and found some
real bugs (only reported downstream so far, within Fedora's bug
tracker):
- DBus python bindings assume in one place that "unsigned long" is
32 bits wide: https://bugzilla.redhat.com/show_bug.cgi?id=538225
- MySQL-python assumes in one place that sizeof(int) ==
sizeof(long):
https://bugzilla.redhat.com/show_bug.cgi?id=538234
- rpm.ps.append() uses unrecognized 'N' format specifier:
https://bugzilla.redhat.com/show_bug.cgi?id=538218


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.