Dumping a C program's AST with the Psyche-C Compiler Frontend

Psyche-C is a C compiler frontend that is designed for the implementation of static analysis tools. It provides a clean separation between syntax and semantics (with algorithmic- and heuristic-based syntax disambiguation), tolerance against #include failures—which is supported by the type inference of struct, union, enum, and typedef —, and API inspired by that of the Roslyn .NET compiler.

**The cnippet driver**

Psyche-C’s main use is through a C++ library. Specifically, by any tool that wishes to implement static analysis on C programs. But Psyche-C may as well be used as an ordinary C parser via the cnippet driver. In this little article, I will provide you with a glimpse of the AST (Abstract Syntax Tree) that Psyche-C produces as the result of parsing a C program. But first, a quick illustration on how to invoke cnippet with the cnip command.

void f()
{
    int a
}

The AST of a C program

As the result of parsing a C program, Psyche-C produces an AST. This AST resembles that of the Clang (LLVM) frontend. Yet, a node in Psyche-C's AST does not always have a correspondent in Clang's AST. For instance, consider the program below and its Psyche-C produced AST.

int v, f(int d);

int f(int p)
{
    if (p == v)
        return(p - 1);
    return 0;
}

Now, consider the Clang produced AST for that same C program.

What's the difference between the two?

Psyche-C's AST is a bit closer to the grammar of C. For instance, you can't see declarators in Clang's AST, as they are absorbed by their containing declarations.

To better understand the statement above, consider the following declaration.

int v, f(int d);

In Clang's AST, v is represented by a VarDecl and f by a FunctionDecl; both these nodes inherit from DeclaratorDecl (i.e., a declaration with a declarator). However, in pedantic terms, that representation isn't quite correct, given that only a single declaration — starting at int and ending at ; — exists in that snippet.

In Psyche-C’s AST, a single VariableAndOrFunctionDeclaration is created (like in Clang, this node inherits from DeclaratorDeclaration) with two child nodes: an IdentifierDeclarator, designating object v of type int, and a FunctionDeclarator, designating function f whose return type is int. This is a finer-grained representation of the program syntax and one that enables an accurate rewrite of an AST—a characteristic of prime importance for static analysis tools that perform source code refactoring.

Abstraction and concreteness

Toward statically analyzing a program, the AST produced by a parser should abstract the program syntax at the “right” level. On one side, an AST should express, as much as possible, the meaning of a syntax and ignore grammar technicalities; on another side, an AST should expose any grammar technicality that affects, as least as possible, the meaning of a syntax. Producing an AST at a sweet spot between abstraction and concreteness, with precise conformance to the C Standard, is a primary goal of Psyche-C’s parser.

Give Psyche-C a try… I'd love to hear your feedback. Thanks!