Dumping a C program's AST with the Psyche-C Compiler Frontend

Written by ltcmelo | Published 2022/03/08
Tech Story Tags: c | c-programming | parsing | compiler | ast | syntax-analysis | c-language | hackernoon-top-story

TLDRPsyche-C is a C compiler frontend that is designed for the implementation of static analysis tools. It provides clean separation between syntax and semantics (with algorithmic- and heuristic-based syntax disambiguation) It provides tolerance against.#include failures, tolerance against `#include` failures, and API inspired by that of the [Roslyn.NET] frontend. But it may as well be used as an ordinary C. parser via the *cnippet* driver. In this little article, I will provide you with a glimpse of the AST that Psyche C produces as the result of parsing a C program.via the TL;DR App

Psyche-C is a C compiler frontend that is designed for the implementation of static analysis tools. It provides a clean separation between syntax and semantics (with algorithmic- and heuristic-based syntax disambiguation), tolerance against #include failures—which is supported by the type inference of structunionenum, and typedef —, and API inspired by that of the Roslyn .NET compiler.

The cnippet driver

Psyche-C’s main use is through a C++ library. Specifically, by any tool that wishes to implement static analysis on C programs. But Psyche-C may as well be used as an ordinary C parser via the cnippet driver. In this little article, I will provide you with a glimpse of the AST (Abstract Syntax Tree) that Psyche-C produces as the result of parsing a C program. But first, a quick illustration on how to invoke cnippet with the cnip command.

void f()
{
    int a
}

The AST of a C program

As the result of parsing a C program, Psyche-C produces an AST. This AST resembles that of the Clang (LLVM) frontend. Yet, a node in Psyche-C's AST does not always have a correspondent in Clang's AST. For instance, consider the program below and its Psyche-C produced AST.

int v, f(int d);

int f(int p)
{
    if (p == v)
        return(p - 1);
    return 0;
}

Now, consider the Clang produced AST for that same C program.

What's the difference between the two?

Psyche-C's AST is a bit closer to the grammar of C. For instance, you can't see declarators in Clang's AST, as they are absorbed by their containing declarations.

To better understand the statement above, consider the following declaration.

int v, f(int d);

In Clang's AST, v is represented by a VarDecl and f by a FunctionDecl; both these nodes inherit from DeclaratorDecl (i.e., a declaration with a declarator). However, in pedantic terms, that representation isn't quite correct, given that only a single declaration — starting at int and ending at ; — exists in that snippet.

In Psyche-C’s AST, a single VariableAndOrFunctionDeclaration is created (like in Clang, this node inherits from DeclaratorDeclaration) with two child nodes: an IdentifierDeclarator, designating object v of type int, and a FunctionDeclarator, designating function f whose return type is int. This is a finer-grained representation of the program syntax and one that enables an accurate rewrite of an AST—a characteristic of prime importance for static analysis tools that perform source code refactoring.

Abstraction and concreteness

Toward statically analyzing a program, the AST produced by a parser should abstract the program syntax at the “right” level. On one side, an AST should express, as much as possible, the meaning of a syntax and ignore grammar technicalities; on another side, an AST should expose any grammar technicality that affects, as least as possible, the meaning of a syntax. Producing an AST at a sweet spot between abstraction and concreteness, with precise conformance to the C Standard, is a primary goal of Psyche-C’s parser.

Give Psyche-C a try… I'd love to hear your feedback. Thanks!


Written by ltcmelo | http://ltcmelo.com
Published by HackerNoon on 2022/03/08