The Beehive, City Place, Gatwick, RH6 0PA, United Kingdom
+44 (0)20 801 74646

Automatic Detection of Proprietary Coding Rule Violations

Modern, static code analysis tools for C++ and C provide a multitude of checkers out of the box, capable of detecting many different types of defect and violation. In addition, there is likely to be lots of configurable parameters that the adventurous (or instructed!) team can tinker with to try and bend some aspect of the tools operation more to their demands. Inevitably though, situations will arise where the tool simply cannot be tuned to detect a teams particular requirement(s). These could include specialist situations impossible to predict by the tool authors, or perhaps, the team simply has certain requirements that are too unique to their environment. In many of these cases, CodeSonar, our static analysis tool for C and C++, does actually provide a solution: custom checkers.

CodeSonar provides a rich API for the creation of custom checkers. This API is offered in a number of different languages – including C, C++, C#, Java, and Python. The API provides functionality allowing your custom checker to piggy back its analysis requirements on the existent analysis framework. In more detailed terms, as the analysis automatically traverses along the statically valid paths of execution through your codebase (known as symbolic execution), your checker gets to delve into the details of the currently visited source code location, where upon, you can extract the values and states of variables or pointers, and other important characteristics, which can then be used to detect your particular proprietary issues. Once such an issue has been determined, the API also provides methods for annotating the source with English commentary, as well as registering the issue so that its reported in the same way as any other built in checker.

However, some defects in code can be more stylistic in nature, more along the lines of standards compliance checks, such as many within the well known MISRA C suite of coding guidelines. These types of checks are more concerned with avoiding or mandating that certain language characteristics are used, such as the self explanatory rule 16.4 “Every switch statement shall have a default label”. In these cases, the deep capabilities of CodeSonar’s symbolic execution analysis are not required to detect the violation. Instead all that is needed is to pick out the existence (or not) of keywords or other source code tokens, the result of which will be a warning highlighting the deviation from the expected rule. In these types of case, the checker author can rely on something called the Abstract Syntax Tree (AST). The AST is the source code rearranged into a grammatically equivalent hierarchical format, where your code is organised into tree elements, along with various properties of the code, which are stored as attributes. This transformation is carried out by all compilers, but unlike most developers experience, Codesonars compiler outputs them. The purpose of this translation into the AST is to provide the source code in a form that the subsequent static analysis can traverse and analyse more easily, compared to the untokenised and unstructured file based source code.

As an example of such a checker, let’s consider the case where it’s disallowed to declare C++ class member variables as public. We will be using the C++ version of the API, as it offers several advantages over the C based API (easier debugging, simpler memory management to name two). What our custom checker simply needs to do is look in each class definition in the code base, and each time a member variable is declared with public access, issue a warning. So, the first thing we need to do is understand the AST hierarchy enough to locate the offending variables. Fortunately, CodeSonar provides a routine to dump out the AST for the currently visited location. Before we show that, here is a simple example program that contains two failures of this custom rule:

Yes, this is a simple example, but as with many simple bugs, as soon as the constituent parts are distributed through longer and more densely packed code, even multiple files, they become much harder to manually detect.

With the above in mind, here is an abbreviated snapshot of the corresponding AST (please click to enlarge plus there is a download of this AST and code at the end of this post):

The actual dump of the AST contains much much more detail; there are many more nodes, children and attributes (it’s very revealing to see how augmented the original source code is by the compiler), but this is enough to clearly see how the AST matches up to our example source code. At the root of the AST tree for the source file we show the first two children, source-file and file-scope. Source-file is just file related generalities such as name and directory. File-scope contains what we really care about. As file-scope implies, this particular tree contains children declared at the, wait for it…. file scope! I only show the types child here; there are several additional child nodes beneath file-scope. Regarding the types child (called “types:(cc:ast-list)”) following the AST logic, we are now being presented with the detailed set of types declared at file-scope. Again, the AST contains more types children than expected, but I’m almost only showing just the relevant parts. Following the next “+” after “types:(cc:ast-list)” is the first type which is a class called type-info, which is another example of something that the compiler injects into the source on our behalf. The next two children of types are our two classes “classA” and “classB”. In each case, we can see a further set of children detailing the class members (intA, intB, intC & intD). It is at this level that we also get the all important attributes (again, reduced for clarity), including the access modifier applied to the parent member variable.

With this appreciation of the AST in place, we can now write our custom checker. As mentioned in a previous blog post (“writing custom code checkers in codesonar”), there is a degree of necessary boiler place code which I’ll skip here. I’ll just step into the interesting bit. For ease of explanation, I have annotated the code below with explanatory comments:

After compiling, and configuring CodeSonar to use the above new checker, CodeSonar will report occurrences of this new warning exactly as any other out of the box checker would.

And in terms of the actual warning detail, it would appear as follows:

If you would like to download a txt file of the code for the example programs and the AST in this coding rule violations blog, please click here

Did you enjoy this post?

Subscribe to our newsletter and to keep up to date on blog posts, product updates and events.

Related Posts

Leave a comment