Ticket #2553 (new action item)

Opened 6 years ago

Last modified 6 years ago

Select a new Style Checker for C++

Reported by: robyn Owned by: robyn
Priority: normal Milestone:
Component: TCT Keywords:
Cc: gpdf@…, ktl@…, mjuric@…, rplante@…, rhl@…, rowen@…, smm@… Blocked By:
Blocking: Project: LSST
Version Number:
How to repeat:

not applicable

Description

DM has been without a style checker since the Parasoft contract was allowed to lapse.

DMTF-1283 task is "Evaluate and Select Coding Standards Tools". This ticket will report on a recent software review and ask the TCT to comment on the proposed selection.

Change History

comment:1 Changed 6 years ago by robyn

  • Cc gpdf@…, ktl@…, mjuric@…, rplante@…, rhl@…, rowen@…, smm@… added

The possibilities for C++ Style Checkers were to be narrowed to those which fit the constraints for: zero cost, public domain software; user-defined rule specification; modes to turn on/off selected rules; efficiency in setup and execution; and comprehensible error reports. The two tools selected for detailed consideration: Google's "cpplint" and Steve Bickerton's "style.py", were compared against the DM C++ Coding Standards and tested using the DM code.

Of particular note: "style.py" was written explicitly for the DM C++ coding style standard as a developer desktop alternative to Parasoft's enterprise tool, "cppcheck", which provided standards checking, static code analysis and common coding error detection. "Cppcheck" was abandoned in 2011 due to its continuing problems handling DM's complex template specification and use.

In both tools a generic rule specification scenario is missing. Instead each rule is hand coded. Embedded functions provide support for token parsing; general C++ language structure recognition; and tailoring of naming rules.

Out of 123 total Rules specified in DM's style standard, Google's "cpplint" roughly mapped 44 rules. However in almost every rule there was a difference in the specification requiring hand coding. For example, global variables are checked but against a different syntax; function names are checked but against a different syntax, and so on.

Bickerton's "style.py" currently supports 50 DM Style rules out of 123 total. The missing 85 rules were annotated with the reasons they were not tested, e.g. can't test (35), too vague or not applicable (8), too tricky to test (7), no template to test failure (7), etc. The rules which were deemed "can't detect" were mostly guidance rules: write good code, use object oriented design, use 'find' in names that look-up, no 'dual-meaning' variables.

Steve's thorough review of the rules pointed out a number which need either to be made more explicit and testable or should be placed in an introductory paragraph describing the overall guidance for the rules defined in the particular subsection.

I will take these results to the TCT for discussion and final selection. In my opinion, there isn't any valid reason to choose other than "style.py" since it exactly covers the current DM Style Guide to the extent that the rules permit automation.

When we start style checking is a separate matter which should also be discussed by the TCT.

comment:2 Changed 6 years ago by robyn

From: 	Steve Bickerton <steven.bickerton@gmail.com>
Subject: 	Re: [LSST-data] TCT meeting Th 17 Jan @ noon PT
Date: 	Wed, 16 Jan 2013 11:49:16 +0900

I haven't actually been maintaining the style.py script, but it's written entirely in native python, so I don't anticipate anything being *too* broken. At the time, I think we were using gcc 4.4 so any of the new c++ features might cause trouble.

The script uses regular expressions to identify violations, but I remember that became a bit hackish when looking for style violations that occurred over multiple lines. If you're keen on using this, we'll likely want to give it a proper design review (those didn't exist when it was written) and clean it up a bit (possibly restructuring). There are likely better ways to implement what I did (open source parsers like flex/bison come to mind), and no doubt my pattern matching foo could be improved upon by a regex guru.

From: 	"Robyn Allsman" <robyn@noao.edu>
Subject: 	Re: [LSST-data] TCT meeting Th 17 Jan @ noon PT
Date: 	Wed, 16 Jan 2013 08:00:32 -0700

The beauty of your style checker is that it already works for many of the Rules. Its cost is optimal - minimal resources needed to get it up and running.

I have only used it on a couple of simple packages but I'll give afw and meas_algorithms a try to check its robustness.

A rewrite/redesign is not envisioned unless dealing with multi-line rules become a problem. In that case, I'll delve more deeply into another support library which provides the framework for multi-line analysis support. The framework described per line tokenizing and block/multi-line tokenizing as essentially separate aspects. Perhaps I should find that reference again...

comment:3 Changed 6 years ago by robyn

From: 	Jim Bosch <jbosch@astro.princeton.edu>
Sender: 	lsst-data-bounces@lsstcorp.org
Subject: 	Re: [LSST-data] TCT meeting Th 17 Jan @ noon PT
Date: 	Thu, 17 Jan 2013 11:59:59 -0500

On 01/15/2013 03:24 PM, Robyn Allsman wrote:

2) Style Checker proposal - We will lead with a discussion on whether instituting routine style checking should begin now or at later date. If we decide to start style checking now, the recommendation is to use Steve Bickerton's style.py which covers 50 of the 123 DM rules. Steve annotated each ignored rule with comments on outright impossibility, ambiguity, level of difficulty, etc. Another tool examined in detail was Google's cpplint which needs lots of work to have it conform to our style specs; it covers about 43 DM rules.

My 2 cents on this:

  • I see very little potential for gaining much value from an automated style checker. The best chance at making good use of one would require a tool that only grepped for particularly common mistakes (which might indeed be what this one is) and was only considered an optional tool to be used by code reviewers.
  • I see a potential for a great deal of time-wasting to come of this, especially if we have a parser that does not understand the full complexity of C++. We already have two non-compiler parsers that suffer from this problem (Doxygen and SWIG) that we can't get rid of, but the fact that they don't understand C++ as well as a compiler already leads to a lot of headaches. If a style checker is run in any kind of automated way I think it's almost certain we'll spend more time fending off false-positive email than fixing our code.
  • The #1 thing we could do to improve the style/standard conformance of new code is fix old code to adapt to standards we have changed/established (e.g. ::Ptr, Doxygen in headers instead of source files), and for the most part that's easy to grep for by hand (though Steve's tool may indeed be useful - I don't know enough about it to say). New developers (and old developers) write new code by looking at old code.
Note: See TracTickets for help on using tickets.