wiki:TCTCodingStandardsChanges
Last modified 9 years ago Last modified on 12/01/2009 01:23:19 PM

Proposal for Changes to the DM C++ Coding Standards

Words to Live By

In general, I'm against setting down policies that don't attempt to improve robustness or maintainability: if it truly doesn't matter, we don't need a policy. Thus, I can't vote .... until someone provides a compelling justification for how it [new rule or its modification] helps. (Ray)

References

  • C++ Coding Standards - statement of LSST DM C++ Coding Standards.
  • Coding Standards divided into 'Required' and 'Desirable' with implementation status
    • CodingStandardsRequired - rules which are required unless there is significant justification to do otherwise in the specific situation.
    • CodingStandardsDesirable - rules which are highly recommended but can be neglected for aesthetics of layout or structural consistency, etc.

New Rule 1 (namespace *::detail)

Implementation-specific globals should go in namespace *::detail

RHL

Justification

There are times that implementation-specific details need to be globally visible (i.e. can't be in the private part of a class, or be declared static or in an anon namespace in a single file).

For example, the fits i/o code in lsst::afw::image uses boost::gil internals but needs to be in a header file included by both Image.cc and Mask.cc; there are also Image traits classes.

I propose that, in keeping with the boost convention, that such global information be consigned to a *::detail namespace (in this case, lsst::afw::image::detail). We should, of course, strive to minimise the amount of such information.

Action Requested

Add this new Rule to the DM C++ Coding Standards.

Discussion

TCTCodingStandardsChanges


New Rule 2 (pre- vs post- increment/decrement)

Pre-increment and Pre-decrement should be used.

Sponsor

RHL

Justification

If you write iter++ the method is required to make a copy of iter before incrementing it, as the return value is the old value. If iter is a pointer this is cheap and probably inlined (and thus optimised away) but for complex objects it can be a significant cost. The convention for STL code is to always pre-increment, and we should follow it. (See E.g. Meyers, More Effective C++, item 6)

This is only a recommendation; there are times when you ""do"" need the old value, and in that case postfix ++ is exactly what you want.

Action Requested

Add this new Rule to the DM C++ Coding Standards.

Discussion

TCTCodingStandardsChanges


New Rule 3 (define const Ptr)

Add 'const Ptr' definition whenever defining 'Ptr'.

Sponsor

RHL

Justification

The declaration Foo::Ptr const cptr doesn't declare a (shared) pointer to const Foo, it declares a const shared pointer to mutable Foo. We need a way to refer to pointers to const objects, and this ConstPtr typedef provides it in a convenient form parallel to Ptr.

Action Requested

Add this new Rule to the DM C++ Coding Standards.

Discussion

TCTCodingStandardsChanges


Existing Rule 3-3 (named constants)

Named constants (including enumeration values) must be all uppercase using underscore to separate words.

MAX_ITERATIONS, COLOR_RED

Common practice in the C++ development community. In general, the use of such constants should be minimized. In many cases implementing the value as a method is a better choice.

int getMaxIterations() {
    return 25;
}

This form is both easier to read, and it ensures a unified interface towards class values.

Action Requested

Major clarification needed.

Discussion

TCTCodingStandardsChanges


Existing Rule 3-5 (typedefs)

A name representing a typedef must be initial letter capitalized, camel-case with no prefix of the enclosing class and no suffix of 'T' or "Type'.

typedef unsigned char Byte;
typedef unsigned long BitMask;
Byte smallMask;

This syntax is consistent with template type names and classes which are also similar in usage.

Action Requested

Allow 'typedef' alias used by a 'template typename' to end with 'T' or 'Type'

Discussion

TCTCodingStandardsChanges


Existing Rule 3-6 (namespace names)

Names representing namespaces should be all lowercase and based on component or directory name. ====

analyzer, iomanager, mainwindow

Common practice in the C++ development community.

Action Requested

Replace existing Rule with following entry.

Names representing namespaces will be camelCase starting with lowercase and based on component or directory name. The original package developer will specify in the .cc file the preferred abbreviation to use and, optionally, also use it throughout his code.

The original developer may consider using the following guideline to fabricate the name

  • remove the preliminary 'lsst'
  • concatenate the remaining fields
  • if desired to make shorter, abbreviate each field while still maintaining a relevant word
    namespace pexLog = lsst::pex::logging;
    namespace afwMath = lsst::afw::math;
    

Two options are available for using a namespace when defining symbols

  • Specify the namespace explicitly in the definition.
    lsst::foo::bar::myFunction(...) {...};
    
  • Use an abbreviation for the namespace.
    namespace fooBar lsst::foo::bar;
    fooBar::myFunction(...) {...}
    
  • Putting the definitions into a namespace block is not recommended
     namespace lsst{ 
     namespace foo{ 
     namespace bar{      // NO
        myFunction(...) {...};
     }}} // lsst::foo::bar
    

Sponsor

Russell

Justification

Most DM code already defines namespace aliases using camelCase with lowercase leading character. And, as Russell pointed out, on May 7 2008 the TCT agreed to alter this Rule.

Discussion

TCTCodingStandardsChanges


Existing Rule 3-9 (globals)

Global variables should be avoided and if used always be referred to using the "::" operator.

Example: ::mainWIndow.open(), ::applicationContxt.getName()

In general, the use of global variables should be avoided. Consider using singleton objects instead. Only use where required (i.e. reusing a framework that requires it.) See Rule 5-7.

Existing Rule 5-7 Use of global variables should be minimized. In C++ there is no reason that global variables need to be used at all. The same is true for global functions or file scope (static) variables. See also Rule 3-9.

Action Requested

Clarification of intent. Possibly delete second part of rule: "and if used always be referred to using the "::" operator".

Discussion

TCTCodingStandardsChanges


Existing Rule 3-10 (private class variables/methods)

Private class variables must have underscore prefix.

class SomeClass
{
private:
    int  _length;
}

Apart from its name and its type, the scope of a variable is its most important feature. Indicating class scope by using underscore makes it easy to distinguish class variables from local scratch variables. This is important because class variables are considered to have higher significance than method variables, and should be treated with special care by the programmer.A side effect of the underscore naming convention is that it nicely resolves the problem of finding reasonable variable names for setter methods and constructors:

void setDepth(int depth){
    _depth = depth;
}

An issue is whether the underscore should be added as a prefix or as a suffix. Both practices are commonly used. Since LSST Data Management uses both C++ and Python as implementation languages, prefixing the underscore is recommended in order to maintain conformity with Python's naming convention where variables with leading underscore are treated specially. Care must be given to avoid using a reserved name.

It should be noted that scope identification in variables has been a controversial issue for quite some time. It seems, though, that this practice now is gaining acceptance and that it is becoming more and more common as a convention in the professional development community.

Action Requested

Either require or allow '_' prefix on private class methods.

Discussion

TCTCodingStandardsChanges


Existing Rule 3-24 (boolean naming)

The prefix 'is' should be used for boolean variables and methods. ====

isSet, isVisible, isFinished, isFound, isOpen

Common practice in the C++ development community and partially enforced in Java.

Using the 'is' prefix solves a common problem of choosing bad boolean names like 'status' or 'flag' . 'isStatus' or 'isFlag' simply doesn't fit, and the programmer is forced to choose more meaningful names.

There are a few alternatives to the is prefix that fit better in some situations. These are the 'has', 'can' and 'should' prefixes:

bool hasLicense();
bool canEvaluate();
bool shouldSort();

Action Requested

Add another prefix, 'do'.

Discussion

TCTCodingStandardsChanges


Existing Rule 3-29 (enum constants)

3-29. Enumeration constants can be prefixed by a common type name. ¶

enum Color {
    RED,
    GREEN,
    BLUE
};
favoriteColor = Color::RED;

This gives additional information of where the declaration can be found, which constants belongs together, and what concept the constants represent.

Action Requested

Change enum variable name to 'color' to conform to Rule 3-2.

Discussion

TCTCodingStandardsChanges


Existing Rule 4-5 (inline functions)

Inline functions are prohibited except for simple accessors/mutators that get or set a simple attribute (see example).

All others must be approved by the DM Project Scientist, and will only be approved where there is no other way to meet performance requirements.

If used, inline functions must be very simple, with bodies of 10 lines or less.

#ifndef LSST_FOO_H
#define LSST_FOO_H

class Foo {
public:
    Foo();
    virtual ~Foo();
    int getValue() const { return value; };
    int getAnotherValue() const;

private:
    int value;
    int anotherValue;
};

#endif // LSST_FOO_H

When choosing whether to inline, think about balancing compile-time and run-time performance. Be careful to avoid requiring inclusion of additional .h files (use forward declaration if needed).

Action Requested

Allow inline block to be empty, '{}'.

Discussion

TCTCodingStandardsChanges


Existing Rule 4-6 (max line length)

File content must be kept within 110 columns. The restriction to 80 columns is no longer as much a consideration as a common dimension for editors, terminal emulators, printers and debuggers, and so on. However, even with multi-window environments and current displays it is often the useful to have multiple source windows open side by side, and limiting the number of characters facilitates this. It improves readability when unintentional line breaks are avoided when passing a file between programmers.

Action Requested

Change max line length from 110 to 120

Discussion

TCTCodingStandardsChanges


Existing Rule 5-10 (const location)

The 'const' keyword should be listed after the type name.

void f1(Widget const *v)     // NOT: void f1(const Widget *v)

This is for a mutable pointer to an immutable Widget. Stroustrup points out one advantage to this order: you can read it from right to left i.e. "v is a pointer to a const Widget". Of course this is different than:

Widget * const p

Which is an immutable pointer to a mutable Widget. Again, the right-to-left reading is pretty clear, so this and the above reinforce each other.

Action Requested

Should it be altered or left unchanged.

Discussion

TCTCodingStandardsChanges


Existing Rule 5-12 (testing floats)

Floats and doubles should never be directly test for equality.

if (value == 1.0)                //Subject to round-off error
if (fabs(value - 1.0) < epsilon) //OK

Round-off makes it difficult for two floating point numbers to be truly equal. Always use greater than or less than. A utility method like 'boolean closeEnough(value1,value2)' would be useful.

Action Requested

Clarification, possibly.

Discussion

TCTCodingStandardsChanges


Existing Rule 5-14 (loop control statement)

Only loop control statements must be included in the 'for()' construction. Use of post-increment and post-decrement is preferred but not required. Loop variables should be declared in loop scope.

// YES:
int sum = 1;
for (int i = 0; i < 100; i++)
    sum += value[i];

//NO:
for (int i = 0, int sum=0; i < 100; i++)
    sum += value[i];

Increase maintainability and readability. Make it crystal clear what controls the loop and what the loop contains.

Action Requested

Clarification:

  • Prefer use of pre-increment and pre-decrement
  • Strike "Only loop control statements must be included in the for() construction".

If you write ++iter the method is required to make a copy of iter before incrementing it, as the return value is the old value. If iter is a pointer this is cheap and probably inlined (and thus optimised away) but for complex objects it can be a significant cost. The convention for STL code is to always pre-increment, and we should follow it. (See E.g. Meyers, More Effective C++, item 6)

Discussion

TCTCodingStandardsChanges


Existing Rule: 5-30 (magic numbers)

The use of magic numbers in the code must be avoided. Numbers other than 0 and 1 should be declared as named constants. See also Rule 3-3.

If the number does not have a obvious meaning by itself, the readability is enhanced by introducing a named constant instead. A different approach is to introduce a method from which the constant can be accessed.

Action Requested

Change wording from 'must be avoided' to 'should be avoided'.

Discussion

TCTCodingStandardsChanges


Existing Rule: 6-5 (block format for class definition)

The class declarations should have the following form:

class SomeClass : public BaseClass {
public:
    ...
protected:
    ...
private:
    ...
}

This follows partly from the general block rule [ 6-4 ] below. The declarations 'public', 'protected' and 'private' are left-justified.

[ Rule 6-4 ]

Block layout should be as illustrated in example 1 below (K&R, strongly recommended) not as in example 2 or 3.

1

while (!done) {
    doSomething();
    done = moreToDo();
}

2

while (!done)
{
    doSomething();
    done = moreToDo();
}

3

while (!done)
    {
      doSomething();
      done = moreToDo();
    }

Example 3 introduces an extra indentation level which doesn't emphasize the logical structure of the code as clearly as example 1. Example 2 adds an additional line without significant increase in readability.

Action Requested

Allowing (or possibly requiring) the opening brace for functions and classes to be on the next line rather than the end of the line.

Discussion

TCTCodingStandardsChanges


Existing Rule: 6-6 (block format for functions)

The function declarations should have the following form:

void someMethod() {
    ...
}

This follows from the general block rule [ 6-4 ] above. ===

Action Requested

Allowing (or possibly requiring) the opening brace for functions and classes to be on the next line rather than the end of the line.

Discussion

TCTCodingStandardsChanges


Discussion

New Rule 1

Russell Owen

I agree. One use case is unit-testing functions that are intended for internal use. (Though I'm not sure whether such functions were intended to be covered by the term "globals").


New Rule 2

Ray

I vaguely remember reading a suggestion that pre was better when it didn't matter; however, it didn't strike me so compelling a reason at the time.

In general, I'm against setting down policies that don't attempt to improve robustness or maintainability: if it truely doesn't matter, we don't need a policy. Thus, I can't vote for this until someone provides a compelling justification for how it helps.

Russell Owen

I agree with RHL. As Josuttis says for one of his STL iterator examples: "Note that the preincrement operator (prefix ++) is used here. This is because it might have better performance than the postincrement operator. The latter involves a temporary object because it must return the old position of the iterator. For this reason, it generally is best to prefer ++pos over pos++."

It is much easier to keep track of one way of doing things than trying to have a separate rule for plain pointers vs. iterators and other complex classes, so I agree with RHL that the prefix version should always be preferred as long as the postfix is not needed (i.e. in some clever expression where it is used in the right hand side of an expression).

Mike Jarvis

If we do have this kind of rule, it seems to me that it should only ever apply to places where the return value is not used. If the return value is used, presumably the programmer understood which one was needed.

Ray Plante

Recommended wording:

When incrementing or decrementing a variable with ++ or --, the prefix operator (e.g.++i) should be preferred whenever the result of the operation is not saved.


New Rule 3

KTL

I'm having second thoughts about the Ptr typedef, let alone the ConstPtr typedef. The problem is that using Foo::Ptrs in other modules requires that the providing Foo class's Foo.h file be included, since that is where the typedef is defined. Unless the providing class is a strict pimpl-style interface, this inclusion creates an often-unnecessary dependency on the implementation details of the Foo class. Using boost::shared_ptr<Foo>, while a lot more wordy, does not require inclusion of Foo.h and thus avoids this dependency.


3-3

KTL

You should also include the equivalent "<type> const <name>(<value>)" or disallow it. I am a little reluctant to enforce this in cases where <type> is other than a numeric type, std::string, or char const*. The purpose of the all-caps is to distinguish selectors and parameter constants from other variables, but if <type> is something more complex (e.g. boost::regex), the possibility of confusion seems greater.

Bick

I have to disagree with rule defining

the usage of 'const' (3-3). Or, at least I disagree that all uses of 'const' should be considered named constants. My understanding (outlined in Meyers) is that variables should always be declared const when possible. Rule 3-3 appears to discourage their use. I tend to think of named constants much like C #define statements. I'm happy to use all-caps for any variable resembling a C #define, but for garden variety 'const' variables I'd prefer to avoid such a style convention.

Now, I'm not actually all that worried about this, and if the decision is to go ahead with all-caps, that's fine. But, I do think the wording of the rule should then be changed to remove the statement saying that "the use of such constants should be minimized". To the best of my knowledge, that contradicts the current c++ coding practices.

Bick

### Literal constant '2' is used (Bick)

The line in question is "int const xsig = 2;". I suspect parasoft has misunderstood something. If this isn't permitted, it would seem we can't use actual numbers anywhere at all.

### Constant variable's name 'wid' should be uppercase (Bick)

Should it be all-caps (WID), or first-letter caps (Wid) ? I tend to thing of all-caps as denoting #define'd preprocessor quantities.

KTL

I think the goals of this rule and 5-30 (q.v.) are:

  • Always use named constants in place of numbers.
  • Do not use #define preprocessor directives to create constants.
  • Use int const etc. instead.
  • Keep naming things as if they were traditional #defines.

The confusion comes when developers add const to ordinary variables that are not intended to be constants in the #define sense. I agree with Russell that these should probably not be all caps (or first-letter caps).


3-5

RHL & Bick

In:

template<typename PixelT> class SetPcaImageVisitor? : public afwMath::CandidateVisitor? {

typedef afwImage::Image<PixelT> ImageT; typedef afwImage::MaskedImage<PixelT> MaskedImageT;

I'm getting complaints about the ImageT typedefs. They seem reasonable to me; they are basically the same as the template argument, so calling them things like ImageT helps the reader.

It was my understanding that 'T' and 'Type' suffixes are acceptible for templates but not typedefs. If that's correct, do we want/need a different nomenclature spec to distinguish the two?

Russell Owen

I am in favor of the current rules. These seems to treat a typedef as a class, rather than a template parameter, which I find very readable. However, it does seem that at least sometimes the template parameter name is just another class, so perhaps we should allow template parameter names to be like class names (no trailing T). We could even consider requiring it, but before going that far I'd want to think carefully about the different kinds of template parameters that we use.


3-6

Russell Owen

Names such as afwMath are a standard accepted by the TCT based on my own proposal for standardizing namespace abbreviations. At the time I had no idea that it violated a C++ coding guideline (and, apparently, neither did the TCT). I suggest we simply replace the current rule with the proposal that was already accepted.


3-9

RHL

Whenever a global function is referenced, use the :: operator

Do we really want to require ::sqrt() and ::cos()?

Fergal

Don't we already have an exception for the std:: namespace?

KTL

If we #include <cmath> instead of #include <math.h> (and equivalently for cstdlib, ctime, etc.), then I believe all of the functions that RHL is concerned about are imported into std:: and the exception Fergal mentions applies. I would recommend keeping 3-9 and adding a recommendation to use the "c" versions of the legacy includes.

Mike Jarvis

I agree that cmath (et al) should be recommended over the legacy versions, but I want to caution about making them required. One case I came across was trying to find a portable way to implement tests for nan. I had trouble when using cmath, but was able to find a solution with math.h.

Russell Owen

I'm nervous about allowing the use of math.h instead of cmath to work around limitations in cmath because it seems unlikely to be sufficiently portable.

3-10

RHL

Regarding:

85:   The incorrect member function name _registry was found  LsstDm-3-4b-3

It's a private method; shouldn't I be able to use the same convention as private variables?


3-24

RHL

Regarding:

779:  Boolean type variable '_debias' shall begin with 'is'.   LsstDm-3-24a-5
bool _debias isn't an "is", it's a way of specifying behaviour.  So I
think that this rule needs softening.

[RAA] Here the output error message was wrong and I updated it after your code check. The Rule, as currently stated and implmented allows: 'is', 'can', 'has', 'should' as prefix. In this case, the private variable might be "_shouldDebias" or "_hasDebias".

Bick

It seems 'debias' represents something to be done. Would a 'do' prefix be appropriate ... ie. 'doDebias' (or '_doDebias' in this case) ?

Mike Jarvis

A 'do' prefix doesn't sound like a boolean to me. It sounds like a function. Also, it seems to me that any boolean the involves a do concept is a bit ambiguous about what it means. Is it: 'shouldDoSomething', 'canDoSomething', or maybe even 'hasDoneSomething'? So I recommend keeping the current rule and putting it in one of these forms.

If we want to make the rule include more prefixes, I think we should keep to the pattern of starting with a third person present tense verb. Not imperative.

Daniel Wang

"do" sounds like a function to me too (some C++ pedagogy agrees with this). I'd prefer using "has" or "should" in the example case. If those don't work, how about "enable" or "needs"?

Russell Owen

I like "do"; I think it makes a very imperative (meaning must do). But I would be OK with "must" instead. None of the other proposed replacements is as clear to me and most are already handled: "shouldDo" is already nicely handled by "should". "canDo" is already nicely handled by "can". "hasDone" could be more clearly expressed by "did" -- which should be on our list!

To be explicit: I propose adding "do" (with a fallback to "must") and "did" to our list.

Regarding confusing it with a function: I do not see this as a serious concern. If you know the type there is no possibility of mistake. Also our standard suggests "compute" as a prefix for a function name rather than "do".

KTL

I didn't realize initially that this rule applies to methods as well. What about methods like "exists()" or "contains()" which seem to be clear boolean state verbs? Rewriting these to start with one of the suggested verbs seems excessive. ("isPresent()"?)


3-29

Robyn

The 'enum' example conflicts with Rule 3-2.

Mike Jarvis

This example seems to be the right one, and 3-2 seems wrong to me. I think of enums as types. And types are supposed to start with a capital letter.

So it seems to me that the example in 3-2 should be changed to:

enum Account { SAVINGS, CHECKING };
Account savingsAccount;

or perhaps what was meant was an anonymous enum:

enum { SAVINGS, CHECKING } account;

4-5

RHL

I object to this one. The code in question is:

explicit IdSpan(int id, int y, int x0, int x1) : id(id), y(y), x0(x0),
x1(x1) {}

a trivial inline constructor; it's simpler to just put it inline. If I moved it, it'd move the member initialisation away from the declarations.

Mike Jarvis

I'd like to see the no-approval-needed case expanded to allow all functions that consist of at most a single (non-looping) statement. Most such functions will be accessors and mutators, but there are other possibilities as well.


4-6

KTL

I prefer 80 :-). What is the justification for why 120 is preferable to 110 or 140 or 160?

Ray

Again, what is the reason lengthening this? Lengthening the limit is presumably to benefit the author. However, since the main justification for a limit (it seems to me) is ensure readability for common methods for viewing the file, we need to favor the viewer.

As someone who also prefers 80 characters and utilizes several side-by-side windows while coding, I'm not eager to lengthen this.

(BTW, the other possible justification is to discourage lengthy lines that attempt to do too much and should be broken up for better readability.)

Bick

Bick inquired why max line length was chosen as 110. He preferred 120.

Mike Jarvis

I also prefer 80. Any chance we could vote on this?

Daniel Wang

Actually, I would prefer 78, but will accept 80. To me, longer lines enable two undesirable things: (1) lines that do too many things and (2) too many levels of indentation.

srp

I can fit a couple of windows, one on top of the other, and expand out well beyond 80. If it's the difference between having them side by side or one on top of another, I'm not sure that's a big enough justification no to have it bigger than 80, given the screen real estate we have on most systems.

Other rules we're setting forth, like understandable and sometimes lengthy method and variable names (which I'm not arguing against) will require that some code that traditionally go on one line (like for loops, among others) be split across two (more more) lines (see the example in discussion 5-14). That's especially true if you indent, even a little (see second example in 5-30). Limiting it to 80 directly impacts the readability of the code because it'll be constantly split on multiple lines.

And that's just from two things on this page. I imagine there are numerous examples in the code base.

I advocate 110.

Russell Owen

I like 120 but am willing to live with 110. I would be very sorry to see 80; c++ especially is just too verbose with its huge namespaces and long type names for 80 character lines to be efficient and readable.


5-10

Mike Jarvis

I've been working on converting my code over to the LSST standard, and there is one rule that I find really irksome, so I was wondering if it would be possible to reopen discussion on it. I know I'm a late arrival, so I don't want to do so if it's already been hashed to death. But here are my thoughts on the matter, and just let me know if this is out of line.

The rule is 5-10: The 'const' keyword should be listed after the type name.

The justification given in the standards document is pretty weak, in my opinion:

Stroustrup points out one advantage to this order: you can read it from right to left i.e. "v is a pointer to a const Widget".

This is really only an issue if you are going to do complicated things like "Widget*const* p", so p is a mutable pointer to a const pointer to a non-const Widget. But if you are writing code like this, it's probably time to introduce a typedef or two.

If you only have one *, then putting the const first is always clear. Writing "const Widget* v" is not less clear than "Widget const* v". In fact, if anything, the spot between Widget and * is the _only_ ambiguous place to put the const. Before Widget obviously applies const to the Widget, and after the * obviously applies the const to the pointer. Correctly interpreting "Widget const* v" actually requires a bit of thought, especially if there is no space between the const and the *.

The main place where I think putting the const first produces substantially more readable code is in parameter listings of functions. Having the const show up at the front of the line makes it very quick to spot which parameters are output parameters, and which are input. Consider the two versions of the following function declaration:

template<typename Function, typename Other>
void myComplicatedFunction(
        SomeUserDefineClass* theOutput,
        Other& someObjectToModify,
        typename Function::argument_type const& arg,
        SomeOtherClass const* extraOptionalInputInfo=0);

template<typename Function, typename Other>
void myComplicatedFunction(
        SomeUserDefineClass* theOutput,
        Other& someObjectToModify,
        const typename Function::argument_type& arg,
        const SomeOtherClass* extraOptionalInputInfo=0);

The second version makes it much more obvious which are the input parameters, and which are the output parameters. The const's in the first version kind of get lost in the text, and it takes a bit of time to look and find them. But in the second case, they are immediately apparent.

I know this is kind of an arbitrary choice either way, but as far as I can tell, the current LSST choice is very non-standard in the c++ coding world. When I search for "c++ style guide" in Google, none of the first 100 results agrees with this recommendation, and that includes Bjarne himself, who was given as an authority on why to go this way. Here is the section from his style guide:

Should I put "const" before or after the type?

I put it before, but that's a matter of taste. "const T" and "T const" were - and are - (both) allowed and equivalent. For example:

        const int a = 1;        // ok
        int const b = 2;        // also ok

My guess is that using the first version will confuse fewer programmers (is more idiomatic).

The Google style guide also has a discussion on the issue, which seems quite reasonable to me:

Where to put the const

Some people favor the form int const *foo to const int* foo. They argue that this is more readable because it's more consistent: it keeps the rule that const always follows the object it's describing. However, this consistency argument doesn't apply in this case, because the "don't go crazy" dictum eliminates most of the uses you'd have to be consistent with. Putting the const first is arguably more readable, since it follows English in putting the "adjective" (const) before the "noun" (int).

That said, while we encourage putting const first, we do not require it. But be consistent with the code around you!

So, do you think there is any chance we could loosen requirement on putting the const after the type, and maybe allow either version as Google does?

ROwen

My experience is that this is one of our most widely ignored coding standards. So I think this rule is ripe for review.

Personally I have mixed feelings. On the one hand, I find it easier to read complex declarations that follow our standard. On the other hand, like you, I much prefer having const first for magic numbers and input-only function arguments.

KTL

I happen to really like the consistency of right-to-left.

Having the const show up at the front of the line makes it very quick to spot which parameters are output parameters, and which are input.

Simple fix: never have output parameters [1/2 :-)].

Putting the const first is arguably more readable, since it follows English in putting the "adjective" (const) before the "noun" (int).

Perhaps post-const is more natural for those of us who speak non-English languages (which may often have post-adjectives).

So, do you think there is any chance we could loosen requirement on putting the const after the type, and maybe allow either version as Google does?

I think that would be the worst of both worlds. Your Google quote included this:

But be consistent with the code around you!

Mike Jarvis

So, do you think there is any chance we could loosen requirement on putting the const after the type, and maybe allow either version as Google does?

I think that would be the worst of both worlds. Your Google quote included this:

But be consistent with the code around you!

This is already not happening in the LSST stack. There are lots of cases of prefix const declarations scattered around the code. I'd be happy to just have each file be consistent with itself, rather than have both styles mixed together. Then in the files that I primarily work on, I could have the const first, and be happy. And you could have const second in the files that you are primarily responsible for and also be happy.

RHL

I have some sympathy with Mike on this one; it certainly grated on many years of C experience. However, I do believe in the value of consistency.

We are going to have to face this in other places. For example, the illustrious DM TCT has decided to permit function definitions in the form

type funcname()
{
     ...
}

(and similarly for classes). They are not going to mandate that we go through all our current code changing things.

So one option would be to require that each source file (or package?) choose a particular style and stick to it. Would this be an acceptable solution to all parties? Note that this would still require you to follow the local rule when working on other people's code. I would NOT be in favour of extending this latitude to e.g. non- K&R if/else blocks, but somehow function definitions (and variable definitions) seem less bothersome.

KTL

So one option would be to require that each source file (or package?) choose a particular style and stick to it. Would this be an acceptable solution to all parties?

If we're going to have coding standards at all, I really think we should go to the trouble of enforcing them everywhere. If we have two options, there should be a clear specification as to when to use each one (for the brace-on-the-next-line, it's when the previous line would b too long).

If you're saying we should drop some of our coding standards altogether, that could be a possibility, but I'm worried that switching back and forth between two (or more) styles may lead to additional bugs.

KTL

This [consistent use of const] is already not happening in the LSST stack.

Yup, and the automated checker should be able to catch these. (Eventually, if not now.)

Then in the files that I primarily work on, I could have the const first, and be happy. And you could have const second in the files that you are primarily responsible for and also be happy.

Until I have to work with your code or vice versa...

P.S. I think two of your claims contradict each other:

I happen to really like the consistency of right-to-left.

Perhaps post-const is more natural for those of us who speak non-English languages (which may often have post-adjectives).

Do you read "char const*" from right to left or left to right (with a post-adjective, but also a post-"pointer to")? :)

I read it aloud from left to right ("kair const star"), but understand it from right to left ("pointer to const character"), just like "la maison verte de ma soeur" (my sister's green house).

5-12

RHL

Usually good advice, but sometimes 0.0 is special and a test against 0.0 is reasonable, e.g.

    if (b == 0.0 && sigma2 == 0.0) {
        _sigma2 = 1.0;                  // avoid 0/0 at centre of PSF
    }

The value sigma2 isn't used (that component is multiplied by b, i.e. 0.0) but 0*NaN == NaN.

Yes, I can write fabs(b) <= 0.0 which will make parasoft happy, but it isn't really what I mean.

RAA

Then decide how you want this handled. The Standards could be updated to allow the 0.0 as a special case. Otherwise you will be notified of the error and you may take the appropriate evasive action: magic cookie or recode.

Mike Jarvis

I agree that 0.0 seems like it should be a specific exception. Another use case is setting a variable to an error value, and then testing if it has been changed:

double x = -1.0;
while (...) {
  // code that might set x to a positive value, or might break out.
}
if (x == -1.0) {  // deal with error }

There are obviously easy workarounds, such as introducing another (boolean) variable, but this kind of code seems reasonable to me, so it's at least worth considering allowing it.

Russell Owen

I think 0.0 could be allowed silently. If it is not then I would like a comment (or other bit of non-code) that would silence the warning. I'm not keen to put in weird bits of real code just to make parasoft happy.


5-14

Bick

### Variable 'i_x' used in third expression is not initialized in first expression of a 'for' statement

Here's the code in question (it seems to be complaining about 'int i_x = 0;':

       for (int i_y = 0; i_y != img.getHeight(); ++i_y) {
           int i_x = 0;
           for (ImageF::x_iterator ip = img.row_begin(i_y); ip != img.row_end(i_y); ++ip, ++i_x) {

I need to keep track of the index in this iteration, and what I've done is (I believe) a standard C/C++ approach. In fact, I know of no other way to do it.

Russell

Variable 'yRev' initialized in third expression is not tested in second expression of a 'for' statement LsstDm?-5-14-2

This is a standard paradigm for variables not used outside the loop. If we have a guideline against this, we should reconsider it.

AndyB

This

30:     Variable 'idx' initialized in first expression is neither increment
nor decrement in third expression of a 'for' statement          LsstDm-5-14-2
30:     Variable 'idx' initialized in third expression is not tested in
second expression of a 'for' statement  LsstDm-5-14-2

is an unnecessary rule.

Bick

I've been bumping into the lsst style rule on loop control variables (5-14), and I just discussed it with Robert. Apparently I missed a phonecon recently when it was discussed, and I'm curious to get some feedback on it.

There are two common violations I've encountered ...

(1) The 'end()' is assigned to a local variable:

for (foo_iter ptr = somefoo.begin(), end = somefoo.end(); ptr != end; ++ptr) { stuff }

(2) A row coordinate is being iterated over and I have to increment the pixel coordinate that's needed in the loop:

int iX = 0; for (typename image::Image<PixelT>::x_iterator ptr = img->row_begin(iY); ptr != ptr + img->getWidth(); ++ptr, ++iX) { stuff }

For case (1), I could define "foo_iter end = somefoo.end();" outside the loop, but this changes the scope of the variable. Alternatively, I could avoid assigning "end" and test "ptr != somefoo.end()". I have no strong feelings about this one; I'd just like to know what people think about it.

For case (2), the alternative is to increment the coordinate within the body of the loop. My feeling is that it's more natural to do the increment within the for() statement (ie. as shown), as the coordinate and the iterator are really representations of the same thing.

[RHL]

For case (1), I could define "foo_iter end = somefoo.end();" outside the loop, but this changes the scope of the variable. Alternatively, I could avoid assigning "end" and test "ptr != somefoo.end()". I have no strong feelings about this one; I'd just like to know what people think about it.

Not assigning to end is a significant performance hit, so we're left with 1. I'd also note that (until the advent of auto in the next C++ standard) the definition of end outside the loop requires a copy of an often-messy type.

[KTL]

SteveB wrote:

Alternatively, I could avoid assigning "end" and test "ptr != somefoo.end()".

Robert wrote:

Not assigning to end is a significant performance hit,

Not if end() is inline and implemented in an intelligent fashion (e.g. with caching). Not all classes are that smart, though.

Also, note that it is possible for the sequence to be designed such that the assigned end iterator could be invalidated while the == .end() test still works.

[Mike]

For case (1), I could define "foo_iter end = somefoo.end();" outside the loop, but this changes the scope of the variable. Alternatively, I could avoid assigning "end" and test "ptr != somefoo.end()". I have no strong feelings about this one; I'd just like to know what people think about it.

Not assigning to end is a significant performance hit, so we're left with 1.

Actually, I don't think I've ever seen a performance hit from this as long as you compile with -O2 (maybe even just -O). Almost all end() functions are simple accessor methods that are implemented inline, so it is equivalent to comparing against a variable.

Plus, caring about this kind of thing is almost always premature optimization. There are very few loops where the while test is a significant fraction of the time. (And where that loop is an important one to optimize.) And these will usually be cases where you want to go with a do...while construct instead anyway (since it usually compiles to faster code than either for or while loops).

But all that said, I agree with Steve that both of his cases seem like they should be legal, so I would advocate loosening the standard here.

[KTL]

int iX = 0; for (typename image::Image<PixelT>::x_iterator ptr = img->row_begin(iY); ptr != ptr + img->getWidth(); ++ptr, ++iX) { stuff }

This one I don't really agree with. The iX definition is

already outside the loop; why can't its increment be at the end of the loop body? I'd only feel the need to have the increment in the loop statement if the variable appeared in the condition as well. Then it would really be a ptr *and* iX loop instead of a ptr loop that happens to increment iX.

[Russell]

For case (1) I personally feel we should allow the notation. It is sometimes useful paranoia. As you say, the alternative of declaring the variable before the for loop is less satisfactory because it has excessive scope; I also find it less clear.

*In the case of MaskedImages? we saw a definite performance increase at once time -- whether it still holds for our current afw compiled with C ++ 4.2 or later, I can't say -- but I'd rather be safe than sorry. I would prefer not to have to prove that it is not needed for every kind of iterator we use and every compiler!!!

[For case 2] I agree with you. In my opinion it is good style when it is used to indicate that several related variables are incrementing in lock step (e.g. to walk across corresponding pixels of different images). Incrementing lock-step variables in different places invites future code bugs -- e.g. not increment one variable, or incrementing it in the wrong place due to code changes.

[AndyB]

I would agree that both are reasonable, understandable things to do and should be allowed in the standards.

[Gregory DF]

[For Case 2] I'm on this side, too, for exactly that reason: it illuminates the parallel stepping most clearly. (It's also a coding style that's got the weight of tradition behind it going back to the early days of C. Probably the value of that -- being recognizable -- is less nowadays when most people don't learn to code from AT&T books from the 1970s...)

This is a very narrow exception to the rule, though.

[For Case 1]

Robert wrote:

Not assigning to end is a significant performance hit,

Not if end() is inline and implemented in an intelligent fashion (e.g. with caching). Not all classes are that smart, though.

Much like Russell, I've also seen a number of instances where repeated calls to end() did produce a meaningful performance hit.

Also, note that it is possible for the sequence to be designed such that the assigned end iterator could be invalidated while the == .end() test still works.

In a non-modifying pass over the sequence? That would be a pretty unusual container. But it's possible (maybe if there is internal rebalancing or caching going on as a complex container is processed).

Caching the value of end() before a *modifying* algorithm pass over a sequence is of course not in general a good idea anyway, and probably falls under a "stamp out excess cleverness" rule even if you happen to have devised a situation in which it produces a desirable result.


5-30

KTL

Note that the examples in 5-31 and 5-32 violate the original statement, which is 5-30. There is no doubt that requiring naming of any number other than 0 or 1 is too strict. For example, indices into small, fixed length vectors/arrays such as x,y,z coordinates or regexp matches are more easily read as numbers. Expressing

nanosecs = static_cast<uint64_t>(secs * 1.0e9 + 0.5);

as

nanosecs = static_cast<uint64_t>(secs * FLOAT_BILLION + ROUND_OFFSET);

does not seem to be a win. At the same time the general principle is a good one. Without a clear line or more guidance as to when to choose which form, though, this seems like a recipe for confusion and disarray.

6-5

6-6

MikeJ

There is only one LSST Standard indenting rule that I couldn't figure out how to get right. And that is when we have an opening brace after either a list of multiple inheritances, or a constructor initialization list. Basically, vim simply doesn't handle the colon correctly.

Here is an example from the LSST code:

/// Create an %image of the specified size
template<typename PixelT>
image::DecoratedImage<PixelT>::DecoratedImage(const int width, ///<desired number of columns
                                              const int height ///<desired number of rows
                                             ) :
    lsst::daf::data::LsstBase(typeid(this)),
    _image(new Image<PixelT>(width, height))
{
    init();
}

The standard says this should be:

/// Create an %image of the specified size
template<typename PixelT>
image::DecoratedImage<PixelT>::DecoratedImage(const int width, ///<desired number of columns
                                              const int height ///<desired number of rows
                                             ) :
    lsst::daf::data::LsstBase(typeid(this)),
    _image(new Image<PixelT>(width, height)) {
    init();
}

But I haven't figured out how to get vim to indent like this. It indents both the function code and the closing } an extra 4 spaces. In addition, I think the first format is far easier to read than the second. It emphasizes where the initializations end and the function code begins. And the difference is even more pronounced with more complicated initializations.

So I am proposing that the standard allow for functions to have the opening brace on its own line, at least for constructors with an initialization list. Personally, I think it's always clearer that way, but I'm willing to conform to the current standard for other functions.

From my perusal of the current code, it seems like a lot of people agree with me, since most of the constructors that have initialization lists are written with the opening brace on its own line.

Also, the same problem happens for classes with multiple inheritances (specifically when they take more than one line). Vim has trouble getting the indent right unless the opening brace for the class is on its own line, so I'd like to allow that to be legal as well.

I think my indent function conforms to everything else in the LSST standard. I tried re-indenting a bunch of files (with gg=G), and looking at the differences. It seemed like the only other differences compared to what people had done were things for which there is no specification in the standards document. So I made some choices in these cases, which can of course be changed. (e.g. where should a closing parenthesis go when it starts a new line?)

KTL

As for your function/class opening braces always being at the

beginning of a line, I prefer that style, too. (Just as long as you don't do it for control blocks like if/while/for.) Maybe we can lobby the TCT to get the standard changed...

RHL

As for your function/class opening braces always being at the

beginning of a line, I prefer that style, too. (Just as long as you don't do it for control blocks like if/while/for.) Maybe we can lobby the TCT to get the standard changed...

I have no problem with this. I had to work quite hard in the elisp to handle this properly with esc-A

Incidently, KT, if you're still worried about editor string proliferation (which I am not), I could add code to the elisp that recognised some generic LSST string and switched on that, then all editors could use the same ID. Now, I'd propose that that string be

-*- lsst-c++ -*-

but I'm flexible.

ROwen

I agree that allowing the opening brace on a new line is sensible for functions and constructors. I find it often leads to more readable code.

I would like to also suggest that we explicitly allow (if not require) function arguments to be indented one level in from start of the the function declaration, for example:

/// Create an %image of the specified size
template<typename PixelT>
image::DecoratedImage<PixelT>::DecoratedImage(
    const int width, ///< desired number of columns
    const int height ///< desired number of rows
) :
    lsst::daf::data::LsstBase(typeid(this)),
    _image(new Image<PixelT>(width, height))
{
    init();
}

Advantages:

  • It leaves much more room for the documentation string
  • For very long function declarations there is almost no room at the right for even the variable names
  • The code is near the left margin, making it easier to read (in my opinion).

Also note now the different parts of the constructor (argument list, initializer list and code) are neatly separated by punctuation at the left margin.

I personally find this much more readable.

KTL

I would like to also suggest that we explicitly allow (if not require) function arguments to be indented one level in from start of the the function declaration, for example:

/ Create an %image of the specified size template<typename PixelT> image::DecoratedImage?<PixelT>::DecoratedImage?(

const int width, /< desired number of columns const int height /< desired number of rows

) :

' I don't disagree.