LSST Python Coding Standard

DRAFT Revision 2 of Python Coding Standards

Following is revision 2 of the LSST Python Coding Standards edited

  • to highlight the rules and
  • to clarify naming conventions.

This document needs to be discussed by the user community and sanctioned by the DM TCT.

Access Python Coding Standard Revision 1 here

If you want to comment on a Rule in advance of the TCT review, please add them to section USER COMMENTS. The comments will be excised when the document is ratified.

RAllsman 28 Oct 2009

Introduction

This document gives coding conventions for LSST Python code. It is a slightly modified version of  Python PEP 8: Style Guide for Python Code by Guido van Rossum and Barry Warsaw. The section on Naming Conventions was extracted from  Python Style Guide for Babar which is also based on the Python PEP 8. This document includes changes for consistency with the LSST C++ coding standard plus some additions and a few changes based the author's Python and C++ experiences.

A Foolish Consistency is the Hobgoblin of Little Minds

One of Guido's key insights is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of LSST code.

A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important.

But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!

Two good reasons to break a particular rule:

  • When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.
  • To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess.

Code Layout

Indentation

  • Use 4 spaces per indentation level.
    • This width provides a good balance between readability and excessive indentation. Using spaces instead of tabs assures that the code may be edited with all common editors and displayed with all common displays without special configuration.
    • For an old code package that you don't wish to alter too far, you may use its existing indentation method with one exception, no tabs.

No Tabs

  • Existing code that mixes tabs and spaces must be converted to use 4 spaces per indentation level.
  • To check a file you may invoke the Python command line interpreter with the -t/-tt option, it issues warnings/errors about code that illegally mixes tabs and spaces.

Maximum Line Length

  • Limit all lines to a maximum of 110 characters.
    • There are still many devices around that are limited to 80 character lines; plus, limiting windows to 80 characters makes it possible to have several windows side-by-side. The default wrapping on such devices looks ugly.

Line Continuation

  • The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better. Make sure to indent the continued line appropriately. Some examples:
class Rectangle(Blob):
    """Documentation for Rectangle.
    """
    def __init__(self, width, height,
                 color='black', emphasis=None, highlight=0):
        if width == 0 and height == 0 and 
           color == 'red' and emphasis == 'strong' or 
           highlight > 100:
            raise ValueError("sorry, you lose")
        if width == 0 and height == 0 and (color == 'red' or
                                           emphasis is None):
            raise ValueError("I don't think so")
        Blob.__init__(self, width, height,
                      color, emphasis, highlight)

Blank Lines

Use blank lines to make your code readable. The following are recommendations:

  • Separate top-level function and class definitions with two blank lines.
  • Separate method definitions inside a class by a single blank line.
  • Do not use a blank line on either side of a doc string.
  • Use blank lines in functions, sparingly, to indicate logical sections.
  • Extra blank lines may be used (sparingly) to separate groups of related functions.
  • Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).

Encoding

  • Always use ASCII for new python code.
  • Do not include a coding comment (as described in  PEP 263) for ASCII files.
  • Existing code using Latin-1 encoding (a.k.a. ISO-8859-1) is acceptable so long as it has a proper coding comment. All other code must be converted to ASCII or Latin-1 except for 3rd party packages used "as is".

Code Order

Within a module the things should be placed in the following order:

   1. Shebang line (#!), only for executable scripts
   2. Module-level comments
   3. Module-level docstring
   4. Imports
   5. '__all__' statement, if any
   6. Module variables (names start with underscore)
   7. Module functions and classes (names start with underscore)
   8. Public variables
   9. Public functions and classes
   10. Optional test suites

Imports

  • Each package should be imported on one line.
    • For example, this is preferred:
      import os
      import sys
      from subprocess import Popen, PIPE
      
    • Whereas this is not:
      # two packages imported on one line
      import sys, os
      # one package imported on two lines
      from subprocess import Popen
      from subprocess import PIPE
      
  • Imports should be grouped in the following order, with each group separated by a blank line:
    • standard library imports
    • related third party imports
    • local application/library specific imports
  • When importing a class from a class-containing module,
    • it's usually okay to do this:
      from myclass import MyClass
      from foo.bar.yourclass import YourClass
      
    • But if that causes local name clashes, then do this instead:
      import myclass
      import foo.bar.yourclass
      
      and use "myclass.MyClass" and "foo.bar.yourclass.YourClass"
  • Consistency with the LSST C++ Coding Standards namespaces exists.
    • Good
      • "from lsst.foo.bar import myFunction" is analogous to "using lsst::foo::bar::myFunction
      • "import lsst.foo.bar as fooBar" is analogous to "namespace fooBar = lsst::foo::bar"
    • Disallowed in both Coding Standards - except in __init__.py library initialization context.
      • "from lsst.foo.bar import *" is analogous to "using namespace lsst::foo::bar"

Whitespace in Expressions and Statements

Avoid Extraneous Whitespace

Avoid extraneous whitespace in the following situations:

  • Immediately inside parentheses, brackets or braces.
    Yes: spam(ham[1], {eggs: 2})
    No:  spam( ham[ 1 ], { eggs: 2 } )
    
  • Immediately before a comma, semicolon, or colon:
    Yes: if x == 4: print x, y; x, y = y, x
    No:  if x == 4 : print x , y ; x , y = y , x
    
  • Immediately before the open parenthesis that starts the argument list of a function call:
    Yes: spam(1)
    No:  spam (1)
    
  • Immediately before the open parenthesis that starts an indexing or slicing:
    Yes: dict['key'] = list[index]
    No:  dict ['key'] = list [index]
    
  • More than one space around an assignment (or other) operator to align it with another. Make an exception if alignment makes the data significantly clearer (e.g. complex lookup tables).
    • Thus:
      x = 1
      y = 2
      long_variable = 3
      
    • Not this:
      x             = 1
      y             = 2
      long_variable = 3
      

Binary operators

  • Always surround these binary operators with a single space on either side:
    • assignment (=),
    • augmented assignment (+=, -= etc.),
    • comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not),
    • Booleans (and, or, not).
  • Use spaces around arithmetic operators.
    • Thus this:
      i = i + 1
      submitted += 1
      x = x * 2 - 1
      hypot2 = x * x + y * y
      c = (a + b) * (a - b)
      
    • Not this:
      i=i+1
      submitted +=1
      x = x*2 - 1
      hypot2 = x*x + y*y
      c = (a+b) * (a-b)
      

Keyword Argument and Default Parameter

Don't use spaces around the '=' sign when used to indicate a keyword argument or a default parameter value.

  • Thus this:
    def complex(real, imag=0.0):
        return magic(r=real, i=imag)
    
  • Not this:
    def complex(real, imag = 0.0):
        return magic(r = real, i = imag)
    

Multiple Statements on Line

Compound statements (multiple statements on the same line) are generally discouraged.

  • Yes:
              if foo == 'blah':
                  do_blah_thing()
              do_one()
              do_two()
              do_three()
    
  • Rather not:
              if foo == 'blah': do_blah_thing()
              do_one(); do_two(); do_three()
    

While sometimes it's okay to put an if/for/while with a small body on the same line, never do this for multi-clause statements. Also avoid folding such long lines!

  • Rather not:
    if foo == 'blah': do_blah_thing()
    for x in lst: total += x
    while t < 10: t = delay()
    
  • Definitely not:
    if foo == 'blah': do_blah_thing()
    else: do_non_blah_thing()
    
    try: something()
    finally: cleanup()
    
    do_one(); do_two(); do_three(long, argument,
                               list, like, this)
    
    if foo == 'blah': one(); two(); three()
    

Comments

General Rules

  • Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!
  • Comments should be complete sentences. If a comment is a phrase or sentence, its first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!).
  • If a comment is short, the period at the end can be omitted. Block comments generally consist of one or more paragraphs built out of complete sentences, and each sentence should end in a period.
  • You need not use two spaces after a sentence-ending period.
  • When writing English, Strunk and White apply.

Block Comments

  • Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).
  • Paragraphs inside a block comment are separated by a line containing a single #.

Inline Comments

  • Use inline comments sparingly.
  • An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.
  • Inline comments are unnecessary and in fact distracting if they state the obvious.
    • Don't do this:
      x = x + 1      # Increment x
      
    • But sometimes, this is useful:
      x = x + 1      # Compensate for border
      

Documentation Strings

Read  PEP 257. This is the your main resource for information on writing doc strings. Here are a few minor points and emendations:

  • Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the "def" line.
  • Start the doc string with a one-line summary, a phrase ending in a period. Prescribe the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...".
  • If more information is wanted (as it usually is), include it after blank line. This usually should include a description of the arguments, return value and important error conditions.
  • If you mention arguments or other variables, always use their correct case.
  • Delimit doc strings with """ (three double quotes). You may use u""" for unicode but it is usually preferable to stick to ASCII.

  • Doc strings should not be preceded or followed by a blank line.
  • The terminating """ should be on its own line, even for one-line doc strings (this is a minor departure from PEP 257).
      """Return a foobang
    
      Optional plotz says to frobnicate the bizbaz first.
      """
    

Naming Conventions

The naming conventions for LSST Python and C++ source have been defined to be as similar as the respective languages allow.

General Remarks

In general, class names are CamelCase with leading uppercase; all other names in the source are camelCase with leading lowercase, except for module variables used as module global constants - which should be UPPERCASE_WITH_UNDERSCORES.

Names may be decorated with leading and/or trailing underscores in the following instances

  • Names with double leading and trailing underscores are "magic" names (e.g. __init__, __name__, or __str__). Users should not use this form to define "user names".
  • Names with leading double underscores (but without trailing double underscores) define class-private names if they appear inside class.
  • Single leading underscore is the weak "internal use" indicator. E.g. "from M import *" does not import names starting with an underscore. Modules that are designed for use via "from M import *" should use the __all__ mechanism to prevent exporting globals and/or the weak "internal use" indicator.
  • Single trailing underscore is used to prevent name clash with reserved keyword; or better, yet, choose a synonym to avoid the clash.

The names of files containing Python source should be camelCase with leading lowercase letter and terminating with .py.

Class Names

Python class names should follow the same conventions as C++ class names - they should be CamelCase with leading uppercase. Typically there should be one class in one source file. As an exception you can put one or more "hidden" classes (whose names start with underscore) along with the normal class if these hidden classes' names are never exposed to the outside world.

Exception Names

Use the class name convention since exceptions are classes. In addition, use the suffix "Error" on your exception names (if the exception actually is an error).

Method and Attribute Names

Class and object methods and attributes should be made camelCase, with leading lowercase. To make attribute or method private, prefix it with double underscore. There is no reason to prefix member variables with single underscore in Python because they are always referenced through 'self'.

Module methods (free functions) should be camelCase with leading lowercase. Modules should not normally expose their variables directly, except the variables which are "constants". Constants should be named in UPPERCASE_WITH_UNDERSCORES. Module variables of functions which should not be exposed start with underscore.

Module Names

Modules which contain class definitions should be named after the class name (one module per class). Modules containing only functions should be named in camelCase with leading lowercase.

When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket).

Modules that are designed for use via "from M import *" should use the __all__ mechanism to prevent exporting globals and/or use the the older convention of prefixing such globals with an underscore to indicate these globals are "module non-public".

Python Source File Names

The name of the file containing a module will be the camelCase-with-leading-lowercase transliteration of the module name.

The name of an executable python script will be camelCase with leading lowercase. The name of a test case should be descriptive without the need for a trailing numeral to distinguish one test case from another; the name will be camelCase with leading lowercase.

Function and method arguments

Always use 'self' for the first argument to instance methods.

Always use 'cls' for the first argument to class methods.

Names to Avoid

Never use the characters

  • 'l' (lowercase letter el),
  • 'O' (uppercase letter oh), or
  • 'I' (uppercase letter eye) as single character variable names.

In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use 'l', use 'L' instead.

Designing for Inheritance

Always decide whether a class's methods and instance variables (collectively: "attributes") should be public or non-public. If in doubt, choose non-public; it's easier to make it public later than to make a public attribute non-public.

Public attributes are those that you expect unrelated clients of your class to use, with your commitment to avoid backward incompatible changes. Non-public attributes are those that are not intended to be used by third parties; you make no guarantees that non-public attributes won't change or even be removed.

We don't use the term "private" here, since no attribute is really private in Python (without a generally unnecessary amount of work).

Another category of attributes are those that are part of the "subclass API" (often called "protected" in other languages). Some classes are designed to be inherited from, either to extend or modify aspects of the class's behavior. When designing such a class, take care to make explicit decisions about which attributes are public, which are part of the subclass API, and which are truly only to be used by your base class.

With this in mind, here are the Pythonic guidelines:

  • Public attributes should have no leading underscores.
  • If your public attribute name collides with a reserved keyword, append a single trailing underscore to your attribute name. This is preferable to an abbreviation or corrupted spelling. (However, notwithstanding this rule, 'cls' is the preferred spelling for any variable or argument which is known to be a class, especially the first argument to a class method.)
    • Note 1: See the argument name recommendation above for class methods.
  • For simple public data attributes, it is best to expose just the attribute name, without complicated accessor/mutator methods. Keep in mind that Python provides an easy path to future enhancement, should you find that a simple data attribute needs to grow functional behavior. In that case, use properties to hide functional implementation behind simple data attribute access syntax.
    • Note 1: Properties only work on new-style classes.
    • Note 2: Try to keep the functional behavior side-effect free, although side-effects such as caching are generally fine.
    • Note 3: Avoid using properties for computationally expensive operations; the attribute notation makes the caller believe that access is (relatively) cheap.
  • If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, name them with double leading underscores and no trailing underscores. This invokes Python's name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.
    • Note 1: Note that only the simple class name is used in the mangled name, so if a subclass chooses both the same class name and attribute name, you can still get name collisions.
    • Note 2: Name mangling can make certain uses, such as debugging and getattr(), less convenient. However the name mangling algorithm is well documented and easy to perform manually.
    • Note 3: Not everyone likes name mangling. Try to balance the need to avoid accidental name clashes with potential use by advanced callers.

Programming Pitfalls

Beware of writing "if x" when you mean "if x != None"

This often comes up when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!

Never use mutable object as default in arg list

Never use a mutable object as default value in a function or method argument list.

The problem is that the default value may itself change, leading to subtle bugs. This problem bites many new Python programmers, though usually only once.

  • To avoid the problem use something like the following:
    def proclist(alist=None):
        if alist == None:
            alist = []
    
    def proclist(alist=()):   # if you can tolerate a tuple; tuples are immutable
    
    
  • Rather than the more obvious but dangerously wrong:
    def proclist(alist=[]):
    

Object type comparisons should always use isinstance()

Object type comparisons should always use isinstance() instead of comparing types directly.

Yes: if isinstance(obj, int):

No:  if type(obj) is type(1):

When checking if an object is a string, keep in mind that it might be a unicode string too! Starting with Python 2.3, str and unicode have a common base class, basestring, so you can do:

if isinstance(obj, basestring):

Use * and ** in function calls instead of 'apply'

In old versions of python, to call a function with an argument list and/or keyword dictionary you had to write "apply(func, args, keyargs)". Now you can write "func(*args, **keyargs)", which is faster and clearer.

Programming Recommendations

Try to make your Python code idiomatic ("pythonic"). Consider the following, slightly adapted from Tim Peters'  The Zen of Python:

Beautiful is better than ugly.
  Explicit is better than implicit.
  Simple is better than complex.
  Complex is better than complicated.
  Flat is better than nested.
  Sparse is better than dense.
  Readability counts.
  Special cases aren't special enough to break the rules.
  Although practicality beats purity.
  Errors should never pass silently.
  Unless explicitly silenced.
  In the face of ambiguity, refuse the temptation to guess.
  There should be one -- and preferably only one -- obvious way to do it.
  If the implementation is hard to explain, it's a bad idea.
  If the implementation is easy to explain, it may be a good idea.

Use Python 2.5

Write your code to run under Python 2.5. Learn the new features of Python and use them where applicable to make your code simpler and more readable. Thus:

  • Use  iterators,  generators (classes that act like iterators) and  generator expressions (expressions that act like iterators) to iterate over large data sets efficiently. New in Python 2.2, except generator expressions were added in 2.4 and generators were slightly enhanced in Python 2.5.
  • Use the  with statement to simplify resource allocation. New in Python 2.5. For example to be sure a file will be closed when you are done with it:
    from __future__ import with_statement
    ...
    with open('/etc/passwd', 'r') as f:
        for line in f:
            ...
    

Exception Handling

  • To catch all errors but let SystemExit and KeyboardInterrupt through, use:
            except Exception, e:
                ...
    
  • The exception hierarchy in python 2.5 was improved, eliminating the need to use this:
            except (SystemExit, KeyboardInterrupt):
                raise
            except Exception, e:
                ...
    
  • When raising an exception, use "raise ValueError('message')" instead of the older, deprecated form "raise ValueError, 'message'".

String Handling

  • Use string methods instead of the string module.

String methods are always much faster and share the same API with unicode strings.

  • Use .startswith() and .endswith() instead of string slicing to check for prefixes or suffixes. startswith() and endswith() are cleaner and less error prone. For example:
    Yes: if foo.startswith('bar'):
    No:  if foo[:3] == 'bar':
    
  • Don't write string literals that rely on significant trailing whitespace. Such trailing whitespace is visually indistinguishable and some editors (or more recently, reindent.py) will trim them.

Comparisons

  • Avoid comparing with "is" and "is not" unless you really mean it. Use "is" or "is not" only for the very rare case that you need to know that two variables point to the exact same object. Usually you only care whether two objects have the same value, in which case use "==" or "!=".
  • Prefer "==" and "!=" when comparing to None. This disagrees with  PEP 8, but I feel it is a mistake because it implies the wrong thing, it only works due to a design subtlety of the language, and it tempts coders to use "is" in inappropriate situations.
  • For sequences, (strings, lists, tuples), use the fact that empty sequences are false.
    Yes: if not seq:
       if seq:
    
    No: if len(seq)
      if not len(seq)
    
  • Don't compare boolean values to True or False using == (unlesss it matters, e.g. for tri-state logic).
    Yes:   if greeting:
    
    No:    if greeting == True:
    
    Worse: if greeting is True:
    

Suggested Modules

  • Use the  subprocess module to spawn processes. This supersedes and unifies os.system, os.spawn, os.popen, etc. New in Python 2.3.
  • Avoid the use of lambda. You can almost always write clearer code by using a named function or using the  functools module to wrap a function.
  • Use the  set type for unordered collections of objects. New in Python 2.4 (though available via the Set module in Python 2.3).
  • Use the  optparse module for command-line scripts.

Document Metadata

Adapted by Russell Owen from work by Guido van Rossum, Barry Warsaw and others. Updated by R Allsman to include sections from 'Python Style Guide for BaBar' at:  http://www-spires.slac.stanford.edu/BFROOT/www/Computing/Programming/Python/PythonStyleGuide.html

To Do

  • Update to match the LSST C++ standard.
  • If we want to use doxygen for a documentation system then add a section on making doc strings compatible.

License

This document is in the public domain.

USER COMMENTS on Rule Statements