Semantics changing on whitespace?

July 03, 2008 at 10:41 AM | categories: python, oldblog | View Comments

In python it's clear that the semantics of a piece of code change depending on whitespace - for example:
if True:
    print "Yay"
    print "Woo"
frag. 1
vs
if True:
    print "Yay"
print "Woo"
frag. 2
However, generally speaking this does actually mean what people intended it to mean. The common exception is where, you might want to write this:
class Foo(object):
    def main(self):
         while True:
              print "Woo"
sys.stderr.write("DEBUG - UM, in loop\n")
              print "Yay"
frag. 3

Whereas of course python views the sys.stderr.write line as the end of the while, def, & class blocks. Often people do the above (in non-python languages) because they want to make it easier to find where they've inserted debug code, and lament the lack of it python. As an aside, you can of course do the above in python, if you add an extra line in:
class Foo(object):
    def main(self):
         while True:
              print "Woo"
              \
sys.stderr.write("DEBUG - UM, in loop\n")
              print "Yay"
frag. 4
Since the continuation marker effectively causes the next line to be part of the same line, thereby meaning it's logically indented, even if not in reality. ("sys" is still at the start of the line in the source of course)

However, as far as whitespace goes, I think that's as far as the change in semantics due to white space goes of course *except* that it's also used as a delimiter between tokens. (This is kinda necessary after they found with fortran many years back that allowing whitespace in identifiers was a rather bad idea in practice)

Anyhow, this does tend to mean that the last line of this:
foo =10
bar = 2
X[foo/bar]

frag. 5
means the same as all of these:
X [foo/bar] X[foo /bar] X [foo /bar] X[foo / bar] X [foo / bar]
frag. 6
Whereas apparently in ruby it wouldn't - based on some recent posts. In fact, I think only 2 of them do the same thing. That's actually pretty insane (but then I'm sure people think the same about python's whitespace rules), but clearly a consequence of allowing foo bar to mean something similar (if not identical ?) to foo(bar). However, it also clearly breaches a the rule of least surprise. Whilst the problem with the rule of least surprise is "who is surprised", I think it's reasonable for someone looking at code to assume that the following all do the same things:
X[foo/bar] X [foo/bar] X[foo /bar] X [foo /bar] X[foo / bar] X [foo / bar]
frag. 7
And it's also reasonable to assume that the following are at least intended to be different:
if (1)
   
printf("Yay\n");
    printf("Woo\n");
frag. 8
vs
if (1)
   
printf("Yay\n");
printf("Woo\n");
frag. 9

But of course in C, they aren't. Now that's why most C programmers wouldn't do that, but it's made me wonder. C has this foible, which every C programmer knows about. Ruby has the above foible which I'm guessing most if not all ruby programmers are aware of, but with python it's whitespace semantics (which are intended to actually encourage good behaviour and fix the "problem" with frags 8 vs 9) that everyone knows about and does put people off...

ie The biggest barrier (that I hear of) to adoption of python is the fact that frags 1 & 2 do mean different things. I'm not sure why it's a huge barrier, but it does turn out to be the single factor that turns most people off the language (in my experience...). Whilst you do have something like pindent.py which allows frags 1& 2 to look like this:
if True:
print "Yay"
print "Woo"
#end if
frag. 10 - same as frag 1 after running through pindent.py
vs
if True:
    print "Yay"
#end if
print "Woo"
frag. 11 - same as frag 2 after running through pindent.py
And whilst hell would freeze over before the addition of a keyword 'end' to python, it strikes me that being able to write:
if True:
  print "Yay"
 print "Woo"
end
frag. 12 - same as frag 1 after running through pindent.py
and
if True:
    print "Yay"
end
print "Woo"
frag. 13 - same as frag 2 after running through pindent.py
and
class Foo(object):
    def main(self):
         while True:
              print "Woo"
sys.stderr.write("DEBUG - UM, in loop\n")
              print "Yay"
         end
    end
end
frag. 14 - same as frag 2 after running through pindent.py
Wouldn't actually be the end of the world, and would actually simplify things for beginners, but also simplify things when people are embedding code and copying/pasting code from webpages from archives of lists etc. It'd mean that web templating languages which allow python code inside templates (not a wise idea to mix code with templates really, but it does happen) would be able to use the same syntax, etc.

It would also do away with the one major criticism of python. To make it available, my personal preference is that it would have to be available as a command line switch, which defaults to off. However, as mentioned hell would freeze over before it was added, so the question that springs to mind is "is it worth writing a pre-processor for" ? I can see some benefits in doing so for example it would mean that python was less fussy about things like frags 12 and 14 - both of which have whitespace issues python would scream about. frag 12 has a common mistake - whereas frag 14 contains a common desire. (at least for people who are used to that temporary debug style in many languages)

It'd also mean (perhaps) that resurrecting kids books teaching programming could use python happily without people wondering whether they've counted the dedents correctly - since they'd be able to count up end keywords.

It'd also open the door to handwriting based coding in python... (since indenting 8 "spaces" when writing doesn't make much sense - and your indentation isn't going to be perfect then either)

So the question for me, is it worth writing? I personally suspect it is, and the preprocessor needed would be quite simple to write either from scratch or to derive from pindent.py, but wonder what other people's opinions are. How long did whitespace sensitivity in python stop you learning it? (It put me off for 5 years) Has it stopped you wanting to bother? Do you think such a pre-processor would be a really bad idea?


blog comments powered by Disqus