Yenya's World

Tue, 15 Aug 2006

Clean Code versus undef

I like to write a clean, readable code. Sometimes even to the point that I am slow to write or fix the code when I think the current code is too ugly, but I have no immediate idea on what would be the best way to fix it cleanly. Recently I've came across an interesting dilemma, which is probably a clash between two measures of a clean code:

Rule 1:
The clean code produces no warnings. The compiler usually knows about dangerous constructs of its language, and can warn the programmer appropriately. When the programmer really wants to do something weird, there is often a way to switch the warning off (such as using two nested pairs of parentheses when an assignment is to be used as a Boolean value inside the if (...) statement).
Rule 2:
The clean code is readable. And by readable, I also mean short. The longer the code is, the more the reader has to keep in a short-term memory in order to grok the code. Perl programmers usually call this principle DWIM - Do What I Mean. So the symbols should be named appropriately, the meaning of the return values should be obvious, etc.

There is an interesting problem in Perl with an undef value. The Perl interpreter usually considers undef to be something volatile which should be handled with an extreme care. So it warns of almost every attempt to use an undef value. Yet undef has a very clean mental interpretation: it is nothing. So it is natural to expect it to behave as a neutral element of the operation in question: The concatenation undef.$string should lead to $string, length(undef) should be zero, lc(undef) should remain undef, ++$new_var should return 1 for a variable with undefined value, etc.

So I am in a situation when I have to put if (defined $var) { ... } around various parts of the code, in order to disable warnings about an unitialized value. But the code is perfectly readable (and thus clean by Rule 2) even without it. Adding more code makes the readability by Rule 2 worse.

Moreover, even Perl itself is not consistent with handling undef: The above ++$new_var construct follows the DWIM principle, and produces no warning. But something like print STDERR $log_prefix, $message, "\n"; produces a run-time warning when $log_prefix is not defined. There are definitely usages of undef where Perl should complain loudly (such as calling a method of undef, calling undef as a function, etc). But I think string concatenation or other string operations (i.e. everything that can be solved by replacing undef by an empty string) should not fall to this category.

So what do you think? Should Rule 1 have precedence here? And what about your language of choice? I know, LISP is unreadable by definition :-), but what about other languages?

Section: /computers (RSS feed) | Permanent link | 14 writebacks

14 replies for this story:

Vasek Stodulka wrote:

I think if I have to break one of the rules, then I definitely choose rule 2. In your example - if $log_prefix is undefined, then something should be wrong - that was something to be printed, but it is not even defined yet. In this case setting $log_prefix to empty string is perfectly readable and the intepreter is also completely happy. In conclusion, I think that when the code gets shorter, then it is not automatically more readable.

Michal Schwarz wrote: undef means uninitialized

I think that variable containing undef means quite clearly that it uninitialized variable, and should behave strangely and produce warnings :) You can add something to undef, but why? Isn't "my $a=0; .... $a++" cleaner and more readable than "... $a++"?

Yenya wrote:

I don't think "my $a=0; ... $a++" more cleaner than "my $a; ... $a++". But then, the "my $a=0; ... $a++" case is trivial. Consider rather something like 'my $a=some_function(); ... print "returned $a\n"'. There is plenty of ways to get an undef to the variable, not just "my $variable;". Using the undef case as a branch different to (say) an empty string just adds to the code bloat. There are cases when returned undef value means an error, but in general, I think undef should not be treated in a special way.

Milan Zamazal wrote:

I agree with others here -- using an uninitialized variable is wrong. "my $a; ... $a++" is definitely dirty, "my $a;" clearly says "$a is uninitialized" to the kind source code reader, so using its value (as in "$a++") before initialization is very confusing and there should be a warning about it in compile time (if possible) and signalling an error in run time. As for functions, they should have well defined returned values. And in an introspection code you must be very cautios anyway, so the extra checking line is all right. Clean code doesn't mean shortest possible notation but comprehensible information without redundancy.

Yenya wrote: The shortest notation.

By clean code, I definitely do not mean "shortest possible notation". But a not-so-necessary branch which just does something like "if $var is undef, take it as an empty string" adds to the code bloat. Yes, the functions should have well-defined return values - consider the (cleanly designed) function read_file($filename), which returns the contents of the file as scalar, or undef if the file cannot be opened. And imagine you want to use this function on many files (to search for something in the contents or whatever), and you do not care whether the file exists or no. You just want to search in those files you can access. So "for my $file () { print $file, "\n" if read_file($file) =~ /string/ }" is what you want. But instead you get a warning that you are trying to match something against an undef. WTF? It is perfectly OK if undef does not match. For example in SQL, NULL values also have a clean meaning (yet different than "a neutral element"). It is possible (and perfectly OK) to use NULL in expressions.

Milan Zamazal wrote:

Cleanly designed read_file should return something like NULL value (which is a *value*, I don't know what's its equivalent in Perl) instead of undef (which I'd interpret as an undefined result, perhaps due to some unexpected error) and all should be right. If Perl doesn't have a NULL equivalent and uses undef instead, then it's a missing feature in the language.

Yenya wrote: undef _is_ NULL

The undef constant is an equivalent of NULL in SQL or (void *)0 in C - just a special value. undef in Perl does not mean "undefined" (as in "random" or "arbitrary"). Think of it as NULL (or LISP nil). Undef _is_ a value.

Milan Zamazal wrote:

I see. Then I think your assumptions are basically correct as heavy coercing is one of key Perl features. With the exception that I consider "my $a=undef;" being better notation than "my $a;" if you intend to use the initial value and that even Perl should complain when you try to use undeclared variables (I agree it should do that consistently).

Yenya wrote: New variables

The "my $a;" declaration is perfectly OK in Perl, and the variable is set to undef by definition. So "my $a=undef;" would be superfluous and thus violating the Rule 2. Warnings/errors about undeclared variables are of course the good thing, and Perl does it. But "my $a;" is the declaration (and setting to undef). META: From the discussion ti may seem that I favour the Rule 2 over the Rule 1 - it is not true, I simply do not know, and in this case (warnings when using the value undef in some string operations) they seem to clash. So when you all defended the Rule 1, I just wanted to emphasize the arguments in favour of Rule 2.

Milan Zamazal wrote: Assigning undef explicitly

"my $a=undef;" is not superfluous, it tells the code reader that undef is the *intended* value of the variable. For instance, in Lisp the forms (let ((foo nil)) ...) and (let (foo) ...) are equivalent too, but I use the first one to tell the reader that foo's value is nil and the latter one to tell him that the value is to be defined later (despite the compiler sets it to nil anyway). If I always used (let (foo) ...) then the code reader would have to distinguish the two cases by reading other parts of the code. That contradicts your Rule 2 (the reader has to extract and hold an implicit information in his memory). BTW, I think it's more important to keep the number of lines of the code low then to make the lines themselves short.

Yenya wrote: undef is always intended

I think having the newly declared variable (my $var) set to undef is always intended. Perl easily allows the variable to be declared anywhere, so the most feasible is to declare it with the first use (such as "my $var = Some::Class->new()", "my $var = some_function()", or even "open(my $fh, $filename);" and "for my $fh (@list) {...}". So actually seeing "my $var;" as a standalone statement is _the_ case where the author wanted the variable to be explicitly undef. I however agree with you in general about keeping the number of lines low (instead of the length of these lines).

Adelton wrote:

What's wrong with "no warnings 'uninitialized';" in places where you really want to print out undefined values? It's been a long time since I really used undefined values knowingly and it brought some benefits.

Yenya wrote: no warnings 'uninitialized'

I think either "no warnings 'uninitialized'" should be on by default, or $newvar++ should emit a warning as well. Why is ++ different from string concatenation?

Adelton wrote:

Actually, I agree with both your points. ;-)

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)