When "unsafety" is a Good Thing
Some months ago, I read an article by someone who teaches Perl to Java programmers. In it, the author wonders a bit about some of Perl’s features which seem counterintuitive to those coming from a Java background. As you might expect, one of those features is its typing. He says:
For example: stern, protective type safety is not only missing from Perl but actually not even considered particularly desirable. Perl is relaxed. The reaction from students is, naturally: “Isn’t this a huge obstacle to getting anything done?” Obviously the answer is “No”, but even after all the Perl I’ve written I sometimes wonder why not. Is it really just the fact that it takes less time to type method signatures and variable declarations?
At the time I read this, I thought to myself, “Hey, I know the answer to that!” But I just jotted it down as a potential blog topic and let it percolate for a bit. Many folks reading this will also know the answer, but it’s an interesting topic, and I’m sure someone will get something out of it.
So, the first stumbling block in any discussion on this topic is to quibble over what we mean by “strong typing” and “weak typing.” Java (like C++ and C) is considered “strongly typed.” This is because when you declare a variable, you say “this is an integer” (for instance) and then you can’t put anything into that variable that’s not an integer. The compiler won’t let you do this—it’s not a runtime check (such as Moose has), but rather enforced at the very top level. On the other hand, Perl is generally considered “weakly typed” because you don’t say whether a variable is an integer or not, and you can stick whatever you like into it—integer, string, floating-point number, boolean value, reference to another variable: whatever floats your boat.
But some people argue that this is too narrow a view on typing. After all, when you declare a variable in Perl (or even if you never declare it), you say “this is a scalar” or “this is an array” or “this is a hash.” And you can’t put an array into a scalar, or a hash into an array. Isn’t that strong typing ... just at a higher level? Obviously the Java/C++/C folks don’t think so, but then they have a bit of a cultural bias as to what constitutes a “type.”
But let’s ignore that debate for the moment. Whatever sort of typing Java and its ancestors have, let’s agree to call it “strong typing,” just to save confusion. And we’ll agree to call the sort of typing that Perl has “weak typing.” Of course, even those terms have issues. Anything that’s “strong” has to be better than something that’s “weak” ... right? So it’s sort of like we’re insulting Perl’s type system before we even look at what it actually means. But let’s avoid the semantic argument as well, if we can, and just agree not to consider “weak” as equivalent to “bad” in this case.
So, which one is better, “strong typing” or “weak typing”? The answer, of course, is neither (or both, depending on your point of view). Both have advantages and disadvantages. This is one of the reasons I think Perl 6 gets it right: its system could be considered to be “optional strong typing.” That is, if you want “strong” (Java/C++/C style) typing, you can have it. If you prefer “weak” (Perl 5 style) typing, you can have that too. I hope that’s one of those features that gets backported to Perl 5 someday. But, the point is, sometimes “strong” typing is useful, which is why Moose provides a type system—which, as I noted above, doesn’t work the same way as true typing, but it serves the same purpose ... which is to help programmers catch mistakes. When you’re expecting an integer, and someone hands you a string, you’d probably like to know about it at some point before you try to divide 365 by “bmoogle.”
So Java and its ilk have us covered on that count: such an error would be impossible in those languages. But when is Perl’s “weak” typing better? Well, anyone who’s ever spent any time writing C code can probably tell you how annoying it is to put in all those atoi
calls. And, sure, printf
is nice when you need it, but what if you had to use it every single time you wanted to print anything that wasn’t a string? Trust me, it gets old fast. C++ makes it better, with its iostream
system, but then Java goes backward again (although admittedly not all the way back to the horrors of C). But none of them are particularlly easy. And printing (be it to screen or file) is a pretty common thing to want to do with variables. Sometimes you even want to read them back in from files, and that’s even harder. It’s much easier in Perl, which doesn’t much care whether a variable is supposed to be a string or a number. The ability to transition back and forth without having to think about it is pretty damned convenient, although anyone raised in Perl probably doesn’t even realize it, never having had to deal with the frustrations of a “strongly typed” language.
But let’s move beyond the simple case of trying to mix strings and numbers in output. In my experience, where a “weakly typed” language really shines is in trying to interface with your database. Your typical RDBMS is pretty “strongly typed,” which can make dealing with values coming out of it fairly annoying in a “strongly typed” language. You wouldn’t think that would be the case. You’d think that a “strongly typed” language and a “strongly typed” data source would be a match made in heaven. But what it actually means is that now you have to coordinate those types exactly, or you’re going to have troubles. What if your database’s concept of an int
and your language’s concept of an int
aren’t the same? Perhaps one is using 32-bit integers and one 64-bit integers. More commonly, what if your software doesn’t have foreknowledge of the types of your data? This can happen with dynamic queries, or just from using RDBMS functions. Sure, you can query the database itself and get column types back, but that happens at run-time, and your types have to be all set at compile-time. Quite a dilemma.
Even worse if you’re trying to write a general-purpose solution such as a library. I tried that, once, in C++. I cooked up some abomination of a class built around a union. I suppose, in the end, it was pretty much how an SV
(that is, the internal C representation of a Perl scalar value) works. Only mine was untested and most likely full of holes that the authors of Perl had found and fixed years prior. When I tried a few years later, in Perl, it only took a few hours to get the basics down. Because every column in an RDBMS query is a scalar value, and that’s exactly what Perl wants in a variable.
So it’s easy to see several situations where “weak typing” is an advantage. When you want to print a variable to the console, you don’t care what type it is: you just want to see it on the screen. When you want to pull a value out of your query and slap it into an HTML report, you don’t care what type it is: you just want to see it in the browser. Yes, there are times when “strong typing” is going to save your ass by catching a bad paramater to a method call. But there are also plenty of times when all it’s going to do is be a pain in your ass. And not buy you anything. And that’s why “weak typing” is a good thing.
And thus it is that lack of “type safety” isn’t nearly as unsafe as it sounds. And the time it saves you typing variable declarations really doesn’t enter into it.
I'm racking my brain trying to think of examples where I've gotten into trouble with a scalar being able to contain a number or a string, and the only real case I can think of is when trying to JSON-encode some data and having to be clear whether to encode as an int (without quotes) or as a string (with quotes). That's not a problem with Perl, but with whatever system is receiving the JSON string on the other end and expecting a particular type!
Otherwise, it's great being able to read in a byte stream and be able to perform numeric operations right away without having to do conversions. When we really need to check the content of our string, we have Scalar::Util::looks_like_number().
Related link: list of dynamic languages (probably mostly with weak typing): http://en.wikipedia.org/wiki/Dynamic_programming_language#Examples
@Ether,
what about trouble when "0" is false?
@Ether: You're right: it's occasionally annoying when you have to interface your Perl code with something "strongly typed" (such as SOAP). But I generally consider that an issue with the other end. :-D
@vsespb:
> what about trouble when "0" is false?
Well, 2 things about that:
1) A lot of the time we run into that, it's because of people mixing up "false" with "undefined," e.g.:
In those cases, using // can help (assuming you're using 5.10 or later) by allowing you to be more precise.
2) There are still some cases where you still run into trouble, such as the return from DBI::execute(), or the return from system(). But then those often have to do with trying to jam success/failure into a return value that actually has information in it. And that's a problem with "strongly typed" languages as well. For instance, the C/C++ spawn* calls have the same issue that Perl's system() does.
mjd's Strong Typing and Perl.
Treating numbers and strings as the same thing is a problem for US zip codes that have leading 0s. :)
And let’s hear it not for strong typing, or weak typing, or dynamic typing, or static typing, but Nancy typing. :-)
One thing that might be interesting is that the disadvantage of "weakly typed" language is seen as "danger" while the disadvantage of the "strongly typed" language is seen as mere "inconvenience".
I think most people can cope with inconvenience better than with danger.
So I think an important question is how do you eliminate, or at least reduce the perceived danger.
In Java types are mostly Object or Interface ... I think Perl 6 will eventually solve our problems. Basically this problem of type saftey is caused, IMO by lack of object usage. I cannot guarantee what is coming in is what I expect, or at least behaves how I expect. Though I think weak typing, and duck typing is an advantage in many ways, one of the biggest mistakes IMO of Perl 5 is not having things like an actual Boolean type.
Heh, why don't we refer to the systems as "stiff typing" and "spongy typing"? :-)
I'm going to take the devil's advocate position here.
A lot of what type-safe languages do isn't so much about what you put into variables, but about what you do with them. They don't let you take the square root of a string, because that doesn't make sense.
Your typical RDBMS is pretty “strongly typed,” which can make dealing with values coming out of it fairly annoying in a “strongly typed” language. You wouldn’t think that would be the case. You’d think that a “strongly typed” language and a “strongly typed” data source would be a match made in heaven. But what it actually means is that now you have to coordinate those types exactly, or you’re going to have troubles. What if your database’s concept of an int and your language’s concept of an int aren’t the same?
I've seen similar problems with putting data into an RDBMS query with Perl. The problem was that the DBI driver was considering the Perl variable to be a string when it should have been an integer. This ended up throwing the DB's query optimizer into conniptions, and the solution was about as syntactically appealing as typecasting. In fact, forcing the DBI driver to consider the variable an integer was basically a manual form of typecasting.
I did a three part article on type systems a couple of years ago, which looks into their strong and weak points, quirks and idiosyncrasies, and reaches to a conclusion based on their comparison with each other (strong vs weak,dynamic vs static).
What I found is that it's not that easy to categorize languages based on their type systems, as they adapt a mix of each system's characteristics to their own needs,(C# makes use of duck typing for example,although categorized as a static language), something that gets summarized in this chunk of the conclusion :
"Strong,weak,static,dynamic what is it all about?
Labeling a language as of a certain type is apparently not that easy since it can incorporate a mixture of type systems; furthermore such a labeling would not offer anything significant.
But knowing the underlying concepts, quirks, weaknesses and strong points of each type system helps in avoiding potential pitfalls (Quoting the Camel book "You will be miserable until you learn the difference between scalar and list context"), leads to less buggy code and allows maximum exploitation of each system's unique features as to increase productivity, flexibility and agility."
There is much more it, so I attach the links to the parts in case you'd be interested :
part one : Type systems demystified
part two : Weakly typed languages
part three and conclusion : Strong typing
Buddy - I think you hit the nail on the head when you point to database interaction as an area where a weakly typed language is beneficial. We have strong typing in our (relational) databases, so adding primitive typing in our middle tier violates the Don't Repeat Yourself principle.
With regards to Timm Murray's comment - my experience has been that has been extremely rare that I have had to use extended DBI features to provide additional primitive type information back to the DBMS. I would not give up on weak typing in the middle tier for these rare cases.
Woohoo! Lots of great comments; thx guys!
@brian d foy:
I'm not sure I agree here ... since Perl has specifically different operators for strings and numbers, you actually have to work pretty hard to lose that leading 0 ... like a `print $zip + 0` or something. :-)
@Aristotle:
I don't think I would call that "Nancy typing" so much as "turn-on-warnings-you-twit-typing." :-D
@Gabor Szabo:
Yes, I think this is built into the use of the word "safety" (in "type safety"), which can be just as biasing as the "strong" in "strong typing." A "strongly typed" language offers the convenience of helping you catch (some) mistakes more easily, but I'm not sure that translates to helping you avoid actual danger. And if the inconvenience in other places starts to overwhelm the convenience of catching those mistakes, then the "strong typing" isn't any better, or safer, or stronger, than the alternatives.
@xenoterracide:
Agree with you here, especially regarding Perl 6 and missing booleans.
@David Mertens:
Sounds good to me. :-D
@Timm Murray:
Like @Ross Attrill, this is not something I've seen often. In fact, I would say I've never seen it using placeholders, only with bind variables. It's one of the reasons I prefer placeholders. ;->
@nikosv:
Excellent reference; thx for posting it!
Agree totally, which is why I always put "strongly typed" and "weakly typed" in quotes. ;->
A lot of the times that type safety could save you, it doesn't. SQL injection attacks still happen, because the SQL and the thing injected into it are both the same type (strings). If you could get a compile-time error trying to concatenate an SQL string with an unescaped string, that would be a great argument in favour of strong typing.
What about stenches?
The specific case I was thinking of happened under Oracle. I know Oracle is a bit of a redheaded stepchild in the Perl community, but we still have to deal with it.
The problem did happen with placeholders. DBD::Oracle makes some guesses about the placeholder types, and in this case, those guesses were causing problems. The solution was actually to convert this specific case to bind variables with an explicit type.
I do agree that it's a relatively narrow case, though.
The latest news on the subject wants a team of Swedish students releasing a paper that compares Node.js/Javascript to C# and suggests, including the aspect of typing, which one is a better fit for a betting company's IT operation.
It refers to the Weakly typed languages article, and looks interesting in that it gives the language typing debate a much practical value, through a real world example.
The only caveat is that it is written in Swedish and only the abstract is in English; so if anybody finds it interesting and knows Swedish then I would much appreciate a translated overview!