What do you care what type it is?

I'm about to start writing about a bunch of stuff that will definitely show my lack of a computer science background. Unlike many of my posts, this is your chance to correct me rather than me explain things to you. This has been on my desktop for awhile, so I'm cutting bait and posting what I have instead of working on it more.

I've been reading Seven Languages in Seven Weeks: A Pragmatic Guide to Learning Programming Languages, which is an enjoyable book except for the parts where he starts to talk about types and, ahem, types of programming languages. It's mostly distracting, not very useful, probably misguided, it not outright wrong.

So, Ovid posts a clever summary of type arguments. This reminded me of the smart, educated, and quite entertaining "Strong Typing" talk that Mark Jason Dominus gave to several Perl mongers groups. It also reminds me that no one seems to think the same things about the same terms. Also see mjd's message in comp.lang.perl.moderated, in which he summarizes the several competing definitions of strong typing.

Wikipedia's entry on Strongly-typed Programming Languages isn't any help. Indeed, the discussion page, where mjd shows up, is better than the main article (and also demonstrates the underlying weakness of Wikipedia). The article did point me toward Types and Programming Languages (Google Books), which Amazon is already sending to me. I like the look of that book since it goes back to the math.

The math is where it's at, and although I don't have a background in Computer Science, I have a lot of experience with abstract algebra, which defines sets, groups, and so on, and what happens when they interact with each other.

I think this is why I get confused with most people's explanations. Most of the explanations I find come from people trying to explain a concept that they don't fully understand based on their limited experience. That is, more concretely, people think the C programming language means something when it comes to types. I like Real World Haskell's approach if only because it defines the term. They could have just as well said Haskell is a "blue language" because the particular word doesn't matter when you provide your own definition for it:

When we say that Haskell has a strong type system, we mean that the type system guarantees that a program cannot contain certain kinds of errors.

That provides an easy way for Haskell to compare itself to other languages. In Haskell, certain classes of errors can't occur in a valid program. In other languages, maybe those classes or errors can. The question is, does that matter to you, both personally as a matter of beauty, and economically, as a productive use of your time?

And now comes the bit where I try to do better and will fail.

There's this big mess of terms: strong, weak, loose, static, dynamic, concrete, abstract, data, variable, and so on. I like what Richard Feynman learned about bird names from his father. Dr. Feynman says:

You can know the name of a bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird... So let's look at the bird and see what it's doing -- that's what counts. I learned very early the difference between knowing the name of something and knowing something.

The video is more interesting:


He tells the same story to an interviewer in "Take the World from Another Point of View":


In this telling, he adds one important bit to that story:

Names don't constitute knowledge. That's caused me a certain trouble since because I refuse to learn the name for anything. ... What he forgot to tell me was knowing the names of things is useful if you want to talk to someone else.

The names of birds, however, only matter if people call the bird by the same name.

Types are just a kind of thing, and not at all like birds. It doesn't matter how we define that thing or how it works. The type is not the algebra. Forget about the terms, which no one can agree on (mostly), and figure out what you what to know and why you want to know it. It doesn't really matter what you call it as long as you get what you want.

What can I put in this variable?

A lot of programmers immediately think of int, float, or char as types. That's fine. However, when they don't see those types, they tend to turn up their nose because they think something is type deficient. The sorts of types that you have really has nothing to do with it. Indeed, most of those types come from architecture-dependent factors, like exposing the storage and format details at the higher levels. The people that want these sorts of types are looking to define the set of data that belong in that type. However, that does not mean that larger sets are not also types.

Programmers typically want this so they have something that protects them from storing invalid values.

How soon do I find out about type errors?

Do I have to wait until I run the program or will the compiler tell me? Consider this Perl example:

 push $array, qw(a b c);

Is that a type error? Is it a type error in Perl 5.12? What about Perl 5.14? When does Perl find out about that error in each of those versions? Is it good or bad that it does that?

Can I change the type?

Is the type fixed, or can programmers play tricks to cast or coerce the thingy to void, or Object, or whatever, from which they can then recast the thingy to whatever they want? Do you want to allow or forbid that sort of thing?

When do I know the types?

People get confused about when the compiler (or interpreter) knows what the type is. Can you check it without a compiler (as with PPI or other static analysis tools), which really means can you infer all of the information that you need about types without actually running the program?

Mostly, people want to find errors though type-checking (so, the terms "type safety" and "type security"). The earlier you know about the types, the sooner your program can report problems. Some people don't even want the program to compile if there is a type mismatch.

What is the operator?

Some languages choose the operator by type, even if you type the same literal text for the code. How does that make you feel? Would you rather see the operation explicitly so you don't have to read through several lines of code to determine the type to know the operation, or do you want to be able to look at isolated statements and know what is going on?

What type is the result?

I always hated this about FORTRAN. Dividing 10 by 3 gave back 3, because, in someone's mind, a integer divided by an integer had to be an integer. Why can't some other type come out? That goes back to the algebra. Any operation in an algebra has to return another member of the group.

What ______ is _______?

There's more to this topic than I can imagine at the moment.

14 Comments

Disclaimer: I'm not a computer scientist.

I'm with you on not caring. I'm not trying 7 languages in 7 anything (hours,days,weeks,months,years) but I am interested in always writing better code, and doing so in a variety of languages.

Sometimes the difference between strong and weak types is like figuring out if you have a left handed hammer or not. It's a tool, use it the way it was meant to be used or get another tool. If you are wont to be, you can find something wrong with every programming language... or you can learn how the one you are using is using types and deal with it. Once I've got the syntax right, I don't really care which way it is. When it comes to optimization I think most people rely on the compilation process to figure it out for them.

I don't think I've ever seen anyone choose one language over another because of types. There is usually much more right or wrong about one of them for the job at hand. Imagine knowing Perl well enough to keep Perl::Critic happy and trying to pick up Python? Types are your last worry.

Personally, I like weak types but can work with strong types... it's a thing. Both will cause you to write a few lines extra in comparison to the other depending on your perspective. When your only tool is a hammer, every problem looks like a nail is a good saying, but when your nails are divided up and this type can only be used for pine wood, and that type can only be used for oak wood... well, it can be confusing but your hammer still works. No, that wasn't supposed to be profound... and I don't think the argument about which type of types is better is profound in any meaningful way either.

push $array, qw(a b c);

Is that a type error? Is it a type error in Perl 5.12? What about Perl 5.14? When does Perl find out about that error in each of those versions? Is it good or bad that it does that?

The question that really matters (is $array an array ref?) is actually the same in both versions. The only safety afforded by 5.12 is that you can’t type $array when you meant @array. But if you meant @$array – then the deref will still happen at runtime, not compile time. For the most part there is no difference.

Datatypes where created by compiler writers to make their job easier. They do not make programming easier. If fact, for strongly-typed languages, programmers often use Hungarian notation to help they keep the types straight.

If you go out into the real world and ask people, "How do you multiple a number by 10?" they will respond, "Put a zero on the end." "Put a zero on the end" is not arithmetic, it's string manipulation. People general do not divide items into types; they just lump things together. This can lead to mistakes. For example, if you place a zero at the end of your name, is it multiplied by 10? But it can also lead to the Aha! moment that provides insight into the world.

7 / 3 = 2 was even true in python :)

We had this discussion before and nobody still understands it, sigh.
perl is both.

from the user point of view it is:
1. strongly typed on classes, and its basic types scalars, arrays and hashes.
2. weakly typed in scalars and certain contexts where a strongly typed language would/should fail, e.g. automatic conversion in scalar context for arrays and hashes.

from the performance point of view it is very weak, as it carries the type dynamically in the value, and so you have to do costly but flexible run-time type checks. compile-time checks and type assertions are existent (and should be added for pluggable optimizers), but the most important performance gains have been removed or not added yet.
inferred i_opt (using the fast integer ops) was removed by not understanding the concept,
use integer or strict 'refs' are seldom used, but mostly static method calls were not added at all. in better languages you can have bot.h, dynamic for flexibility and static for performance and safety, in perl you always have to search a string value in a package hash, at least mro is new used for the package inheritance tree, which should be dynamic for typical perl hacks. A dynamic language with static objects (types) would e.g. carry a type pointer in each value, we have it, the STASH. But a it also store the compile-time cv for each method call op, or at least at run-time cached after the first lookup.class hierarchy changes would then dirty the cv.
Or when switching to strong disallow such run-time changes.
Lots of type optimizations are possible.

It just means that in actual practice, people are going to think about it differently because Perl has to wait until the last moment to know if that line will work.

If you write it push @$array, 'foo', Perl still has to wait until the last moment to know if that line will work.

Wonderful post!
The responses so far tickle me so.

Even with such a well argued and written article you will fail to make sense to many.

Your argument reminds me of the different between art and science.

An artist spends they're time interpreting what they sense.

A scientist measures it.

That’s not what you said. But point taken.

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).