A (not so) simple matter of privacy

You may have seen Ovid's recent post on his discussions with the Perl Steering Committee about moving forward with implementing an initial subset of the Corinna proposal in the Perl core.

One of the issues that came up during those discussions was the best way to provide private methods in Corinna. The current Corinna proposal is that this would be done (like almost everything else in Corinna) via an attribute:

method do_internal :private () {...}

Thereafter, the do_internal() method can only be called from within the current class, and is never overridden by derived-class methods when it is called within its original class.

In other words, the :private method effectively prepends the following code to the start of the method:

croak "Can't call method 'do_internal'"
    if caller ne __CLASS__;

...and, in addition, causes the compiler to treat any call to that particular method from within the current class as a fully qualified method call. That is, within any class C with a private do_internal() method, any call to $self->do_internal() is treated as a call to $self->C::do_internal().

All of which means that there is no way to call a private method from anywhere except within the same class. Which, of course, is the whole point of having private methods in the first place.

But the need to automagically convert any call to a private method into a fully qualified method call is (to put it mildly) a complication for the compiler. So the members of the Perl Steering Committee Cor design team suggested that rather than having private methods, Perl should have lexical methods instead. Specifically, they suggested that instead of:

method do_internal :private () {...}

method do_external () {
    $self->do_internal();   # Call private method
    ...
}

...Perl would provide anonymous methods, which you could place in lexical variables and then call using the existing call-via-reference method call syntax:

my $do_internal = method () {...};

method do_external () {
    $self->$do_internal();   # Call lexical method
    ...
}

That neatly avoids of the challenge of rewriting private methods to check their caller, or the much greater challenge of rewriting private method calls to be fully qualified. Instead, it cleverly enforces the “can be called only from the current class” requirement by making it impossible to refer to the method at all, except in the lexical scope of the $do_internal variable.

You could even consider the slightly uglier $self->$do_internal() call syntax as a win: because it means that private method calls are explicitly marked as such.

The only downsides are:

  • The call-via-reference syntax is more obscure and perhaps a little more challenging for less-experienced Perl developers. That might discourage them from using this approach, even for methods that really should be private, thereby penalizing a good OO practice.

  • Using anonymous methods to implement private methods is structural rather that declarative. That is: we can’t just say what we want to have happen, we have to say how to actually implement it. In practical terms, this means that private method definitions are no longer explicitly marked as being private, and hence are far less searchable.

And that second issue is the real problem here. Because they’re structural, not declarative, lexical methods specified like this are also much easier to “enbug”. For example, if the developer mistakenly wrote something like this:

method other () {
    ...
    $do_internal++;
    ...
}

Then any call to the $do_external() method will be fine so long as the other() method is never called. But after other() has been called, the next call to $do_external() will throw a weird exception:

   Can't locate object method "140362266577153" via class "C"

Of course, we can get around that potential bug by making $do_internal immutable:

use Const::Fast;
const my $do_internal => method () {...};

but that’s now an extra module load, and even more infrastructure to get right.

(BTW, wouldn’t it be cool if the existing :const subroutine attribute could also be applied to scalar variables, in which case we’d just need:

my $do_internal :const = method () {...};

Maybe one day!)

Yet another issue with using anonymous methods as private methods is that they make it easy to subvert the intended encapsulation:

our $do_internal = method () {...};

That’s no longer a private method, because you can now call it from literally anywhere in your code:

$obj->$C::do_internal();

And, even if it were still a my variable, there’s nothing to prevent a class from explicitly exporting it:

method circumvent_privacy () { return $do_internal }

As an OO purist, I find that possibility worrying. And as someone who has reviewed a vast amount of real-world Perl OO code over the past few decades, I think that kind of expedient kludge is...inevitable.

So what do we do?

The good folks of the Perl Steering Committee Cor team suggested an approach that does address some of these issues. They proposed that the syntax for private methods be:

method $do_internal () {...}

method do_external () {
    $self->$do_internal();   # Call private method
    ...
}

That is: when you want to declare a private method, you put a $ at the start of its name. This form of the method keyword would then create a (presumably immutable!) lexical $do_internal variable, initialized with a reference to an anonymous method. You could then use the reference in that variable to call the private method.

While it doesn’t solve the problem of evil developers explicitly exporting that variable, it does make the syntax declarative again, and it could solve the issue of accidentally changing the variable (assuming the variable were indeed to be created as immutable).

The only problem is that now we have the method keyword not just declaring methods,
but also declaring variables.

Which, from the perspective of a language designer, is...less than ideal.
And, from the perspective of someone who actually still teaches Perl,
is much less than ideal.

So, how do we solve that?

Well, Raku solves it by marking private methods not with a $, but with a !
and then calling them with the ! as well (rather than Raku’s usual . method call operator):

# In Raku...
method !do_internal () {...}  # Private method

method do_external () {
    self!do_internal();       # Private method call
    ...
}

That’s a neat solution, but it probably wouldn’t work for Perl.

Nevertheless, the ideal of prefix-marking private methods and private method calls in some way might work. For example, we could consider formalizing the “private methods start with an underscore” convention inside the new class blocks:

method _do_internal () {...}

method do_external () {
    $self->_do_internal();
    ...
}

In other words, we could specify that any method definition that starts with an underscore doesn’t get added to the method symbol table. Instead, it implicitly creates an immutable and invisible lexical scalar within the class. Then, within any class block, any call to a method whose name starts with an underscore actually calls the method through the associated invisible lexical scalar instead.

Or, to put it more simply: it’s precisely the Steering Committee Cor team’s proposal, but: s/\$/_/g
And with the added advantage that the lexical is not otherwise accessible within its scope,
and therefore not accidentally modifiable, or maliciously exportable.

That’s by far the safest alternative I can think of, but I concede that it might still be too magical.

In which case we’d instead need to address just the immediate problem: that we’re contemplating using the method declarator to declare something that isn’t actually a method. Specifically, we’re using method to declare variables.

And the easiest solution to that problem is simply to define another declarator:

private $do_internal () {...}   # Declare private method

method do_external () {
    $self->$do_internal();      # Call private method
    ...
}

Now method once again only declares named methods, and we can easily explain that private declares a lexically scoped variable whose immutable value is a reference to the anonymous method.

We get very clear declarative labelling of private methods and equally clear labelling of calls to those private methods.

We don’t solve the “naughty folk can export that variable to circumvent privacy” problem, but I suppose we could always just spin that as a feature, rather than a bug. /s

For the moment, however, unless the Steering Committee Cor team is happy with either the “underscore enforces private” approach or the addition of a private keyword, I expect what we’ll actually have to do is to remove private methods entirely from the first stage of the implementation, and simply keep on pondering how to get them exactly right.

In the meantime, don’t let anyone ever tell you that language (re-)design is easy. :-)

26 Comments

I think I may have misdescribed things to you (I was very tired when I was writing that email). The PSC suggested rolling out Corinna in stages. It's the Cor team which finally came up with method $foo () {...} syntax.

I had actually objected to this:


    croak "Can't call method 'do_internal'"
        if caller ne __CLASS__;

That's because in my calling code, I could do this:


{
    package Class::Im::Using;
    $object->private_method;
}

Here, the class author can't control the encapsulation violation.

Whereas, with the method $foo () syntax being rewritten internally as my $foo = method () {...}, we have encapsulation. You point out that the author can do this:


    method $do_internal () {...}
    method circumvent_privacy () { return $do_internal }

But here, the author chooses to violate encapsulation, not the consumer. And we have this same issue with lexical fields:


    field $seKret { 18475329104 };
    method circumvent_privacy () { return \$seKret }

But again, it's the author who makes this choice. This isn't much different from Java developer choosing to make everything public.

I rather do like your suggestion of using a leading underscore for private, because that's actually very natural for Perl. And no matter how we set this up, we can't prevent the producer from violating encapsulation, but they could simply not use private methods and provide readers and writers for all fields. I can't think of any OO system which prevents authors from being naughty. But if it prevents consumers from being naughty, that's OK.

But if we consider a leading underscore, that needs to go in immediately because changing that behavior later will be a *very* breaking change. I think (don't quote me on this) we were currently considering the leading underscore for "trusted" (protected) methods. We will need those, so anything we do for private should probably consider the needs of trusted (not visible outside the class, except to subclasses).

I have probably lost/misunderstood a good deal of the explanation, but I'm left wondering why we can't keep the interface and the implementation separate.

I mean, why can't we have this (which IMHO sticks to the principle of least surprise):

method do_internal :private (...) {...}

and then have it implemented with whatever mechanism that is deemed better, e.g. lexical immutable thingies in a glass safebox under the sea.

I can only think of a trade-off in efficiency (both lowercase perl programmer-wise and lowercase perl-wise), because that would probably require revising how method resolution is performed today and have this resolution go through a private list first, then a public one, in lack of a way to distinguish a private and public method at the time of calling (which seems something that programmers would be happy to hand over to the compiler/interpreter, I guess).

I previously proposed allowing the syntax $object->&method_name to call a lexical sub (my sub method_name). This fits more with traditional perl OO, but it should be possible to extend to Cor methods.

https://github.com/Perl/perl5/issues/18621

Couldn't lexical subs be extended to just work as private methods too:

class A {
  has $x;

my sub _print_x() { # private method
say "x value for $self is $x";
}

method print_x() {
_print_x(); # calling it, no dispatching required!
}
}

Alternatively, using

my method {...}.

The issue here is how you call a private method on a different object of the same class, but well, the same problem exists for accessing slots.

But precisely, my point is that you don't need to specify explicitly the object when calling a private method as it must be the same one that was used to invoke the current method from where the private one is being called, right?

Actually, the same logic can be applied to public methods too as in the following code:

class A {
  method foo() { ... }

method bar() :private { ... } # or whatever syntax you want to
# use to signal bar is private

method doz() {
foo(); # method foo can be invoked as a sub,
# $self can be inferred from the context

bar();
}
}

my $a = A->new();
$a->doz(); # ok
$a->foo(); # ok
$a->bar(); # fails!

The point remaining is how you call a private method on an object of the same class... Maybe using "&" to disable the special method calling.

class B {
  method bar() :private { ... } # or whatever syntax you want to
                                # use to signal bar is private

method doz($other) {
bar();
&bar($other)
}
}


If methods can be called like subs, then every sub call now also has to check whether there's a suitable method to call instead.

IIRC, perl does subroutine look up at compile time*. For instance, that's how arguments for subroutines with prototypes are parsed in accordance with the prototype.

So, extending it to also look for methods declared on the current class scope or up in its parents, doesn't seem too complex, and the runtime penalty is zero.

Regarding whether it is better to require the explicit $self->method() syntax. Well, I can see both pros and cons. As you say, it allows one to distinguish method calls from sub calls but on the other hand, not requiring it would result in more concise code.

Also, the same argument could then be applied to slots and so, being able to access object slots as regular variables shouldn't be allowed. Actually, my experience programming in other languages where slots and methods can be referenced without stating the object (Java, C++) is that the troublesome ones are usually not the methods but the slots.

* well, it tries to do the look up at compile time and if that fails, in some cases, it tries again at runtime.

No, subs are always looked up at runtime

Well, I think this is just a semantic issue, but by lookup I was referring to the operation of traversing one or several tables containing mappings between names and the subroutines (currently, the package stash or the lexicals table) which is an expensive operation.

Perl does not store CV pointers directly on the OP tree, but the globs (GVs), so that operations like the one you describe above are possible, but accessing the CV slot of a GV is very, very cheap.

Regarding the second point about the importance of calling methods as $self->method, I still fail to see why it is important to differentiate sub/method calls **inside** a class, but it is ok to use the same syntax for lexicals/slots which are much more prone to collide.

Because sub calls and method calls are different constructs with different interfaces and behaviours

I think that is exactly the point where we disagree. I see them as different constructs but their interface and behaviors are so similar, that the same syntax can be used for both.

Also, there are several programming languages that use the same syntax for both, C++, C#, Java if you take its static methods as subs, Common Lisp/CLOS uses the same syntax for everything (mehtods, functions, generic functions, macros, accessors, etc.). Admittedly all this programming languages have their issues... but I never heard anybody say the overloaded subroutine call syntax was one of them.

I have to admit that for me, they real issue I have with supporting $self->private_method is that I see lots of issues at the implementation level.

For instance, what happens when you have two classes A and B so that B derives from A and both are implementing a private method with the same name. Just try to image in the following code how perl could resolve the method "foo" in every case:

class A {
  sub foo() :private { ... }

sub bar() {
my $b = B->new();
my $c = C->new();
return $self->foo + $b->foo + $c->foo;
}
}

class B extends A {
sub foo() :private { ... }

sub doz() {
print "I use ".foo()." for something completely different!"
}
}


In any case, I think that going for $self->&private_method solves all of them too!

Private methods by their very nature don’t participate in inheritance, so you should get:
    return $self->foo  # A::foo() called
            + $b->foo  # Can't call private B::foo() outside class B
            + $c->foo; # C::foo() if public, otherwise exception

Oh, no, it is not that simple! The right think to do there is to call A::foo on $b as $b is of class A too. Otherwise you are requiring the B programmer to know about the private methods up the hierarchy class in order to avoid reusing their names.

In order to support that, it is not enough to have a-per class table mapping method names to CVs. Now, the interpreter would have to check at runtime for every call to a method having the same name as a private method if the object is of the class being declared and if so, redirect the call to the private method.

It can even be worse: Perl supports calling methods using the $self->$method syntax and that would have to play nicely with private methods too. So, for every $self->$method call, the interpreter would have to check whether a private method with the name in $method exists in order to call it or fallback to the public method.

So now every method call is more expensive and more confusing. It’s a method call, but it doesn’t always dispatch polymorphically to the bottom of the inheritance hierarchy to call the most derived method; sometimes it dispatches statically into the middle of the inheritance hierarchy to a more-ancestral method.

Yes, exactly, that's the only possible "right" behavior for private methods if for private you want to mean that they are completely invisible from outside the class where they are defined.

No, we are requiring all programmers not to attempt to call private methods on objects from outside those object’s own classes

Yes, but the issue there is that in general a programmer implementing a class B derived from A, doesn't necessarily have to know which private methods are defined in A (or in its parents). So, he could unintendedly add to B a method with the same name as some private method of A. Or a private method with a colliding name could be added in a later version of A.

IMO that is not a critical issue. Actually, the current practice of using a double underscore prefix to signal private methods is affected by it, people writes effective code in Perl and the collisions are very rare (but also, very unexpected). It is just that if you are designing a new OO extension to the language, I think you should try to account for it at least and try to avoid it.

That anything one might want to do via a private method would be much better done via a lexical subroutine.

And that was what I was trying to say in my initial reply. But then, the issue is that lexical subs don't have access to the object state. They only can access the object through its public interface. That's a severe limitation and, probably, most programmers would keep using the double underscore prefix convention for private methods.

... unless you give lexical subs the ability to access the object state directly. At the end, it is just a matter of scope.

For instance...

class A {
  slot $x;

method inc() {
$x++;
return sub {
--$x;
}
}
}

I hope I am not the only one that would expect that closure over the slot $x to work, so, why not this one below?

class A {
  slot $x;

my sub _dec {
--$x;
}

method dec() {
_dec();
}

}

After all, $x is in scope when _dec is declared.

I imagine that extending that per-object look-up to lexical subroutines as well would be quite a bit trickier. The implementors might not be willing — or even able — to do so.

Yes, and there are other issues. AFAIK, supporting class methods is planned (tagging then with :common) and lexical subs should be callable from those, and that creates several edge cases that could be quite difficult to support.

class A {
  field $a :common;
  field $b;
  my sub priv {
    eval '$a + 1'; # which fields are visible from this eval?
                   # depends on whether priv was called from
                   # foo or from bar!
  }
  method foo :common {
     priv();
  }
  method bar {
     priv();
  }
}

So, I think that declaring explicitly those subs as methods would simplify everything both at the conceptual and implementation levels:

my sub { ... } # no field access allowed
my method { ... } # access to all fields allowed
my method : common { ... } # access to class fields allowed

So, the issue remaining is how to call those subs/methods...

And our ongoing disagreement on the correct semantics for them leads me to conclude that the only reasonable approach left is:
...
$self->&increment_counter();  # Lexical method call syntax
...

Let me start by saying that I would be happy with that solution as I think it provides the correct semantics for private methods.

It is just that I think it is quite ugly and doesn't really blends well in the language. At least for me, "->" means method dispatching and "&" means subroutine and more specifically to strip a sub of any special behavior. Private (lexical) methods are not dispatched, and we are calling a method, not a sub... I think the only positive point for that syntax is that it is available!

After all, it would just be a fancy way to write "increment_counter($self)". So, why do we need it?

And at that point, if it was ok for you to have lexical subroutines with access to the object fields as in...

class A {
  field $x;
  my sub _inc { $x += 1 }
  method inc { _inc }
}

Is not ok to just do the same with lexical methods?

class A {
  field $x;
  my method _inc { $x += 1 }
  method inc { _inc }
}

So that the keyword "method" means it has access to the object/class fields and "my"/nothing controls whether the sub/method goes into the stash becoming callable through "->".

And the only special behavior of private methods is having the $self/$class object being passed implicitly at the call point (which could be disabled using & as in "&_inc($other)").

# which fields are visible from this eval?

Isn’t the answer simply “none”? It seems a trick question. Fields aren’t visible inside a sub, whether it’s lexical or not, no?

# which fields are visible from this eval?

Isn’t the answer simply “none”?

Yes, that was the conclusion: it is not a good idea to let lexical subroutines, declared at the class top level, access the object fields.

It seems a trick question. Fields aren’t visible inside a sub, whether it’s lexical or not, no?

Well, IMO, they should be visible if the sub is declared inside a method. For instance, I would expect the following code to work:

use Scalar::Util qw(first);

class A {
field $x;
method foo(@args) {
return first { $_ > $x } @args;
}
}

I would expect the following code to work:

Well, yes, obviously, a closure is a different kind of situation. The question would be what happens in a named sub declared inside a method – where I would expect the exact same behaviour that the “variable will not stay shared” warning cautions against currently. And I would expect the use of a lexical sub to cure the problem, the same way it currently does for regular subs.

method $do_internal () {
  ...;
}

has been supported by Zydeco for over a year. documentation.

Leave a comment

About Damian Conway

user-pic