Are Restricted/Locked Hashes A Failed Experiment?

Some time back we added support to Perl for locked or restricted hashes. (See Hash::Util and fields.pm). The basic idea is that you can set up a hash, and then "lock" it, at which point access to unregistered keys in the hash, either write OR read, will cause an exception.

The basic idea was to work around Perl's lack of a true "struct"/"object" where it would be conventional to have compile time exceptions when accessing a non-existent member, or when accessing a late bound object in many languages which should produce a run time exception. Unfortunately restricted hashes do not support compile time exceptions, so we only get run time exceptions.

All of this comes at a cost, a cost that is imposed on every hash function usage, either by your code, or by Perl itself. So for instance every time perl looks up a variable from the stash it has to execute code that is *never* used internally. Similarly, whenever you populate a hash to build a report, or whatever you are paying a cost for restricted hashes even if you aren't using them.

IMO even worse is that locked hashes are basically very unperlish. In my experience they are there to provide a safety blanket to people from other languages who feel unsafe without these kind of protections. In my work code the use of locked hashes is basically zero. Almost every time someone uses locked hashes it is not for their intended purpose of making OO "easier", but rather to make a read-only hash. However even that use case is not well served by locked hashes, as locked hashes *die* when you try to access a non-existent key, which goes against the general expectations of a read-only hash.

To put this context, some of the perl core developers have started work on making Perl's hash function pluggable and vtable driven. This would mean that we can use different hash implementations in different places, and people could produce efficient C level code that made Perl use, for instance, a B+tree for storing its hashes. Or use robin-hood hashing, or any other data structure suitable for implementing associative arrays.

However it also means we have to put a lot of thought into what kind of API a hash exposes. If the API has to support locked hashes, then any alternative implementation has to support locked hashes, and by extension you and I and everyone using Perl will have to pay the price for those locked hashes. On the other hand, if we do not have to provide an API for locking hashes we can instead provide a bespoke lockable hash implementation via a CPAN module that supports sane locking primitives (and who knows, maybe even compile time exceptions).

Another context to consider is that Perl hashes are really a work around for the fact that Perl has not real "object" type. So locked hashes impose a non-trivial wide scope cost to provide a work around. So what happens if the core hackers get their shit together and add a true object type and grammar to perl? Will we still have to pay the price of locked hashes?

So, I am arguing that we should deprecate and remove locked hashes from core, and reimplement fields.pm using a custom hash table implementation built on top of the pluggable hash API that we have started investigating.

What do you think? Do you use Hash::Util::lock_hash() or similar methods? What is your impression and experience of locked hashes and fields.pm? Useful? Useless? Good idea but flawed implementation? Something else?

Cheers,
Yves

14 Comments

I for one really like to use locked hashes for the results of DBI queries. They are a very easy maintenance-free way of preventing access to fields that were not returned from the query. Any other way would have me maintain the information in the SQL and additionally in Perl by adding accessors or an AUTOLOAD method where the resulting hashes are not really objects.

I would be sad to see locked hashes go.

They interest me very little, but I'm not sure how much of a failure they are. fields.pm does actually support some degree of compile-time checking if you do "my Type $var", but of course no one actually does.

What I would like to see, actually, is closer to pseudo-hashes (which were removed for being terrible) than restricted hashes. If using fields meant smaller objects, faster access, *and* compile-time safety (because restricted-field objects had a different implementation with compact storage, and the compiler was able to substitute in direct field access when the type and the key were both known, only falling back to a perfect hash or trie or something when forced to act like a hash at runtime), then that would be interesting enough that people might actually adopt it. But the thing we've got, where you're paying a programmer cost, and you're paying a runtime cost, and what it buys you isn't even all that much, well, I guess the lack of adoption speaks for itself.

I use lock_keys frequently for locking the hash I use for Getop::Long.
I haven't seen a better method to prevent code dealing with
unsupported options. A very effective typo prevention.

Andreas, could you provide an example of that usage of lock_keys() with Getopt::Long? Since that's a library I've often used, I'd like to see how you use it and whether there are alternatives.


Thank you very much.

I would imagine locked hashes would be one specific hash implementation, and Hash::Util::lock_keys would continue to work by switching the hash from its current hash implementation to a locked hash. It would be a costly operation, but nobody who doesn’t use the functionality would ever pay anything for it, and the hash API would not have to include the concept of a locked hash nor would other hash implementations have to implement it, and yet it could still stay in core with no deprecation or disruption. Would that not work?

If it would, then there’s no reason to break existing code. We get all the benefits with, AFAICT, no downsides.

should ALL hashes support locking, or would it be sufficient to have a bespoke lockable hash to use with fields.pm and Hash::Util?

Given I proposed the bespoke locked hash implementation, it’s clear what I’d answer, but I’m not up on the internals enough to really answer categorically. Do you have specific doubts about it?

We use them a bit at work. They helped me catch some typos while refactoring very messy code.

That said, I'd be fine with a replacement module that works similarly to Python's NamedTuple (that is, you specify the keys at hash creation time, instead of locking/unlocking keys at any point in time), or a proper object storage (like P6Opaque in Perl 6).

I've used restricted hashes in a few ecommerce situations where they were indispensable, exactly as important and useful as `use strict` for making sure lexical accesses aren't typos.

I don't have any opinion about whether restricted hashes should be part of all hash types -- I think I'd be fine using a special hashlike object since I haven't used them a lot. But in the cases I've needed them, they were really, really helpful.

kid51: short example

% perl -le '
use Getopt::Long;
use Hash::Util qw(lock_keys);
lock_keys %Opt, qw(mrg);
GetOptions(\%Opt, "mrg=i") or die;
if ($Opt{mgr}){
print "INT=$Opt{mrg}";
}
'
Attempt to access disallowed key 'mgr' in a restricted hash at -e line 6.

An example that combines with Pod::Usage:
http://repo.or.cz/cpan-testers-parsereport.git/blob/HEAD:/bin/ctgetreports

For me lock_keys is a typo checker, also but not limited for production code, also but not limited for blessed hashed.

Locked hashes are great, but if the implementation is slowing down all hashes, then I agree they should no longer be built in, and should be moved to a module

That said, I think it's an important enough feature that this module should be bundled with Perl, so that people relying on fields.pm (which was first released with Perl 5.5) and Hash::Util (Perl 5.8) won't need to install anything extra.

I have used Hash::Utils more than once, specially in OOP projects that I was using "default" OOP model of Perl (not Moose or anything like that).
I found out it very useful, specially to avoid typos when referring to attributes (access or creating new instances), those bugs can be annoying to find out and fix.
On the other hand, I usually don't start using it from scratch, usually when things starts to getting more complex and bugs starts to appear it is a sign that I should consider using it.
It is good to be have options... sometimes you need flexibility (like expanding an object "on the fly"), sometimes you need some restrictions.

Leave a comment

About demerphq

user-pic Perl core hacker, Principal Dev and Fellow at Booking.com. You can blame me for hash randomization, and for much of the new regex syntax since 5.10.