Are Restricted/Locked Hashes A Failed Experiment?

By demerphq on January 29, 2017 6:56 AM

Some time back we added support to Perl for locked or restricted hashes. (See Hash::Util and fields.pm). The basic idea is that you can set up a hash, and then "lock" it, at which point access to unregistered keys in the hash, either write OR read, will cause an exception.

The basic idea was to work around Perl's lack of a true "struct"/"object" where it would be conventional to have compile time exceptions when accessing a non-existent member, or when accessing a late bound object in many languages which should produce a run time exception. Unfortunately restricted hashes do not support compile time exceptions, so we only get run time exceptions.

All of this comes at a cost, a cost that is imposed on every hash function usage, either by your code, or by Perl itself. So for instance every time perl looks up a variable from the stash it has to execute code that is *never* used internally. Similarly, whenever you populate a hash to build a report, or whatever you are paying a cost for restricted hashes even if you aren't using them.

IMO even worse is that locked hashes are basically very unperlish. In my experience they are there to provide a safety blanket to people from other languages who feel unsafe without these kind of protections. In my work code the use of locked hashes is basically zero. Almost every time someone uses locked hashes it is not for their intended purpose of making OO "easier", but rather to make a read-only hash. However even that use case is not well served by locked hashes, as locked hashes *die* when you try to access a non-existent key, which goes against the general expectations of a read-only hash.

To put this context, some of the perl core developers have started work on making Perl's hash function pluggable and vtable driven. This would mean that we can use different hash implementations in different places, and people could produce efficient C level code that made Perl use, for instance, a B+tree for storing its hashes. Or use robin-hood hashing, or any other data structure suitable for implementing associative arrays.

However it also means we have to put a lot of thought into what kind of API a hash exposes. If the API has to support locked hashes, then any alternative implementation has to support locked hashes, and by extension you and I and everyone using Perl will have to pay the price for those locked hashes. On the other hand, if we do not have to provide an API for locking hashes we can instead provide a bespoke lockable hash implementation via a CPAN module that supports sane locking primitives (and who knows, maybe even compile time exceptions).

Another context to consider is that Perl hashes are really a work around for the fact that Perl has not real "object" type. So locked hashes impose a non-trivial wide scope cost to provide a work around. So what happens if the core hackers get their shit together and add a true object type and grammar to perl? Will we still have to pay the price of locked hashes?

So, I am arguing that we should deprecate and remove locked hashes from core, and reimplement fields.pm using a custom hash table implementation built on top of the pluggable hash API that we have started investigating.

What do you think? Do you use Hash::Util::lock_hash() or similar methods? What is your impression and experience of locked hashes and fields.pm? Useful? Useless? Good idea but flawed implementation? Something else?

Cheers,
Yves

14 comments

Tagged as:

internals, locked hash, oo, p5p, perlcore, restricted hash

14 Comments

Max Maischein | January 29, 2017 8:47 AM | Reply

I for one really like to use locked hashes for the results of DBI queries. They are a very easy maintenance-free way of preventing access to fields that were not returned from the query. Any other way would have me maintain the information in the SQL and additionally in Perl by adding accessors or an AUTOLOAD method where the resulting hashes are not really objects.

I would be sad to see locked hashes go.

Andrew Rodland | January 29, 2017 1:09 PM | Reply

They interest me very little, but I'm not sure how much of a failure they are. fields.pm does actually support some degree of compile-time checking if you do "my Type $var", but of course no one actually does.

What I would like to see, actually, is closer to pseudo-hashes (which were removed for being terrible) than restricted hashes. If using fields meant smaller objects, faster access, *and* compile-time safety (because restricted-field objects had a different implementation with compact storage, and the compiler was able to substitute in direct field access when the type and the key were both known, only falling back to a perfect hash or trie or something when forced to act like a hash at runtime), then that would be interesting enough that people might actually adopt it. But the thing we've got, where you're paying a programmer cost, and you're paying a runtime cost, and what it buys you isn't even all that much, well, I guess the lack of adoption speaks for itself.

Andreas Koenig | January 29, 2017 3:03 PM | Reply

I use lock_keys frequently for locking the hash I use for Getop::Long.
I haven't seen a better method to prevent code dealing with
unsupported options. A very effective typo prevention.

kid51 | January 29, 2017 7:43 PM | Reply

Andreas, could you provide an example of that usage of lock_keys() with Getopt::Long? Since that's a library I've often used, I'd like to see how you use it and whether there are alternatives.

Thank you very much.

Aristotle | January 29, 2017 10:28 PM | Reply

I would imagine locked hashes would be one specific hash implementation, and Hash::Util::lock_keys would continue to work by switching the hash from its current hash implementation to a locked hash. It would be a costly operation, but nobody who doesn’t use the functionality would ever pay anything for it, and the hash API would not have to include the concept of a locked hash nor would other hash implementations have to implement it, and yet it could still stay in core with no deprecation or disruption. Would that not work?

If it would, then there’s no reason to break existing code. We get all the benefits with, AFAICT, no downsides.

demerphq replied to comment from Aristotle | January 30, 2017 6:00 AM | Reply

Yes indeed. That is my line of thinking too.

So i guess a refined version of my question is: should ALL hashes support locking, or would it be sufficient to have a bespoke lockable hash to use with fields.pm and Hash::Util.

BTW, making Hash::Util::lock_keys, etc, switch hash type is a great suggestion. I didn't think of that. I was thinking something much clunkier, whereas your suggestion is quite clean.

Aristotle replied to comment from demerphq | January 30, 2017 8:33 AM | Reply

should ALL hashes support locking, or would it be sufficient to have a bespoke lockable hash to use with fields.pm and Hash::Util?

Given I proposed the bespoke locked hash implementation, it’s clear what I’d answer, but I’m not up on the internals enough to really answer categorically. Do you have specific doubts about it?

moritz | January 30, 2017 9:20 AM | Reply

We use them a bit at work. They helped me catch some typos while refactoring very messy code.

That said, I'd be fine with a replacement module that works similarly to Python's NamedTuple (that is, you specify the keys at hash creation time, instead of locking/unlocking keys at any point in time), or a proper object storage (like P6Opaque in Perl 6).

demerphq replied to comment from Aristotle | January 30, 2017 9:38 AM | Reply

I'm just restating the question to see if anyone thinks it should be a standard part of the hash api.

Jon Jensen | January 30, 2017 4:56 PM | Reply

I've used restricted hashes in a few ecommerce situations where they were indispensable, exactly as important and useful as `use strict` for making sure lexical accesses aren't typos.

I don't have any opinion about whether restricted hashes should be part of all hash types -- I think I'd be fine using a special hashlike object since I haven't used them a lot. But in the cases I've needed them, they were really, really helpful.

Andreas Koenig | January 30, 2017 5:27 PM | Reply

kid51: short example

% perl -le '
use Getopt::Long;
use Hash::Util qw(lock_keys);
lock_keys %Opt, qw(mrg);
GetOptions(\%Opt, "mrg=i") or die;
if ($Opt{mgr}){
print "INT=$Opt{mrg}";
}
'
Attempt to access disallowed key 'mgr' in a restricted hash at -e line 6.

An example that combines with Pod::Usage:
http://repo.or.cz/cpan-testers-parsereport.git/blob/HEAD:/bin/ctgetreports

Mikko Koivunalho | February 1, 2017 8:06 AM | Reply

For me lock_keys is a typo checker, also but not limited for production code, also but not limited for blessed hashed.

Toby Inkster | February 1, 2017 10:13 AM | Reply

Locked hashes are great, but if the implementation is slowing down all hashes, then I agree they should no longer be built in, and should be moved to a module

That said, I think it's an important enough feature that this module should be bundled with Perl, so that people relying on fields.pm (which was first released with Perl 5.5) and Hash::Util (Perl 5.8) won't need to install anything extra.

Alceu Rodrigues de Freitas Junior | February 1, 2017 7:41 PM | Reply

I have used Hash::Utils more than once, specially in OOP projects that I was using "default" OOP model of Perl (not Moose or anything like that).
I found out it very useful, specially to avoid typos when referring to attributes (access or creating new instances), those bugs can be annoying to find out and fix.
On the other hand, I usually don't start using it from scratch, usually when things starts to getting more complex and bugs starts to appear it is a sign that I should consider using it.
It is good to be have options... sometimes you need flexibility (like expanding an object "on the fly"), sometimes you need some restrictions.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About demerphq

Perl core hacker, former Principal Dev and Fellow at Booking.com. You can blame me for hash randomization, and for much of the new regex syntax since 5.10.

More info »

demerphq