Validating Nested JSON Hashes with Moose

At SocialFlow, we work with Twitter's streaming APIs, which are a lot of fun. There's a ton of information that comes down the pipe, and it's nonstop.

Structurally, it's nested JSON objects. This is what a favorite looks like: ( summarized for readability, the actual object is > 1kb )

{                              
    message => {               
        event => 'favorite',   
            target_object => { 
                text => '..',  
                user => {      
                    ...        
                }

The target_object points to a full tweet, and the user inside the tweet is a full user.

Looking at this layout, I immediately realized that before pulling metadata out of the stream, I'd need to validate that I'm getting all the stuff I want. Moose is great for this, you can specify all the attributes you care about, make them required ( if Twitter says they'll always be present ), then just pass them into the constructor and pull the data out via accessors:

package SocialFlow::Reporting::Daemon::Twitter::StreamMessage;

use Moose;
use strict;
use warnings;


has for_user => (   
    is => 'ro',     
    required => 1,  
);                  

has message => (    
    is => 'ro',     
    required => 1,
);

......

my $obj = $json->decode( $str );
my $msg = StreamMessage->new($obj);
my $for_user = $msg->for_user;

It's all guaranteed to be there. I don't have to test if foruser is undefined. I could even add Isa => Int for foruser and isa => Str for message.

Even cooler, using Type::Tiny I can make coercions to take my nested data and create objects with it in one shot:

use Type::Tiny;
use Types::Standard qw[ Object HashRef ];

 has message => (                                                               
is => 'ro',                                                                    
required => 1,                                                                 
isa => Object->plus_coersions(                                                 
    HashRef, sub {                                                             
            #Do some heuristics to figure out what kind of message it is, then:
                return Favorite->new( $_ );                                    
        }                                                                      
    )                                                                          
)

In ::Favorite, something like:

#favorited tweet                                                              
has target_object => (                                                        
    is       => 'ro',                                                         
    required => 1,                                                            
    isa      => Object->plus_coercions(                                       
        HashRef,                                                              
        sub {                                                                 
            SocialFlow::Reporting::Daemon::Twitter::StreamMessage::Tweet->new(
                $_)                                                           
        },                                                                    
    ),                                                                        
    coerce => 1,                                                              
);

and in ::Tweet, something like:

#user who tweeted                                                            
has user => (                                                                
    is       => 'ro',                                                        
    required => 1,                                                           
    isa      => Object->plus_coercions(                                      
        HashRef,                                                             
        sub {                                                                
            SocialFlow::Reporting::Daemon::Twitter::StreamMessage::User->new(
                $_)                                                          
        },                                                                   
    ),                                                                       
    coerce => 1,                                                             
);

After all the above is done, you can do:

my $obj = $json->decode( $str );                                                 
my $msg = StreamMessage->new($obj);                                              
my $FAVORITE = "SocialFlow::Reporting::Daemon::Twitter::StreamMessage::Favorite";
if( $msg->message->isa($FAVORITE) ) {                                            
    my $tweet = $msg->message->target_object;                                    
    print 'New favorite for @'.$tweet->user->screen_name."\n";                   
    print 'total favorites for tweet: "'.$tweet->text . '" ';                    
    print $tweet->favorite_count."\n";                                           
}

The validations are in place so consistency is guaranteed ( assuming there are no bugs in the above semi-pseudocode ), and you've also tricked yourself into writing some nice docs about the structure of the data.

Object->plus_coercions(HashRef,sub{...

^ This means this attribute Isa Object, and if it's not, but happens to be a HashRef, coerce it into an Object via the subroutine.

Some may think this kind of coersion and validation is overkill, but in order to achieve the same level of validation you'd need some seriously gnarly ifs everywhere. Or you could shut your eyes and just assume everything will always behave as you hope, but that's dumb. The reason API programmers love JSON is because it's flexible and very easy to change. This flexibility drops the responsibility of validation on the end programmer; so you need to write software that will tell you (preferably early and loudly) when the data is malformed.

1 Comment

Sometimes I use HTML::Formhandler to wrap this type of validation / inflation, that way I can get a results object (with accumulated error messages) instead of throwing an exception. The ability to get all the validation errors is pretty useful.

Leave a comment

About Samuel Kaufman

user-pic CTO / Codemonkey, Socialflow.com