Tuesday, July 28, 2009

Plumbing 101 - Fixing my leaky Intertubes

So this will be a short one, but I promised Yuval I would write it up.

My recent big refactoring project at work has been largely about switching a mad-cap collection of YAML config files, database tables, poorly inferred relationships and other sorts of random insanity that accumulates over 5 years of maintaining and extending an application, and inverting this unmaintainable mess into a nice clean happy KiokuDB based set of objects.  

So all is going well until the other day when I upped the number of FCGI backends and the concurrency exposed  a leak in my objects which was not visible when using only the 1 FCGI backend for development. So the first thing I did was to turn to my unit tests and see if I could detect the leak outside of my web-app using Test::Memory::Cycle. Of course I didn't find one there, that would have been too easy, it was buried deeper inside the web-app. So I then enlisted Yuval for help since his knowledge of the Perl guts far exceeds what mine will ever be. 

Unfortunately for me, this application pre-dates Catalyst so I could not use Catalyst::Model::KiokuDB, but I was able to cargo-cult the core of that module and stuff it into the homegrown web framework that this did run on. So while this didn't solve the problem it did allow me to watch the problem happen and gather nice statistics.  I can't stress how important it is to get the your monitoring tools set up first so your not digging through false positives and/or useless information. Yuval and I spent a fair amount of time tweaking this until we finally found the right settings that gave us the perfect balance of information. 

After this it was a lot of playing around with Devel::FindRef and Scalar::Util::weaken until I found the right settings and all my leaks were gone. One particularly evil leak was closures passed into Template::Toolkit params that referenced themselves inside the closure. This resulted in the entire template object leaking, including all associated parameters (in one case this meant hundred of objects). I realized as I was fixing this, that this leak (and a few others) had existed for probably the entire 5 years this application has been running. Only now that it was leaking KiokuDB objects and causing visible issues did I actually notice. 

So while I don't feel that I am now some kind of Master Plumber or anything, I do feel confident enough to fix a leaky intertube here or there. And as always I am amazed by the flexibility of Perl and the wonders that CPAN provides.  It is an odd combination of pride and shame to have finally cleaned up 5 year old leaks. 

Anyway, back to work ...

Thursday, July 16, 2009

More Thoughts on Parameterized Roles

So my last post on parameterized roles has really got me thinking. One of the first use cases I ran into for parameterized roles was MooseX::Storage. Myself and Chris Prather wrote it on the train into NYC one day and since there was no such thing as MooseX::Role::Parameterized yet we hacked it with an exported Storage subroutine which composing in multiple roles based on the parameters that were passed to it. This is actually still how MooseX::Storage works because, well, it Just Works tm so there is no need to change it. But I decided as a thought experiment and a test of the Role Functor idea from my last post to see if I could re-write MooseX::Storage in terms of it. Much to my delight it not only worked, but came out very cleanly. 

Now, this is still written in MooseX::Declare inspired pseudo-code, so it is not yet a reality, but I am getting more and more convinced that this is something I really need to write. So anyway, here goes.

role COLLAPSER { requires 'pack', 'unpack' }
role FORMATTER { requires 'thaw', 'freeze' }
role IO        { requires 'load', 'store'  }

role DefaultCollapser with COLLAPSER {

    method pack {
        Collapser::Engine->new( object => $self )
                         ->collapse_object
    }

    method unpack ($class:, $data) {
        Collapser::Engine->new( class => $class )
                         ->expand_object( $data )
    }
}

role JSONFormatter [ 
        Collapser => (does => COLLAPSER) 
    ] with FORMATTER {

    method thaw ($class:, $json) {
        $class->unpack( JSON::Any->encode( $json ) )
    }

    method freeze {
        JSON::Any->decode( $self->pack )
    }
}

role SimpleFile [ 
        Formatter => (does => FORMATTER) 
    ] with IO {

    method load ($class:, $filename){
        my $fh   = IO::File->new( $filename, 'r' );
        my $data = do { local $/; <$fh>; };
        $class->thaw( $data );
    }

    method store ($filename) {
        my $fh = IO::File->new( $filename, 'w' );
        $fh->print( $self->freeze );
    }
}

I am obviously punting on a couple of details here to keep things simple for the example, but I think it gets the point across.  The nice part, in my opinion, is that the parameterization nicely captures the "levels" of serialization. For instance, here is what a class that does all the options would look like:

class Point 
 with SimpleFile( 
          Formatter => JSONFormatter( 
              Collapser => DefaultCollapser 
         ) 
    ) {
    has x => (is => rw, isa => Int, default => 0);
    has y => (is => rw, isa => Int, default => 0);

    method clear {
        $self->x(0);
        $self->y(0);
    }
}

And here is a class which does not do the load/store but just does the JSON freeze/thaw:

class Point
 with JSONFormatter( 
          Collapser => DefaultCollapser 
    ) {
    has x => (is => rw, isa => Int, default => 0);
    has y => (is => rw, isa => Int, default => 0);

    method clear {
        $self->x(0);
        $self->y(0);
    }
}

And here is a class which does only the simple pack/unpack:

class Point with DefaultCollapser {
    has x => (is => rw, isa => Int, default => 0);
    has y => (is => rw, isa => Int, default => 0);

    method clear {
        $self->x(0);
        $self->y(0);
    }
}

Overall I am quite happy with this, so now it is just a matter of finding the tuits to actually implement it.

Sunday, July 12, 2009

Thoughts on Parameterized Roles

I was discussing parameterized roles with Sartak and doy at YAPC::NA this year. Sartak is the author of the very cool MooseX::Role::Parameterized module, which implements pretty much unlimited parameterization abilities for roles. The shear, unbridled flexibility embodied in that module is insane, which, of course, is both really cool and really scary at the same time. One of our discussion points was about how so much flexibility, if misused, pretty much destroys the benefits of allomorphism you get from roles. With enough parameterization the statement $object->does(SomeRole) has very little meaning anymore since SomeRole could easily be parameterized so that two instances of it do wildly different things. One of the thoughts discussed for solving this problem was to create a stricter set of different kinds of parameters that are allowed. Essentially restricting the functionality to a sane subset through which we can provide some level of guaranteed allomorphism.  While we pretty much rejected that idea for MooseX::Role::Parameterized, the idea stuck in my head.

So the other day on #moose, I was discussing parameterized roles again with Sartak and doy and I mentioned how I have always seen parameterized roles as being very close to ML Functors. The ML family of languages (Standard ML , OCaml, etc.) has an extremely powerful module system which not only has modules (structure in SML) and module signatures (the "type" of the module) but also functors. Functors are best described as modules which take another module as an argument and produce a third module as a result. The book "ML for the Working Programmer" (highly recommended, it is a great book) shows the following conceptual mapping to try and help describe the ML module system. 

  structure ~ value
  signature ~ type
    functor ~ function

But as the book says, this is a helpful starting point, but it fails to convey the full possibilities of the ML module system. 

So at one point in this discussion I decided to try and sketch out how Functor-esque parameterized roles might look and I came up with this (using MooseX::Declare inspired pseudo-code).

role ORDERING { requires 'compare' }

role Sortable [Ordering => (does => 'ORDERING') ] {
    sub sort {
        my ($self, @elements)
        sort { $self->compare($a, $b) } @elements
    }
}

role StringOrder with ORDERING {
    sub compare {
        my (undef, $x, $y) = @_;
        $x cmp $y;
    }
}

role NumericOrder with ORDERING {
    sub compare {
        my (undef, $x, $y) = @_;
        $x <=> $y;
    }
}

role AlphabeticalOrder with ORDERING {
    sub compare {
        my (undef, $x, $y) = @_;
        lc($x) cmp lc($y);
    }
}

class BunchOfStrings with Sortable(StringOrder) {
    # ...
}

class BunchOfNumbers with Sortable(NumericOrder) {
    # ...
}

The first role ORDERING is just an role that requires the compare method and nothing more (an interface), which maps to the ML idea of a signature. 

The second is the parameterized role Sortable which implements a sort method and expects a single role parameter Ordering which must be a role that does the ORDERING interface. This role maps to the ML idea of a Functor. If you notice the Sortable::sort method calls a compare method, which is a method of the ORDERING interface role. The idea here is that the role provided in the parameter Ordering will get composed into the Sortable role and provide the expected compare method. 

The next three roles are just examples of roles that do the ORDERING interface role. Basically one for each of the most common Perl sorting behaviors (at least the most common in my experience). These are pretty simple and straightforward, nothing special here.

After this is a few classes that show how this mechanism might get used. The Sortable(StringOrder) syntax shows the passing of the role parameter (in this case StringOrder) to the parameterized role Sortable. The result of this will produce a third role which is then composed into the BunchOfStrings class.

So, while this is much more restrictive then MooseX::Role::Parameterized, it is much more flexible then simply creating a restricted subset of parameterizable bits. It also (perhaps) solves the allomorphism issue since the "name" of the Sortable(StringOrder) role is simply Sortable(StringOrder) and this clearly provides a predictable and repeatable set of functionality.

So anyway, I do not currently have the tuits to implement this and honestly I kind of want to let this stew for a little longer. It would not replace MooseX::Role::Parameterized but perhaps be called MooseX::Role::Functors or something and can be just another way to do it.

Why I don't like Autobox

So I was looking over Michael Schwern's perl5i module recently (after hearing about it at this years YAPC::NA) and I noticed that it enables the autobox module. This reminded me of all the debates I have had with Matt Trout over the years about the various pros and cons of autobox. So I figured this would probably make a decent blog post, so here goes.

My core objection to autobox is that it is an illusion. It works by hijacking the normal perl method resolution process and right before perl says "Cannot call method 'foo' on unblessed reference" it checks specific packages to see if there are available methods. This gives the illusion that these core perl types are in fact objects, when in reality they are very much not. If they were proper objects, they would always be objects instead of just objects within the lexical scope of the autobox pragma. Here is some code that illustrates what (for me) is the big abstraction leak of autobox.

my $test;
{
    use autobox;
    my $foo = [ 1, 2, 3, 4, 5 ];
    warn $foo->length;
    $test = sub {
        warn $foo->length; # succeeds ...
        $foo;
    };
}

my $x = $test->();
warn $x->length; # fails

This example shows how the lexical scoping of the autobox pragma allows the $test closure to still work correctly, but once outside of the lexical scope the value is no longer autoboxed. This just seems really backwards to me because it requires the users of your code to also enable autobox in their code to use elements from your code. The result of them (for whatever reason) not doing this is that your internal usage of a data element can greatly differ from external usage of the same element. This is an API disconnect that does not sit well with me.

In short, autoboxing is a feature of the lexical environment and not something intrinsic to the element itself. 

My second issue with autobox is that it is very shallow. In languages where the core types are proper objects (Smalltalk, Ruby, Javascript, etc.) it is possible to subclass/extend these core types using normal OO practices. Autobox provides the illusion of normal OO, but as soon as you look any deeper the the surface the illusion starts to crumble at an alarming rate. 

While it is possible to do something close to subclassing/extending with autobox code by using the following technique, it has some severe drawbacks and serious inconsistencies. 

{
    package ARRAY;
    sub length { scalar @{ $_[0] } }
    
    package MyArray;
    use base 'ARRAY';
    # do something silly here for illustration
    sub length { (shift)->SUPER::length + 1 }
}

{
    use autobox;
    my $foo = [ 1, 2, 3, 4, 5 ];
    warn $foo->length; # 5

    my $bar = bless [ 1, 2, 3, 4, 5 ] => 'MyArray';
    warn $bar->length; # 6
}

The most obvious issue is that this only works for reference types (ARRAY, HASH and CODE) since Perl only allows blessing of references. So you cannot use this with SCALAR, INTEGER, FLOAT, NUMBER, STRING and UNDEF which leaves out more then half of the functionality of autobox.

Also, the manual blessing of the subclassed array ref seems a little odd since it differs from how the regular autoboxed array ref works. Of course you could create a MyArray::new method to hide this if you want. If you did this then perhaps for consistencies sake you would want to ARRAY::new as well. But unless you blessed the array ref into the ARRAY package then a user of your code would need to have autoboxing enabled for ARRAY->new to return anything useful, because (as I said above) the autoboxing is not intrinsic functionality, but instead functionality of a given lexical environment. 

Now, my last issue with autobox is that if used with the wrong kind of laziness is can expose the internals of an object and defeat encapsulation and make bad APIs. This was the original motivation behind my writing MooseX::AttributeHelpers after having written Moose::Autobox.  Take this example for instance.

{
    package MyThings;
    use Moose;
    use Moose::Autobox;
    
    has 'things' => (
        is      => 'ro',
        isa     => 'ArrayRef',   
        default => sub { [] },
    );
    
    my $me = MyThings->new;
    
    $me->things->push( 1 );
}

It is very tempting to just let the autoboxing provide the API to add things to your object, but this exposes a lot of internal details to your objects consumer. If at some point you want to change how things are stored you will have a lot of work  to do. Of course this is better then if users had been doing push(@{ $me->things }, 1) because you still have the encapsulation of the autoboxed APIs. But having to write an interface to match ARRAY for whatever you change things to use is just going to get nasty after a while.

Perceptive readers will also note that $me->things->push( 1 ) will not work  unless autoboxing is enabled in that particular lexical environment. Again placing a lot of responsibility on the users of your code just to use the API your providing.

In contrast the MooseX::AttributeHelpers (soon to be core Moose) version is much more encapsulation friendly and is much more amenable to future changes to the storage type of things.

{
    package MyThings;
    use Moose;
    use MooseX::AttributeHelpers;
    
    has 'things' => (
        traits   => [ 'Collection::Array' ],
        is       => 'ro',
        isa      => 'ArrayRef',   
        default  => sub { [] },
        provides => {
            push => 'add_thing'
        }
    );
    
    my $me = MyThings->new;
    
    $me->add_thing( 1 );
}

If you change how things are stored, you simply need to re-write the add_things method. Everything is properly encapsulated within your object as it should be. 

So anyway, thats enough of my autobox ranting. I think that autobox is an extremely interesting piece of software and by no means do I think people should not use it if they are so inclined to. But I think it should be used carefully and with full knowledge of it's limitations and issues.