<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Burak Gürsoy</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/burak_gursoy/" />
    <link rel="self" type="application/atom+xml" href="http://blogs.perl.org/users/burak_gursoy/atom.xml" />
    <id>tag:blogs.perl.org,2009-11-03:/users/burak_gursoy//287</id>
    <updated>2010-07-18T23:21:52Z</updated>
    <subtitle>All things Perl</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>Parallel programming with fork() and tail()ing logs</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/burak_gursoy/2010/07/parallel-programming-with-fork-and-tailiing-logs.html" />
    <id>tag:blogs.perl.org,2010:/users/burak_gursoy//287.759</id>

    <published>2010-07-18T22:34:06Z</published>
    <updated>2010-07-18T23:21:52Z</updated>

    <summary> I&apos; ve worked on a small freelance project recently. It was a log watcher and another tiny program watching it (watcher watcher). Basically, it&apos;s an extended `tail -f` watching over multiple logs generated by some persistent programs simultaneously and...</summary>
    <author>
        <name>Burak Gürsoy</name>
        <uri>http://twitter.com/burakgursoy</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="cpan" label="cpan" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="fork" label="fork" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="ipc" label="ipc" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="memory" label="memory" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="module" label="module" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="parallel" label="parallel" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="shared" label="shared" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="tail" label="tail" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/burak_gursoy/">
        <![CDATA[<p>
I' ve worked on a small freelance project recently. 
It was a log watcher and another tiny program watching it (watcher watcher).
Basically, it's an extended <code>`tail -f`</code> watching over multiple logs 
generated by some persistent programs simultaneously
and sending alerts based on configuration data. So there are two main problems 
before starting coding:

<ol>
<li>Use threads or use fork()?</li>
<li>Hand craft tailing code or check if there is already a module for that?</li>
</ol>
</p>]]>
        <![CDATA[<p>
Oh, there are also other side issues like fetching configuration and sending emails.
For the configuration part, the client initially thought about connecting to a 
database and <code>SELECT</code>ing the conf from an (Oracle) table. So,
we either needed to use <code>DBD::Oracle</code> or <code>DBD::ODBC</code>
for that and apart from making sure that these dependencies work, they are
dependencies after all; and just to get a configuration. So the client dismissed
that and we decided that the code will fetch the conf from a HTTP URL instead
(which I don't care how they implement that).

</p>

<p>
At this point, the the problem of <code>GET()</code>ing the URL arised. The code will
run on machines with different setups and it's impossible to assume that LWP will be
pre-installed (altough some distributions are known to bundle it) and making LWP a dependency 
will pull in a bunch of other modules too
(they are not using perl, it's needed for this project). So I thought that 
maybe I can use Sockets directly or perhaps someone wrote a low-calory user agent.
</p>

<p>
And as one can assume, CPAN already has <i>that</i> wheel re-invented by 
<i>Adam "Tiny" Kennedy</i> as
<a href="http://search.cpan.org/dist/HTTP-Lite/" target="_blank">HTTP::Lite</a> 
(one can expect it to be HTTP::<i>Tiny</i> though). As a side note: I also noticed that 
<a href="http://search.cpan.org/dist/App-perlbrew/" target="_blank">perlbrew</a> is also 
using it and even 
<a href="http://github.com/gugod/App-perlbrew/commit/e68eb121f5e2a4dfa2a3f40fdcf50e6eda141ead"
 target="_blank">submitted a patch to add support for mirror selection</a> (that unknown 
 committer is me, I forgot to set git ENV variables).
</p>
<p>
For sending email, I chose <a href="http://search.cpan.org/dist/MIME-Lite/" target="_blank">MIME::Lite</a>
as it's a light module as the name suggests and I like it's API. Also, SASL Authentication support added 
recently to it and although we didn't need it, it's easy to enable when needed.
</p>

<p>
For parallel processing I decided to use <code>fork()</code> as it seemed natural than using threads
and I didn't really used threads before and there was a possibility to have non-threaded perls. Actually
I didn't use fork() either in a production code before and it seemed like a good opportunity
to test its usage extensively. The usage is quite simple as you know:
<pre>

    foreach my $foo ( @logs ) {
        # pre-init stuff here(*)
        my $pid = fork();
        if ( $pid ) {
            # parent
            push @children, $pid;
            $counter++;
        }
        elsif ( $pid == 0 ) {
            # child
            # real action happens here
        else {
            die "Couldn't fork: $!\n";
        }
        # other stuff here
    }

</pre>

I collect the children PIDs in the code, to wait for them below to prevent zombies
and also to kill them if needed (and it's needed!):

<pre>

    foreach my $pid ( @children ) {
        waitpid($pid, 0);
    }

</pre>

<br>
<code>$counter</code> is there to limit the forks to a constant value to prevent fork bombs.

<pre>

        # pre-init stuff here(*)
        if ( $counter >= FORK_LIMIT ) {
            warn "A warning to inform that the program will "
                 ."discard any logs from now on\n";
            warn "The user either has to change the hard coded limit "
                  ."or create another instance\n";
            last;
        }

</pre>

For some reason, the user (or the system admin) could decide to stop the watcher program. It's easy
to kill it or hit <code>[CTRL]+C</code> but that's not a good user interface and <i>is</i> messy.
So, it's better to have a nice command like <code>"$0 -stop"</code>.
And the simplest way to implement <code>"$0 -stop"</code> is:

<pre>

    local $SIG{INT} = sub {
        kill SIGKILL, @children;
        warn "All stopped.\n";
        exit 0;
    };

</pre>

And notice that, just before the program <i>exit()</i>s, it'll call <code>END</code> blocks and 
since I did an OO implementation, the <code>DESTROY</code> method of the object will be called 
where I can do some cleanup and save state onto the disk.

</p>

<p>
But how will <code>$0</code> (the program) know which process to send the <code>^INT</code>
signal to (where it'll be catched by <code>$SIG{INT}</code>)? 
That needs a little trick used by a lot of programs and even by your <i>cpan</i>
shell. Basically, the programs creates a <i>lock file</i> when it starts and saves the PID
into it. When someone executes <code>$0 -stop</code> command, the program does not over-write
(re-create) the lock file, but instead reads it and sends a kill command to the other process
and then returns (exits):

<pre>

    kill SIGINT, $pid_from_lock_file;

</pre>

Another requirement was to check if there is a new configuration every one hour automatically.
I wrote this as the last part of the code with an alarm handler, which proved to be easy to 
implement at the end.

</p>

<p>
The whole thing sounds easy right? But the real implementation of this whole stuff took a little longer 
than I anticipated, because there are some other requirements around it that made things
a little bit complex. But the main logic is really simple. However, the problem with <i>IPC</i>
is; you get copies of everything instead of a simple shared variable
(that also took a little while to realise). So, if you want to set 
some flags in the child, you can't get them back in the parent or inside other siblings. 
To do that you need a shared memory area where everyone
can read and write. I decided to use <a href="http://search.cpan.org/dist/IPC-ShareLite/">IPC::ShareLite</a>.
From a "normal" programming point of view, it has a ridiculous API and only supports strings as shared
memory storage but there are workarounds for that and the one suggested is to use the excellent
<a href="http://search.cpan.org/dist/Storable">Storable</a> module to store and fetch complex 
structures (which I did). If you are wondering, why I needed that shared memory, it's needed to 
implement a <i>pause()</i> functionality where the watcher stops sending emails for a defined period 
of time when a treshold is reached (and some other stuff).

</p>

<p>
At this point, all I did write was the <i>wrapper</i> code. I didn't actually write the code that tails the logs.
For tailing, it seemed so simple to <code>open()</code> the file and then use a loop to get new entries.
But that became a bad idea after I realised that I also need to check if the file is rotated and do
some other stuff, etc. So, after loosing a couple of hours, 
I ended up using <a href="http://search.cpan.org/dist/File-Tail/">File::Tail</a>.
It really has stuff more in it than meets the eye and I strongly suggest using it instead of implementing 
your own. It takes care of pretty much everything needed to implement <code>tail -f</code> in <i>Perl</i>.
</p>

<p>
But I didn't just use <code>File::Tail</code>, I decided to improve it as leaving a very handy code
in abandonement (not updated for 5 years with open bugs) seemed like a bad idea. I've forked the code
from <a href="http://twitter.com/schwern">Michael Schwern</a>'s
<a href="http://github.com/gitpan/File-Tail">gitpan</a> project and since I don't like <i>git</i>
much, I've created a new Mercurial repository on Bitbucket at
<a href="http://bitbucket.org/burak/cpan-file-tail">bitbucket.org/burak/cpan-file-tail</a>.
I did a major refactoring, improved unit tests, fixed Windows issues (apart from rotate detection)
and implemented <code>$/</code> support as it's in the TODO list and someone requested it.
I still haven't done some benchmarks to see if there are speed issues but feel free to 
fork it or test it. I've also <a href="https://rt.cpan.org/Public/Bug/Display.html?id=58570"
>opened a RT ticket</a> to inform the original author. If I don't get a response or he rejects it,
I'll possibly fork it on CPAN with a new name.
</p>]]>
    </content>
</entry>

<entry>
    <title>PAUSE UI Enhancement with Greasemonkey</title>
    <link rel="alternate" type="text/html" href="http://blogs.perl.org/users/burak_gursoy/2010/04/pause-ui-enhancement-with-greasemonkey.html" />
    <id>tag:blogs.perl.org,2010:/users/burak_gursoy//287.441</id>

    <published>2010-04-04T23:37:33Z</published>
    <updated>2010-04-05T03:36:59Z</updated>

    <summary>Last week, Marcel Grünauer (hanekomu) tweeted about his PAUSE deletions possibly triggered by one of the recent discussions about the CPAN ecosystem (and mirroring deficiencies). PAUSE sends deleted files to the BackPAN which has everything released to CPAN. However, selecting...</summary>
    <author>
        <name>Burak Gürsoy</name>
        <uri>http://twitter.com/burakgursoy</uri>
    </author>
    
        <category term="Perl" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="backpan" label="backpan" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="cpan" label="cpan" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="delete" label="delete" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="greasemonkey" label="greasemonkey" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="javascript" label="javascript" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="pause" label="pause" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="perl" label="perl" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="userscript" label="user-script" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blogs.perl.org/users/burak_gursoy/">
        <![CDATA[<p>Last week, Marcel Grünauer (hanekomu) <a href="http://twitter.com/hanekomu/status/11529322301">tweeted about his PAUSE deletions </a> possibly triggered by <a href="http://www.nntp.perl.org/group/perl.module-authors/2010/03/msg8448.html">one of the recent discussions</a> about the CPAN ecosystem (and mirroring deficiencies). PAUSE sends deleted files to the BackPAN which has <em>everything</em> released to CPAN. However, selecting the files to delete is not much fun since PAUSE UI is sadly not dynamic and if you have <a href="http://thegestalt.org/simon/perl/wholecpan.html">lots of distributions</a> you'll have even more released files on PAUSE.</p>]]>
        <![CDATA[<p>I have relatively small number of distributions, but after <a href="http://twitter.com/hanekomu/status/11555631275">talking with hanekomu</a>, I wondered whether it can be easy to inject some JavaScript to alter the behaviour or add some little functionality to the PAUSE UI. I've played with Greasemonkey and spent <b>much</b> time to finally discover I need <em>addEventListener()</em> to do the magic. The result is my tiny user script: <a href="http://bitbucket.org/burak/gm-pause/src/">http://bitbucket.org/burak/gm-pause/src/</a>. You'll need Firefox + Greasemonkey plugin. Then either you can download the code and install it manually or <a href="http://bitbucket.org/burak/gm-pause/raw/tip/PAUSE.user.js">just click here</a> to install the latest version in the code repository. Next time you logon to PAUSE there will be three more buttons to click (they'll be displayed on every page, but will work only on the deletions section).</p>

<p>Disclaimer: the code is new and possibly has missing parts and/or bugs. Be sure to check the deletion list and undelete files from PAUSE if necessary. And you can also fork the project to improve it.</p>]]>
    </content>
</entry>

</feed>
