On Nagios, Thunk, Shinken and wrapper included marketing

By Sawyer X on February 3, 2010 12:09 PM under Design

Nagios is probably the most famous and used monitoring program on the market. It's free, GPL and has nice features such as object representation of data, inheritance, plugin systems, passive testing, built-in Perl interpreter, result caching, pipe interface, alert delegations and so on and so on.

The web interface of Nagios is, however, incredible ugly. It's written in CGI the way the early CGI scripts were written. When you make a change to a server via the web interface, you get a few screens (avoiding Javascript is a benefit for some cell phones, I guess) and the old and quickly-annoying "You have done the action you wanted, please click this back link we created to go backwards" screen. You won't simply get automatically directed back to where you were before with a new message at the top in bright green saying "Action X done" or something like that. That would be too easy and Web 2.0. It uses frames (yuck!) to show the sidebar, you don't see the content of comments in hover, only when you click on the comment to get to the comment screen to view the comment. It's literally a pitfall and at least one company where I worked at rewrote the entire interface in ASP and .NET (I know, I know...) by parsing the Nagios log.

For that reason, it has always been a bit difficult (though possible) selling Nagios to the enterprise when your boss isn't tech savvy, and other programs, not much better or worse (OpenNMS for instance) find their way since they have a much better user interface.

A new fork of Nagios has begun with a PHP interface, calling Icinga. The point is to accept a lot of patches that were difficult to get into Nagios (that has only one actual developer - Ethan Galstad), and provide a beautiful web interface with Javascript. At least one Nagios community members sees the entire fork's point is the web interface, and assumes there's a good chance it will be merged back into Nagios, keeping the core as it is.

Apparently, there is a Perl Catalyst-based project to revamp the Nagios web interface, called Thunk. It is available on Github and also has a demo page. It's incredibly fast and seems promising. However, there is no website for it. Currently the only face it has is the Github page which seems generic and perhaps non-welcoming for some people who consider using it. You can also view the demo, but it still has the basic dull look of Nagios' oldschool interface.

Another new project relating to Nagios is Shinken, which is a Python rewrite of Nagios. Apparently someone thought that Nagios is great, except it's written in C, which makes it a barrier to include other peoples' work. I personally disagree for a variety of reasons which I won't go into. What does seem interesting is that Shinken is very welcoming, even though (at least to me) it seems like an exercise in futility. I'm assuming it will rapidly develop a stable core userbase, and the website is to thank, IMHO.

One more issue to note: Ethan Galstad is now developing Nagios XI, an enterprise solution. The "solution" boasts a new web interface, which I suspect will be the leading selling point of it.

So:

Nagios has a terrible interface.

Nagios XI has a pretty interface (and a free iPod Touch with every purchase, according to the site).

Icinga say they will put more patches in that were rejected or ignored for Nagios (but will keep core compatibility). However the strong point of Icinga (and where most efforts are now going to) is the web interface.

Thunk attacks the interface problem directly, but doesn't seem it will be adopted much because it doesn't even have a face.

Shinken tries to replace Nagios by rewriting the core to be in Python, but the best reason to adopt it (as with OpenNMS) is the web interface.

Now, of course there's a difference in Python, Perl, C and all that. However, in marketing, the selling value goes to the more presentable. In a System Monitoring conference I would have a pretty difficult time selling anything with just a Github page. It all boils down (in marketing terms) to the interface.

13 comments

Tagged as:

marketing, monitoring, nagios, sysadmin

13 Comments

aero.myid.net | February 3, 2010 5:04 PM | Reply

How about Opsview ?

From http://www.opsview.org/
Opsview, in development since 2003 is a fully integrated monitoring tool that incorporates popular Open Source software including Nagios® Core, Nagvis, Net-SNMP and RRDtool. The Catalyst web framework provides an extensible monitoring and configuration user interface.

Robert | February 3, 2010 7:38 PM | Reply

I think the Python fork of Nagios is a good idea actually.

What are the Perl options in this area? I know Python has a few on its own.

lausser.myid.net | February 4, 2010 1:05 AM | Reply

A few remarks...
The new interface is called "Thruk", not "Thunk". (Unfortunately, because to me, Thunk is more easy to pronounce)

"Nagios has a terrible interface". True. But admins are used to it, it does what they expect to do. At least the admins i know, not only got used to it, they like it. (looking at the alternatives). There are some drawbacks like "mass-clicking". I mean, you can't send a selection of hosts/services into a downtime for example. With Thruk this will be possible.

"Nagios IX has a pretty interface". On the first glance, yes. Then you'll see, it's just the old interface with some fancy css added.

"Icinga say they will put more patches ". This can only be bugfix- and maybe performance tuning-patches. They can't innovate, as it would break the nagios compatibility they promised. Besides the web interface, they have the ido, which is ndo+oracle. But the core...as long as Nagios development is hibernating, Icinga cannot move forward. So it is just another interface. Let's see if it's spectacular/useful enough for the community to accept it on a broad basis.

"Thunk attacks the interface problem directly". Thruk was born out of necessity to handle an installation of some thousand services. Fortunately it also solved another problem. Distributed Nagios servers sending check results back to a central node just to have a single point of view. Now the Thruk web application plays the role of this central node. (without the danger getting killed by thousands of nsca-forks)

Shinken kicks ass like Nagios did 10 years ago (ok, 5 years ago). Look at the Nagios development during the last 3 years. Innovations? Developer's presence on the devel mailing list? Acceptance of patches? New releases? (There wasn't even one after the summer/wintertime-disaster). Instead there is a commercial version of Nagios. To me, that doesn't look like an open source project with a bright future.
Shinken on the other side offers the chance to an average programmer to actually read and understand the scripts instantly. Try this with the 10-year-old C-code of Nagios.
The Shinken code is inspiring. People will play around and develop new ideas. And finally, it runs on Windows. Not an argument for the geek, but managers will like it.

sve.myopenid.com | February 4, 2010 2:15 AM | Reply

Yes, maybe i have to change that name. I misspell it sometimes too.

The reason, why there is no website for thruk yet, is that thruk is not yet finished in a way i want to release it. I want to be at least feature comparable with the original cgis before releasing thruk. Otherwise you would have to install two web interfaces.

jaw | February 4, 2010 3:00 AM | Reply

Robert, above, asked:
What are the Perl options in this area?

I'm a big fan of argus,
which is written entirely in perl.

Sawyer X | February 4, 2010 11:39 AM | Reply

I want to reply to a lot of stuff, so it's a bit long.

Regarding Opsview:

I haven't ever worked with Opsview, but it seems it's getting some good ground. It's existed for a while now and has a good name. I haven't heard bad things about it.

The point I was trying to make was that beauty is the key operative in many successful projects. Many other programs that aren't (IMHO) much better than Nagios (perhaps somewhat better, but I still doubt it personally) get better traction with users because they have comfortable (and by comfortable, I mean beautiful) interfaces.

Regarding Thruk:

Sorry for the misspell. I'm dyslexic and sometimes I guess it's easier to take what's more probably to say than the more accurate name. Sorry.

Regarding the website, I would still recommend creating a static page that says it's still under development. I think Icinga have done the same. It's still not operative but there's already a website maintaining and growing the buzz around it. It can definitely help gain more efforts into it, or more programmers to help. Just my two cents, I might be wrong on this.

Regarding Nagios vs. Shinken:

I don't think the major problem of Nagios is that it is written in C. There is no shortage in C programmers and people who want to work on Nagios. They just can't because they don't have an open environment (which is, apparently, very different from Open Source). If Nagios was on Github or Google Code or some other {de,}centralized server that accepts patches, forks, etc. we would probably see a surge of changes and improvements. Perhaps even a rewrite of some parts.

Maybe that boat sailed by now and many prominent capable willing C programmers went somewhere else or decided to let go of the idea.

Still, I don't believe that C being the language is the major problem there. The problems you've mentioned (which I don't disagree with you on) are all language agnostic. They can happen with Python as well. Imagine being able to do all the changes in 10 days, instead of a month. If you still can't push it in, have no one to discuss it with, the sole developer being incommunicado, what good will Python (or Perl, for that matter) do you for?

Regarding Shinken specifically, I think it's incorrect that it has no performance hit (something they point out). If you would provide what Nagios provides, the way it is provided, you would definitely get a performance hit. That's why a lot of hardcore heavy programs are written in C. That's one of the reasons we have Perl XS.

Perhaps, if they implement some of things differently (and I'm sure already some of the things are optimized), they might get a few yards ahead, but if Nagios had a truly open framework, so could the Nagios people do in C. It would take longer, but the performance would be much higher.

And thanks everyone for all the input!! :)

Robert | February 5, 2010 1:29 AM | Reply

Maybe a Perl re-write of Nagios then? You can use Inline::C and/or XS if you need the speed. I think Python has similar stuff.

Sawyer X | February 5, 2010 4:57 PM | Reply

A Perl rewrite is only valuable if it leverages the Perl community, and if it has an interface comfortable and marketable enough. The current interface of Nagios was more marketable years back when ugly was thought of as "simple" and "possible". Today it's an obstruction.

There is at least one company that I know that is rewriting Nagios in Perl using (my) POE::Component::OpenSSH. However, I don't think it will be released to the public, unfortunately. Either way, I don't think it does or will have XS pieces to speed up some stuff.

naparuba.myid.net | February 15, 2010 4:58 PM | Reply

Hi,

I'm the Shinken main dev. I don't think svn->git is the major problem of Nagios. Even core dev do not want to touch the Nagios core. When you propose to add a process pool (less fork) or bypass the reaping process with socket return(no more flat file reaping, big perf boost) they say : "too difficult".
Is it so difficult? In C, it is not trivial. Just look at Apache code for the process pool. The socket return? You need to make the protocol, the serialization of checks and the socket management (this part is not so difficult in fact).

Look at Python or Perl modules for multiprocessing. In 10 lines max and you have your process pool with socket return of checks.

What is the problem here? Scheduling check is a high level problem. Why keep a low level language to resolve it? In Nagios, the big perf problem is in the architecture (flat file reaping), with high level languages, you can make the architecture you want, there less "too difficult" things. You want to try a pool process? Ok, the test need 5 minutes. It is good or not. In C, it's 2 full days.

You can make it in C, but It will need a lot of work, and you will have to be better than Python/Perl coders. If not, even the interpreted language will be faster than you, and you will just loose severals days of hard work.

For Shinken performances : it's because we do not do like Nagios that we can do better. Do it change something for the user? No. Same configuration, same global behavior (check_command, host, service, etc). Just not the same way of get results, but users just do not care about this.

For the interface : Shinken do not have one, and will never have a new one. It use Nagios ones (CGI, Ninja, Centreon). The Shinken goal is to replace the Nagios core, not all the Nagios world (plugins, interface, etc).

Sawyer X | February 15, 2010 5:25 PM | Reply

I actually replied to this by email (since I received the comment first by email), but I'll recap the main points here:

You certainly gave me a new insight into the need of rewriting Nagios core. Perhaps I don't see the immediate necessity of it as you do, but I definitely understand better why it is so important for people.

Regarding performance, I do think that generally C has better performance (even though you shouldn't convert Perl to C[1]), but like you I personally write in a very high level language (Perl specifically) because the writing and maintenance speed and ability trade-off is much much better.

So, thanks for the corrections and I definitely see Shinken in a new light now and perhaps even move to it at some point. :)

[1] http://www.perl.com/pub/a/2001/06/27/ctoperl.html

sve.myopenid.com | February 24, 2010 9:11 AM | Reply

Just a quick update.
There is a website about Thruk (www.thruk.org)
with a few screenshots and documentation.

Then Thruk supports themes which the user can change on the fly.

Sawyer X | February 24, 2010 9:48 AM | Reply

Thanks!

How old is this website? I'm assuming it's brand new since I didn't find it before.

sve.myopenid.com | February 26, 2010 10:23 AM | Reply

Thats true. I created it after your blog post about 2 weeks ago.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Sawyer X

Gots to do the bloggingz

More info »

Sawyer X