On Nagios, Thunk, Shinken and wrapper included marketing

Nagios is probably the most famous and used monitoring program on the market. It's free, GPL and has nice features such as object representation of data, inheritance, plugin systems, passive testing, built-in Perl interpreter, result caching, pipe interface, alert delegations and so on and so on.

The web interface of Nagios is, however, incredible ugly. It's written in CGI the way the early CGI scripts were written. When you make a change to a server via the web interface, you get a few screens (avoiding Javascript is a benefit for some cell phones, I guess) and the old and quickly-annoying "You have done the action you wanted, please click this back link we created to go backwards" screen. You won't simply get automatically directed back to where you were before with a new message at the top in bright green saying "Action X done" or something like that. That would be too easy and Web 2.0. It uses frames (yuck!) to show the sidebar, you don't see the content of comments in hover, only when you click on the comment to get to the comment screen to view the comment. It's literally a pitfall and at least one company where I worked at rewrote the entire interface in ASP and .NET (I know, I know...) by parsing the Nagios log.

For that reason, it has always been a bit difficult (though possible) selling Nagios to the enterprise when your boss isn't tech savvy, and other programs, not much better or worse (OpenNMS for instance) find their way since they have a much better user interface.

A new fork of Nagios has begun with a PHP interface, calling Icinga. The point is to accept a lot of patches that were difficult to get into Nagios (that has only one actual developer - Ethan Galstad), and provide a beautiful web interface with Javascript. At least one Nagios community members sees the entire fork's point is the web interface, and assumes there's a good chance it will be merged back into Nagios, keeping the core as it is.

Apparently, there is a Perl Catalyst-based project to revamp the Nagios web interface, called Thunk. It is available on Github and also has a demo page. It's incredibly fast and seems promising. However, there is no website for it. Currently the only face it has is the Github page which seems generic and perhaps non-welcoming for some people who consider using it. You can also view the demo, but it still has the basic dull look of Nagios' oldschool interface.

Another new project relating to Nagios is Shinken, which is a Python rewrite of Nagios. Apparently someone thought that Nagios is great, except it's written in C, which makes it a barrier to include other peoples' work. I personally disagree for a variety of reasons which I won't go into. What does seem interesting is that Shinken is very welcoming, even though (at least to me) it seems like an exercise in futility. I'm assuming it will rapidly develop a stable core userbase, and the website is to thank, IMHO.

One more issue to note: Ethan Galstad is now developing Nagios XI, an enterprise solution. The "solution" boasts a new web interface, which I suspect will be the leading selling point of it.

So:


  • Nagios has a terrible interface.

  • Nagios XI has a pretty interface (and a free iPod Touch with every purchase, according to the site).

  • Icinga say they will put more patches in that were rejected or ignored for Nagios (but will keep core compatibility). However the strong point of Icinga (and where most efforts are now going to) is the web interface.

  • Thunk attacks the interface problem directly, but doesn't seem it will be adopted much because it doesn't even have a face.

  • Shinken tries to replace Nagios by rewriting the core to be in Python, but the best reason to adopt it (as with OpenNMS) is the web interface.

Now, of course there's a difference in Python, Perl, C and all that. However, in marketing, the selling value goes to the more presentable. In a System Monitoring conference I would have a pretty difficult time selling anything with just a Github page. It all boils down (in marketing terms) to the interface.

13 Comments

How about Opsview ?

From http://www.opsview.org/
Opsview, in development since 2003 is a fully integrated monitoring tool that incorporates popular Open Source software including Nagios® Core, Nagvis, Net-SNMP and RRDtool. The Catalyst web framework provides an extensible monitoring and configuration user interface.

I think the Python fork of Nagios is a good idea actually.

What are the Perl options in this area? I know Python has a few on its own.

A few remarks...
The new interface is called "Thruk", not "Thunk". (Unfortunately, because to me, Thunk is more easy to pronounce)

"Nagios has a terrible interface". True. But admins are used to it, it does what they expect to do. At least the admins i know, not only got used to it, they like it. (looking at the alternatives). There are some drawbacks like "mass-clicking". I mean, you can't send a selection of hosts/services into a downtime for example. With Thruk this will be possible.

"Nagios IX has a pretty interface". On the first glance, yes. Then you'll see, it's just the old interface with some fancy css added.

"Icinga say they will put more patches ". This can only be bugfix- and maybe performance tuning-patches. They can't innovate, as it would break the nagios compatibility they promised. Besides the web interface, they have the ido, which is ndo+oracle. But the core...as long as Nagios development is hibernating, Icinga cannot move forward. So it is just another interface. Let's see if it's spectacular/useful enough for the community to accept it on a broad basis.

"Thunk attacks the interface problem directly". Thruk was born out of necessity to handle an installation of some thousand services. Fortunately it also solved another problem. Distributed Nagios servers sending check results back to a central node just to have a single point of view. Now the Thruk web application plays the role of this central node. (without the danger getting killed by thousands of nsca-forks)

Shinken kicks ass like Nagios did 10 years ago (ok, 5 years ago). Look at the Nagios development during the last 3 years. Innovations? Developer's presence on the devel mailing list? Acceptance of patches? New releases? (There wasn't even one after the summer/wintertime-disaster). Instead there is a commercial version of Nagios. To me, that doesn't look like an open source project with a bright future.
Shinken on the other side offers the chance to an average programmer to actually read and understand the scripts instantly. Try this with the 10-year-old C-code of Nagios.
The Shinken code is inspiring. People will play around and develop new ideas. And finally, it runs on Windows. Not an argument for the geek, but managers will like it.

Yes, maybe i have to change that name. I misspell it sometimes too.

The reason, why there is no website for thruk yet, is that thruk is not yet finished in a way i want to release it. I want to be at least feature comparable with the original cgis before releasing thruk. Otherwise you would have to install two web interfaces.

Robert, above, asked:
What are the Perl options in this area?

I'm a big fan of argus,
which is written entirely in perl.

Maybe a Perl re-write of Nagios then? You can use Inline::C and/or XS if you need the speed. I think Python has similar stuff.

Hi,

I'm the Shinken main dev. I don't think svn->git is the major problem of Nagios. Even core dev do not want to touch the Nagios core. When you propose to add a process pool (less fork) or bypass the reaping process with socket return(no more flat file reaping, big perf boost) they say : "too difficult".
Is it so difficult? In C, it is not trivial. Just look at Apache code for the process pool. The socket return? You need to make the protocol, the serialization of checks and the socket management (this part is not so difficult in fact).


Look at Python or Perl modules for multiprocessing. In 10 lines max and you have your process pool with socket return of checks.


What is the problem here? Scheduling check is a high level problem. Why keep a low level language to resolve it? In Nagios, the big perf problem is in the architecture (flat file reaping), with high level languages, you can make the architecture you want, there less "too difficult" things. You want to try a pool process? Ok, the test need 5 minutes. It is good or not. In C, it's 2 full days.


You can make it in C, but It will need a lot of work, and you will have to be better than Python/Perl coders. If not, even the interpreted language will be faster than you, and you will just loose severals days of hard work.


For Shinken performances : it's because we do not do like Nagios that we can do better. Do it change something for the user? No. Same configuration, same global behavior (check_command, host, service, etc). Just not the same way of get results, but users just do not care about this.


For the interface : Shinken do not have one, and will never have a new one. It use Nagios ones (CGI, Ninja, Centreon). The Shinken goal is to replace the Nagios core, not all the Nagios world (plugins, interface, etc).

Just a quick update.
There is a website about Thruk (www.thruk.org)
with a few screenshots and documentation.

Then Thruk supports themes which the user can change on the fly.

Thats true. I created it after your blog post about 2 weeks ago.

Leave a comment

About Sawyer X

user-pic Gots to do the bloggingz