More on Splunk
We’ve had it running for a bit over a day now with logs being fed in via syslog from eight Solaris hosts plus a NetApp Filer. So there’s a bit of data in it now, enough to make it worth looking for patterns.
The first thing that leaps out is that a particular host is logging a lot more than any others. Click on the hostname to see all the stuff that has come in over the past 24 hours. Not the most helpful view, there’s just too much stuff to eyeball easily.
So click “Report on results” to get a list of fields you use to build a graph view, click “process”, select “pie graph” from the “display as” dropdown, and now you can see what’s causing all those messages.
Of 1800 events, 1100 are from puppetmasterd. Click on the slice of the pie that represents puppetmasterd and you’re back to the log-entry view but now it’s filtered to only show puppetmasterd entries from this specific host.
At this point I can see that a great many of these are because the fileserv module in puppet is logging info it shouldn’t (known bug). So I want to ignore those. The current search expression is:
host="host.example.com" | where process="puppetmasterd"
To filter out all these irrelevant messages, I change it to:
host="host.example.com" NOT File | where process="puppetmasterd"
and suddenly I’m down to the 200 actual errors I care about. In this case they’re all identical and they indicate that one of the client hosts is having trouble evaluating a custom fact.
Unfortunately at this point I have to rely on the fact that I know which host is having trouble, because puppet’s log entries don’t actually say. Gah! But hey, we’re indexing all these logs from the clients too, yes? So…
selector | where process="puppetd"
And now I can see which client(s) are having this problem! Only one in this case, but the “host” dropdown tells you how many hosts match the search. Or you could graph it.
This is an example of a problem I already knew was there and that I’m working on, but the basic principle holds. The trick is going to be reducing the amount of noise going into the index in the first place, but when you can’t do that you can still eliminate it from the results.
I’m really looking forward to getting more stuff feeding into Splunk. It looks like it’ll be excellent for finding problems while ignoring noise.
Popularity: 37% [?]



Kind of OT, but have you had any luck getting Splunk to work on Konqueror. Have moved over to Gutsy and really would prefer to start using Konqueror rather than Firefox. The javascript seems to be generating quite a few errors, on both Splunk 2.1.1 and 3.1.2.
Haven’t tried it in Konqueror — when I found it didn’t work in Opera I figured Konq was completely out of the question and didn’t even try.