Webalizer is a web log analysis software licensed under GPL. It’s written in C and it’s super fast in processing access log files. Configuring Webalizer is very easy, especially if your Apache web server combines all access logs into one log file, but I noticed that many people have trouble configuring Webalizer for multiple websites (virtualhosts). There are a couple of ways to make this trouble disappear, but I’ll explain only one which is, in my opinion, one of the easiest and applicable in most scenarios. Although I’m using CentOS 6, this tutorial isn’t CentOS-specific. So, if you’re using another distro you’ll notice that there are some differences, but you’ll get there in the end.
The story
Let’s imagine you have Apache serving couple of websites on the same server. Each website has its own virtualhost and access log. Let’s suppose that access logs are rotated on a daily basis and that Webalizer’s stats should be generated once a day. Oh, and let’s say that you hate the thought of having gaps in the stats because of access log rotation.
Getting things done
Right, since Webalizer is available in CentOS base repo, you can install it with:
# yum install webalizer
By default, Webalizer’s configuration file is located in /etc/webalizer.conf
and its daily cron job is located in /etc/cron.daily/00webalizer
. In this case we won’t be needing the cron job, so you can delete it right away. On the other hand, we’ll need the configuration file, but just as a template. To make things easier and more organized I suggest that you create a new directory where you’ll put multiple Webalizer configuration files - one for each website/virtualhost.
# mkdir /etc/webalizer
Next, we’ll create a configuration file for every website. For example:
# cp /etc/webalizer.conf /etc/webalizer/ws1.example.com
# cp /etc/webalizer.conf /etc/webalizer/ws2.example.com
In configuration files that you created you can configure a lot of interesting settings (which graphs to generate, how many child processes should Webalizer use for DNS resolving etc.), but to get everything in working state you must set 5 main options:
LogFile /var/log/httpd/ws1.example.com-access_log # apache access log
OutputDir /var/www/stats # document root for stats pages
HistoryName /var/lib/webalizer/ws1.example.com.hist # history file
IncrementalName /var/lib/webalizer/ws1.example.com.current # file for saving incremental data
HostName ws1.example.com # website's FQDN
I strongly recommend that you double-check these settings in all configuration files and make sure that they differ to avoid mixing up stats data of different websites.
Gimme the stats!
You’re probably wondering how are we going generate stats if we got rid off the default cron job. Well, we’ll be using logrotate which will execute a simple bash script for generating stats just before log rotation. This is very convenient because you don’t have to worry about missing chunks of data that occur when logs are rotated before Webalizer gets ahold of them.
If you don’t have logrotate installed, you can simply install it with:
# yum install logrotate
If you configured Apache to save individual access log in /var/log/httpd/
folder, you can easily put the following logrotate configuration in /etc/logrotate.d/httpd
/var/log/httpd/*log {
daily
missingok
rotate 4
compress
delaycompress
notifempty
sharedscripts
delaycompress
prerotate
/root/scripts/webalizer
endscript
postrotate
/sbin/service httpd reload > /dev/null 2>/dev/null || true
endscript
}
As you can see, logs are rotated on a daily basis. Rotated logs are compressed and kept for 4 days. prerotate
function calls external script right before, well, log rotation and the script looks like this:
#!/bin/bash
lockfile="/tmp/webalizer.lock"
# bail out if lock file still exists
if [ -f $lockfile ]; then
echo "Lock file exists! Webalizer may still be crunching numbers!"
exit 1
else
# write the lock file
date +"%d.%m.%Y - %H:%M" > $lockfile
echo -e "-------------------------------------"
echo "[`date +"%d.%m.%Y - %H:%M"`] Generating stats..."
echo -e "-------------------------------------\n"
# go trough config files and generate stats
for i in /etc/webalizer/*.conf; do webalizer -c $i; done
echo -e "\n-------------------------------------"
echo "[`date +"%d.%m.%Y - %H:%M"`] Finished"
echo -e "-------------------------------------\n"
# delete the lock file
rm -rf $lockfile
fi
exit 0
If you are using the above logrotate configuration, you should put the bash script in /root/scripts/webalizer
and make it executable with
# chmod 700 /root/scripts/webalizer
Right, now you’re ready to go. If you don’t want to wait log rotation to see the results, you can always manually execute the bash script to update stats.