Monit is a daemon that monitors processes running on a Linux box, and can restart them if:
- They died for any reason (e.g., they crashed, or they were done doing whatever it is they do).
- They use too much CPU or memory
We use it in AppScale for exactly these reasons, for nearly every daemon AppScale relies on. This includes:
- App Engine apps that users upload
The official Monit documentation is pretty thorough but doesn’t tell you in a very minimal way exactly how to write a Monit config file to revive processes that die or take up too much memory. With that in mind, let’s tell you how to do exactly that!
So a Monit config file has N parts:
- The command used to start your service.
- The command used to stop your service.
- The maximum amount of CPU or memory allowed for your service.
- How to see if your service is running.
For (1) and (2), the command has to be fully-qualified. For example, you can’t say
python /home/cgb/blah.py. You have to say
/usr/bin/python /home/cgb/blah.py. For (3), you can say that either the main process itself is limited to a certain amount of CPU and memory, or that process and anything it forks are limited to a certain amount of CPU and memory. In AppScale-land, we always want the latter, so we say
totalmem instead of
mem. Finally, for (4), you specify a command that you want to use to see if the process is actually running. If your process doesn’t work, then this can be the same as (1). If it does fork, then you can use
monit procmatch (.*) once your process is running to see what monit can see, and use that as what monit calls the “match command”.
Let’s look at an example, with what we use as a monit template file in AppScale:
1 2 3 4
So let’s use this template to see how we monitor a service like ZooKeeper. With ZooKeeper, we start it by running
service start zookeeper and stop it by running
service stop zookeeper. However, it forks into a different process to actually run ZooKeeper, so we can’t see that it’s running by seeing if
service start zookeeper is running. I ran
monit procmatch (.*) and saw that
zookeeper.jar was running in the ZooKeeper process, so we can use that as our match command. Our monit config file for ZooKeeper therefore looks like the following:
1 2 3 4
Notice that we didn’t just say
service zookeeper start, because that doesn’t work! I ran
which service, which on our Ubuntu Precise virtual machines returned
/usr/sbin/service, to find out where the service command was installed. I also picked the 250 MB value somewhat arbitrarily, so adjust that as you need to for your system. I also indicated
totalmem here just in case ZooKeeper forks off other processes that take up memory that we want to track as well.
That’s a lightning-fast intro on using monit, and how we use monit to monitor processes in AppScale. Enjoy!