I’ve learned a bit about Java and also about Linux this past week-end. I finally found out, what it was that kept me from going forward with the deployment of my web application. I kept getting OOM errors when starting up a second java process on my virtual server. Even stackoverflow users, thorough and diligent as they are, could not provide me with a solution. (In the end it turned out however that one had provided me with the right information, he just could not know that I was unable to apply it, due to the configuration of the server.)
Typically – with past projects – we had the problem, that java did not reserve enough memory for the application. So we would try to give it a larger heap. One does this with the Xmx and Xms options. These two are the most commonly used VM options:
- -Xmx sets the maximum heap size, but it does not limit the amount of memory the VM may use
- -Xms sets the initial heap size
So you might want to start your vm with these java options:
export JAVA_OPTS="-Xmx64M -Xms64M"
This time however I had a new problem: I seemed to not have enough memory on my virtual server. I had been sure that 2GB was sufficient to run a small java web application with a few users for my first alpha test (and it is, but not as smooth as I want it to).
With Xmx and Xms you can not limit the VM. When starting up tomcat the Sun VM would immediately grab 300MB of physical memory. After some searching I found another helpful parameter, some of you might remember this one from trying to keep Eclipse from devouring your development system:
Because the other options only limit the heap size (what your programs use). This one actually limits the VM itself. After adding this to my JAVA_OPTS I found that the VM would finally grab only about 190M (sum of heap size and perm size).
I was however still having trouble running a second java process. Not even java -version would run without throwing my favorite error:
java.lang.OutOfMemoryError: Cannot create GC thread. Out of system resources.
This should have told me. I am writing this article because with that search phrase I did not find a solution. It took me many hours of debugging to arrive at the point where I understood what was wrong and could finally approve the correct answer on stackoverflow.
I was confused because top was telling me there was at least 1GB of memory available (this being a virtual server that would not necessarily be true I guess but since the problem appeared consistently I am quite sure the physical memory is actually available). Even when java was down to <200MB and there was more than 1GB of memory available I kept getting OOM errors (btw the open Jdk used a lot less memory than the sunJdk in my tests). Finally I managed to produce OOM errors even from simple shell commands and that is what led me to the right solution. OOM for me meant “out of (physical) memory”. I never looked at the actual error message: “could not create thread”. Because there are other resources that may not be available that are also counted among the memory resources.
On most systems there is a neat little command that shows you a lot about your systems resource configuration:
On my system ulimit told me that the number of user processes aka threads was unlimited. Well and that is a lie. Because it is not. The virtualization software enforces a hard limit from outside the container. You cannot necessarily see that inside the container. In my case I had access to the “Parallels Power Panel” and finally saw under Resources -> Memory -> Primary System Parameters that the value numproc had a hard limit of 96. If you have a real server the answers on stackoverflow should provide you with helpful hints how to fix this, if you have a virtual server you can call your hosting provider and ask if 96 was a mistake by chance because it is a ridiculously low value. (I also should have seen the Ressource Alerts for numproc, but I didn’t … as I said I learned a lot this past week-end)
Run top and toggle to thread view with shift-h and it shows you which of your applications uses up how many of your threads. Here’s a quick overview what was going on on my system:
- named ~10 threads
- mysql ~10 threads
- apache ~10 threads
- tomcat/java ~30 threads
- misc resource eaters: cron, sshd, init, syslogd
- each login with ssh and subsequent sudo su eats up your processes
So the more I was debugging the worse my problems got. When I am not logged in and I just started my services my system is using about 50-60 threads just doing nothing. There is no traffic to speak of, as I am not yet hosting anything.
It cannot even run jenkins because java seems to need an awful lot of threads when starting up and jenkins of course needs to run mvn or ant to build and so starts up a second java vm which is really expensive as I now know.
Most of the programs above could possibly be made to use fewer threads for a small test server like this. Which is what I will do for now and then I’ll be thinking about different hosting options.
End of story: This morning I called the support and asked if by chance the value for numproc was an error, since I could not find the small print that might have told me so before ordering such a crippled (but indeed rather cheap) server. But alas I was informed that this was as it should be and the server was meant for testing, I should probably go for an upgrade. I totally understand how companies have to make money. You need a low barrier of entry to get people to sign-up for something cheap and then make them upgrade to more expensive servers quickly. Well done, in that aspect! Most people will never find out what it is that is not working. However with this ridiculously low limit I have lost my trust in that the other virtual servers would be adequate. I am not well-versed enough in server hosting to know in advance all possible limits that I would need to check before ordering one of the more powerful servers and when running a few calculations in the end AWS might be cheaper (ulimit -a tells me they allow 1024 threads on the 1-year-free micro instance). Another alternative might be a root server over which I have full control.