Yenya's World

Fri, 12 Jan 2007

Graceful Reload

Yesterday we have tried to solve a problem we probably had for a month or so: we have observed a very high load spikes on our application cluster servers. There were usually only few such spikes a day, and the spikes usually did not occur on all servers simultaneously. I think the problem lasted since we have moved to the new system (Apache 2.2 based, native x86_64). Here is a load graph (the problem has been solved around 5:30pm):

Apache load graph

Mirek found that during this load peak there was an extraordinary number of Apache processes serving our title page (which is quite computationally intensive, but rarely used in such a massive scale). So we thought about somebody DDoSing[?] us. But according to the Apache status page the clients came from 127.0.0.1[?] address.

I don't know about any case where our application would want to access our title page over HTTP (we do some self-referencing requests for, for example, WAP access, but none for the title page). After increasing the server log level we have found that these requests had strange User-Agent value "internal dummy connection". Quick search for this string gave us the answer:

During the "graceful reload", the main Apache process apparently contacts its children not by sending them the SIGUSR1 signal, as in previous Apache releases, but instead sends them a dummy request "GET /", so that they can after the request check (and find out) that the configuration has been changed, and terminate themselves.

So every time we have changed something in our applications (which is several times a day), there was many Apache processes trying to serve our dynamic title page to the Apache itself. Because there are some other (service-only) Apache processes, the load spike was sometimes way bigger than an ordinary remote DDoS attack can cause. A mod_rewrite hack in the server configuration has solved the problem - we redirect such dummy requests to /robots.txt instead of the dynamic title page:

<Directory /documentroot>
        RewriteEngine on
        RewriteCond %{HTTP_USER_AGENT} internal\ dummy\ connection
        RewriteRule ^$ /robots.txt [L]
	...
</Directory>

If you ask me, I think it is pretty lame way to restart itself. The URL in the internal request is not even configurable (what would Apache do when not configured to listen on 127.0.0.1 at all?), and from my searches it looks like we are not the first who ran into this problem.

Section: /computers (RSS feed) | Permanent link | 4 writebacks

4 replies for this story:

Peter Kruty wrote: What's wrong with SIGUSR1?

Sounds realy stupid. I wonder what's wrong with SIGUSR1.

Spes wrote: Re: What's wrong with SIGUSR1?

Maybe to have the same code for all systems, because not all support signals?

mutante wrote: Apache Wiki Page on Internal Dummy Connection

John Gillespie wrote:

Thanks for the info, I've been wondering what all those lines in my logs were about...

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)