[Nagiosplug-help] Tracking down pthread/check_dns problem on CentOS4 w/ 1.4.2 plugins.
John P. Rouillard
rouilj at cs.umb.edu
Tue Nov 29 08:41:06 CET 2005
In message <D91212FE-417A-4B6E-BD1C-6652FC50788B at altinity.com>,
Ton Voon writes:
>On 28 Nov 2005, at 17:26, John P. Rouillard wrote:
>> Correcting the configure script (deleted the $ closing achor) to allow
>> the test to be run I see it calling make to run "config_test/run_tests
>> 10". If I run run_tests with an argument of 1000, I get Success=993
>> Fail=7 with "run_tests 10", I get a successfull completion better than
>> 80% of the time leading to REDHAT_SPOPEN_ERROR being undefined.
>
>Are you saying that if you run it 10 times, it is 100% successful?
If I run "run_tests 10" 10 times, I get a 2 of the 10 element runs
to fail on avergae, but I have had a run of 15 error free. I am just
guessing, but it may be load related. If I pause between the runs, it
seems less likely to happen. However I never had a run of 1000 pass.
>I'm happy with increasing the number of iterations if it catches the
>problem more of the time.
While 1000 may be overkill, I am seeing a 50% detection of failure
when running it in a while loop. The 10 iteration version is failing
less often. I've didn't try 100 or 500.
However I did a bit more testing. The results aren't reliable. I have
had 20 runs of "run_test 10" fail in a row and 20 pass in a row. As
the number passed to run_tests goes up, I have fewer passes, but no
definate way of determining oif the problem exists. E.G. with
a single run of "run_tests 500" I got the following distribution:
1 Success=372 Fail=128
1 Success=400 Fail=100
2 Success=496 Fail=4
1 Success=498 Fail=2
1 Success=499 Fail=1
14 Success=500 Fail=0
80% success. For a "run_tests 10", I get:
19 Success=10 Fail=0
1 Success=7 Fail=3
95% success or
2 Success=10 Fail=0
5 Success=5 Fail=5
3 Success=6 Fail=4
4 Success=7 Fail=3
6 Success=8 Fail=2
10% success or
5 Success=5 Fail=5
4 Success=6 Fail=4
4 Success=7 Fail=3
5 Success=8 Fail=2
2 Success=9 Fail=1
0% success.
For a count of 1000 I got:
5 Success=1000 Fail=0
1 Success=780 Fail=220
1 Success=986 Fail=14
1 Success=990 Fail=10
1 Success=995 Fail=5
2 Success=996 Fail=4
6 Success=997 Fail=3
3 Success=999 Fail=1
25% success or
9 Success=1000 Fail=0
1 Success=833 Fail=167
1 Success=944 Fail=56
1 Success=990 Fail=10
1 Success=996 Fail=4
1 Success=997 Fail=3
2 Success=998 Fail=2
4 Success=999 Fail=1
45% success.
Not sure if the data is of any use, but more runs seems to be better.
>> Ton Voon said:
>>> Alternatively, Sascha Runschke has been working with Red Hat and it
>>> has been fixed in hotfix-kernel-2.6.9-22.12.EL, which you can
>>> probably request from them through your support contract.
>>
>> I think I am seeing this problem in a java based application as
>> well. Searching through redhat's bugzilla hasn't lead me to the ticket
>> for this fix, does anybody have the kernel patch or a ticket ID so I
>> can see the actual problem and try to fix/verify it, or send it to the
>> CentOS folks for inclusion in a release/patch?
>
>What is the best way to specify what the fix from Red Hat is? I will
>update the configure.in comments to reflect.
I would guess the bugzilla ID. I assume the bug ticket is publically
accessible. A link to the kernel patch wouldn't hurt either.
-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.
More information about the Help
mailing list