[ngw] Antw: Re: GroupWise POA Memory Usage
Uwe.Barth at stadt-chemnitz.de
Thu Mar 28 13:01:35 UTC 2019
I think it would be interesting how big the affected GroupWise systems are and how old users/data are (meaning how many updates did the databased had). Perhaps the problem didn't occur on "newer" and "smaller" systems.
>>> "Anthony Harper" <aharper at psc.ac.uk> 28.03.2019 13:55 >>>
Thanks for the sanity check! I could also restart the agents weekly to control the memory usage, but the crashes also come seemingly randomly during production hours.
I know of another couple of GW sites that have the memory issue, and one with memory and crashing issues. One is moving off GW.
You'd think that with all the cores they receive that a solution could be found. Perhaps it's worth combining our thoughts and talking to management?
>>> Hinchman Consulting <gregg at hinchmanconsulting.com> 28/03/2019 12:32 >>>
So this has been an issue for one of my customers for years now. We moved to sles12 and upgraded to 18 within the last 4 months and within a week it started again.
He runs top and sets a cron to stop/start the agents once a week.
Cores have been set SRs open. End result. No fix. I am pretty sure dev cannot lock it down and fix it. It was just one of their pos only. Which really makes you wonder.
I can tell you I did the same move to sles12/upgrade to 18 for several other customers within the last 6 mos or so and they have not in the past not today have the issue.
These leads me to wonder if there is a feature or data in the po that exposes it. Suffice to say. A fix is needed.
Sent from while mobile.
Gregg Hinchman, PMP
> On Mar 28, 2019, at 5:33 AM, Anthony Harper <aharper at psc.ac.uk> wrote:
> Hi Marvin,
> I also have POA crashes. What usually happens is that either the poa crashes (with a mention of libc in /var/log/messages), the oom killer steps in or I manually restart the poa out of hours. I've recently migrated GW from OES 2015 SP1 to SLES 15 (all running on top of ESX), and the memory usage issue and crashes have followed.
> I use Adrem Netcrunch to monitor our servers, and I'm tracking "% Memory Used by Processes" for the server and "% Memory Utilization" of the poa process. They track in near perfect correlation.
> I am sending cores to support (and have done for a few years) - the only steps I have to create the core is editing the /etc/sysconfig/grpwise file's GROUPWISE_DEBUG_OPTION to "on,fence" - do you have other steps?
>>>> "Marvin Huffaker" <mhuffaker at redjuju.com> 27/03/2019 18:14 >>>
> I've been chasing POA crashes on larger systems for a couple years now. I haven't actually observed the behavior you're mentioning (I've only see after the POA Crashes) but with the cores we've obtained, I'm told it's a memory corruption issue. Where/how are you monitoring the memory consumption? I could monitor my customer systems for the same behavior on the problem systems. One of my systems is SLES 15 but it doesn't seem to matter which OS, I've seen it on SLES11 and SLES 12 also. Are you getting Cores to the support team for analysis? If so that could help get it resolved. While it may be common, it shouldn't be considered normal. I have a list of steps to take to prepare your server for a core, it's more of a challenge on SLES 12 and 15, but you actually get better cores than on SLES 11.
>>>> "Anthony Harper" <aharper at psc.ac.uk> 3/27/2019 8:25 AM >>>
> The poa processes on my two GroupWise servers gradually increase their memory consumption until the server's oom-killer steps in a kills the poa. Is this normal behaviour?
> We're currently running 18.1.0 -132861 on SLES 15, however I've seen this behaviour for a number of years now.
> ngw mailing list
> ngw at ngwlist.com
ngw mailing list
ngw at ngwlist.com
More information about the ngw