[ngw] GroupWise POA Memory Usage
gregg at hinchmanconsulting.com
Thu Mar 28 12:32:43 UTC 2019
So this has been an issue for one of my customers for years now. We moved to sles12 and upgraded to 18 within the last 4 months and within a week it started again.
He runs top and sets a cron to stop/start the agents once a week.
Cores have been set SRs open. End result. No fix. I am pretty sure dev cannot lock it down and fix it. It was just one of their pos only. Which really makes you wonder.
I can tell you I did the same move to sles12/upgrade to 18 for several other customers within the last 6 mos or so and they have not in the past not today have the issue.
These leads me to wonder if there is a feature or data in the po that exposes it. Suffice to say. A fix is needed.
Sent from while mobile.
Gregg Hinchman, PMP
> On Mar 28, 2019, at 5:33 AM, Anthony Harper <aharper at psc.ac.uk> wrote:
> Hi Marvin,
> I also have POA crashes. What usually happens is that either the poa crashes (with a mention of libc in /var/log/messages), the oom killer steps in or I manually restart the poa out of hours. I've recently migrated GW from OES 2015 SP1 to SLES 15 (all running on top of ESX), and the memory usage issue and crashes have followed.
> I use Adrem Netcrunch to monitor our servers, and I'm tracking "% Memory Used by Processes" for the server and "% Memory Utilization" of the poa process. They track in near perfect correlation.
> I am sending cores to support (and have done for a few years) - the only steps I have to create the core is editing the /etc/sysconfig/grpwise file's GROUPWISE_DEBUG_OPTION to "on,fence" - do you have other steps?
>>>> "Marvin Huffaker" <mhuffaker at redjuju.com> 27/03/2019 18:14 >>>
> I've been chasing POA crashes on larger systems for a couple years now. I haven't actually observed the behavior you're mentioning (I've only see after the POA Crashes) but with the cores we've obtained, I'm told it's a memory corruption issue. Where/how are you monitoring the memory consumption? I could monitor my customer systems for the same behavior on the problem systems. One of my systems is SLES 15 but it doesn't seem to matter which OS, I've seen it on SLES11 and SLES 12 also. Are you getting Cores to the support team for analysis? If so that could help get it resolved. While it may be common, it shouldn't be considered normal. I have a list of steps to take to prepare your server for a core, it's more of a challenge on SLES 12 and 15, but you actually get better cores than on SLES 11.
>>>> "Anthony Harper" <aharper at psc.ac.uk> 3/27/2019 8:25 AM >>>
> The poa processes on my two GroupWise servers gradually increase their memory consumption until the server's oom-killer steps in a kills the poa. Is this normal behaviour?
> We're currently running 18.1.0 -132861 on SLES 15, however I've seen this behaviour for a number of years now.
> ngw mailing list
> ngw at ngwlist.com
More information about the ngw