[ngw] GroupWise POA Memory Usage

Marvin Huffaker mhuffaker at redjuju.com
Sun Mar 31 22:01:30 UTC 2019


In the cases with my customers...  Usually the crashes are with larger
post offices..  I generally qualify larger post offices as 300-400 users
and 500+GB of storage.

One customer has 150 users on one post office and has 650GB of data and
has regular crashes.
One customer has 20 Post Offices distributed amongst 1500 users.  Their
3 biggest post offices are prone to crashing regularly, all of them are
300+GB and they have 300-500 users.
One customer has 3 Post Offices for 1500 users.  All three are evenly
distributed and they have 400-500GB of data per post office. They all
crash regularly.
In addition to POA cores I also see GWADMINSERVICE cores, which are not
as critical but still annoying.

I rarely see POA crashes on smaller post offices.
The crashes have slowed down as I've been working with developers and
new FTF's have been provided, meaning they aren't happening as
frequently but they still happen.

When I first moved to SLES 12, I had a horrible time just configuring
GroupWise to capture a core. It took a long time to figure it out since
it's not as automatic as on SLES 11.  SLES 15 would also have the same
nuances as SLES 12.  I was finally able to compile a list of needed
items to either try to prevent the crash or create a good core once it
does.  It came from multiple sources and I'd still consider it a little
rough, but here it is,  Also I was told that these cores provided better
info that the cores produced by default on SLES 11.   In a situation
where I can expect a core, I do this ahead of time as a necessary
measure. I don't wait to see if it cores and then try to do this, I just
do it.

Note:  Apparmor tends to interfere with cores so ensure it's turned off
or disabled.  I had a customer reboot a server last week, then GroupWise
crashed, but I had not disabled AppArmor, only unloaded it. So when the
server restarted, it loaded again and I failed to get a core at all. 


/etc/sysctl.conf
[Edit]
kernel.core_uses_pid = 1
kernel.core_pattern=/opt/cores/core.%p
kernel.suid_dumpable=2
 
Issue the command :
   	
sysctl -p /etc/sysctl.conf -w
 



/etc/sysconfig/grpwise
[Edit]
 
GROUPWISE_DEBUG_OPTION="on,fence" in /etc/sysconfig/grpwise   (This
section is confusing but basically add this directive to the referenced
file. Note it's /etc/sysconfig/grpwise not /etc/init.d/grpwise. )
 
1. in the /etc/init.d/grpwise script, verify if he see this string
"$GROUPWISE_DEBUG_OPTION".
2. if he sees that, can he go in to /etc/sysconfig/grpwise script, add
this line at the end of the file?
 
By the way, the GROUPWISE_DEBUG_OPTION="on,fence" in
/etc/sysconfig/grpwise script, , the "GROUPWISE_" should be all upper
cased."


 

noelision
[Edit]
Create a file called /etc/ld.so.conf.d/noelision.conf. Add the
following:
/lib64/noelision
Save the file and run /sbin/ldconfig


GW_MEMTST
[Edit]
in /etc/init.d/grpwise add the following (put it under other export
statement)
export GW_MEMTST=on,fence

 

SLES12/SLES15 CORE PROCESS
[Edit]
The following steps should be taken to prepare for capturing a core
dump:
Disable the limit for the maximum size of a core dump file
Configure a fixed location for storing core dumps
Disable AppArmor
Enable core dumps for setuid and setgid processes
The quick step guide for this is as follows
Run
ulimit -c unlimited (should already be set like this)
Run
install -m 1777 -d /opt/cores
Run
echo "/opt/cores/core.%e.%p"> /proc/sys/kernel/core_pattern
Run
rcapparmor stop  (disable apparmor --> Yast --> Services --> Disable
and Stop)

Hope that helps.

Marvin





Marvin Huffaker
GroupWise and OES Expert
Sophos Certified Architect
Cell: 480-797-2989
mhuffaker at redjuju.com 
https://www.redjuju.com



>>> "David Krotil" <David.Krotil at hilo.cz> 3/29/2019 1:42 AM >>>
Never had such big POA, normally I have 350 users per POA, GWcheck
will
take forver on bigger POA´s ....


------------------------------------------------------------------
Obsah tohoto e-mailu a všechny připojené soubory jsou důvěrné a mohou
být chráněny zákonem. Tento e-mail je určen výhradně jeho adresátovi
a jiné osoby do něj nejsou oprávněny nahlížet či s ním jakkoliv
nakládat, jinak se dopustí protiprávního jednání. V případě, že
nejste adresátem tohoto e-mailu, prosíme o jeho vymazání a o podání
e-mailové zprávy. 

The content of this e-mail and any attached files are confidential and
may be legally privileged. It is intended solely for the addressee.
Access to this e-mail by anyone else is unauthorized. If you are not
the
intended recipient, any disclosure, copying, distribution or any
action
taken or omitted to be taken in reliance on it, is prohibited and may
be
unlawful. In this case be so kind and delete this e-mail and inform us
about it.

>>> "Gregg A. Hinchman" <Gregg at HinchmanConsulting.com> 28.03.2019
22:27
>>>

In my case, the customer has 2 post offices.



One is bigger than the other. That one has heavier users and is the
one
that has consistently had poa crashes.  



User count for site is over 1200.  My guess -as I do not have access
to
verify today -is likely 600-700 users in 1 PO.  That is well within
most
specs.  And we are talking about a data size of 300GB, so well under
some of the others I have for GW.



The GW system is old.  Honestly I cannot say just how old, but its
been
on NetWare, OES, SLES11 and now SLES12.  Issues started with GW 14 and
sles11 back in....wait for it....2015!  It was moved from OES to
sles11
in 2015.



Not a believer in coincidences, so we have definitely a commonality
here.  






Take Care.

Gregg A. Hinchman, PMP

Salesforce Certified




Gregg at HinchmanConsulting.com 

www.HinchmanConsulting.com 

765.653.8099





"Courage is doing what is right." 





>>> 
From: David Krotil<David.Krotil at hilo.cz>
To:NGWList<ngw at ngwlist.com>
Date: 3/28/2019 9:00 AM
Subject: Re: [ngw] GroupWise POA Memory Usage


How many users do you have on POA that crashes ?







------------------------------------------------------------------

Obsah tohoto e-mailu a všechny připojené soubory jsou důvěrné a mohou

být chráněny zákonem. Tento e-mail je určen výhradně jeho adresátovi

a jiné osoby do něj nejsou oprávněny nahlížet či s ním jakkoliv

nakládat, jinak se dopustí protiprávního jednání. V případě, že

nejste adresátem tohoto e-mailu, prosíme o jeho vymazání a o podání

e-mailové zprávy. 




The content of this e-mail and any attached files are confidential and

may be legally privileged. It is intended solely for the addressee.

Access to this e-mail by anyone else is unauthorized. If you are not
the

intended recipient, any disclosure, copying, distribution or any
action

taken or omitted to be taken in reliance on it, is prohibited and may
be

unlawful. In this case be so kind and delete this e-mail and inform us

about it.




>>> "Anthony Harper" <aharper at psc.ac.uk> 28.03.2019 13:55 >>>

Hi Gregg,




Thanks for the sanity check!  I could also restart the agents weekly
to

control the memory usage, but the crashes also come seemingly randomly

during production hours.




I know of another couple of GW sites that have the memory issue, and

one with memory and crashing issues.  One is moving off GW.




You'd think that with all the cores they receive that a solution could

be found.  Perhaps it's worth combining our thoughts and talking to

management?




Anthony

>>> Hinchman Consulting <gregg at hinchmanconsulting.com> 28/03/2019
12:32

>>>

So this has been an issue for one of my customers for years now.  We

moved to sles12 and upgraded to 18 within the last 4 months and within
a

week it started again.  




He runs top and sets a cron to stop/start the agents once a week.  




Cores have been set SRs open.  End result.  No fix.   I am pretty sure

dev cannot lock it down and fix it. It was just one of their pos only.

Which really makes you wonder. 




I can tell you I did the same move to sles12/upgrade to 18 for several

other customers within the last 6 mos or so and they have not in the

past not today have the issue.   




These leads me to wonder if there is a feature or data in the po that

exposes it.  Suffice to say.  A fix is needed.  




Sent from while mobile.




Gregg Hinchman, PMP

Salesforce Certified

765.653.8099




> On Mar 28, 2019, at 5:33 AM, Anthony Harper <aharper at psc.ac.uk>

wrote:

> 

> Hi Marvin,

> 

> I also have POA crashes.  What usually happens is that either the
poa

crashes (with a mention of libc in /var/log/messages), the oom killer

steps in or I manually restart the poa out of hours.  I've recently

migrated GW from OES 2015 SP1 to SLES 15 (all running on top of ESX),

and the memory usage issue and crashes have followed.

> 

> I use Adrem Netcrunch to monitor our servers, and I'm tracking "%

Memory Used by Processes" for the server and "% Memory Utilization" of

the poa process.  They track in near perfect correlation.

> 

> I am sending cores to support (and have done for a few years) ‑ the

only steps I have to create the core is editing the

/etc/sysconfig/grpwise file's GROUPWISE_DEBUG_OPTION to "on,fence" ‑
do

you have other steps?

> 

> Anthony

>>>> "Marvin Huffaker" <mhuffaker at redjuju.com> 27/03/2019 18:14 >>>

> I've been chasing POA crashes on larger systems for a couple years

now.   I haven't actually observed the behavior you're mentioning
(I've

only see after the POA Crashes) but with the cores we've obtained, I'm

told it's a memory corruption issue.   Where/how are you monitoring
the

memory consumption? I could monitor my customer systems for the same

behavior on the problem systems.  One of my systems is SLES 15 but it

doesn't seem to matter which OS, I've seen it on SLES11 and SLES 12

also.  Are you getting Cores to the support team for analysis?  If so

that could help get it resolved.  While it may be common, it shouldn't

be considered normal.   I have a list of steps to take to prepare your

server for a core, it's more  of a challenge on SLES 12 and 15, but
you

actually get better cores than on SLES 11.

> 

> 

> Marvin

>>>> "Anthony Harper" <aharper at psc.ac.uk> 3/27/2019 8:25 AM >>>

> Hi,

> 

> The poa processes on my two GroupWise servers gradually increase

their memory consumption until the server's oom‑killer steps in a
kills

the poa.  Is this normal behaviour?

> 

> We're currently running 18.1.0 ‑132861 on SLES 15, however I've seen

this behaviour for a number of years now.

> 

> Regards,

> 

> Anthony

> _______________________________________________

> ngw mailing list

> ngw at ngwlist.com 
> http://ngwlist.com/mailman/listinfo/ngw 










_______________________________________________

ngw mailing list

ngw at ngwlist.com 
http://ngwlist.com/mailman/listinfo/ngw 










_______________________________________________

ngw mailing list

ngw at ngwlist.com 
http://ngwlist.com/mailman/listinfo/ngw 

_______________________________________________
ngw mailing list
ngw at ngwlist.com 
http://ngwlist.com/mailman/listinfo/ngw


_______________________________________________
ngw mailing list
ngw at ngwlist.com
http://ngwlist.com/mailman/listinfo/ngw



More information about the ngw mailing list