[ngw] Android IMAP Groupwise and radical storage growth
Jeff at ScrippsCollege.edu
Thu Dec 13 19:58:33 UTC 2012
Since this is do dear to my heart, let me explain what's going on. Also, there is a warning at the end that should not be ignored.
Devices: Android Devices since 2.0 code (Nov 2009) using built-in email client in IMAP mode.
Issue: When an Android device posts a sent message to GroupWise IMAP, it then uses a obscure but valid search command to verify that the message was in fact written. GroupWise does not respond properly to the search, and as a result, the Android email app considers the response a failure and posts the message again. Depending on the device's access to a cellular/WiFi network, sleep settings, etc. the message can be duplicated thousands, and in many cases, multi-thousands of times.
This issue plagued other email systems including Zimbra, Mercury, Yahoo, etc. These vendors patched their IMAP code quickly to fix the issue - David Harris (Mercury/Pegusus Mail) explained the odd yet valid way the Android phones were verifying the sent message, and how his code was broken. Novell has known about the issue since 2010, but only patched the IMAP code in GW 2012.
Secondary Issue: Message DB hitting 2GB and the filling of all storage allocated to the post office. The Android IMAP bug can trigger a larger and more critical bug in GW 8 (and GW 2012) if you're running on Linux. Sent items from phones tend to by small, and as such, there is a high probability of them being stored in the shared message database (one of the 255 in offmsg).
Message databases in offmsg are still limited in size to 2GB, and when you hit it, there is a documented bug. Basically, a failure to write to the maxed message DB will cause other messages destined for the same message DB (with attachments going to offiles) to fail, but the attachment is still written to offiles. The POA then tries again to store in the maxed DB, it fails, and the attachment is duplicated again. This duplication occurs at a rapid pace until all storage allocated to the post office is consumed. Add more storage prior to fixing the 2GB message DB, watch is disappear.
So my warning: Even if the pace of storage use appears normal, and you don't make it a habit of looking at the sizes of the 255 message DB's in \ofmsg, then stop, don't pass go, and go check them now. In fact, I would suggest sorting by size, take a snap shot of where everything is at, and check them again tomorrow. If you find ones that are growing, and they never reduce during a normal maint check, it's likely you have an Android device storing lots of messages there.
If you're on Linux and one of your message DB's hit the 2GB ceiling, your GroupWise system will be down. This is what happed to my GW system. Message DB hit 2GB, and within about an hour, 330GB was lost.
1) Upgrade to GroupWise 2012 SP1 - needs to be SP1 since the 2GB/Volume filled issue is only fixed in SP1.
1a) Pressure Novell to patch GW 8 - it is still under general support and should be subject to patches, yet fixes for both of these critical bugs were not included even it the last HP.
2) Turn off IMAP for everyone.
3) Find and selectively disabled IMAP for only Android devices - This is what I've been doing. If your clients are using IMAP via GWIA, use the web interface to look at the IMAP threads. You'll likely see the problematic devices running a IMAP command for the \Sent folder and searching for a message ID. I created a new class of service (Access control) that disabled IMAP and moved the Android users there. Once we got the Android people to delete the email setup, we moved them to the K-9 email application, and turned off the restriction.
4) Move to DataSync - this is of course problematic for EDU in that we are at the center of the BYOD movement, and depending on your FTE count and if you are residential or not, DataSync may not scale easily to the number of devices you have.
Clean Up: This is the hard part - There really aren't great tools for cleaning a mess such as this up. Gwcheck can do some of it, but it's very slow. I actually partnered with two other EDUs, and both had developed some in-house GW tools that they then augmented to help with this very situation. They too had the duplicated message issue.
I had two accounts with over 400K duplicated messages. One of them took a message db from ~400MB to 2GB and caused our outage. The 2nd large account took a 90MB message DB to over 1GB, this is the span of only two months. With the help of the tools, I then found over twenty other accounts with between 15K and 150K duplicated messages.
>>> On Thursday, December 13, 2012 at 8:50 AM, in message <3E409294-6418-4B83-A8E3-A1BDCC1646CB at gmail.com>, Danita Zanre <dzanre.ngwlist at gmail.com> wrote:
Sometimes android devices get stuck "posting" a sent item into the sent items folder, and it gets duplicated thousands of times. The message only gets sent once. This is the IMAP "post" command to show the item in the sent items folder.
The "fix" right now is to not allow android devices to do IMAP and make them do DataSync. I don't know if this is something Novell can fix in the IMAP code, or if this is an android problem in and of itself.
Sent from my iPhone
On Dec 13, 2012, at 9:41 AM, "Ben Knorr" <bknorr at westminstercollege.edu> wrote:
> I vaguely recall that there were some people on this list who have experienced incredible growth of data in their post offices that was eventually traced back to how certain Android devices do IMAP.
> I've got one post office that is growing at 10GB a week, and I'm adding 10GB or more each week. My linux Volume Group looks horrendous now and doling out more disks in VMware is getting interesting (I've got 6 disks on this box now).
> Any information as to how this is happening (or if I am completely off in my recollection of this Android/IMAP issue) or what is known about it and how to fix it would be greatly appreciated.
> We're on GW 8.0.3 103395 , SLES 11 x64, FWIW.
> ngw mailing list
> ngw at ngwlist.com
ngw mailing list
ngw at ngwlist.com
More information about the ngw