Friday, September 19, 2008

Foundations autonomic computing and IBM support

This is a public thank you to Robert on the Foundations Support Line in Canada and to Matt Webb at IBM in Australia for helping me with my current support issue. Matt has been researching this problem in his own time (and I've got the time-stamped emails to prove it!) and together with long distance support from Robert we have almost beaten the problem.

The problem was that a customer's Foundations server suddenly stopped accepting SMTP mail. The Domino SMTP task was active and internal<->internal mail was being delivered as was internal->external mail. That server had been running fine for a couple of months and it was a complete mystery as to what had caused the problem. In the end we told the server to run a Netscan command to automatically detect its network connections and rebuild its internet access routes.

So that's what it did.

Maybe I impress too easy but I found that to be quite an elegant piece of autonomic computing. I wish organizing my personal finances was that easy. "OK Tax Return! It's the end of the Financial Year. Go away and inspect my accounting environment and fill in your blanks then tell me when you're ready to be sent to the Tax Office."

The original issue isn't completely resolved yet since the customer has switched their mail to another email server for the duration. Now we need to switch it back and run some mail-delivery tests but I think we're on the Home Stretch.

Anyway, I want to give public recognition and thanks where it's due.

Thanks again Matt and Robert. I hope your Managers read this.
.

5 comments:

Bilal Jaffery said...

Blogged about it bud, thanks!

http://www.bilal.ca/great-example-of-foundations-support/

Joe Nitix said...

I'd agree that the Foundations technical support is fantastic, however I have to question your other claims. If the product is really so autonomic, would it really have required research on behalf of a support guru on his own time, then manual intervention to get the software to work correctly?

Graham Dodge said...

Hi Joe,

The research was to determine the root cause of the problem and that was work done in Matt's own time since he was already busy full time that week working at the Lotus Collaboration Summit in Sydney. Since this request was outside of the normal Lotus support channel Matt could have told me I had to wait till next week before he could look at it. Matt chose to go the extra mile and it seemed appropriate to thank him for that effort.

Once we had a handle on the problem we told the server to fix itself regarding network routes. It makes sense to me to isolate what the problem is *before* you try to fix it or do you think we should have immediately told the server to set absolutely *everything* back to the default settings?

So yes, it did require research to identify the problem and then manual intervention to tell the server to fix it. The autonomic part was where the server *did* fix the problem.

I think that's a better model than Microsofts automated 'download-the-patch-then-install-it-and-reboot-your-server-whether-you-want-it-or-not' approach.

Joe Nitix said...

I don't disagree with you that Foundations is much better at fixing problems than Microsoft software ever could. However, if the software were truly autonomic, it should have fixed itself, or perhaps have prevented the issue from ever happening. I could understand if it were a hardware problem or a problem with the network, but it seems from your description that it was an issue with how the networking on Foundations was setup.

Don't take my critiques as overly negative, as I'm a huge fan of Lotus Foundations. I think perhaps I'm just trying to balance out perceptions with reality. Yes, Foundations is fantastic at fixing its own issues, auto-detecting network configurations, recovering from crashes, etc, but it like any other software can have its flaws too. Rather than point at those flaws and indicate how fantastic they are as features, we can instead point to the Foundations support and dev teams and how responsive they were, which you did. Just adding some yin to your yang. :)

Graham Dodge said...

There were some hardware configuration issues we needed to address. It was *intriging* that LFS worked perfectly for three months despite those errors and I don't blame the software for finally spitting the dummy.

Now we've found and fixed those config errors and moved on. It's all a learning experience.