Sunday, May 20, 2012

HA - Illusion or realization

So lets say this new hyper-phobia of making the system HA catches you. You go, hire a few professionals to make the system HA, how do you test if your system is a true HA or not. Well , lets get it straight first time, IBM TIM is made up of java classes, has a LDAP server , a DB2 , a directory server and TDI in its most simplest of installations. You would be adding up with TAM combo adapter , which means webseal, policy server, authorization server as well. When you talk about HA , you want to have a backup of most of these components.
Few of these are critical , like mission critical, which if fail will stop the entire functioning of the system. Identification of these is purely dependent on the business logic, and functioning of the system. If your system runs the recon once daily / weekly, and then most of the transactions happen over ITIM server in itself , you might want to replicate your LDAP / Directory Server and DB2 ( HADR) , while TDI can be taken care of later. One has to understand, the team can go and have all the components. If you have a  huge number of requests coming from Webseal , and need to authorize using ITIM, then you will need to make the TDI HA. Similarly , multiple Webseals might be needed for heavy traffic streamlining , but the policy and authorization server can be up only one at a time. If you don't have daily changing ACLs , then this piece does not need to be HA. But if you have a TAI++ configuration set up, you can't afford for this piece to go down either. Regardless of the thought, lets say you have a HA setup with two LDAP (master -master) , two directory server, two DB2 ( HADR) , and two TDI servers as well. How would be start testing. What is the most basic test one can perform.
The first and most basic test is ,if you can , turn off the primary servers, one component at time , and keep on running transactions. By turning off one of the servers the hard way , you know if the system is truly HA. Once this step is complete, then starts the alternate shutting down of services , and restarting them while hitting the tests, and watching the logs .
One of the most important things while doing the HA is to look after the logs, they help you determine if the request is routed to the correct side. The LDAP should essentially be master -master , and more the entries, more should be the number of LDAP servers. Regular backup of LDAP servers is an essential tool, which will help to troubleshoot any unforeseen activity.
Getting the system to HA is not impossible in the books, but one should understand the fact, ITIM is essentially an EAR file deployed on application server, which will have memory leaks. Any code ,custom added will add to the complexity, and the irregular code would make it worse. Additionally DB2 sitting as the backbone should be properly configured, and steps as mentioned in Performance tuning guide be strictly followed.
I will try to push out the basic HA steps as well , but to understand HA is something which can be achieved, and sustained, but it comes with nice hardware costs , and understanding that it might be HA, it still can fail. There are too many pieces which can cause it stop functioning.