Wednesday, June 20, 2012

High Availablity (HA) - Aftermath


The HA has its own consequences. Higher server costs, memory costs, additional support system, regular updates, intensive monitoring and regular updates. This is not sufficient to end the cause. It will be a practical thought to have a HA system in place , along with disaster recovery only considering there is a possibility of one of the servers failing or hard disk crash. The various scenarios where you might want to have HA are considering:
1.       Corruption of Policy Server files / Authorization / ACLD servers
2.       Corruption of LDAP data
3.       Accidental Fat finger at root of one of the components removing critical files
4.       Corruption of itim data file
5.       Corruption of DB2 server
6.       Crash of DB2 primary server
7.       Crashing WAS / corruption of security.xml file
8.       Physical crash of any hard disk
9.       Natural calamity (like fire / floods ) rendering the entire data center useless
In all of the above situations, the HA, and HADR must be able to ensure business continuity, with little or no impact. Once we have encountered such a situation, and we pat our backs for winning out of the situation, we are now faced with a very difficult situation. The one which will become much more critical as the data and customization grows. Once you have come out of disasters in the work environment, you have the bigger challenge then HA itself. There are different parts which needs attention. Two of the most critical parts are DB2 and LDAP.  DB2 backups should be more regular and pushed outside the datacenter. Similar with LDAP backups. The ldap backups should be extensive, and should be couple with V3.schema file. There are other files which need to be backed up as indicated in the installation guide.
Once a few servers or data center is lost, we should be making very critical decisions. What will be the primary data center, which will be primary server, will the secondary / backup servers take the responsibility of the primary servers or one intend to build out a new primary server. This will be really an interesting choice and you would like to delay your decision on following topics:
1.       Was the secondary server at par with primary server?
2.       Did you have restricted features in failover server than in primary server
3.       Is the performance of secondary server lower than primary
4.       Is the new server build will have better configuration?
5.       The time to build the new servers from scratch (assuming you have the backups configuration for all the servers) so that you can be up in business quickly.
Assuming you gather all the information and decisions on time next is the implementation of these.  As I always say, the backup process should be bottom up. A copy of all the servers is maintained if possible, like getting a snapshot of initial configuration. Once the snapshot is taken, it acts as the baseline for future.  The entire LDAP should be delicately and very carefully taken in account, considering it has all the information you need in future. One important thing while taking the ldap backup is to make sure you have hashed out the password, sealed it out and then encrypt it. The encryption should be very private and out of hands of anyone, to ensure the security of data. This key will be needed if you have to rebuild the identical primary server as crashed. Once you have ensured you have ldap schema, keys and ldap entries. Make sure you back up all domains as well. For TAM LDAP make sure you have backed up secauthority=default, and o=company, dc=com as well.  All the workflows are saved up to this point, but all the customized java code still resides in the data folder, and should be backed up after each release. The data continuity can be maintained from secondary working server as well. Make sure that the new server is cryptographically in sync.
Another thing which you want to save and have a copy of, are the keystores and pass stores. This will be useful for you to have at least in one of the cases, I am going to tell. The certificates will be used for two ways SSL. You can use the certificates from secondary server, but one should make a detailed list of all the servers with SSL locations, and where all servers are defined, and where all clients are installed. The policy server if rebuilt has its own default certificates, which will need to go out, and update with the new one, and you will need to run bassconfig to establish the new connection. Make sure svrsslconfig and pdjrte are done correctly the first time. If the servers were running as non-root, and you need to re-build the servers, make sure to back up the startup scripts as well. It is a small thing but will help you once your system is up and roaring.

1 comment: