The HA has its own consequences. Higher server costs, memory
costs, additional support system, regular updates, intensive monitoring and
regular updates. This is not sufficient to end the cause. It will be a
practical thought to have a HA system in place , along with disaster recovery
only considering there is a possibility of one of the servers failing or hard
disk crash. The various scenarios where you might want to have HA are considering:
1.
Corruption of Policy Server files /
Authorization / ACLD servers
2.
Corruption of LDAP data
3.
Accidental Fat finger at root of one of the
components removing critical files
4.
Corruption of itim data file
5.
Corruption of DB2 server
6.
Crash of DB2 primary server
7.
Crashing WAS / corruption of security.xml file
8.
Physical crash of any hard disk
9.
Natural calamity (like fire / floods ) rendering
the entire data center useless
In all of the above situations, the HA, and HADR must be
able to ensure business continuity, with little or no impact. Once we have
encountered such a situation, and we pat our backs for winning out of the situation,
we are now faced with a very difficult situation. The one which will become
much more critical as the data and customization grows. Once you have come out
of disasters in the work environment, you have the bigger challenge then HA
itself. There are different parts which needs attention. Two of the most
critical parts are DB2 and LDAP. DB2
backups should be more regular and pushed outside the datacenter. Similar with
LDAP backups. The ldap backups should be extensive, and should be couple with
V3.schema file. There are other files which need to be backed up as indicated
in the installation guide.
Once a few servers or data center is lost, we should be
making very critical decisions. What will be the primary data center, which
will be primary server, will the secondary / backup servers take the responsibility
of the primary servers or one intend to build out a new primary server. This
will be really an interesting choice and you would like to delay your decision
on following topics:
1.
Was the secondary server at par with primary server?
2.
Did you have restricted features in failover
server than in primary server
3.
Is the performance of secondary server lower
than primary
4.
Is the new server build will have better configuration?
5.
The time to build the new servers from scratch (assuming
you have the backups configuration for all the servers) so that you can be up
in business quickly.
Assuming you gather all the information and decisions on time
next is the implementation of these. As I
always say, the backup process should be bottom up. A copy of all the servers is
maintained if possible, like getting a snapshot of initial configuration. Once
the snapshot is taken, it acts as the baseline for future. The entire LDAP should be delicately and very
carefully taken in account, considering it has all the information you need in
future. One important thing while taking the ldap backup is to make sure you have
hashed out the password, sealed it out and then encrypt it. The encryption
should be very private and out of hands of anyone, to ensure the security of
data. This key will be needed if you have to rebuild the identical primary
server as crashed. Once you have ensured you have ldap schema, keys and ldap entries.
Make sure you back up all domains as well. For TAM LDAP make sure you have
backed up secauthority=default, and o=company, dc=com as well. All the workflows are saved up to this point,
but all the customized java code still resides in the data folder, and should
be backed up after each release. The data continuity can be maintained from
secondary working server as well. Make sure that the new server is
cryptographically in sync.
Another thing which you want to save and have a copy of, are
the keystores and pass stores. This will be useful for you to have at least in
one of the cases, I am going to tell. The certificates will be used for two ways
SSL. You can use the certificates from secondary server, but one should make a
detailed list of all the servers with SSL locations, and where all servers are
defined, and where all clients are installed. The policy server if rebuilt has
its own default certificates, which will need to go out, and update with the new
one, and you will need to run bassconfig to establish the new connection. Make sure
svrsslconfig and pdjrte are done correctly the first time. If the servers were
running as non-root, and you need to re-build the servers, make sure to back up
the startup scripts as well. It is a small thing but will help you once your
system is up and roaring.