IBM Tivoli

Wednesday, June 20, 2012

High Availablity (HA) - Aftermath

The HA has its own consequences. Higher server costs, memory costs, additional support system, regular updates, intensive monitoring and regular updates. This is not sufficient to end the cause. It will be a practical thought to have a HA system in place , along with disaster recovery only considering there is a possibility of one of the servers failing or hard disk crash. The various scenarios where you might want to have HA are considering:

1. Corruption of Policy Server files / Authorization / ACLD servers

2. Corruption of LDAP data

3. Accidental Fat finger at root of one of the components removing critical files

4. Corruption of itim data file

5. Corruption of DB2 server

6. Crash of DB2 primary server

7. Crashing WAS / corruption of security.xml file

8. Physical crash of any hard disk

9. Natural calamity (like fire / floods ) rendering the entire data center useless

In all of the above situations, the HA, and HADR must be able to ensure business continuity, with little or no impact. Once we have encountered such a situation, and we pat our backs for winning out of the situation, we are now faced with a very difficult situation. The one which will become much more critical as the data and customization grows. Once you have come out of disasters in the work environment, you have the bigger challenge then HA itself. There are different parts which needs attention. Two of the most critical parts are DB2 and LDAP. DB2 backups should be more regular and pushed outside the datacenter. Similar with LDAP backups. The ldap backups should be extensive, and should be couple with V3.schema file. There are other files which need to be backed up as indicated in the installation guide.

Once a few servers or data center is lost, we should be making very critical decisions. What will be the primary data center, which will be primary server, will the secondary / backup servers take the responsibility of the primary servers or one intend to build out a new primary server. This will be really an interesting choice and you would like to delay your decision on following topics:

1. Was the secondary server at par with primary server?

2. Did you have restricted features in failover server than in primary server

3. Is the performance of secondary server lower than primary

4. Is the new server build will have better configuration?

5. The time to build the new servers from scratch (assuming you have the backups configuration for all the servers) so that you can be up in business quickly.

Assuming you gather all the information and decisions on time next is the implementation of these. As I always say, the backup process should be bottom up. A copy of all the servers is maintained if possible, like getting a snapshot of initial configuration. Once the snapshot is taken, it acts as the baseline for future. The entire LDAP should be delicately and very carefully taken in account, considering it has all the information you need in future. One important thing while taking the ldap backup is to make sure you have hashed out the password, sealed it out and then encrypt it. The encryption should be very private and out of hands of anyone, to ensure the security of data. This key will be needed if you have to rebuild the identical primary server as crashed. Once you have ensured you have ldap schema, keys and ldap entries. Make sure you back up all domains as well. For TAM LDAP make sure you have backed up secauthority=default, and o=company, dc=com as well. All the workflows are saved up to this point, but all the customized java code still resides in the data folder, and should be backed up after each release. The data continuity can be maintained from secondary working server as well. Make sure that the new server is cryptographically in sync.

Another thing which you want to save and have a copy of, are the keystores and pass stores. This will be useful for you to have at least in one of the cases, I am going to tell. The certificates will be used for two ways SSL. You can use the certificates from secondary server, but one should make a detailed list of all the servers with SSL locations, and where all servers are defined, and where all clients are installed. The policy server if rebuilt has its own default certificates, which will need to go out, and update with the new one, and you will need to run bassconfig to establish the new connection. Make sure svrsslconfig and pdjrte are done correctly the first time. If the servers were running as non-root, and you need to re-build the servers, make sure to back up the startup scripts as well. It is a small thing but will help you once your system is up and roaring.

Tuesday, June 19, 2012

Re-Configuring TDI pdjrte with Policy server

One of the most painful things is to have pdjrte installed incorrectly for the first time. If everything is installed as per needs and correctly in one time success , the journey is sweet. If not , or just in case you have to make any change , its more easy to unconfigure and then configure the system back. Trying to get one component, then second and so on will make you lose the mission critical time. To configure the pdjrte with already configured policy server be:
[root@tdiserver sbin]# ./pdjrtecfg -action status
HPDBF0031E This Java Runtime Environment has already been configured.
[root@tdiserver sbin]# ./pdjrtecfg -action name
Access Manager Runtime for Java
[root@tdiserver sbin]# ./pdjrtecfg -action unconfig
Unconfiguration of:
Access Manager Runtime for JavajvmHome: //V7.0/jvm/jre
is in progress. This might take several minutes.

Unconfiguration of: Access Manager Runtime for Java
completed successfully.

[root@tdiserver sbin]# ./pdjrtecfg -action config -interactive
Specify the full path of the Java Runtime Environment (JRE)
to configure for Tivoli Access Manager [//V7.0/jvm/jre]://V7.0/jvm/jre/

Enter 'full' or 'standalone' for the configuration type [full]:

Policy server host name [tdiserver]: policyserverfullname.company.com

Tivoli Access Manager policy server port number [7135]:

Enter the Access Manager policy server domain [null]:

The domain contains invalid characters.

Enter the Access Manager policy server domain [null]: null

Tivoli Common Directory logging is currently configured.
You may enable this application to use Tivoli Common Directory logging
using the currently configured directory for log files.

Do you want to use Tivoli Common Directory logging (y/n) [n]? y

Log files for this application will be created
in directory: /var/ibm/tivoli/common

Configuration of Access Manager Runtime for Java is in progress.
This might take several minutes.
Configuration of Access Manager Runtime for Java completed successfully.
[root@tdiserver sbin]#
---
This completes the pdjrte re-configuration to prevent any mishaps and make sure it works correctly

Monday, June 11, 2012

Adding Chain Certificate to Webseal

To add a certificate to the webseal server is clearly defined in the webseal administration guide. What I faced a big challenge was to add a chain certificate. I had no clue what is the process or how should I go about doing it.
To start with , chain certificates are no different than regular certificates, just they are more in number and follow a particular pattern. There is a root certificate, and a signer one. There can also be an intermediate signer, or more levels as the security would demand.
The root certificate goes in first , then intermediate signer and then the signer. Before you start with adding the certificate, make sure you have the following things with you :
1. All root and signer certs
2. Location of keystore and truststore , with the passwords for both
3. Permission to run the commands and add certificates to the server
4. The location of webseal pdsrv file .

Many of the people use iKeyman or gsk7cmd kit , you can use anyone of those. most People like to use export display to add the certificates , but my system was responding too slow for display that i used command line interface ( my first choice of use as well) .

Adding chain certificates to webseal server::
Exact commands as run on the server.
--------------------
[root@servername /]# export JAVA_HOME=/opt/IBM/WebSphere/AppServer/java/jre/
[root@servername/]# export PATH=$JAVA_HOME/bin:$PATH
[root@servername/]# gsk7cmd -cert -list -db /var/pdweb/www-default/certs/pdsrv.kdb -pw
Certificates in database /var/pdweb/www-default/certs/pdsrv.kdb:
   WebSEAL-Test-Only
   RSA Secure Server Certification Authority
   Thawte Personal Basic CA
   Thawte Personal Freemail CA
   Thawte Personal Premium CA
   Thawte Premium Server CA
   Thawte Server CA
   Verisign Class 1 Public Primary Certification Authority
[root@servername/]# gsk7cmd -cert -add -db /var/pdweb/www-default/certs/pdsrv.kdb -pw -file /tmp/RootCertificate.cer -label WEbsealRootCertificate
[root@servername/]# gsk7cmd -cert -add -db /var/pdweb/www-default/certs/pdsrv.kdb -pw -file /tmp/Root/CertificateFile.cer -label websealChaincertificate
--------
One of the very critical commands here is listing the already existing certificates. This step would enable you to determine whether you are hitting the right keystore, have correct creds and possibly the correct permission to go for the operation.

This is specific to webseal , but can be repeated over all of Tivoli identity manager , websphere application server or any server.

Saturday, June 9, 2012

HA - Reports and Log monitoring

One of the most critical part of the HA is the reports. Possibly the most under-valued , yet the most powerful tool are reports. Reports can be run from various parts of the system. Equally , if not more , important are the logs of various components in ITIM / TAM which make up for the system, and should be monitored very closely. The logs and reports generally provide a heads-up , and system health check to understand and estimate any possible memory leaks / performance issues or malicious attack on the server or the system. When we talk about making the system HA with no / scheduled pre-determined downtime, we need to understand that we still need to bring down each component of the system one by one , repair it ( fixes /maintenance ) and then take down the other server.
What is the best time for maintenance :
If the majority of user base is corporate then weekend might be considered a good option to consider, but its here where the reports come to play in. Monitoring the request logs in webseal can help you estimate the peak time and the traffic strength , essential for determining maintenance window.
Reports : Linux reports can be downloaded directly from server logs , /var/logs directory or if the financial condition allows , we can go forward and install any of the major reporting tools Tivoli Monitoring (ITCAM) / Prognosis / Wiley / iiAgent are few popular you would like to consider. ITCAM stands as my personal choice , for its ease in configuring.
If you have installed ITIM , then reports can be generated from Console homepage itself ( given your admin has given you access to the reports). If you are admin , navigate to Reports --> data synchronization . Click on Run Syncronization Now. Run the reports. Refresh synchronization status. Once you have done with sync, you can generate the reports. Make sure you have orphan reports produced before anything else
Another important aspect is the webseal Log monitoring , LDAP monitoring ( both ITIM / ITAM).
Webseal log is divided into request logs . agent logs and referrer logs. All these logs are sufficient to pinpoint the intrusion. Regular study of logs is required to keep the system healthy. LDAP logs enable to keep intact the enteries , making sure that errors are not encountered.
DB2 logs can be monitored using TDS web interface, even though not essential on day to day basis, but one should keep a look at them frequently, just to make sure all the processes are running fine.
In next article i will be writing the technical details where the logs are , and common things inside the log one should be looking for.

Sunday, May 20, 2012

HA - Illusion or realization

So lets say this new hyper-phobia of making the system HA catches you. You go, hire a few professionals to make the system HA, how do you test if your system is a true HA or not. Well , lets get it straight first time, IBM TIM is made up of java classes, has a LDAP server , a DB2 , a directory server and TDI in its most simplest of installations. You would be adding up with TAM combo adapter , which means webseal, policy server, authorization server as well. When you talk about HA , you want to have a backup of most of these components.
Few of these are critical , like mission critical, which if fail will stop the entire functioning of the system. Identification of these is purely dependent on the business logic, and functioning of the system. If your system runs the recon once daily / weekly, and then most of the transactions happen over ITIM server in itself , you might want to replicate your LDAP / Directory Server and DB2 ( HADR) , while TDI can be taken care of later. One has to understand, the team can go and have all the components. If you have a huge number of requests coming from Webseal , and need to authorize using ITIM, then you will need to make the TDI HA. Similarly , multiple Webseals might be needed for heavy traffic streamlining , but the policy and authorization server can be up only one at a time. If you don't have daily changing ACLs , then this piece does not need to be HA. But if you have a TAI++ configuration set up, you can't afford for this piece to go down either. Regardless of the thought, lets say you have a HA setup with two LDAP (master -master) , two directory server, two DB2 ( HADR) , and two TDI servers as well. How would be start testing. What is the most basic test one can perform.
The first and most basic test is ,if you can , turn off the primary servers, one component at time , and keep on running transactions. By turning off one of the servers the hard way , you know if the system is truly HA. Once this step is complete, then starts the alternate shutting down of services , and restarting them while hitting the tests, and watching the logs .
One of the most important things while doing the HA is to look after the logs, they help you determine if the request is routed to the correct side. The LDAP should essentially be master -master , and more the entries, more should be the number of LDAP servers. Regular backup of LDAP servers is an essential tool, which will help to troubleshoot any unforeseen activity.
Getting the system to HA is not impossible in the books, but one should understand the fact, ITIM is essentially an EAR file deployed on application server, which will have memory leaks. Any code ,custom added will add to the complexity, and the irregular code would make it worse. Additionally DB2 sitting as the backbone should be properly configured, and steps as mentioned in Performance tuning guide be strictly followed.
I will try to push out the basic HA steps as well , but to understand HA is something which can be achieved, and sustained, but it comes with nice hardware costs , and understanding that it might be HA, it still can fail. There are too many pieces which can cause it stop functioning.

Tuesday, April 24, 2012

HA

One of the key features that have spun up in recent time is the concept of high availability. The desire to have a system which will never go down. There are bigger risks and greater costs associated with it. One of the major consideration people miss out is the infrastructure. An infrastructure built up with very high availability should be aware of certain things in the organization :

1. Highly skilled admins ( Linux / Windows / Solaris / AIX )

High SLA for uptime of the servers
High SLA for problem resolution
High Disk Space for any coredumps that might be created for any app error
High performance and a back up system for applying fix packs

One of the major reasons sometimes HA is not pursued by organizations and is not a success is because of technologies sitting beneath the stack. Lets say if I want to have ITIM (Tivoli identity manager ) as HA , other than making all components of it to be HA, like LDAP , TDS , DB2 , I should be focused on WAS or Weblogic on which the application is deployed. A broken WAS will result in breach of SLA or HA. Then Tivoli Identity Manager is full of Java classes, which has memory leaks. The trouble increased when there is custom code in Workflows or person account using java statements and printing out outputs to system console. A not well tuned application code can be one of the biggest challenges in HA.

Once these challenges are faced, we have to make sure that each component in the application is performance tuned to perform the work it requires to do. The term tuning essentially means “best setting for the system” and not best setting in general. A 32 bit machine should not have a JVM of 4096kB , regardless.

Once all these steps are performed , then there is a need to understand how the things will shape up in real world. Here comes the biggest problem , the data. Here we have three challenges , data integrity , data warehouse , and data store. The speed at which data can be accessed , modified and use should not be at the cost of data security or HA , and otherwise.

Let us see in coming days how HA , Data security and Data Accessing should be tackled.

Wednesday, March 14, 2012

Server List in Webseal

Login to Webseal machin :
pdadmin> login
Enter user name:
Enter password :
-----
The first thing i recommend you to list the servers available in your webseal
pdadmin sec_master> server list

You will get a list of servers ... find the entry that looks like :
default-webseald-server01