Skip links

SiteMinder Pro-Tip: Troubleshooting Poor Authorization Performance – Part 1

One of the most difficult things to do with SiteMinder is determine the root cause of poor authorization (Az) performance. In this post, we will enumerate the direct contributors to poor Az performance and we will post a second article later that provides details about indirect contributors. We will also provide a few tips that can assist you during troubleshooting.

Direct Contributors

These areas directly impact the performance of user authorizations.

User Store Performance

The performance of the user store can greatly impact the performance of the SiteMinder infrastructure. If the user store is slow to respond to requests it will impact the number of authentications and authorizations SiteMinder is able to process in a given amount of time.

It’s important to understand the max throughput that the user store is capable of handling, as well as the max number of connections the user store can support. For database user stores, the DB administrator should be able to answer the following questions:

  • How many user lookup queries can the database perform per minute?
  • How many connections can the userid used by SiteMinder open to database simultaneously?

Analyze User Store Response Time

To determine how many connections are established between the policy server and user store, do the following:

  • Use netstat and grep for the user store ports that are ESTABLISHED
  • Compare the results to the average number of connections
  • Evaluate User Store Response — get any performance metrics from the user store team
  • Execute a traceroute from each of the policy servers to the user store and record results
  • Execute a ping from each of the policy servers to the user store and record results

Side Note: We often ask customers what is the normal response time associated with the environment and the majority do not know. When the response times are available, SIS leverages synthetic transaction-based monitoring to collect the metrics needed for comparison.

Network Connections

An increase in network connections for the web server, policy server, or user store can greatly impact authorization performance. To gauge the possible impact, conduct an audit of existing network connections. The following command can be used on Linux to summarize the ESTABLISHED connections:

netstat -apn | grep ESTABLISHED | awk '{ print $7 }' | sort | uniq -c | sort -nr

Sample Output (LDAP Server): (# Connections, Process ID/program)

10 905/ns-slapd 
7 1132/dxserver 
4 904/ns-slapd 
4 1133/dxserver 
2 1556/NetworkWatcher 
1 25135/sshd: 
1 1037/java

This displays the number of connections, their associated process ID, and the name of the application utilizing the connection. This type of information can be extremely useful when troubleshooting a problem, but you must have the historical data to provide a complete picture.

A server with too many connections in a TIME_WAIT state will cause poor performance for applications waiting for new connections to become available from the operating system. As such, it is important to determine how many connections are in this state. The following commands will provide a summary of the connections that are in this state.

Windows: netstat -an | find “TIME_WAIT”

Linux: netstat -an | grep TIME_WAIT

The techniques outlined above should be used for all the servers that comprise the SiteMinder infrastructure. Looking at all servers in the environment will give you the best possible understanding of where bottlenecks may exist.

Cache Thrashing (Session/Resource)

When resource or session cache is too small, the web agent can become overburdened with managing the cache. For example, the most recently used resources will remain in the cache. If the protected resources far exceed the resource cache size, the web agent will discard the least used resource in the cache. For a severely undersized cache, the resource that was discard may be only a few minutes old and, as a result, may be accessed again by active users. This would cause the resource to be added back to the cache and the dumping of another resource, thus causing cache “thrashing”.

In order to avoid cache thrashing, we need to determine if the resource or session caches are sized appropriately. To accomplish this, you need the following information:

  • Resource cache cize
  • Resource cache utilization
  • Session cache size
  • Session cache utilization

It is important to make sure that the cache is not too small or too large. If the cache is too large, it can degrade the performance of the web server because the web agent will pre-allocate web server memory for the session and resource cache. If not properly calculated, this could result in a resource constraint.

User Load

Before a problem occurs, it is important to understand what the average user load is in order to compare the past or expected load with the current load.

Quantify User Load

Monitoring the session cache utilization will provide a rough estimate for the current user load, but it is not as accurate as collecting the information directly from the web servers.

Open Web Server Connections

We recommend using the netstat command to collect the number of active web server connections. The commands below can assist in this task:

Windows:  netstat -an | find "443" | find -c "ESTABLISHED"
Linux: netstat -an | grep "443" | grep -c "ESTABLISHED"

Another key metric is the number of connections that have a TIME_WAIT status, as the quantity of these could indicate that the web server is waiting for the operating system to make additional connections available.

Windows:  netstat -an | find ":443" | find /C "TIME_WAIT"
Linux:    netstat -an | grep ":443" | grep -c "TIME_WAIT"

Wrap Up

Our next post will provide an overview of the indirect contributors to poor authorization performance.