Friday, September 22, 2006

All about the Cache

Here a thing that will makeyour reports run faster!


Introduction to Cache

Decision support queries sometimes require large amounts of database processing. If you reduce the amount of database querying then you can speed up the time to produce reports. In an Analytics system the way to reduce database accesses is to create a ‘Cache’ of data on the same machine as the Analytics engine. The Siebel Analytics Server can save the results of a query in cache files and then reuse those results later when a similar query is requested. We refer to this as ‘query cache’. Using query cache, the cost of database processing only needs to be paid once for a query, not every time the query is run.

The query cache allows the Analytics Server to satisfy many subsequent query requests without having to access the back-end databases. This reduction in communication costs can dramatically decrease query response time.

However, as updates occur on the back-end databases (due to an ETL run), the query cache entries do not reflect the latest data in the warehouse; they become ‘stale’. Therefore, Siebel Analytics administrators need either to enable a method of updating the cash on a rolling basis, or purge it of old data and refresh it to a set schedule.

Caching may be considered to have small cost in terms of disk space to store the cache, and a small number of I/O transactions on the server but the this should be easily outweighed by the improvement in query response times. Caching does require some on going management in terms of limiting the cache size and ensuring the data is refreshed, but both these tasks can be controlled automatically.

Not all queries are suitable for cache use. A simple list of all accounts would not be a good dataset to cache as the cache file would be too large.

The Plan

1. Start with Cache enabled
Caching is proven to be generally beneficial so there is nothing to be gained from not enabling some form of caching from the outset. (exception noted above)
2. Develop a cache updating method.
Analytics has various methods to manage the cache generation and deletion.
3. Monitor query code to spot potential for improvement to the cache.
Check for slow running queries on an ongoing basis.


Cache Testing

It is prudent to test your reports to ensure that they do benefit from the cached data.

1. Test the response times on queries using cache
2. Test the response times on queries not using cache



Cache Methods

This section describes the available methods of ensuring the data in the cache is kept current.
For the cache to be effective as a means or of performance enhancement it needs to be populated with relevant data quickly, relying on this data being built up as a result of users queries is not practical where the data will be updated frequently. To speed population a process known as seeding can be implemented, this involves running carefully designed queries to populate the cache with the required data. This can be done in a number of ways but the most efficient is to build this functionality into the ETL process.
One of the costs, or disadvantages of caching is the potential for data latency, or a ‘stale’ cache, this occurs when the data in the cache is not purged after the data warehouse has been updated, there are several options available to deal with this.


Possible Methods
a. No Cache
b. Polling table
c. Caching enabled at table level
d. Manual Cache Management

a. No Cache - If no cache is used then every request for data will generate a new SQL query that will be applied to the SRMW, this increase network traffic, hugely increases the demands on the server and affects productivity. The speed at which the results are returned is governed by the speed of the database and the network, and the ability of the Analytics server to compute the parameters of the query.

b. Polling table - Data can be refreshed using an Event Table. This is a table in the database which is populated with an entry recording the details of a table when that table is updated by the ETL process, the Analytics Server polls the table and purges data from the cache if a table has been updated. This is useful where incremental ETL processes are run during the day, for instance to update sales data, the frequency with which the Analytics Server checks the polling table can be set to coincide with that of the ETL so data from the more frequently updated tables is purged from the cache more often to avoid out of date results to queries. Where incremental ETL is run once a day or overnight for instance, this approach is arguably less beneficial. The frequency with which the Analytics Server polls the event table is set in the Analytics Administration tool, Tools>Utilities>Siebel Event Tables. N.B The parameters for the Event table contain table names only and cannot contain an alias, when the data for a table is purged data from an alias of that table is not, this can lead to misleading results and an alterative purging strategy must be found for the alias.

c. Table Level - In it’s simplest form, caching can be enabled table by table within the data warehouse, by default all tables have caching enabled and the persistence time, the time the data is left in the cache, is infinite. To achieve the best combination of performance improvement whilst limiting disk space used by the cache, tables that are rarely queried can be deselected from this process, performance can be further enhanced by altering the persistence time to coincide with incremental ETL processes as with method b. The two methods can also be used in conjunction with each other.
It is still important to purge the cache otherwise you can be querying yesterday’s data, even though an ETL was run overnight.

d. Manual Management – A cache manager is available from within the repository when you are connected in online mode. To access the Cache Manager select Tools\ Manage\ Cache. Note that this option is greyed out unless you have enabled caching in the NQSConfig.ini file. Manual cache management involves the purging of cache physically by a user. It is not suitable for day to day operations, but can be handy during testing.


Purging Options

Invoking ODBC Extension Functions
The following ODBC functions affect cache entries associated with the repository specified by the ODBC connection. You can call these functions using the nqcmd.exe command-line executable.

The syntax of the call will be as follows:
nqcmd -d "Analytics Web" –u administrator –p sadmin –s purge.txt
Where purge.txt contains the call (for example, call SAPurgeAllCache()).

SAPurgeCacheByQuery. Purges a cache entry that exactly matches a specified query.
The following call programmatically purges the cache entry associated with this query:
Call SAPurgeCacheByQuery(‘select lastname, firstname from employee where salary >
100000’ );

SAPurgeCacheByTable. Purges all cache entries associated with a specified physical table
name (fully qualified) for the repository to which the client has connected.
This function takes up to four parameters representing the four components (database, catalog,
schema and table name proper) of a fully qualified physical table name. For example, you might have a table with the fully qualified name of DBName.CatName.SchName.TabName. To purge the cache entries associated with this table in the physical layer of the Siebel Analytics repository, execute the following call in a script:
Call SAPurgeCacheByTable( ‘DBName’, ‘CatName’, ‘SchName’, ‘TabName’ );
Wild cards are not supported by the Siebel Analytics Server for this function. Additionally,
DBName and TabName cannot be null. If either one is null, you will receive an error message.

SAPurgeAllCache. Purges all cache entries. The following is an example of this call:
Call SAPurgeAllCache();

SAPurgeCacheByDatabase. Purges all cache entries associated with a specific physical
database name. A record is returned as a result of calling any of the ODBC procedures to purge
the cache. This function takes one parameter that represents the physical database name and
the parameter cannot be null. The following is an example of this call:
Call SAPurgeCacheByDatabase( ‘DBName’ );

Siebel Analytics Scheduler
The SA Scheduler can be used for running general purpose scripts that extend the functionality of Siebel Analytics.
The script purgeSASCache is used to periodically purge all of the cache from the Siebel Analytics Server:
/////////////////////////////////////////////////////////
// purgeSASCache.js
//
// Purges the cache on SAS.
// Parameter(0) - The user name to pass in to NQCMD.
// Parameter(1) - The password for the aforementioned user.
/////////////////////////////////////////////////////////
// The full path to nqcmd.exe
var nqCmd = "D:\\SiebelAnalytics\\Bin\\nqcmd.exe";
// The data source name
var dsn = "Analytics Web";
// The user to execute the queries
var user = Parameter(0);
// The password of the aforementioned user
var pswd = Parameter(1);
// The ODBC procedure call for purging the cache
var sqlStatement = "{call SAPurgeAllCache()};";
//////////////////////////////////////////////////////////
// Returns a string from the file name
//////////////////////////////////////////////////////////
function GetOutput(fso, fileName)
{
var outStream = fso.OpenTextFile(fileName, 1);
var output = outStream.ReadAll();

outStream.Close();
return output;
}
//////////////////////////////////////////////////////////
// Get WshShell object and run nqCmd. Capture the output
// so that we can handle erroneous conditions.
var wshShell = new ActiveXObject("WScript.Shell");
// Create a temp file to input the SQL statement.
var fso = new ActiveXObject("Scripting.FileSystemObject");
var tempFolder = fso.GetSpecialFolder(2);
var tempInFileName = fso.GetTempName();
var tempOutFileName = fso.GetTempName();
tempInFileName = tempFolder + "\\" + tempInFileName;
tempOutFileName = tempFolder + "\\" + tempOutFileName;
var tempInFile = fso.CreateTextFile(tempInFileName, true);
tempInFile.WriteLine(sqlStatement);
tempInFile.Close();
try
{
// execute
var dosCmd = nqCmd + " -d \"" + dsn + "\" -u \"" + user
+ "\" -p \"" + pswd + "\" -s \"" + tempInFileName + "\"" +
" -o \"" + tempOutFileName + "\"";
wshShell.Run(dosCmd, 0, true);
var output = GetOutput(fso, tempOutFileName);
// Remove the temp files
fso.DeleteFile(tempInFileName);
if (fso.FileExists(tempOutFileName)) {
fso.DeleteFile(tempOutFileName);
}
// Check the output for any errors
if (output.indexOf("Processed: 1 queries") == -1) {
ExitCode = -1;
Message = output;
}
else if (output.indexOf("Encountered") != -1) {
ExitCode = -2;
Message = output;
}
else {
ExitCode = 0;
}
} catch (e) {

if (fso.FileExists(tempInFileName)) {
fso.DeleteFile(tempInFileName);
}
if (fso.FileExists(tempOutFileName)) {
fso.DeleteFile(tempOutFileName);
}
throw e;
}


Options for Seeding the Cache

Seeding the cache can be automated via an ODBC call in much the same way as purging.

The syntax of the call will be as follows:
nqcmd -d "Analytics Web" –u administrator –p sadmin – file.sql

Where file sql is a script containing sql select statements to populate the cache, ideally these will be a super set of the queries issued by the requests in Siebel Answers and dashboards deemed most likely to derive maximum performance gains from using the cache.

Siebel Delivers can also be used to seed the cache automatically, the advantage of this is that iBots can trigger the specific requests that are required for caching, this should carry further performance benefits.


NQSConfig.ini

The NQSConfig.ini file located in the $SiebelAnalytics\Config folder needs to be modified in order to activate and parameterise Analytics Caching. Below are settings for caching with cache tables of an expected size of approximately 250 000 records.

Parameter Value Comments

ENABLE
YES
Turns overall Caching on or off

DATA_STORAGE_PATHS
“C:\SiebelAnalyticsData\Temp\Cache 500 MB”
For optimal performance, the directories specified should be on high performance storage systems.

METADATA_FILE
"C:\SiebelAnalytics\metadata_file.dat"
The filename needs to be a fully qualified pathname with a .dat extension.

REPLACE_ALGORITHM
LRU
Currently, the only supported algorithm is LRU (Least Recently Used).

BUFFER_POOL_SIZE
20 MB
Specifies the amount of physical memory Analytics can use to store Cache files, in. More cache in memory means better response time.

MAX_ROWS_PER_CACHE_ENTRY
0
When set to 0, there is no limit to the number of rows per cache entry.

MAX_CACHE_ENTRY_SIZE
10 MB
Maximum size of a specific cache entry on the Physical drive.

POPULATE_AGGREGATE_ROLLUP_HITS
YES
Specifies whether aggregate cache files should be created. Setting this parameter to TRUE may result in better performance, but results in more entries being added to the cache.

METADATA_BACKUP_FREQUENCY_MINUTES
1440
This creates a backup file with a .bak extension. There will only ever be one of these.


Hope this helps

2 comments:

Lost_in_Space said...

Greetings Adrian,

Have you ever utilise caching with very specific data visibility rules?

In a nutshell we have specific rules based on the users place in the hierarchy but the visi rules are defined different columns so the SQL isn't the same? I'm guessing that the cache won't hit as the SQL is different.

I think we will have to allow our tuning gurus loose on the code....

sid said...

Adrian,


Thanks!! That is very informative and helpful. Regarding the javascript method, I think it calls ActiveX objects and that is somewhat taboo in most security aware enterprise environment. Alternatively a batch script could be written and scheduled with a windows scheduler (or shell script and at tool in *nix environments).

Just some random thoughts!!!

Please keep up the good work. Can there be any good post related to web messages that OBIEE uses (i.e. how the tool uses XML and web messages to provide functionalities like write-back and "act-as" proxies)? Also some examples of using XMLA source will be highly appreciated.

Regards,
Sid

The Cowes

The Cowes
Cowes Racing

Sudoku