Friday, September 22, 2006

All about the Cache

Here a thing that will makeyour reports run faster!


Introduction to Cache

Decision support queries sometimes require large amounts of database processing. If you reduce the amount of database querying then you can speed up the time to produce reports. In an Analytics system the way to reduce database accesses is to create a ‘Cache’ of data on the same machine as the Analytics engine. The Siebel Analytics Server can save the results of a query in cache files and then reuse those results later when a similar query is requested. We refer to this as ‘query cache’. Using query cache, the cost of database processing only needs to be paid once for a query, not every time the query is run.

The query cache allows the Analytics Server to satisfy many subsequent query requests without having to access the back-end databases. This reduction in communication costs can dramatically decrease query response time.

However, as updates occur on the back-end databases (due to an ETL run), the query cache entries do not reflect the latest data in the warehouse; they become ‘stale’. Therefore, Siebel Analytics administrators need either to enable a method of updating the cash on a rolling basis, or purge it of old data and refresh it to a set schedule.

Caching may be considered to have small cost in terms of disk space to store the cache, and a small number of I/O transactions on the server but the this should be easily outweighed by the improvement in query response times. Caching does require some on going management in terms of limiting the cache size and ensuring the data is refreshed, but both these tasks can be controlled automatically.

Not all queries are suitable for cache use. A simple list of all accounts would not be a good dataset to cache as the cache file would be too large.

The Plan

1. Start with Cache enabled
Caching is proven to be generally beneficial so there is nothing to be gained from not enabling some form of caching from the outset. (exception noted above)
2. Develop a cache updating method.
Analytics has various methods to manage the cache generation and deletion.
3. Monitor query code to spot potential for improvement to the cache.
Check for slow running queries on an ongoing basis.


Cache Testing

It is prudent to test your reports to ensure that they do benefit from the cached data.

1. Test the response times on queries using cache
2. Test the response times on queries not using cache



Cache Methods

This section describes the available methods of ensuring the data in the cache is kept current.
For the cache to be effective as a means or of performance enhancement it needs to be populated with relevant data quickly, relying on this data being built up as a result of users queries is not practical where the data will be updated frequently. To speed population a process known as seeding can be implemented, this involves running carefully designed queries to populate the cache with the required data. This can be done in a number of ways but the most efficient is to build this functionality into the ETL process.
One of the costs, or disadvantages of caching is the potential for data latency, or a ‘stale’ cache, this occurs when the data in the cache is not purged after the data warehouse has been updated, there are several options available to deal with this.


Possible Methods
a. No Cache
b. Polling table
c. Caching enabled at table level
d. Manual Cache Management

a. No Cache - If no cache is used then every request for data will generate a new SQL query that will be applied to the SRMW, this increase network traffic, hugely increases the demands on the server and affects productivity. The speed at which the results are returned is governed by the speed of the database and the network, and the ability of the Analytics server to compute the parameters of the query.

b. Polling table - Data can be refreshed using an Event Table. This is a table in the database which is populated with an entry recording the details of a table when that table is updated by the ETL process, the Analytics Server polls the table and purges data from the cache if a table has been updated. This is useful where incremental ETL processes are run during the day, for instance to update sales data, the frequency with which the Analytics Server checks the polling table can be set to coincide with that of the ETL so data from the more frequently updated tables is purged from the cache more often to avoid out of date results to queries. Where incremental ETL is run once a day or overnight for instance, this approach is arguably less beneficial. The frequency with which the Analytics Server polls the event table is set in the Analytics Administration tool, Tools>Utilities>Siebel Event Tables. N.B The parameters for the Event table contain table names only and cannot contain an alias, when the data for a table is purged data from an alias of that table is not, this can lead to misleading results and an alterative purging strategy must be found for the alias.

c. Table Level - In it’s simplest form, caching can be enabled table by table within the data warehouse, by default all tables have caching enabled and the persistence time, the time the data is left in the cache, is infinite. To achieve the best combination of performance improvement whilst limiting disk space used by the cache, tables that are rarely queried can be deselected from this process, performance can be further enhanced by altering the persistence time to coincide with incremental ETL processes as with method b. The two methods can also be used in conjunction with each other.
It is still important to purge the cache otherwise you can be querying yesterday’s data, even though an ETL was run overnight.

d. Manual Management – A cache manager is available from within the repository when you are connected in online mode. To access the Cache Manager select Tools\ Manage\ Cache. Note that this option is greyed out unless you have enabled caching in the NQSConfig.ini file. Manual cache management involves the purging of cache physically by a user. It is not suitable for day to day operations, but can be handy during testing.


Purging Options

Invoking ODBC Extension Functions
The following ODBC functions affect cache entries associated with the repository specified by the ODBC connection. You can call these functions using the nqcmd.exe command-line executable.

The syntax of the call will be as follows:
nqcmd -d "Analytics Web" –u administrator –p sadmin –s purge.txt
Where purge.txt contains the call (for example, call SAPurgeAllCache()).

SAPurgeCacheByQuery. Purges a cache entry that exactly matches a specified query.
The following call programmatically purges the cache entry associated with this query:
Call SAPurgeCacheByQuery(‘select lastname, firstname from employee where salary >
100000’ );

SAPurgeCacheByTable. Purges all cache entries associated with a specified physical table
name (fully qualified) for the repository to which the client has connected.
This function takes up to four parameters representing the four components (database, catalog,
schema and table name proper) of a fully qualified physical table name. For example, you might have a table with the fully qualified name of DBName.CatName.SchName.TabName. To purge the cache entries associated with this table in the physical layer of the Siebel Analytics repository, execute the following call in a script:
Call SAPurgeCacheByTable( ‘DBName’, ‘CatName’, ‘SchName’, ‘TabName’ );
Wild cards are not supported by the Siebel Analytics Server for this function. Additionally,
DBName and TabName cannot be null. If either one is null, you will receive an error message.

SAPurgeAllCache. Purges all cache entries. The following is an example of this call:
Call SAPurgeAllCache();

SAPurgeCacheByDatabase. Purges all cache entries associated with a specific physical
database name. A record is returned as a result of calling any of the ODBC procedures to purge
the cache. This function takes one parameter that represents the physical database name and
the parameter cannot be null. The following is an example of this call:
Call SAPurgeCacheByDatabase( ‘DBName’ );

Siebel Analytics Scheduler
The SA Scheduler can be used for running general purpose scripts that extend the functionality of Siebel Analytics.
The script purgeSASCache is used to periodically purge all of the cache from the Siebel Analytics Server:
/////////////////////////////////////////////////////////
// purgeSASCache.js
//
// Purges the cache on SAS.
// Parameter(0) - The user name to pass in to NQCMD.
// Parameter(1) - The password for the aforementioned user.
/////////////////////////////////////////////////////////
// The full path to nqcmd.exe
var nqCmd = "D:\\SiebelAnalytics\\Bin\\nqcmd.exe";
// The data source name
var dsn = "Analytics Web";
// The user to execute the queries
var user = Parameter(0);
// The password of the aforementioned user
var pswd = Parameter(1);
// The ODBC procedure call for purging the cache
var sqlStatement = "{call SAPurgeAllCache()};";
//////////////////////////////////////////////////////////
// Returns a string from the file name
//////////////////////////////////////////////////////////
function GetOutput(fso, fileName)
{
var outStream = fso.OpenTextFile(fileName, 1);
var output = outStream.ReadAll();

outStream.Close();
return output;
}
//////////////////////////////////////////////////////////
// Get WshShell object and run nqCmd. Capture the output
// so that we can handle erroneous conditions.
var wshShell = new ActiveXObject("WScript.Shell");
// Create a temp file to input the SQL statement.
var fso = new ActiveXObject("Scripting.FileSystemObject");
var tempFolder = fso.GetSpecialFolder(2);
var tempInFileName = fso.GetTempName();
var tempOutFileName = fso.GetTempName();
tempInFileName = tempFolder + "\\" + tempInFileName;
tempOutFileName = tempFolder + "\\" + tempOutFileName;
var tempInFile = fso.CreateTextFile(tempInFileName, true);
tempInFile.WriteLine(sqlStatement);
tempInFile.Close();
try
{
// execute
var dosCmd = nqCmd + " -d \"" + dsn + "\" -u \"" + user
+ "\" -p \"" + pswd + "\" -s \"" + tempInFileName + "\"" +
" -o \"" + tempOutFileName + "\"";
wshShell.Run(dosCmd, 0, true);
var output = GetOutput(fso, tempOutFileName);
// Remove the temp files
fso.DeleteFile(tempInFileName);
if (fso.FileExists(tempOutFileName)) {
fso.DeleteFile(tempOutFileName);
}
// Check the output for any errors
if (output.indexOf("Processed: 1 queries") == -1) {
ExitCode = -1;
Message = output;
}
else if (output.indexOf("Encountered") != -1) {
ExitCode = -2;
Message = output;
}
else {
ExitCode = 0;
}
} catch (e) {

if (fso.FileExists(tempInFileName)) {
fso.DeleteFile(tempInFileName);
}
if (fso.FileExists(tempOutFileName)) {
fso.DeleteFile(tempOutFileName);
}
throw e;
}


Options for Seeding the Cache

Seeding the cache can be automated via an ODBC call in much the same way as purging.

The syntax of the call will be as follows:
nqcmd -d "Analytics Web" –u administrator –p sadmin – file.sql

Where file sql is a script containing sql select statements to populate the cache, ideally these will be a super set of the queries issued by the requests in Siebel Answers and dashboards deemed most likely to derive maximum performance gains from using the cache.

Siebel Delivers can also be used to seed the cache automatically, the advantage of this is that iBots can trigger the specific requests that are required for caching, this should carry further performance benefits.


NQSConfig.ini

The NQSConfig.ini file located in the $SiebelAnalytics\Config folder needs to be modified in order to activate and parameterise Analytics Caching. Below are settings for caching with cache tables of an expected size of approximately 250 000 records.

Parameter Value Comments

ENABLE
YES
Turns overall Caching on or off

DATA_STORAGE_PATHS
“C:\SiebelAnalyticsData\Temp\Cache 500 MB”
For optimal performance, the directories specified should be on high performance storage systems.

METADATA_FILE
"C:\SiebelAnalytics\metadata_file.dat"
The filename needs to be a fully qualified pathname with a .dat extension.

REPLACE_ALGORITHM
LRU
Currently, the only supported algorithm is LRU (Least Recently Used).

BUFFER_POOL_SIZE
20 MB
Specifies the amount of physical memory Analytics can use to store Cache files, in. More cache in memory means better response time.

MAX_ROWS_PER_CACHE_ENTRY
0
When set to 0, there is no limit to the number of rows per cache entry.

MAX_CACHE_ENTRY_SIZE
10 MB
Maximum size of a specific cache entry on the Physical drive.

POPULATE_AGGREGATE_ROLLUP_HITS
YES
Specifies whether aggregate cache files should be created. Setting this parameter to TRUE may result in better performance, but results in more entries being added to the cache.

METADATA_BACKUP_FREQUENCY_MINUTES
1440
This creates a backup file with a .bak extension. There will only ever be one of these.


Hope this helps

Monday, September 18, 2006

Three sheets to the wind

No, not a description of the weekend’s activities unfortunately, not a drop passed my lips.

The sheets to which I refer are CSS or Cascading Style Sheets, PortalBanner.css, PortalContent.css and Views.css.

You probably will not be interested to know that the original form of the expression was 'three sheets in the wind', this refers to the nautical term for a rope, derived from sheet line, a rope used to control a sheet or sail, and literally means 'with the sail completely unsecured', and thus flapping about, and with the boat itself thus unsteady. So, no more grog at lunch time.

Style sheets are used to separate the content of an HTML page from the control and formatting applied, this is particularly useful where the content, i.e. the data, is more important than the appearance in the long term.

From a productivity point of view, creating the 'look' of your dash boards once leaves you more time to actually create the reports.

In early HTML pages the appearance of text was controlled by tags like 'FONT' and their attributes like color, face, size etc, these were used to surround the text to which they were applied, this was fine until you had a hundred pages using three colours of text and you decided to change the colour, the tag for every instance of the text had to be changed. With 'css' the tag becomes a reference to the style sheet, and therefore, if you wish to change the attributes of the tag you change them once, in the style sheet, not on every page.

PortalBanner and PortalContent are css files that affect Siebel Dashboards, they can be found in SiebelAnalytics\Web\App\Res\s_Siebel77\b_mozilla_4, a word of warning however, the best way to learn what does what is to 'do a Claudio', i.e. tinker all you like to see what happens.

With this in mind you should always copy the original before you start to tinker, likewise, if you decide to use an edited css file, save a copy to the SiebelAnalyticsData\Web\App\Res directory to preserve a copy should you ever upgrade analytics.
Siebel's recommended practice is to copy the s_Siebel77 folder to the SiebelAnalyticsData folder and then rename it appropriately, this file name is then available as a style to select in Dashboard Properties, once the Analytics Web service has been restarted.
Editing these can be a useful way to enhance the appearance of a dashboard and make it more familiar to the users by using a corporate colour scheme, for instance. You should also remember that there are contractual obligations with regard to the appearance of elements such as the 'Powered by Siebel' logo etc.

To my mind though, the most useful of the three files is Views.css, this is the file that controls the appearance of the reports created in Answers and displayed in your dashboards, the reason I say this is because the sheer volume of reports means that altering these settings once will save you the most time, even using the copy formatting feature will not save you as much time.

The styles are defined as classes (name preceded by a full stop) under headings commented out e.g. '/* Title */' .TitleCell, the attributes are then defined within the curly brackets, names such as these are fairly intuitive, some are more obscure and you might only decipher them by tinkering. I have found the most useful to be:


.TitleTable The border attributes control the heavy line below a report title, change the colour to suit your corporate image.

.TitleCell The title added to the report via the table properties.

.TitleNameCell This formats the title if the saved report name is used.

Also useful for the table of results it self:

.ResultsTable The outer borders of the results table.

.ResultsTable TD Individual table cells of the results table.

.ColumnHdg Column heading format.

.ColumnHdgD Column heading where the heading is a drill down.

And so on,

I hope this has wetted your appetite, happy tinkering.

Thursday, September 14, 2006

Do you feel insecure?

Security. Love it or loathe it? Most of the time the choice is yours, you can choose not to lock your house or car, you can choose not to fly, you can choose to keep your PIN in your phone memory under PIN, what you probably can't choose is whether or not to apply it to your Analytics system, because some one else has made that decision for you and they pay your salary!

A brief scan of the popular analytics message boards does suggest that this is part of the system a good number of people can't get their heads round, so I thought I'd jot a few of my thoughts on the subject.

Within Siebel Analytics there are two basic types of security, Data Level and Object Level.
Actually, there are EIGHT layers of security in an Integrated Analytics application (can you identify them all?).

Data level is about controlling the data i.e. the facts that a user can see, typically this is because managers don't want fisticuffs in the office; so preventing everyone from seeing how much commission the sales team get is generally a good idea. Also, you may be familiar with TMI, Too Much Information, if like me you're a nosey parker than everyone else's data is far more interesting than your own and you can't resist having a look should the opportunity present its self. Giving all your staff this opportunity can have a dramatic effect on productivity.
On a serious note in the financial markets you'll know all about 'chinese walls' and the need to keep data secret.

At the simplest level, this is how to go about administering data level security;
  • In the Analytics Administration Tool go to Manage>Security to open the Security Manager, the security objects are displayed down the left,
  • Click on Groups and then right click in the r.hand pane and,
  • Select New Security Group, the procedure is the same for adding a new user.

At this point you should already have thought about whether you are going to administer security at group level or individual level, this will be largely dictated by the number of users, again it is probably not your decision to make so I hope the chiefs in your organisation know what they are on about!


So, for example you have created a group called Agents, these could be support staff or telesales, it doesn't really matter, the point is you want to limit their view of the data to those applicable to their geographical region of responsibility. In the right hand pane double click the Agents group, and click the permissions button and select the Filters tab, now click the Add button to add an object (i.e. table) whose content (data) you wish to restrict access to, highlight the table and click Select, now check that the Status is set to Enable and click the Ellipsis button to open the Expression Builder, the world is now your SQL oyster and you can do pretty much what you want to restrict the view of the data available from your chosen table, and like oysters, SQL should be chosen with care to avoid indigestion!


The choice with this type of filter is between hard coding a value, so the filter will always filter the same value, this will seem like a bad idea if the data is changed, for instance if the Regions are renamed at some time in the future. A better solution is to use a variable that returns the values available for the specific user or group at the time of logging in.

Anyway, it's late and there's so much more to this subject that I will deliver it in small bite sized chunks.

To be continued ......


PS. The eight layers...

  1. Application Log in (Application Authentication)
  2. Responsibilty to a View
  3. Access to Analytics system (Analytics Authentication)
  4. Access to Analytics Components (privileges)
  5. Rights to Webcat object (e.g. Particular Dashboard, Report, Folder)
  6. Rights to Analytics Repository Object (Usually set on Presentation Layer)
  7. Data Restrictions (via Security )
  8. Database Access (normally uses shared name in connection pool)

Wednesday, September 06, 2006

DAc days

Question. How do you load 80 million records into the Data warehouse?
Answer. Slowly!

As we are now in version 7.8 of the CRM application there is lots of work around upgrading from previous versions. Normally an upgrade of the data warehouse will provide companies with an opportunity to reassess what is captured and how. The biggest dilemma is the choice between a simple upgrade and a fresh installation.

Both methods have their benefits, but for the developers the most interesting choice is to go for a new fresh installation. No need to learn how the previous guys worked their magic, and no constraints of the old data structures, ETL, etc.

Whichever method you adopt there will always be data migration. If you’re lucky there is not a big gap between versions and the data structures are similar; the system has only been used for a year and only has a few million records (and pigs might fly too!).

One project I have been involved had approximately 150 M records in the legacy CRM system. After a year of hard labour this data was converted to 80 M fresh rows and was loaded into the new eBusiness application. This is where we come in. The new data warehouse was designed and built around the 7.8.4 database with no vertical licences. (more on this subject another day).

The standard Informatica routines and DAC repository is the basis of the new ETL. The ‘opportunity’ in the design phase was fact that there were approx 150 custom fields added to various S_ tables. (more on this another day!)

So, here we are with a new blank database, customised ETL system and 80 M records ready to load. If you were using the standard DAC then you would now run the execution plan ‘Complete ETL’. Sit back and wait a few days. But what about the database? Is it big enough? Have you created the correct tablespaces, and are these reflected in the DAC? (where you may ask – hint: look at the Indices)
So half way through day one, and the db has crashed four times due to lack of tablespace. After various calls to the dba you’re back up and running again. Midnight of day one and a standard virus protection system kicks in. Server dies under the weight of workload. Morning of day three and the team are bleery eyed and living on extra strong expresso, having spent the night coaxing the DAC into life, time and time again. This time due to centrally organised Server updates (a Windows Patch here, a java update there). Day five and the testers arrive to start picking holes in the results; you know the sort of thing, table W_xxx has no data, Field xx_Type has the wrong LOV’s in.
Fixes were made and the following weekend set aside to do a full load again. Reset the warehouse; purge the run details, analyse the tables; stop the virus protection, disable the PC updates, increase the tablespace and cross your fingers. From Friday night to Sunday morning all is well


Luckily for us the above is just a nightmare that we usually avoid due to good planning, lots of luck, and plenty of time to be prepared.


You have been warned.





*For opportunity normally substitute ‘Problem’!

The Cowes

The Cowes
Cowes Racing