Showing posts with label Development. Show all posts
Showing posts with label Development. Show all posts

Wednesday, June 18, 2014

Easy Way to Collect QVD Stats

 

Collecting Stats Helps Debugging and Alerts You to Problems

Most organizations will use QVD Generators to build out QVD data layers suitable for QlikView’s consumption.  There is generally no need to create any visualizations in these QVD Generators except for maybe some validation tables and some QVD statistics.  Today we will focus on an easy way to gather those statistics.  The statistics I generally gather help me answer questions regarding the output QVDs.
 
·         Which QVDs were written with this application?
·         How long did it take to create this QVD?
·         How many rows were written?
·         How many fields were in each table?
Obviously, being able to easily answer these questions helps us better understand and serve the BI enterprise.  It also provides a helpful tool for debugging logical errors in our QlikView scripts.
And because we actually take this statistics table and write it to QVD as well, we can use it in later loads as needed to create a data source lineage for a particular application.  Another benefit of creating this statistics QVD(s) is that I can drop all the tables that created the QVDs while still retaining the statistics.  This leaves the application light and quick to save.
Here is an example of the front end to show you what kind of data will be collected and displayed.  Of course, the stats you collect and present are really up to you, the developer.
 
 

How to set up the collection

I use blocks of code that can be basically put into any script and adjusted as needed.  We basically have four chunks of code that get deposited into our script.
1.       Set up any needed global variables and set up the statistics table structure.
2.       Set up a sub-routine that creates the stat table.
3.       Change the start time and table name each time we go to a new QVD.
4.       Call the subroutine.
 

Set up any needed global variables and set up the statistics table structure

First, create any variables that will need to be collected in your statistics or displayed in your front end.  For this example, we are using a variable to determine my directory, the name of my environment, the source, division and the path for my statistics QVD.  These two chunks of script should be inserted towards the beginning of the script before any loading occurs.
//dev
set vDir = 'C:\Users\Aaron\Documents\QlikView Apps\Blog Apps\Statistics Collection';
set vEnv = 'Dev';
//prod
//set vDir = 'D:\QlikView Documents\Production\Sales';

//   Set variables for qvd naming
set vDivision=  'QAP';
set vSource =   'BAAN';

LET StatisticsPath = '$(vDir)\ETL_Statistics_$(vDivision)_$(vSource).qvd';
 
We also will look for an existing statistics QVD and if one is not found, we will create it.
IF FileSize('$(StatisticsPath)') > 0 THEN //  The Statistics QVD already exists.
ELSE // The Statistics QVD needs to be created

Statistics:
LOAD * INLINE [
    TableName, Division, Source, ReloadedDate, StartTime, EndTime, Duration, Records, Fields
    , ]
;

STORE Statistics INTO $(StatisticsPath)(qvd);
DROP TABLE Statistics;
END IF  
 
 

Set up a sub-routine that creates the stat table

Next we create a subroutine that appends the statistics each time we load new data into a QVD.  This script records all of the needed data from whatever load has just occurred, adds it to the existing statistics records and then saves the whole thing back to QVD.  This script gets inserted just below the prior set up stuff and will be called at various times later in the script.
sub Stats

    
//   Call this script between qvd store and drop table commands to gather statistics for the qvd. 
     //   There must be a vStartTime and a vTable variable
     LET vEndTime = NOW();
    
LET vRecords = NoOfRows('$(vTable)');
    
LET vFields  = NoOfFields('$(vTable)');
    
     Statistics:
    
LOAD * FROM $(StatisticsPath)(qvd)   WHERE TableName <> '$(vTable)';
    
JOIN (Statistics)
    
LOAD DISTINCT '$(vTable)' AS TableName, '$(vDivision)' AS Division, '$(vSource)' as Source, DATE(NOW()) AS ReloadedDate, '$(vStartTime)' AS StartTime, '$(vEndTime)' AS EndTime,
          
INTERVAL(NUM('$(vEndTime)') - NUM('$(vStartTime)'),'hh:mm:ss') AS Duration, $(vRecords) AS Records, $(vFields) as Fields
    
RESIDENT Statistics;
    
STORE Statistics INTO $(StatisticsPath)(qvd);
    
DROP TABLE Statistics;

end sub 
 
 

Change the start time and table name each time we go to a new QVD

So now we go about the business of loading data from our data sources and creating QVDs.  The basic idea is that we will need to establish some variables at the beginning of each load and then run the subroutine at the end of each load.  Using this approach we can collect statistics for every QVD that gets created.  Start by adding this script to the top of each data loading section.  Note the vTable variable could also be used within your load statements to name your table.
//   Append this script to the start of each qvd creation.
LET vStartTime = NOW();
SET vTable = 'ETL_Sales_Header'; 
 
 

Call the subroutine

Finally, we can call the statistics subroutine that will record our ending variables and append the QVD.  This simple code will run the routine we created in the second step.
//   call subroutine to gather statistics for the qvd.
call Stats;
Place this code after we store each data QVD.  One thing to note is that you should wait to drop the data QVD until after this subroutine runs or simply drop them all at the end of your script.
 

Thoughts?

There are many ways we can simplify the data we are collecting or expand the data.  Hopefully this structure is clear and economical enough for you to adjust to your needs.  The example I provided is simply what I use to complete this task and I tweak the code depending on the situation.
As always, I hope you find this valuable.  Please let me know what you think.  And if you accomplish statistics collection differently or can improve upon this, please participate in the discussion.

Tuesday, May 27, 2014

Let's Do Something Cool with Color



 
 
Script-Side ColorMix – Subcategory Color Assignments
This exercise came from something I did for my personal use, but I think the principal can be applied to business cases as well.  You can download the example here.

I have long used the ColorMix wizard in QlikView for heat map charts.  Using this function QlikView can assign colors to values in your chart based on its position in the range of all values.  This is an immediately apparent use-case but I wanted to do something a little different.

I have a regular fitness program where I mix up different types of activities throughout the week.  In addition to the main activities, I want to be able to visually separate sub activities from each other while still maintaining them as activity groups.  Here are screen shots of the Activity and the Activity SubClass broken out to show you what I mean:


Left - By Activity                                Right - By Activity SubClass

So my P90X programs will always be some shade of gray.  My Insanity programs will all be some shade of Orange, etc.  The front-side ColorMix function is great, but it would be very difficult to assign these in the front-end especially considering the grouped dimension and varying numbers of SubClass groups as time goes on.

The idea is that when only the Activities are shown, I want to explicitly assign the main colors.  But when we break these out into Activity SubClass, I want to dynamically assign a shade of that Activity color to each of the Activity SubClass values within it.  I realized quickly that the best way to do this was to utilize ColorMix in the script.

The first step is to decide what colors should be assigned to each category.  Create a spreadsheet and give each Activity an RGB value.  You must also decide what will be the range of colors that will be used to create the Activity SubClass colors.  In this case, the Activity list will need to be maintained, but the SubClass values can vary because the actual RGB values for each distinct SubClass will be assigned at reload time.
 
Activity ColorTop ColorLow Color
ActivityRGBRHGHBHRLGLBL
Insanity255151162551020255224185
P90X847571847571233231228
Hiking212250098116023025499
Running34841583484158118154207
Biking1302715913027159186112207
Elliptical255797925579792557979

 
Next you must create the script that will load these up and create the structure that will facilitate the ColorMix operation. 
Create a table with the distinct Activity SubClasses creating a key.
//  load distinct keys
TempColorMix:
load distinct ActivityKey, Activity as Act, [Activity SubClass] as ActSub, ActSubSort
resident ActivityLog
where not isnull(ActivityKey)
order by ActSubSort;
 
Create a counter row that resets for each major Activity.
//  add the subrow row numbers for each activity
//  must be done in separate step due to ordering and distinct features.
Temp2ColorMix:
load ActivityKey, Act, ActSub,
if(Act<>peek('Act'),1,peek('SubRow')+1) as SubRow
resident TempColorMix
order by ActSubSort;
drop table TempColorMix;
 
Now find the low and high value for each activity.  This will give the ColorMix function the ranges to create colors for.
//  find the max and min row number for each activity
left join (Temp2ColorMix)
load min(SubRow) as SubMin, max(SubRow) as SubMax, Act
resident Temp2ColorMix
group by Act;

Add in the actual color ranges we had set up in the spreadsheet.
//  join in the guide colors
left join (Temp2ColorMix)
LOAD Activity as Act,
    
argb(255,R,G,B) as ColorAct,
    
RH, GH, BH,
    
RL, GL, BL
FROM Data\Workouts.xlsx(ooxml, embedded labels, header is 1 lines, table is ColorMix);
 
Note that you can use a standard color dispersion or an enhanced color dispersion that results in a more distinguishable set of colors.  I put both examples in the sample script and just commented out the standard one.
//  now we can get the color for each Activity SubClass and create final table
ColorMix:
load ActivityKey, ColorAct,
   
//if(SubMax=1,ColorAct,colormix1((SubRow-SubMin)/(SubMax-SubMin),rgb(RH,GH,BH),rgb(RL,GL,BL))) as ColorSub //  standard color dispersion
    if(SubMax=1,ColorAct,colormix1((1+sign(2*(SubRow-SubMin)/(SubMax-SubMin)-1)*sqrt(fabs((2*(SubRow-SubMin)/(SubMax-SubMin)-1))))/2,rgb(RH,GH,BH),rgb(RL,GL,BL))) as ColorSub //  enhanced color dispersion (less collision in the middle spectrum)
resident Temp2ColorMix
order by Act, ActSub;
drop table Temp2ColorMix;

 

After reloading, we can focus on visualization. Create a variable expression to assign our colors.  We used $(vColorPick).
 
if(getcurrentfield(ActivityGroup)='Activity',ColorAct,
   if(count(distinct
[Activity SubClass])=1,ColorSub,rgb(154,151,170)))
This variable can be called in the Background Color attribute for the expression in any chart.  If you need help with Background Colors in Expression Attributes reference this article.  We added an else statement to take care of any values we neglected to assign in our spreadsheet.
So hopefully this has value for you.  I have posted the spreadsheet and qvw here.  Feel free to use it to learn more about this method OR obviously you could use it to keep your own workout log.

Is this valuable to you?  Then please share. 
Do you see other use cases or a better way to do this?  Then please comment.