How to extract the text of a query in Pivotal

September 12. 2014 4 Comments

Posted in:
pivotal
oracle
sqlserver

Pivotal stores queries using a binary format. Nobody knows (at least I don’t know anybody who does) what this binary structure is. And Pivotal doesn’t provide documentation nor does it provide any tool to extract the text from this queries. If you’ve used Pivotal for long enough you probably have whished to be able to get the text from those queries. A good example of when this could be helpful is when you make a modification to a field and would like to see what parts of the system could be affected by such change. You can query the metadata in the BM and see where the field is being used, but when it comes to queries there is no way to know this, unless of course you go through all the queries in the system and visually identify if the field is being used (ehmm.. I don’t want to do that).

I was recently working in a localization project of a Pivotal system. As part of the localization we decided to take some of the fields that were initially created as Combos and change them to Choices. (Note: Combos are saved as strings in the BD, while Choices are saved as integer and translated to string at run time, which make Choices the best data type for multi-language systems). Making this change is not easy in Pivotal, first because the Toolkit doesn’t allow it, the database doesn’t allow it either, and also because you need to change the data in the ED (I will explain how to do this automatically in a future post). Anyway, the fact is that after the change to Choice was made I had to make sure that all other objects in the BM where “aware” of the change. In the case of queries, you need to change the query (remove the field and then add it again) so it has the right metadata to treat the field in the query as string (Combo) or integer (Choice). I asked some of my coworkers how were they doing this in the past and the answer was…ehmmm… “go through all the queries one by one and see if the field is being used”. There’s no way I’m doing that! So I decided it was time to go deep in the binary format of a query.

I will first talk about the structure of the binary format for the query and then I will provide solutions for both SQLServer and Oracle. The concepts and the solutions presented here work for any version of Pivotal 5.x and 6.x.

Binary Structure of a Pivotal Query

Pivotal stores queries in a table called Saved_Lookups, which has a structure such as the following:

The actual query is saved in a field called SQL_Tree_Binary. If you query this field you will get something similar to this:

As expected, SQL_Tree_Binary contains binary (hexadecimal) values.

First let’s talk about the table Id. This table is present in all pivotal databases: ED, BM and CM. We will focus here in queries that are created with the Toolkit for the system. We don’t want to consider those queries created in the Toolkit for the metadata (which are stored in the same table in the BM). How do we differentiate them? Using the Id (field Saved_Lookups_Id) of the record:

Records with an Id that starts with 0x80 are system queries
Records with an Id that starts with 0x00 are metadata queries

You can do this query to identify them:

-- SQLServer
select 
  saved_lookup_name,
  case substring(saved_lookups_id,1,1)
   when 0x00 then 'metadata'
   when 0x80 then 'system'
  end type
from saved_lookups

-- Oracle
select
  saved_lookup_name,
  case utl_raw.substr(saved_lookups_id,1,1)
    when hextoraw('00') then 'metadata'
    when hextoraw('80') then 'system'
  end type
from
  saved_lookups

Now, let’s talk about the binary format. There are two kind of formats: Unicode and non-Unicode formats. The binary format started as non-Unicode (single-byte) in the early days of Pivotal, but then (I don’t know exactly when) Pivotal realized that queries needed to support international characters as well and used a Unicode (double-byte) format to save the query. Your system could have them both! There are some queries that haven’t been touched/edited since they were first created which are still in non-Unicode formats. In the latest versions of Pivotal 5.x and 6.x all new queries are created in Unicode format, and those queries that are non-Unicode are changed to Unicode when modified.

The header

The type of format (Unicode and non-Unicode) can be determined by the reading the first 5 bytes:

0x0100000000 : non-Unicode
0x01000000FF : Unicode

Knowing whether is Unicode or not is important since we will need read the binary accordingly. The binary format has a kind of “header”, which depends on the type:

Header for non-Unicode queries: 0x0100000000 (length: 5)
Header for Unicode queries: 0x01000000FFFEFF00FFFEFF (length: 11)

As you can see the header size varies depending on whether is Unicode or not.

The length

After the header we have the length of bytes we need to read to get the text of the query. Again here we have two flavors: when the length can be stored in a single byte (i.e. the length is less than 255) then the length is stored in that same byte. If the length cannot be stored in a single byte then we will find 0xFF and then two bytes for the length. This is:

0xXX: when the length is less than 255
0xFFYYZZ: when the length is greater than or equal to 255

Here XX or YYZZ is a binary representation of an integer value for the length. But, there is another caveat (why make it easy?) In the case when the length is represented with 2 bytes, those bytes are stored as little-endian, which means that to get the integer value you need to invert YYZZ to ZZYY.

So for example, after reading the header (which can be 5 or 11 bytes depending on whether is Unicode or not) we need to read 1 or 3 more bytes, depending on whether the first byte is 0xFF or not, to get the length. In case we read 1 byte we convert this to integer. If we read 3 bytes we use the second and third bytes inverted and convert that to integer. So we end up with this decision tree:

If it is Unicode, then skip 11 bytes for the header. For the length

If the 12th byte is 0xFF then read bytes 14th and 13th (inverted) and convert them to integer
If the 12th byte is not 0xFF then convert the 12th byte to integer.

If it is non-Unicode, then skip 5 bytes for the header. For the length

If the 6th byte is 0xFF then read bytes 8th and 7th (inverted) and convert them to integer
If the 6th byte is not 0xFF then convert the 7th byte to integer.

The start index

Lastly, we need to know where to start reading our text. As you might have guessed, it will depend on the above conditions. So we have the following:

If it is Unicode, then skip 11 bytes for the header. For the length

If the 12th byte is 0xFF we skip 3 bytes so our text starts at byte 15th
If the 12th byte is not 0xFF we skip 1 byte so our text starts at byte 13th

If it is non-Unicode, then skip 5 bytes for the header. For the length

If the 6th byte is 0xFF we skip 3 bytes so our text starts at byte 9th
If the 6th byte is not 0xFF we skip 1 byte so our text starts at byte 7th

Some examples:

Example 1: SQL_Tree_Binary = 0x01000000FFFEFF00FFFEFFFF7B042800….

The first 5 bytes are 0x01000000FF, so it is unicode. We skip 11 bytes to get the length. Byte 12th is 0xFF, meaning that we need bytes 13th and 14th to get the length. These bytes are 0x7B04. To convert them to integer we invert them to 0x047B. So the length is 1147. And the query starts at byte 15th.

Example 2: SQL_Tree_Binary = 0x01000000FFFEFF00FFFEFF352800….

The first 5 bytes are 0x01000000FF, so it is unicode. We skip 11 bytes to get the length. Byte 12th is not 0xFF, meaning that byte 12th has the length. We convert byte 0x35 to integer and we get that the length is 53. And the query starts at byte 13th.

Example 3: SQL_Tree_Binary = 0x0100000000FF5F052820….

The first 5 bytes are 0x0100000000, so it is not unicode. We skip 5 bytes to get the length. Byte 6th is 0xFF, meaning that we need bytes 7th and 8th to get the length. These bytes are 0x5F05. To convert them to integer we invert them to 0x055F. So the length is 1375. And the query starts at byte 9th.

Example 4: SQL_Tree_Binary = 0x0100000000F12820….

The first 5 bytes are 0x0100000000, so it is not unicode. We skip 5 bytes to get the length. Byte 6th is not 0xFF, meaning that byte 6th has the length. We convert byte 0xF1 to integer and we get that the length is 241. And the query starts at byte 7th.

This is not the whole story on the binary format of the query, but for now it is enough to get us what we want.

Getting the Query Text

Ok, now that we understand the structure is very simple to get the text (or is it?). From our analysis above we get two important pieces of information:

startIndex: where we can find the text of the query in all the binary
length: the length of bytes we need to consider to get the text

All is left is to get a substring of the binary value inside SQL_Tree_Binary field, starting at the startIndex and taking length bytes. In the case of a Unicode query we need to take length * 2 bytes (because Unicode is double-byte, remember?).

Both SQLServer and Oracle allow you to convert a binary/raw (respectively) datatype to varchar or nvarchar. I will explain how to do this later, but first let me point something to get a cleaner text:

The binary format used to represent the query is more complex than what’s explained here. The text of the query has two parts:

The first part is the text of the query as shown in the Toolkit in the query editor when we click the “Show SQL” button
The second part is a more structured form of the query, with comments for each of the clauses in the query editor.

As an example, take this query:

here is the extracted text:

( 
    (
        EXISTS 
        (
            SELECT * 
            FROM   Company__Type 
            WHERE 
                Company__Type.Company_Type  = 5 AND 
                (Company__Type.Company_Id = Company.Company_Id)
        )
    ) AND 
    (Company.Inactive  = 0)
) 

WhereDelim,..,WhereDelim 

( 
    (
        /* Company  Type(Company).Company Type = Partner */  
        EXISTS 
        (
            SELECT * 
            FROM 
            (
                SELECT DISTINCT Company__Type.Company_Id   
                FROM   Company__Type   
                WHERE Company__Type.Company_Type  = 5
            ) SubqueryView   
            WHERE SubqueryView.Company_Id=Company.Company_Id
        )
    )   AND 
    (Company.Inactive  = 0)
)

(indentation and new lines added by me). Notice how the two parts are clearly separated by a special type of delimiter with the text WhereDelim,..,WhereDelim. The second part is closer to what Pivotal will actually send to the database, and it has also comments that correspond to the lines in the query designer, so it is easier to identify. We will only take the second part as the text of our query.

Extracting the Query Text in SQLServer

So, in SQLServer we can create the following view:

create view saved_lookups_text as
select
    table_id,
    table_name,
    saved_lookups_id,
    saved_lookup_name,    
    case
      when charindex('WhereDelim',query_text) > 0 then 
        substring(query_text,len(query_text) - charindex(reverse('WhereDelim'),reverse(query_text)) + 3,4000)
      else query_text
    end query_text
from (
    select
      table_id,
      table_name,
      saved_lookups_id,
      saved_lookup_name,
      case isUnicode
        when 1 then cast(substring(sql_tree_binary, startIndex, queryLength * 2) as nvarchar(4000))
        else cast(substring(sql_tree_binary, startIndex, queryLength) as varchar(8000))
      end query_text
    from (
        select
            saved_lookup_name, 
            case 
                when substring(sql_tree_binary,1,5) = 0x0100000000 then 0
                when substring(sql_tree_binary,1,5) = 0x01000000FF then 1
            end isUnicode,
            cast(case 
                when substring(sql_tree_binary,1,5) = 0x0100000000 then 
                    case 
                        when substring(sql_tree_binary,6,1) = 0xFF then substring(sql_tree_binary,8,1) + substring(sql_tree_binary,7,1)
                        else substring(sql_tree_binary,6,1)
                    end
                when substring(sql_tree_binary,1,5) = 0x01000000FF then 
                    case 
                        when substring(sql_tree_binary,12,1) = 0xFF then substring(sql_tree_binary,14,1) + substring(sql_tree_binary,13,1)
                        else substring(sql_tree_binary,12,1)
                    end
            end as int) queryLength,
            case 
                when substring(sql_tree_binary,1,5) = 0x0100000000 then 
                    case 
                        when substring(sql_tree_binary,6,1) = 0xFF then 9
                        else 7
                    end
                when substring(sql_tree_binary,1,5) = 0x01000000FF then 
                    case 
                        when substring(sql_tree_binary,12,1) = 0xFF then 15
                        else 13
                    end
            end    startIndex,    
            s.table_id,
            t.table_name,
            s.saved_lookups_id,   
            s.sql_tree_binary
        from 
          saved_lookups s inner join 
          tables t on s.table_id = t.tables_id
        where substring(saved_lookups_id,1,1) <> 0x00
    ) x
) y

(I know the query could be better written, but for the sake of demonstration is ok)

This query will present something like this (notice the query_text field):

Notice lines 8 and 9, where, due to the lack of a reverse charindex function in SQLServer, we need to do a trick to find the last occurrence of WhereDelim text and then take the text that comes after this. Notice also in lines 19 and 20 how we cast the text to the appropriate charset based on whether the text is Unicode or not.

You can download this SQL script from here.

Extracting the Query Text in Oracle

You guessed it! In Oracle things are not as easy. There are some caveats we need to consider:

Oracle has a limitation of 2000 characters when using nvarchar2 and 4000 characters when using varchar2 data types. So if our text is longer than this we will get an error when converting things using the package UTL_RAW.
The package used to read blobs in Oracle is DBMS_LOB. This package has a convenient SUBSTR function to get pieces of the BLOB (in our case the data in the SQL_Tree_Binary field). This function changes the endianness of the data when the data is Unicode (in other words, it inverts it) and thus when converting the Unicode text using UTL_RAW this gets confused and we get funky letters instead.
You cannot combine a Unicode charset with a non-Unicode charset in the same field using CASE statements

The view in Oracle is like this:

create view saved_lookups_text as
select
  saved_lookups_id,
  saved_lookup_name,
  case when unicode = 1
    then reverse(cast(utl_raw.cast_to_nvarchar2(utl_raw.reverse(dbms_lob.substr(sql_tree_binary,case when length > 1000 then 2000 else length * 2 end,queryStartIndex))) as varchar2(4000)))
    else utl_raw.cast_to_varchar2(dbms_lob.substr(sql_tree_binary,case when length > 2000 then 2000 else length end,queryStartIndex))
  end queryText,
  table_id,
  table_name
from
(
  select
    saved_lookup_name,
    dbms_lob.substr(sql_tree_binary,5,1) queryType,
    case
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then dbms_lob.substr(sql_tree_binary,5,1)
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then dbms_lob.substr(sql_tree_binary,11,1)
    end prefix,
    case
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then 0
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then 1
    end unicode,
    utl_raw.cast_to_binary_integer(case
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then
           case
             when dbms_lob.substr(sql_tree_binary,1,6) = hextoraw('FF') then utl_raw.concat(dbms_lob.substr(sql_tree_binary,1,8),dbms_lob.substr(sql_tree_binary,1,7))
             else dbms_lob.substr(sql_tree_binary,1,6)
           end
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then
           case
             when dbms_lob.substr(sql_tree_binary,1,12) = hextoraw('FF') then utl_raw.concat(dbms_lob.substr(sql_tree_binary,1,14),dbms_lob.substr(sql_tree_binary,1,13))
             else dbms_lob.substr(sql_tree_binary,1,12)
           end
    end) length,
    case
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then
           case
             when dbms_lob.substr(sql_tree_binary,1,6) = hextoraw('FF') then 9
             else 7
           end
      when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then
           case
             when dbms_lob.substr(sql_tree_binary,1,12) = hextoraw('FF') then 15
             else 13
           end
    end queryStartIndex,
    s.table_id,
    t.table_name,
    s.saved_lookups_id,
    s.sql_tree_binary
  from saved_lookups s inner join tables t on s.table_id = t.tables_id
  where
    utl_raw.substr(saved_lookups_id,1,1) = hextoraw('80') -- only business queries
  order by table_id,s.saved_lookup_name
)

Notice in line 6 how I do some tricks to overcome all the problems mentioned above. This query shows both parts of the query (doesn’t uses the WhereDelim mentioned above) but it is easy to add. Also, this view doesn’t include queries which text length is more than 4000 characters if non-Unicode or 2000 characters if Unicode, which unfortunately leaves a lot of query in the dark.

In Oracle I have created a package instead that gives me what I want in the query output window. With this package you can do things such as:

-- use a procedure to get the text of the query
declare
    queryText varchar2(8000);
begin
  pk_pivotal_toolkit.getQueryText('<some query name>',queryText);
  pk_pivotal_toolkit.printText(queryText);
end;

-- use a function to get the text of the query
declare
  queryText varchar2(8000);
begin
  queryText := pk_pivotal_toolkit.getQueryText('<some query name>');
  pk_pivotal_toolkit.printText(queryText);
end;

-- this will show only the first 4000 chars of a query
select s.*,pk_pivotal_toolkit.getQueryText(saved_lookup_name) query_text from saved_lookups s
where utl_raw.substr(saved_lookups_id,1,1) = hextoraw('80')


-- see what queries use a specific field:
call pk_pivotal_toolkit.findFieldsInQuery('Company.Type, Contact.First_Name');

You can download the whole package from here.

Conclusion

Now you can know what queries make use of certain field in the database!

I hope you find this useful. If you do please let me know. Also if you know a better way of doing this let me know. And if somebody from Pivotal reads this, please give us a better way of knowing how to get the text for a query.