How to extract the text of a query in Pivotal
Pivotal stores queries using a binary format. Nobody knows (at least I don’t know anybody who does) what this binary structure is. And Pivotal doesn’t provide documentation nor does it provide any tool to extract the text from this queries. If you’ve used Pivotal for long enough you probably have whished to be able to get the text from those queries. A good example of when this could be helpful is when you make a modification to a field and would like to see what parts of the system could be affected by such change. You can query the metadata in the BM and see where the field is being used, but when it comes to queries there is no way to know this, unless of course you go through all the queries in the system and visually identify if the field is being used (ehmm.. I don’t want to do that).
I was recently working in a localization project of a Pivotal system. As part of the localization we decided to take some of the fields that were initially created as Combos and change them to Choices. (Note: Combos are saved as strings in the BD, while Choices are saved as integer and translated to string at run time, which make Choices the best data type for multi-language systems). Making this change is not easy in Pivotal, first because the Toolkit doesn’t allow it, the database doesn’t allow it either, and also because you need to change the data in the ED (I will explain how to do this automatically in a future post). Anyway, the fact is that after the change to Choice was made I had to make sure that all other objects in the BM where “aware” of the change. In the case of queries, you need to change the query (remove the field and then add it again) so it has the right metadata to treat the field in the query as string (Combo) or integer (Choice). I asked some of my coworkers how were they doing this in the past and the answer was…ehmmm… “go through all the queries one by one and see if the field is being used”. There’s no way I’m doing that! So I decided it was time to go deep in the binary format of a query.
I will first talk about the structure of the binary format for the query and then I will provide solutions for both SQLServer and Oracle. The concepts and the solutions presented here work for any version of Pivotal 5.x and 6.x.
Binary Structure of a Pivotal Query
Pivotal stores queries in a table called Saved_Lookups, which has a structure such as the following:
The actual query is saved in a field called SQL_Tree_Binary. If you query this field you will get something similar to this:
As expected, SQL_Tree_Binary contains binary (hexadecimal) values.
First let’s talk about the table Id. This table is present in all pivotal databases: ED, BM and CM. We will focus here in queries that are created with the Toolkit for the system. We don’t want to consider those queries created in the Toolkit for the metadata (which are stored in the same table in the BM). How do we differentiate them? Using the Id (field Saved_Lookups_Id) of the record:
- Records with an Id that starts with 0x80 are system queries
- Records with an Id that starts with 0x00 are metadata queries
You can do this query to identify them:
-- SQLServer select saved_lookup_name, case substring(saved_lookups_id,1,1) when 0x00 then 'metadata' when 0x80 then 'system' end type from saved_lookups -- Oracle select saved_lookup_name, case utl_raw.substr(saved_lookups_id,1,1) when hextoraw('00') then 'metadata' when hextoraw('80') then 'system' end type from saved_lookups
Now, let’s talk about the binary format. There are two kind of formats: Unicode and non-Unicode formats. The binary format started as non-Unicode (single-byte) in the early days of Pivotal, but then (I don’t know exactly when) Pivotal realized that queries needed to support international characters as well and used a Unicode (double-byte) format to save the query. Your system could have them both! There are some queries that haven’t been touched/edited since they were first created which are still in non-Unicode formats. In the latest versions of Pivotal 5.x and 6.x all new queries are created in Unicode format, and those queries that are non-Unicode are changed to Unicode when modified.
The header
The type of format (Unicode and non-Unicode) can be determined by the reading the first 5 bytes:
- 0x0100000000 : non-Unicode
- 0x01000000FF : Unicode
Knowing whether is Unicode or not is important since we will need read the binary accordingly. The binary format has a kind of “header”, which depends on the type:
- Header for non-Unicode queries: 0x0100000000 (length: 5)
- Header for Unicode queries: 0x01000000FFFEFF00FFFEFF (length: 11)
As you can see the header size varies depending on whether is Unicode or not.
The length
After the header we have the length of bytes we need to read to get the text of the query. Again here we have two flavors: when the length can be stored in a single byte (i.e. the length is less than 255) then the length is stored in that same byte. If the length cannot be stored in a single byte then we will find 0xFF and then two bytes for the length. This is:
- 0xXX: when the length is less than 255
- 0xFFYYZZ: when the length is greater than or equal to 255
Here XX or YYZZ is a binary representation of an integer value for the length. But, there is another caveat (why make it easy?) In the case when the length is represented with 2 bytes, those bytes are stored as little-endian, which means that to get the integer value you need to invert YYZZ to ZZYY.
So for example, after reading the header (which can be 5 or 11 bytes depending on whether is Unicode or not) we need to read 1 or 3 more bytes, depending on whether the first byte is 0xFF or not, to get the length. In case we read 1 byte we convert this to integer. If we read 3 bytes we use the second and third bytes inverted and convert that to integer. So we end up with this decision tree:
- If it is Unicode, then skip 11 bytes for the header. For the length
- If the 12th byte is 0xFF then read bytes 14th and 13th (inverted) and convert them to integer
- If the 12th byte is not 0xFF then convert the 12th byte to integer.
- If it is non-Unicode, then skip 5 bytes for the header. For the length
- If the 6th byte is 0xFF then read bytes 8th and 7th (inverted) and convert them to integer
- If the 6th byte is not 0xFF then convert the 7th byte to integer.
The start index
Lastly, we need to know where to start reading our text. As you might have guessed, it will depend on the above conditions. So we have the following:
- If it is Unicode, then skip 11 bytes for the header. For the length
- If the 12th byte is 0xFF we skip 3 bytes so our text starts at byte 15th
- If the 12th byte is not 0xFF we skip 1 byte so our text starts at byte 13th
- If it is non-Unicode, then skip 5 bytes for the header. For the length
- If the 6th byte is 0xFF we skip 3 bytes so our text starts at byte 9th
- If the 6th byte is not 0xFF we skip 1 byte so our text starts at byte 7th
Some examples:
Example 1: SQL_Tree_Binary = 0x01000000FFFEFF00FFFEFFFF7B042800….
The first 5 bytes are 0x01000000FF, so it is unicode. We skip 11 bytes to get the length. Byte 12th is 0xFF, meaning that we need bytes 13th and 14th to get the length. These bytes are 0x7B04. To convert them to integer we invert them to 0x047B. So the length is 1147. And the query starts at byte 15th.
Example 2: SQL_Tree_Binary = 0x01000000FFFEFF00FFFEFF352800….
The first 5 bytes are 0x01000000FF, so it is unicode. We skip 11 bytes to get the length. Byte 12th is not 0xFF, meaning that byte 12th has the length. We convert byte 0x35 to integer and we get that the length is 53. And the query starts at byte 13th.
Example 3: SQL_Tree_Binary = 0x0100000000FF5F052820….
The first 5 bytes are 0x0100000000, so it is not unicode. We skip 5 bytes to get the length. Byte 6th is 0xFF, meaning that we need bytes 7th and 8th to get the length. These bytes are 0x5F05. To convert them to integer we invert them to 0x055F. So the length is 1375. And the query starts at byte 9th.
Example 4: SQL_Tree_Binary = 0x0100000000F12820….
The first 5 bytes are 0x0100000000, so it is not unicode. We skip 5 bytes to get the length. Byte 6th is not 0xFF, meaning that byte 6th has the length. We convert byte 0xF1 to integer and we get that the length is 241. And the query starts at byte 7th.
This is not the whole story on the binary format of the query, but for now it is enough to get us what we want.
Getting the Query Text
Ok, now that we understand the structure is very simple to get the text (or is it?). From our analysis above we get two important pieces of information:
- startIndex: where we can find the text of the query in all the binary
- length: the length of bytes we need to consider to get the text
All is left is to get a substring of the binary value inside SQL_Tree_Binary field, starting at the startIndex and taking length bytes. In the case of a Unicode query we need to take length * 2 bytes (because Unicode is double-byte, remember?).
Both SQLServer and Oracle allow you to convert a binary/raw (respectively) datatype to varchar or nvarchar. I will explain how to do this later, but first let me point something to get a cleaner text:
The binary format used to represent the query is more complex than what’s explained here. The text of the query has two parts:
- The first part is the text of the query as shown in the Toolkit in the query editor when we click the “Show SQL” button
- The second part is a more structured form of the query, with comments for each of the clauses in the query editor.
As an example, take this query:
here is the extracted text:
( ( EXISTS ( SELECT * FROM Company__Type WHERE Company__Type.Company_Type = 5 AND (Company__Type.Company_Id = Company.Company_Id) ) ) AND (Company.Inactive = 0) ) WhereDelim,..,WhereDelim ( ( /* Company Type(Company).Company Type = Partner */ EXISTS ( SELECT * FROM ( SELECT DISTINCT Company__Type.Company_Id FROM Company__Type WHERE Company__Type.Company_Type = 5 ) SubqueryView WHERE SubqueryView.Company_Id=Company.Company_Id ) ) AND (Company.Inactive = 0) )
(indentation and new lines added by me). Notice how the two parts are clearly separated by a special type of delimiter with the text WhereDelim,..,WhereDelim. The second part is closer to what Pivotal will actually send to the database, and it has also comments that correspond to the lines in the query designer, so it is easier to identify. We will only take the second part as the text of our query.
Extracting the Query Text in SQLServer
So, in SQLServer we can create the following view:
create view saved_lookups_text as select table_id, table_name, saved_lookups_id, saved_lookup_name, case when charindex('WhereDelim',query_text) > 0 then substring(query_text,len(query_text) - charindex(reverse('WhereDelim'),reverse(query_text)) + 3,4000) else query_text end query_text from ( select table_id, table_name, saved_lookups_id, saved_lookup_name, case isUnicode when 1 then cast(substring(sql_tree_binary, startIndex, queryLength * 2) as nvarchar(4000)) else cast(substring(sql_tree_binary, startIndex, queryLength) as varchar(8000)) end query_text from ( select saved_lookup_name, case when substring(sql_tree_binary,1,5) = 0x0100000000 then 0 when substring(sql_tree_binary,1,5) = 0x01000000FF then 1 end isUnicode, cast(case when substring(sql_tree_binary,1,5) = 0x0100000000 then case when substring(sql_tree_binary,6,1) = 0xFF then substring(sql_tree_binary,8,1) + substring(sql_tree_binary,7,1) else substring(sql_tree_binary,6,1) end when substring(sql_tree_binary,1,5) = 0x01000000FF then case when substring(sql_tree_binary,12,1) = 0xFF then substring(sql_tree_binary,14,1) + substring(sql_tree_binary,13,1) else substring(sql_tree_binary,12,1) end end as int) queryLength, case when substring(sql_tree_binary,1,5) = 0x0100000000 then case when substring(sql_tree_binary,6,1) = 0xFF then 9 else 7 end when substring(sql_tree_binary,1,5) = 0x01000000FF then case when substring(sql_tree_binary,12,1) = 0xFF then 15 else 13 end end startIndex, s.table_id, t.table_name, s.saved_lookups_id, s.sql_tree_binary from saved_lookups s inner join tables t on s.table_id = t.tables_id where substring(saved_lookups_id,1,1) <> 0x00 ) x ) y
(I know the query could be better written, but for the sake of demonstration is ok)
This query will present something like this (notice the query_text field):
Notice lines 8 and 9, where, due to the lack of a reverse charindex function in SQLServer, we need to do a trick to find the last occurrence of WhereDelim text and then take the text that comes after this. Notice also in lines 19 and 20 how we cast the text to the appropriate charset based on whether the text is Unicode or not.
You can download this SQL script from here.
Extracting the Query Text in Oracle
You guessed it! In Oracle things are not as easy. There are some caveats we need to consider:
- Oracle has a limitation of 2000 characters when using nvarchar2 and 4000 characters when using varchar2 data types. So if our text is longer than this we will get an error when converting things using the package UTL_RAW.
- The package used to read blobs in Oracle is DBMS_LOB. This package has a convenient SUBSTR function to get pieces of the BLOB (in our case the data in the SQL_Tree_Binary field). This function changes the endianness of the data when the data is Unicode (in other words, it inverts it) and thus when converting the Unicode text using UTL_RAW this gets confused and we get funky letters instead.
- You cannot combine a Unicode charset with a non-Unicode charset in the same field using CASE statements
The view in Oracle is like this:
create view saved_lookups_text as select saved_lookups_id, saved_lookup_name, case when unicode = 1 then reverse(cast(utl_raw.cast_to_nvarchar2(utl_raw.reverse(dbms_lob.substr(sql_tree_binary,case when length > 1000 then 2000 else length * 2 end,queryStartIndex))) as varchar2(4000))) else utl_raw.cast_to_varchar2(dbms_lob.substr(sql_tree_binary,case when length > 2000 then 2000 else length end,queryStartIndex)) end queryText, table_id, table_name from ( select saved_lookup_name, dbms_lob.substr(sql_tree_binary,5,1) queryType, case when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then dbms_lob.substr(sql_tree_binary,5,1) when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then dbms_lob.substr(sql_tree_binary,11,1) end prefix, case when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then 0 when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then 1 end unicode, utl_raw.cast_to_binary_integer(case when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then case when dbms_lob.substr(sql_tree_binary,1,6) = hextoraw('FF') then utl_raw.concat(dbms_lob.substr(sql_tree_binary,1,8),dbms_lob.substr(sql_tree_binary,1,7)) else dbms_lob.substr(sql_tree_binary,1,6) end when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then case when dbms_lob.substr(sql_tree_binary,1,12) = hextoraw('FF') then utl_raw.concat(dbms_lob.substr(sql_tree_binary,1,14),dbms_lob.substr(sql_tree_binary,1,13)) else dbms_lob.substr(sql_tree_binary,1,12) end end) length, case when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('0100000000') then case when dbms_lob.substr(sql_tree_binary,1,6) = hextoraw('FF') then 9 else 7 end when dbms_lob.substr(sql_tree_binary,5,1) = hextoraw('01000000FF') then case when dbms_lob.substr(sql_tree_binary,1,12) = hextoraw('FF') then 15 else 13 end end queryStartIndex, s.table_id, t.table_name, s.saved_lookups_id, s.sql_tree_binary from saved_lookups s inner join tables t on s.table_id = t.tables_id where utl_raw.substr(saved_lookups_id,1,1) = hextoraw('80') -- only business queries order by table_id,s.saved_lookup_name )
Notice in line 6 how I do some tricks to overcome all the problems mentioned above. This query shows both parts of the query (doesn’t uses the WhereDelim mentioned above) but it is easy to add. Also, this view doesn’t include queries which text length is more than 4000 characters if non-Unicode or 2000 characters if Unicode, which unfortunately leaves a lot of query in the dark.
In Oracle I have created a package instead that gives me what I want in the query output window. With this package you can do things such as:
-- use a procedure to get the text of the query declare queryText varchar2(8000); begin pk_pivotal_toolkit.getQueryText('<some query name>',queryText); pk_pivotal_toolkit.printText(queryText); end; -- use a function to get the text of the query declare queryText varchar2(8000); begin queryText := pk_pivotal_toolkit.getQueryText('<some query name>'); pk_pivotal_toolkit.printText(queryText); end; -- this will show only the first 4000 chars of a query select s.*,pk_pivotal_toolkit.getQueryText(saved_lookup_name) query_text from saved_lookups s where utl_raw.substr(saved_lookups_id,1,1) = hextoraw('80') -- see what queries use a specific field: call pk_pivotal_toolkit.findFieldsInQuery('Company.Type, Contact.First_Name');
You can download the whole package from here.
Conclusion
Now you can know what queries make use of certain field in the database!
I hope you find this useful. If you do please let me know. Also if you know a better way of doing this let me know. And if somebody from Pivotal reads this, please give us a better way of knowing how to get the text for a query.