I found this excellent article by Birgitta Hauser, Software and Database Engineer, Toolmaker Advanced Efficiency GmbH. It covers the basic concepts you will need to consider if you want to “Replace IBM i Native File Access with SQL”
Just in case it vanishes I’m going to reproduce it here: If you are considering SQL and investigating how using SQL Indexes can benefit the business then this makes some excellent coffee time reading.
SQL indexes and native I/O – no contradiction
Profit from the use of SQL indexes when native I/O is used
Native I/O is widely used in RPG and COBOL programs for getting access to the data stored in the database. Data access is mostly performed with the help of data description specifications (DDS) keyed physical and logical files. Unfortunately DDS is an outdated technology, while all future development regarding the database and data access is only integrated into SQL. For profiting from the new indexing technologies in release 6.1, native I/O does not need to be replaced with embedded SQL. SQL indexes can be used with native I/O like any keyed logical file. Moreover, the enhanced indexing technologies could first be used with native I/O while the SQL query optimizer could only partially profit from these enhancements.This article explains how SQL indexes can be used in conjunction with native I/O and how native I/O can profit from the enhanced indexing technologies.
With every release and technology refresh, powerful enhancements are included in IBM® DB2® for i. Among them, multiple enhancements for defining and modifying SQL- defined database objects, such as tables, views and indexes can be found.
Even though DDS is an outdated technology and DB2 for i enhancements primarily focus on SQL, a lot of companies refrain from defining their database objects with SQL in favor of DDS. The main reason for this hesitation is, data access in most (legacy) RPG and COBOL programs is performed in the traditional way, with the record-level access method. What these companies are not aware of is that SQL-defined database objects can be accessed with native I/O in the same way as DDS-described physical and logical files.
Logical file types
Before using SQL-defined database objects or translating existing DDS-described physical or logical file definitions into their SQL equivalents, we first have to differentiate between the various types of logical files.
DDS basically provides a single type of logical files while SQL principally includes two different object types, views and indexes.
DDS-described logical files
Traditionally DDS-described logical files are used in conjunction with native I/O for getting fast access to the data and reading the data in a predefined sequence. Those logical files include an access path, where a single or multiple independent key fields (compound key) are defined. With the help of these keys, it is possible to locate records or rows in a physical file or table and read the data in a predefined sequence.
Even though it is possible to define logical files without key information, they are rarely defined for use with native I/O.
A DDS-described logical file is not restricted to a single physical file or table, but multiple files can be joined together. A DDS-described logical file can be defined to reference DDS-described physical files or SQL tables. In joined logical files, it is even possible to combine DDS-described physical files and SQL-defined tables.
DDS-described logical files can include key definitions, but all key fields must be located within the same physical file or table. When a DDS-described logical file contains key definitions, it is referred to as a DDS-keyed logical file.
DDS-described logical files can either include all or a subset of fields or columns of the based physical files or tables. This method is often used for avoiding access to sensitive data, for example the salary or birthday. With this technique, it is even possible to access data with native I/O in SQL tables that include columns defined with data types that are not supported by DDS, RPG or COBOL, such as large objects (LOB) or XML documents.
When using logical files that include only a subset of the columns in the base table, only the predefined columns can be accessed with native I/O. If a native I/O write action is performed against those logical (un-joined) files, the excluded columns are populated with their default values. When performing an update operation, the values in the excluded columns remain untouched.
Additionally a DDS-described logical file can include SELECT / OMIT clauses, that is, rules for reducing data access at the row or record level.
When accessing data with native I/O, it is the programmer who decides which physical and logical files are needed and includes them directly in the source code. Once defined it is always used without any ifs, ands or buts. If a logical file to be accessed with native I/O is deleted, the program will crash with the next call.
When accessing the data with SQL methods, the query optimizer decides if and what access paths (defined keys) have to be used. Those access paths can be located in either DDS-keyed logical files or indexes. Even though a DDS-keyed logical file can be specified within an SQL statement, it is not recommended. Contrary to native I/O where the programmer decides and explicitly defines the logical files to be used, in SQL the decision is made by the query optimizer. Specifying a DDS-keyed logical file will not force the query optimizer to use exactly this access path. Instead, the specified DDS-keyed logical file first has to be analyzed. The query has to be rewritten, based on the physical files or tables and other information such as keywords or SELECT / OMIT clauses included in the DDS description of the logical file. The key information included in the DDS description, however, is ignored. After having rewritten the SQL statement, all available access paths located in either DDS-keyed logical files or SQL indexes or key constraints are evaluated with regard on the WHERE conditions, the JOIN information, the GROUP BY clause and the ORDER BY criteria. The access path decision is also influenced by the detail found in the Global Statistics Cache (GSC), which is managed by the SQL Query Engine (SQE) Statistics Manager.
SQL views
Using SQL views is a great concept for moving business logic into the database, masking the complexity of the SQL statements and reducing the final source code. An SQL view is based on an SQL SELECT statement and can include everything that is allowed within a SELECT statement with one exception (ORDER BY is not allowed). Within a view definition multiple physical files or tables and other views can be joined together by using any allowed join method. Scalar and / or user defined functions (UDFs) can be used for converting and modifying existing column values or for building new columns. Data can be condensed and summarized by including aggregate functions and GROUP BY clauses. With the use of UNION, INTERSECT, or EXCEPT expressions, multiple sub-SELECT statements can be merged together. Additionally common table expressions, hierarchical query clauses and nested sub-SELECT statements can be included.
Because SQL views cannot include an ORDER BY clause, they never include key information. Because of the lack of any key information, views do not have to be updated when adding, modifying, or deleting rows in the underlying tables. In this way, theoretically hundreds of views can be built over the same table without causing any performance decrease or increase in storage consumption.
SQL views can be used in composition with SQL like any physical file or table. Even though an ORDER BY clause is not allowed within the view definition, an ORDER BY clause can be added in the final SELECT statement when accessing a view. If the data has to be returned in a predefined sequence, the use of the ORDER BY clause is mandatory. Otherwise, there is no guarantee for the data to be always returned in the required sequence, because the SQL optimizer could decide to use a different access path.
SQL views are very powerful for moving business logic into the database, encapsulating complexity and even reducing source code, at least if the data access is performed using SQL.
Because of the lack of any key information, the use of SQL views with native I/O is rather restricted. .
SQL indexes
By default DDS-created physical files, SQL tables or SQL views do not include any key information. Key information can only be added to DDS physical files.
When accessing data located within SQL tables or physical files, the data is found and returned. Without additional access paths (SQL indexes, DDS-keyed logical files, or key constraints), the complete table or physical file is searched. A table scan is not a problem for small tables or physical files with a few hundred rows, but can be a time consuming process for large tables with several millions or even billions of rows.
In the SQL world, access paths for speeding up data access are stored in separate database objects, known as SQL indexes.
When using native I/O it is the programmer who specifies the logical files that have to be used without ifs and buts.
When running SQL statements, it is the query optimizer that makes the decision if and what access paths should be used. In this way, there is no reason to reference a DDS-keyed logical file or SQL index within an SQL statement. The optimizer analyzes the SQL statement, and validates the existing access paths (in DDS-keyed logical files and SQL indexes) by considering the information provided by the statistics manager about data composition. The optimizer might even decide combining multiple access paths for the same table or physical file in the same query. If the right access paths exist, the optimizer’s decisions are more flexible and precise than most decisions a programmer can make.
SQL indexes can be built based on SQL-defined tables as well as DDS-described physical files.
SQL provides two different kinds of indexes.
Binary radix tree indexes
Binary radix tree indexes are traditional access paths containing all key information. Traditional binary radix tree indexes can be compared with DDS-keyed logical files without any JOIN instruction, field or column selection, and SELECT / OMIT clauses. To provide fast data access, the key values are arranged in the form of a tree. To find the required key values, the index branches are traversed until the required leaf node(s) are found.
Figure 1 shows a small database table with a single key field containing names. It also shows the structure of a binary radix tree index which can be used for retrieving the record or row for a specific name or for returning the data ordered by the name.
On the lowest level, behind the leaf is a bitmap containing the information about which row or record matches the key values. Based on this bitmap, the requested data itself can be located and retrieved from the database.
When accessing data with SQL, binary radix tree indexes are considered (costed) if less than 15 % to 20 % of the rows in a table will be returned. Otherwise, a table scan is likely to be used.
Encoded vector indexes (EVIs)
Encoded vector indexes are special kind of indexes that can be used to get fast data access if between 20% to 70% of the rows or data in a table are being returned. But EVIs provide only restricted use for joining tables and / or ordering data.
With IBM i 6.1, both index types were enhanced. So newly generated key columns (derived indexes) and / or WHERE conditions (sparse indexes) can be included.
SQL indexes are not allowed to be specified within an SQL statement. The query optimizer can use all access paths in either DDS-keyed logical files or binary radix tree indexes, or encoded vector indexes.
When accessing data with native I/O, the programmer can even decide to use an SQL index. Binary radix tree indexes can be defined and used with native I/O similar to any DDS-keyed logical file. Encoded vector indexes are special SQL objects and cannot be accessed with native I/O.
Generating SQL indexes
SQL Indexes can be generated similar to any (SQL) database object by constructing and running the appropriate SQL commands.
New indexes (binary radix tree and EVIs) are generated with the CREATE INDEX statement.
Existing SQL indexes can be deleted with the DROP INDEX statement. Before re-creating an index, it first has to be deleted. There is no OR REPLACE option available in the CREATE INDEX statement.
Labels and longer comments can be added or modified by running a LABEL ON INDEX statement and / or a COMMENT ON INDEX statement.
Scripting tools
There are several interfaces available that allow those SQL commands to be typed in and executed immediately, for example the IBM i Navigator includes a Run SQL Script interface.
A similar Run SQL Script interface was added to the IBM i Access Client Solutions package in December, 2015 but it does not yet include all the options that are available in the based IBM i Navigator’s Run SQL Script interface.
IBM i Access Client Solutions consolidates the most commonly used tasks for managing your IBM i into one simplified location. Refer IBM i Access Client Solutions for more information.
SQL indexes can be generated with the interactive SQL (STRSQL) green screen command interface. But STRSQL is an outdated technology and should no longer be used.
The SQL scripts entered and run with one of the Run SQL Script interfaces can be stored as a *.sql file on your workstation or as a stream file located within the integrated file system (IFS). Those stored SQL scripts can be reopened and rerun with either of these interfaces.
The IBM i Navigator Run SQL Script also allows SQL scripts being stored as traditional source physical file member.
SQL scripts stored in physical file members or as stream files in the IFS can be run either with the IBM i Navigator Run an SQL scripting tool or with the Run SQL Statements (RUNSQLSTM) CL command.
IBM i Navigator – wizards
The downside of the scripting tools is that the correct syntax of the SQL statements needs to be known because there are only restricted assistance and limited prompt methods. But both IBM i Navigator versions, Windows based and browser based, include a wizard for creating SQL indexes.
These wizards support all enhancements that have been added since IBM i 6.1, including the creation of new key columns based on an SQL expression or the addition of the WHERE conditions to an index. Even the special options necessary for native I/O can be set.
Accessing data with native I/O
The traditional method for accessing data with native I/O was to use DDS-keyed logical and physical files.
DDS is an outdated technology and most future DB2 for i improvements are focused on SQL. Therefore, it would be preferred
for IBM i clients to shift from DDS to the SQL Data Definition Language (DDL) even if the data access approach continues to use native I/O.
SNIPETTY SNIP SNIP - pages of useful information and pictures. Check the attached PDF file to view all the details
Conclusion
SQL provides a lot of very powerful specifics for creating indexes, such as using scalar functions for new key columns or adding WHERE conditions for pre-filtering the rows to be accessed. These enhancements are not available in DDS, which is an outdated technology, while all future development is in SQL.
SQL indexes can be used in conjunction with native I/O similar to any keyed logical file. Using the enhanced indexing technologies, especially with derived and sparse indexes, business logic can be moved into the database and the source code can be reduced while the data access is still performed with native I/O.
However, for future development, alternatives to native I/O such as creating SQL views based on (complex) SELECT statements and accessing them with embedded SQL should be considered. That does not mean native I/O has to be completely replaced with embedded SQL. Both methods (embedded SQL and native I/O) have their strength and merits.
And now have fun in creating SQL indexes, native I/O, and embedded SQL.