Memory Use in DataFlex Web Applications

Note

This is a temporary article that will be recast in the appropriate places throughout the documentation.

Background

A recent Forum thread about the size of the memory footprint for WebApp processes pointed out that under certain circumstances the memory used seemed significant (in the hundreds of megabytes per process). As the conversation continued it was apparent that the topic of memory use needed clarification. The purpose of this document is to focus on how the number, type, and structure of tables in a web application impact memory use, which is by far the most significant factor in the total memory necessary per process.

Is the information in this document pertinent to your application?

Significant memory consumption does not impact all, or even many, web applications. There are a few simple aspects to consider in determining if you need to be sensitive to memory consumption and explore ways to reduce it:

The embedded database has limited row lengths and only uses one buffer per open table, so it is extremely unlikely you will experience significant memory consumption unless you are using an SQL backend.
If your application uses a relatively small set of tables and relatively small row sizes, the chances that your application will consume significant amounts of memory is also very low even with an SQL server. You may still find the information below about what consumes memory interesting and surprisingly useful.
Even if your web applications consume significant amounts of memory per process, it may not be an issue that needs to be addressed if high memory use is not causing problems.

So, the bottom line is that significant memory consumption per process requires analysis in your environment to move from “that’s just how it works” to “we have a problem to deal with”. That threshold may be crossed when using SQL, having tables with very large row lengths, having many tables open in the application, and loading a large enough number of processes that would all come together and exceed the memory capacity of a reasonably configured server.

To aid you in this analysis, we’ve created the AnalyzeTableMemoryUsage basic project that you can compile in any workspace. We’ll discuss this in more detail below.

If we’ve caught your attention, proceed…

What uses memory, and when?

The largest consumer of memory in an SQL environment is opening a table. For each table opened, memory is allocated for:

the data buffer (equal to the row length and statically allocated),
the find cache for performance tuning (equal to the row length × the block size and statically allocated),
and each data dictionary (DD) instance in the application (equal to the size of the data in the DD buffer at any given time, because it is dynamically allocated).

There are other factors involved, like argument size and how that impacts the amount of data that moves between the data buffer and the DD buffers, but for now we’ll stay focused on the basics:

Each table has one data buffer = row length.
Each table has a configurable find cache = block size × row length.
Note that BLOCK_SIZE has a default value of 10 (this can be set in the table.int file) or at runtime via the DF_FILE_BLOCK_SIZE attribute. The find cache can also be controlled via Find_Cache_Timeout or the DF_DRIVER_FIND_CACHE_TIMEOUT and DF_DATABASE_FIND_CACHE_TIMEOUT attributes. See the product documentation for details.
Each DD instance has a buffer that is based on the actual data in memory, with the maximum being the row length (but also limited by argument size).

Example: the Customer table in the examples installed with the Studio has a total row length of only 1,222 bytes. With the default block size of 10 rows, the expected memory footprint for opening this table in your application would be 13,422 bytes (row length × 11). If we add in three instances of data dictionaries for the customer table, that could use as much as 3,666 bytes (row length × number of DD instances), but that would only peak when rows that contained the maximum data were in memory. At that level, we’re still only using a total of 17,008 bytes of memory in this application for the Customer table. Even if we had 100 such tables, the projected memory use per process (for tables) would be 100 × 17,088 bytes — approximately 1.7 MB.

Of course, this is not a representative table example. We ran statistics against an application workspace provided by a developer and found that across the 74 tables used in the application it was common to have tables with record lengths in the 17 KB range (there were 7 of them). The default memory use for each of them would be 191 KB (row length × 11) and when totaled would be 1.3 MB. Add in our basic assumption of three DD instances for each of those tables and you could possibly have an additional 365 KB peak (but only if the rows in memory were of maximum defined length). In this particular workspace, the total combined row length for all 74 tables was 140 KB with a projected total memory footprint of 1.5 MB for table buffers and perhaps another 420 KB for DD buffers (assuming 3 DDs per table and maximum length data in all) — totaling just under 2 MB. Make the application even 4 times larger and we’re still not even in the 10 MB range of memory needed for all the tables and data.

So how can there be reports of memory consumption in the hundreds of megabytes per process?

Enter large columns.

While every application is different and, as developers, we each have our own habits, some basics often come into play:

Text columns always allocate a minimum of 16 KB.
(max) columns always allocate a minimum of 16 KB.
These are listed separately because, even though Text columns default to (max) data types, (max) data types can be mapped to ASCII columns as well.
The data column in WebAppServerProps is set to 1 MB when converted to Microsoft SQL Server in some conversions.
Developers tend to size columns on the large side “just in case” (for example, did the data area for WebAppServerProps really need to be 1 MB?).

The behaviors associated with the combination of large columns and block sizes, and then further multiplied by the number of processes in the web application, is where some developers can start to see significant memory use in web applications.

Example: change each of the 7 tables that had row lengths of 17 KB and assume that, upon moving to SQL, the text columns were changed to 1 MB. We now have added at least 77 MB to the memory footprint (per process) — and that’s just for the file and find cache buffers. This shows how very large columns can add up quickly.

How can we reduce the memory footprint?

Note: The fundamentals discussed in this section are generalized so as not to get bogged down in backend-specific details. Everything needs to be taken into the specific context of individual instances. Factors that may limit options include:

Data types available (these can change even within revisions of a particular backend).
Data type sizes in memory (especially when Unicode types come into play).
Row length limitations (often tied to the data types used).
Who owns the data (converted from DataFlex or “native” and used by other applications).
How much of a difference would a change make?

Keep in mind that some techniques discussed below cannot be used in some environments, cannot be used to the same extent, or even if used, may not result in a significant memory reduction.

Block size

We’ll start with block_size, because it multiplies the memory use of the underlying data structure for a very specific purpose: the find cache for each table. It is also the one aspect that can be tuned without any changes to the database itself. The only potential impact is application performance, and that can be easily tested. Remember that the validity of the find cache is temporary (only within the scope of the find_cache_timeout, which defaults to 10 milliseconds), so it usually comes into play in tight finding loops (reports, batch processes, filling selection lists, etc.).

The first question to ask about every table is “how often, and under what circumstances, do sequential finds take place?” While rare, there are some tables that never have sequential finds performed, so allocating a significant find cache is simply wasted memory. WebAppServerProps is one example — the only finds done on this table are FIND EQ. Of course, the minimum block size is 2, so we can’t reclaim all the memory allocated for the find cache for such tables, but since the default block size is 10, we can still cut down the use by 80%.

For the embedded database, WebAppServerProps is only 16 KB, so the default memory use is only 176 KB. But with the column change to 1 MB when converted to Microsoft SQL Server, that expands to over 11 MB just for that one table. We can reclaim 8 MB of that by adjusting the block size to its minimum of 2.

Tables defined as “extensions” (one-to-one relationships) are another example of tables that would probably only use FIND EQ operations; once a row in the “main” table is found, it uses an index value to FIND EQ for the secondary table and the extended data.

Tables that only have FIND EQ operations are rare, but there can still be advantages to tuning the block size of other tables to determine if there is any significant performance loss if the find cache is reduced. Look for tables that have large row sizes and reduce the block size to see if users notice any difference. Remember, only tight loops where the finds happen within the find cache timeout would be aided by the find cache — so the best candidates for change would be lookup tables and others that are not used heavily or rarely have reports or batch processes associated with them.

The key when examining potential block size changes is not to invest time unless a block size reduction will result in significant memory use reduction. Also remember that for tables with very large row sizes, even a minor reduction in block size (for example, going from the default of 10 down to 6 or 7) may save significant memory per process without significantly changing the performance profile. For testing, we recommend setting the block size to the minimum value (2) and then working back up as needed. Setting it to 2 will give the earliest indication of sensitivity for any particular table.

While block size adjustments can result in the largest reductions in memory use when applied to tables with large row sizes, they are also the only change that can have an associated performance impact.

One technique for tables that have very large row sizes but need higher block sizes for reports, batch processes, grids, and lists is to keep the block size low in the table.int file but set it higher dynamically when needed. You can set DF_FILE_BLOCK_SIZE outside of a Structure_Start...Structure_End operation and it will immediately change the memory allocation for the find cache and then reduce it when set back to the original value. For example:

Integer iBaseBlockSize iTempBlockSize
Move 10 to iTempBlockSize // this can be whatever value you want for desired performance
Get_Attribute DF_FILE_BLOCK_SIZE of {tableNumber} to iBaseBlockSize
// {tableNumber} is likely going to be the main DD for the operation
// all the related files will be found with Find EQs and won’t benefit from a block size adjustment.
Set_Attribute DF_FILE_BLOCK_SIZE of {tableNumber} to iTempBlockSize
// keep in mind that whatever process is responding to this request will immediately
// allocate more memory once set
// do stuff
Set_Attribute DF_FILE_BLOCK_SIZE of {tableNumber} to iBaseBlockSize
// once the operation is complete we still need to set the find cache back to its starting point
// or the memory allocated for the responding process will stay at the higher level
// the value in the table.int file is only read upon opening the table.

Note: The above technique can be very useful when performance for a particular process would benefit from a significantly increased find cache, regardless of the memory use impact.

Data types

Usually only text and binary columns come into play during this discussion because, by default, they are mapped to backend data types that store large data (for instance, the (max) data types in Microsoft SQL Server). There are cases where developers may have mapped what would typically be ASCII columns to larger backend data types, so those can come into play as well.

We’ll use Microsoft SQL Server data types in this section, but all backend databases have similar behaviors.

DataFlex has default mappings for the data types used in the embedded database to the SQL backend:

ASCII → char
These can also be mapped to varchar, nchar and nvarchar.
In DataFlex 20, the default is nchar to support Unicode data.
Text → varchar(max)
These can also be text, ntext and nvarchar(max).
In DataFlex 20, the default is nvarchar(max) to support Unicode data.
Binary → varbinary(max)
These can also be image and binary.
No change in DataFlex 20 from prior revisions.

As mentioned earlier, (max) data types will always allocate a minimum of 16 KB, even if they are for much smaller data. So, first examine tables that use a large number of text columns set to sizes that would have fit within ASCII data types.

Example: a table had over 230 columns defined as nVarChar(max), but all with a DataFlex length set to 40. The resulting buffer size was over 3.5 MB, and when combined with the default block size of 10, resulted in memory use of almost 40 MB per process. Changing those columns to nVarChar(40) would reduce the buffer dramatically to less than 28 KB and the total memory per process to around 305 KB. (This was an extreme case; there could be reasons for the data types used that are outside of your control.)

The bottom line is to build awareness of all the tables and data types used. This can be a tedious task, so we’ve created a basic project (AnalyzeTableMemoryUse.src) that can be dropped into any workspace to provide an immediate overview of every table in filelist and project the memory use impact for each one. It also looks for columns that may be unnecessarily mapped to data types that use minimum allocations and points out any tables that have very large columns (where block size may come into play).

Example output from running this against a converted WebOrder sample workspace:

Workspace: Order Entry Mobile Application
Data Path: C:\DataFlex Examples\DataFlex 19.1 Examples\WebOrderMobile\Data\
Date Analyzed: 06/02/2020

Table: MSSQLDRV:OrderSystem
Columns: 4
Block Size: 10
Total Buffer Size: 73
Expected Memory Use: 803

Table: MSSQLDRV:Vendor
Columns: 8
Block Size: 10
Total Buffer Size: 137
Expected Memory Use: 1,507

Table: MSSQLDRV:Inventory
Columns: 6
Block Size: 10
Total Buffer Size: 92
Expected Memory Use: 1,012

Table: MSSQLDRV:Customer
Columns: 15
Column 'Comments' VarChar(max), length used = 1024, recommend change to VarChar
Block Size: 10
Total Buffer Size: 32,964
Expected Memory Use: 362,604
Potential Memory Reduction: 168,949

Table: MSSQLDRV:SalesPerson
Columns: 3
Block Size: 10
Total Buffer Size: 49
Expected Memory Use: 539

Table: MSSQLDRV:OrderHeader
Columns: 9
Block Size: 10
Total Buffer Size: 96
Expected Memory Use: 1,056

Table: MSSQLDRV:OrderDetail
Columns: 6
Block Size: 10
Total Buffer Size: 60
Expected Memory Use: 660

Table: MSSQLDRV:CodeType
Columns: 3
Column 'Comment' VarChar(max), length used = 1024, recommend change to VarChar
Block Size: 10
Total Buffer Size: 16,423
Expected Memory Use: 180,653
Potential Memory Reduction: 168,949

Table: MSSQLDRV:CodeMast
Columns: 3
Block Size: 10
Total Buffer Size: 50
Expected Memory Use: 550

Table: MSSQLDRV:WebAppSession
Columns: 9
Block Size: 10
Total Buffer Size: 173
Expected Memory Use: 1,903

Table: MSSQLDRV:WebAppUser
Columns: 5
Block Size: 10
Total Buffer Size: 92
Expected Memory Use: 1,012

Table: MSSQLDRV:WebAppServerProps
Columns: 11
Block Size: 10
Total Buffer Size: 1,048,741
Expected Memory Use: 11,536,151

Total Expected Memory Use: 12,088,450
Potential Memory Reduction: 2,027,388
Tables to Examine for Further Optimization:
MSSQLDRV:WebAppServerProps / 11,536,151

Any table with very large columns will be listed for further examination so you can easily see which tables may benefit most from a reduction in block size. In fact, just setting the block size on WebAppServerProps to the minimum of 2 changes the profile to:

Table: MSSQLDRV:WebAppServerProps
Columns: 11
Block Size: 2
Total Buffer Size: 1,048,741
Expected Memory Use: 3,146,223

Total Expected Memory Use: 3,698,522
Potential Memory Reduction: 2,027,388

With one inconsequential change (no recompile required), the per-process memory consumption was reduced from ~12 MB to ~3.7 MB; changing the data type used in two other columns could reduce it by another ~2 MB.

What does the future hold?

DataFlex 2021 introduced changes as a result of Unicode use. The buffer allocations will increase because of the increased size of Unicode data, so optimizations are being made for some data types so that, where possible, they don’t allocate their full 16 KB (see the notes about text and (max) columns above).

The documentation for DataFlex 2021 covers this in detail.

For revisions after DataFlex 2021, the DataFlex team will look at how the DataFlex database drivers allocate memory for cache buffers and see if allocations can be made more dynamic.