Incremental Aggregation Overview
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the PowerCenter Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session. For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation. When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the PowerCenter Server to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The PowerCenter Server then processes only the new data and updates the target accordingly. Consider using incremental aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process only new data. Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and re-create the target with complete source data.
Note: Do not use incremental aggregation if your mapping contains percentile or median functions. The PowerCenter Server uses system memory to process Percentile and Median functions in addition to the cache memory you configure in the session property sheet. As a result, the PowerCenter Server does not store incremental aggregation values for Percentile and Median functions in disk caches. The first time you run an incremental aggregation session, the PowerCenter Server processes the entire source. At the end of the session, the PowerCenter Server stores aggregate data from that session run in two files, the index file and the data file. The PowerCenter Server creates the files in a local directory. Each subsequent time you run the session with incremental aggregation, you use only the incremental source changes in the session. For each input record, the PowerCenter Server checks historical information in the index file for a corresponding group. If it finds a corresponding group, the PowerCenter Server performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the PowerCenter Server creates a new group and saves the record data. When writing to the target, the PowerCenter Server applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session. If the source changes significantly, and you want the PowerCenter Server to continue saving aggregate data for future incremental changes, configure the PowerCenter Server to overwrite existing aggregate data with new aggregate data. For details, see Reinitializing the Aggregate Files. When you partition a session that uses incremental aggregation, the PowerCenter Server creates one set of cache files for each partition. The PowerCenter Server creates new aggregate data, instead of using historical data, when you perform one of the following tasks:
Save a new version of the mapping. Configure the session to reinitialize the aggregate cache. Move the aggregate files without correcting the configured path or directory for the files in the session property sheet. Change the configured path or directory for the aggregate files without moving the files to the new location. Delete cache files. Decrease the number of partitions.
Note: When the PowerCenter Server rebuilds incremental aggregation files, the data in the previous files is lost. Reinitializing the Aggregate Files If the source tables change significantly, you might want to run the session with the entire source data. To do this, you can configure the session to reinitialize the aggregate cache.
Incremental Aggregation Overview
For example, you can reinitialize the aggregate cache if the source for a session changes incrementally every day and completely changes once a month. When you receive the new monthly source, you might configure the session to reinitialize the aggregate cache, truncate the existing target, and use the new source table during the session. After you run a session that reinitializes the aggregate cache, edit the session properties to disable the Reinitialize Aggregate Cache option. If you do not clear Reinitialize Aggregate Cache, the PowerCenter Server overwrites the aggregate cache each time you run the session. Note: When you move from Windows to UNIX, you must reinitialize the cache. Therefore, you cannot change from a Latin1 code page to an MSLatin1 code page, even though these code pages are compatible. Moving or Deleting the Aggregate Files Once you run an incremental aggregation session, avoid moving or modifying the index and data files that store historical aggregate information. If you do move the files into a different directory, and you want the PowerCenter Server to use the aggregate files, you must also change the path to those files in the session properties. As well, if you change the path to the files, but you do not move the files, the PowerCenter Server rebuilds the files the next time you run the session. If you change certain session or server properties, the PowerCenter Server cannot use the incremental aggregation files, and it fails the session. To avoid session failure, delete existing incremental aggregation files when you perform any of the following tasks:
Change the PowerCenter Server data movement mode from ASCII to Unicode or from Unicode to ASCII. Change the PowerCenter Server code page to an incompatible code page. Change the session sort order when the PowerCenter Server runs in Unicode mode. Change the Enable High Precision session option.
Finding Index and Data Files By default, the PowerCenter Server stores the index and data files in the directory entered in the server variable, $PMCacheDir, in the Workflow Manager. The PowerCenter Server names the index file PMAGG*.idx. The PowerCenter Server names the data file PMAGG*.dat. If you run the session using Verbose Init mode, the PowerCenter Server writes the file names in the session log. To locate the files, look in the previous session log for the TE_7034 and TE_7035 messages that indicate the cache file name and location. The following messages show sample entries in the session log: MAPPING> TE_7034 Aggregate Information: Index file is [D:\Informatica\InformaticaServer\Cache\PMAGG8_4_2.idx] MAPPING> TE_7035 Aggregate Information: Data file is [D:\Informatica\InformaticaServer\Cache\PMAGG8_4_2.dat] If you do not run the session using Verbose Init mode or use an identifiable transformation naming convention, you may have difficulty determining which files belong to each session. For more information about cache file storage and naming conventions, see Cache Files.