



# Using an L2 Cache Module with the Contaq 82C599 PCI Chipset for the Intel 486 CPU

## Overview

Cypress Semiconductor markets the Contaq 82C599 PCI Chipset for Intel® 486-based systems. The Intel 486 CPU has an on-chip 8-Kbyte first level (L1) cache that significantly improves system performance. The Contaq PCI chipset includes an integrated high-performance cache controller for an external second-level (L2) cache.

This application note works through the design decisions that occur when an L2 cache is designed into an Intel 486-based system built with the Contaq PCI chipset. The questions that are addressed are:

- What are the cache requirements?
- Why use a cache module?
  - discrete vs. modular designs
- Which cache module(s)?
  - selecting an L2 cache module

## L2 Cache Requirements

The L2 cache will be defined by size, speed, and type. There is also the matter of buffering the input address bits and providing chip select inputs to the data RAMs.

### Cache Size

The current market requirement for L2 cache in 486-based systems is largely 128 Kbytes with an expansion option to 256 Kbytes. A small percentage of customers request 512 Kbytes. The larger 512-Kbyte cache size is considered useful in high-performance multiprocessing applications. The

Contaq PCI chipset supports cache sizes from 32 Kbytes to 1 Mbyte.

Assume a nominal cache size of 128 Kbytes with an expansion option to 256 Kbytes.

In that case, the data RAMs can be a standard 32Kx8 device (e.g., CY7C199). The 128-Kbyte cache can be built with one bank of four 32Kx8 RAMs. The 256-Kbyte expansion option can be a second bank of four more 32Kx8 RAMs. With the Contaq PCI chipset, the 256-Kbyte cache can be configured as two interleaved banks.

### Cache Speed

The cache should support zero-wait-state operation at a bus frequency of 33 MHz. That requires a tag RAM with an access time ( $t_{AA}$ ) of 15 ns. The access time of the data RAMs depends on the organization. A single-bank array (128-Kbyte) should have  $t_{AA} = 20$  ns. An interleaved two bank array (256-Kbyte) can use slower data RAMs with  $t_{AA} = 25$  ns.

The Contaq PCI chipset also supports a 50-MHz clock option with one wait state (3222). In this mode, the tag RAMs can be slower with an access time of 20 ns. The data RAM access times are the same as noted above.

Assume two cache configurations at 33 MHz: a single-bank 128-Kbyte cache and a two-way interleaved 256-Kbyte cache. The tag RAM will have  $t_{AA} = 15$  ns in either configuration. The 128-Kbyte cache will use 20-ns data RAMs and the 256-Kbyte cache can use lower cost 25-ns data RAMs.

## Cache Type

The cache type can be either write-through or write-back. The Contaq PCI chipset supports both types of cache with an on-chip 8-bit address comparator and logic to process an optional dirty bit.

The Contaq PCI chipset has two write-back modes: 7-bit tag with one dirty bit or 8-bit tag without a dirty bit. In write-through mode, the chipset supports an 8-bit tag.

The type of cache and cache size affect the cacheable address range. With a 7-bit tag, one dirty bit, and 128 Kbytes of cache, the cacheable address range is 16 Mbytes. Increasing the cache size to 256 Kbytes doubles the cacheable address range to 32 Mbytes. With an 8-bit tag, no dirty bit, and 128 Kbytes of cache, the cacheable address range is 32 Mbytes. With 256 Kbytes of cache, the cacheable address range is 64 Mbytes.

Please note that although the system behavior is different for all three modes of operation, the external support hardware (8-bit tag RAM) is exactly the same. The tag RAM size is 8Kx8 for 128 Kbytes of cache and 16Kx8 for 256 Kbytes of cache.

## Address Buffers for 128-Kbyte Cache

The single bank 128-Kbyte cache will require 15 bits of address ( $A_{16:2}$ ). The upper 13 bits ( $A_{16:4}$ ) from the 486 address bus are buffered through a pair of transparent latches (74FCT373C) to minimize the loading on the 486 address bus. The address latches are gated by the ALE signal from the CPU.

The lower two bits ( $A_{3:2}$ ) are time critical for burst accesses and require special handling. To support the different memory configurations, these address inputs are driven by the Contaq PCI chipset. TOGA<sub>2</sub> from the chipset drives cache address  $A_2$ . TOGA<sub>3</sub> from the chipset drives cache address  $A_3$ .

The write enable ( $\overline{CWE}_0$ ) and output enable ( $\overline{CRD}_0$ ) signals for bank 0 from the Contaq PCI chipset are used to drive the write enable and output enable inputs to the data RAMs.

TOGA<sub>2</sub> drives RAM address bit  $A_0$  and TOGA<sub>3</sub> drives RAM address bit  $A_1$ . The upper 13 bits of

latched address ( $LA_{16:4}$ ) are applied directly to the tag RAM address bits  $A_{14:2}$ .

The loading on the CPU address bus ( $A_{16:4}$ ) is therefore limited to two loads (latch and tag RAM). The loading on the TOGA<sub>3:2</sub> outputs from the chipset is four loads (data RAMs). The ALE input from the 486 has two loads (latches).

## Address Buffers for 256-Kbyte Cache

The address requirements for the interleaved two-bank 256-Kbyte cache are somewhat different. The upper 14 bits ( $A_{17:4}$ ) from the 486 address bus are buffered through a pair of transparent latches (74FCT373C) to minimize the loading on the 486 address bus. The address latches are gated by the ALE signal from the CPU.

The lower two address bits ( $A_{3:2}$ ) are provided by the chipset as TOGA<sub>2</sub> (address bit 3 for bank 0) and TOGA<sub>3</sub> (address bit 3 for bank 1). To support the two-way interleave, the Contaq PCI chipset provides separate write enables ( $\overline{CWE}_0$  and  $\overline{CWE}_1$ ) and output enables ( $\overline{CRD}_0$  and  $\overline{CRD}_1$ ) for each bank.

The address to bank 0 of the data RAMs is thus formed by TOGA<sub>2</sub> driving RAM address bit  $A_0$  and latched address  $LA_{17:4}$  driving RAM address bits  $A_{14:1}$ . The address to bank 1 of the data RAMs is formed by TOGA<sub>3</sub> driving RAM address bit  $A_0$  and latched address  $LA_{17:4}$  driving RAM address bits  $A_{14:1}$ .

The upper 14 bits of address ( $A_{17:4}$ ) are applied directly to the tag RAM address bits  $A_{13:0}$ . The tag RAM is implemented as a 32Kx8 part, so the upper address bit  $A_{14}$  of the tag RAM is either grounded or tied to  $V_{CC}$ .

The loading on the CPU address bus ( $A_{17:4}$ ) is two loads (latch and tag RAM). The loading on the TOGA<sub>3:2</sub> outputs from the chipset is four loads (data RAMs). The ALE input from the 486 has two loads (latches).

## Generating Chip Selects $\overline{CS}_{3:0}$

The Contaq PCI chipset requires logic to combine the read/write signal ( $W/R$ ) and byte enables ( $\overline{BE}_{3:0}$ ) from the Intel 486 to form the chip select



**Figure 1. Chip Select Logic**

( $\overline{CS}_{3:0}$ ) inputs to the cache data RAMs as shown in *Figure 1*. A write cycle ( $W/R=1$ ) selects which byte(s) are written based on the byte enables ( $BE_{3:0}$ ). A read cycle ( $W/R=0$ ) selects all bytes for read independent of the byte enables.

This logic is typically implemented in a PLD (e.g., P16L8) to minimize the loading on the read/write line from the processor.

For a 128-Kbyte cache, each chip select input will go to one data RAM (one load). For a 256-Kbyte cache, each chip select will go to one data RAM per bank (two loads).

## Discrete vs. Modular Designs

The L2 cache design that results from the discussion so far is shown in *Figure 2*. The questions now are how much (if any) of the L2 cache will be included on the motherboard and how much (if any) of the logic will be on a module.

Cypress Semiconductor supports either discrete or module-based designs:

- A wide range of 486 L2 cache modules for most popular chipsets
- High-speed SRAMs for tag and data RAMs
- FCT logic for the address buffers
- Fast PLDs for the chip select logic

The decision of a discrete vs. module-based design is usually based on flexibility, board space, and cost.

## Flexibility

Implementing the L2 cache described in this paper as a module allows the customer to choose one of four configurations:

- No cache for lowest possible cost
- Low-cost 128-Kbyte cache
- Higher-performance 256-Kbyte cache
- Custom configuration (e.g., 512 Kbytes cache)

The modules under consideration for this application require a 112-position Burndy socket (part number CELP2X56SC3Z48). This socket is a high-quality, reliable socket that is a standard in the industry.

For contrast, a discrete implementation with the flexibility to support three of these configurations (no cache, 128 Kbytes, 256 Kbytes) would require sockets for the 9 RAMs in the cache design. These sockets would tend to reduce the reliability of the design. The FCT latches and PLD would usually not be socketed to improve the reliability for minimal cost.

In other words, a module-based design is much more flexible than an equivalent discrete design. Cache modules allow customers to tailor the cache to balance cost vs. performance tradeoffs to meet their requirements.

## Board Space

The amount of board space required by a module-based design depends on how much of the required logic is on the module and how much is on the motherboard.

The minimum space occurs when all of the logic is on the module and the motherboard only has a 112-position socket with normal clearance around the socket (usually 0.1 inch). The section on “Selecting an L2 Cache Module” shows that this will not be the case. The chip select logic (one PLD—P16L8) will also be on the motherboard.

A discrete implementation will have nine 28-pin RAMs, two 20-pin latches, and one 20-pin PLD. It may also have sockets for at least the nine RAMs.



**Figure 2. L2 Cache Design**

The amount of board space required for a discrete design is significantly larger than the amount of space required for a module connector and a PLD. Therefore, a cache module design minimizes the amount of board space required on the mother-board.

## Cost

The lowest-cost module option (no cache) requires one 112-pin socket and one 16L8 PLD. This should cost less than two 373 latches, one PLD, and nine 28-pin sockets.

A discrete 128-Kbyte cache will consist of two 373 latches, one PLD, one 8Kx8 RAM, four 32Kx8 RAMs and four 28-pin sockets. The 128-Kbyte cache module will be the same with a 112-pin socket plus a printed circuit board (substrate) minus the four 28-pin sockets. Module vendors will also add a profit margin to the cost of the module. As a result, a 128-Kbyte cache module will usually cost more than an equivalent discrete design.

For a 256-Kbyte cache, the cache module has the same components as the discrete design with the addition of a 112-pin connector, substrate, and ven-

dor margin. The 256-Kbyte module usually will cost more than an equivalent discrete design.

## Selecting an L2 Cache Module

Cypress Semiconductor currently builds 8 different 486 compatible L2 cache modules in a total of 17 configurations. The question is which module is closest to the cache design described in this paper for the Contaq PCI chipset. The criteria are:

- 128/256 Kbytes data RAM
- 8-bit tag RAM
- No dirty RAM
- Address latches gated by ALE as opposed to address buffers
- Bank write enables as opposed to write enables for each chip
- Four chip selects as opposed to bank selects

The winner is the CYM9246/CYM9247/CYM9248 family of cache modules! These modules are very close to the requirements outlined in this paper with the following design considerations:

- The chip select logic resides on the motherboard, instead of the module.
- The Contaq PCI chipset does not require a dirty RAM separate from the tag RAM.
- The TOGA<sub>3:2</sub> address outputs to the module will require a strap on the motherboard.
- The TAGOE input to the module should be grounded on the motherboard.
- The DIRTYCS and DIRTYWE module inputs should be connected to V<sub>CC</sub> on the motherboard.
- The signal naming conventions are different.

With regards to the dirty RAM, the customer has two choices:

- Tie the dirty RAM control signals inactive (V<sub>CC</sub>) on the motherboard and ignore the dirty RAM.
- Ask Cypress to ship the module without the dirty RAM at a reduced cost.



**Figure 3. Address Straps**

The TOGA<sub>3:2</sub> address outputs from the Contaq PCI chipset do not quite match the address inputs to the module and will require the strap logic shown in *Figure 3* on the motherboard.

Please refer to *Table 1* for a signal name cross reference between the Contaq PCI chipset and the CYM9246 cache module family.

**Table 1. Signal Name Cross Reference**

| Contaq PCI Chipset        | 924X Module Family                                               |
|---------------------------|------------------------------------------------------------------|
| TAGWT                     | <u>TAGWE</u>                                                     |
| TAGEN                     | <u>TAGCS</u>                                                     |
| <u>CWE</u> <sub>1:0</sub> | WE <sub>1:0</sub>                                                |
| <u>CRD</u> <sub>1:0</sub> | <u>OE</u> <sub>1:0</sub>                                         |
| CQ <sub>15:8</sub>        | TAG <sub>7:0</sub>                                               |
| TOGA <sub>2</sub>         | A <sub>2-0</sub> (128 KB only)<br>A <sub>3-0</sub> (256 KB only) |
| TOGA <sub>3</sub>         | A <sub>3-0</sub> (128 KB only)<br>A <sub>3-1</sub> (256 KB only) |

## Summary

The CYM9246 family of L2 cache modules can be designed into an Intel 486 system based on the Contaq PCI chipset. By adding a 112-pin DIMM connector, a P16L8, and a two-position jumper strap to the motherboard design, the customer can offer:

- A lowest-possible-cost option with no cache
- A low-cost performance upgrade with a single bank 128-Kbyte cache module (CYM9246)
- A higher-performance upgrade with a two-way interleaved 256-Kbyte cache module (CYM9247)
- Upgrades to larger cache modules such as the CYM9248 (512-Kbyte)