Scratchpad, also known as Scatchpad RAM or Local Store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. A computer is a Machine that manipulates data according to a list of instructions. In reference to a microprocessor ("CPU"), scratchpad refers to a special high-speed memory circuit used to hold small items of data for rapid retrieval. A microprocessor incorporates most or all of the functions of a Central processing unit (CPU on a single Integrated
It can be considered as similar to an L1 cache in that it is the memory next closest to the ALU's after the internal registers.
Scratchpads are employed for simplification of caching logic, and to guarantee a unit can work without main memory contention in a system employing multiple processors, especially in multiprocessor system-on-chip for embedded systems. The multiprocessor System-on-Chip (MPSoC is a System-on-a-chip (SoC which uses multiple processors (see Multi-core) usually targeted for embedded applications They are most suited to storing temporary results (such as would be found in the CPU stack for example) that typically wouldn't always need committing to main memory; however when fed by DMA, they can also be used in place of a cache for mirroring the state of slower main memory. The same issues of locality of reference apply relating to efficiency of use; although some systems allow strided DMA to access rectangular data sets. In Computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations Another difference is that scratchpads are explicitly manipulated by applications.
Scratchpads are not used in mainstream desktop processors where generality is required for legacy software to run from generation to generation, in which the available on-chip memory size may change. They are suited to embedded systems, special-purpose processors and games consoles, where chips are often manufactured as MPSoC, and where software is often tuned to one hardware configuration.
Contents |
Many architectures such as PowerPC attempt to avoid the need for cacheline locking or scratchpads through the use of cache control instructions. Marking an area of memory with "Data Cache Block Zero" (allocating a line but setting its contents to zero instead of loading from main memory) and discarding it after use ('Data Cache Block Invalidate', signaling that main memory needn't receive any updated data) can have the same benefits as a scratchpad, co-existing with the generality of a conventional cache. Generality is maintained in that these are hints and the underlying hardware will function correctly regardless of actual cache size.
Regarding interprocessor communication in a multicore setup, there are similarities between the Cell's inter-localstore DMA and a Shared L2 cache setup as in the Core2 Duo or the Xbox 360's custom powerPC: the L2 cache allows processors to share results without those results having to be committed to main memory. This can be an advantage where the working set for an algorithm encompasses the entirety of the L2. However when a program can be written to take advantage of inter-localstore DMA, the Cell has the benefit of each other Local Store serving the purpose of BOTH the private workspace for a single processor AND the point of sharing between processors i. e. the other Local Stores are on a similar footing viewed from one processor as the shared L2 in a conventional chip. The tradeoff is memory wasted in buffering and programming complexity for synchronization, though this would be similar to precached pages in a conventional chip. Domains where using this capability is effective include:
It would be possible for a conventional processor to gain similar advantages with cache-control instructions, e. g. allowing prefetching to L1 bypassing L2, or an eviction hint that signaled a transfer from L1 to L2 but not committing to main memory; however, at present no systems offer this capability in a usable form and such instructions in effect mirror explicit transfer of data among cache areas used by each core.