SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD (Single Instruction, Multiple Data) instruction sets. IA-32 ( Intel Architecture 32-bit) often generically called X86 or x86-32, is the Instruction set architecture of Intel In Computing, SIMD ( S ingle I nstruction M ultiple D ata is a technique employed to achieve data level parallelism as in a Vector An instruction set is a list of all the instructions and all their variations that a processor can execute SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001. The Pentium 4 brand refers to Intel 's line of single- core mainstream desktop and Laptop Central processing units (CPUs introduced Year 2001 ( MMI) was a Common year starting on Monday according to the Gregorian calendar. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. S treaming '''S'''IMD E xtensions ( SSE) is a SIMD (Single Instruction Multiple Data Instruction set extension to the X86 MMX is a single instruction multiple data (SIMD Instruction set designed by Intel, introduced in 1997 in their Pentium line of Intel extended SSE2 to create SSE3 in 2004. SSE3, also known by its Intel code name Prescott New Instructions (PNI, is the third iteration of the SSE instruction set for the IA-32 architecture "MMIV" redirects here For the Modest Mouse album see " Baron von Bullshit Rides Again " SSE2 added 144 new instructions to SSE, which has 70 instructions. Rival chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003. The Opteron is AMD 's X86 server processor line and was the first processor to implement the AMD64 Instruction set architecture (known The Athlon 64 is an eighth-generation AMD64 architecture Microprocessor produced by AMD, released on x86-64 is a Superset of the x86 instruction set architecture. Year 2003 ( MMIII) was a Common year starting on Wednesday of the Gregorian calendar.
Contents |
SSE2 extends MMX instructions to operate on XMM registers, allowing the programmer to completely avoid the eight 64-bit MMX registers "aliased" on the original IA-32 floating point register stack. This permits mixing integer SIMD and scalar floating point operations without the mode switching required between MMX and x87 floating point operations. x87 is a math-related instruction subset of the X86 architecture of processors. However, this is over-shadowed by the value of being able to perform MMX operations on the wider SSE registers.
Other SSE2 extensions include a set of cache-control instructions intended primarily to minimize cache pollution when processing indefinite streams of information, and a sophisticated complement of numeric format conversion instructions. Cache pollution describes situations where an executing Computer program loads data into CPU cache unnecessarily thus causing other needed data to be evicted from
AMD's implementation of SSE2 on the AMD64 (x86-64) platform includes an additional 8 registers, doubling the total number to 16 (XMM0 through XMM15). x86-64 is a Superset of the x86 instruction set architecture. These additional registers are only visible when running in 64-bit mode. Intel adopted these additional registers as part of their support for x86-64 architecture (or in Intel's parlance, "Intel 64") in 2004.
The FPU (x87) instructions usually store intermediate results with 80-bits of precision. When legacy FPU software algorithms are ported to SSE2, certain combinations of math operations or input datasets can result in measurable numerical deviation. This is of critical importance to scientific computations, if the calculation results must be compared against results generated from a different machine architecture.
A notable problem occurs when a compiler must interpret a mathematical expression consisting of several operations (adding, subtracting, dividing, multiplying). Depending on the compiler (and optimizations) used, different intermediate results of a given mathematical expression may need to be temporarily saved, and later reloaded. This results in a truncation from 80-bits to 64-bits in the x87 FPU. Depending on when this truncation is executed, the final numerical result may end up different. The following Fortran code compiled with G95 is offered as an example.
program hireal a,b,c,dreal x,y,za=. 013b=. 027c=. 0937d=. 79y=-a/b + (a/b+c)*EXP(d)print *,yz=(-a)/b + (a/b+c)*EXP(d)print *,zx=y-zprint *,xend
Compiling to 387 floating point instructions and running yields:
# g95 -o hi -mfpmath=387 -fzero -ftrace=full -fsloppy-char hi. for# . /hi 0. 78587145 0. 7858714 5. 9604645E-8
Compiling to SSE2 instructions and running yields:
# g95 -o hi -mfpmath=sse -msse2 -fzero -ftrace=full -fsloppy-char hi. for# . /hi 0. 78587145 0. 78587145 0.
SSE2 extends MMX instructions to operate on XMM registers. Therefore, it is possible to convert all existing MMX code to SSE2 equivalent. Since an XMM register is two times as long as an MMX register, loop counters and memory access may need to be changed to accommodate this.
Although one SSE2 instruction can operate on twice as much data as an MMX instruction, performance might not increase significantly. Two major reasons are: accessing SSE2 data in memory not aligned to a 16-byte boundary will incur significant penalty, and the throughput of SSE2 instructions in most x86 implementations is usually smaller than MMX instructions. In Communication networks, such as Ethernet or Packet radio, throughput is the average rate of successful message delivery over a communication channel See also X86 assembly language The generic term x86 refers to the most commercially successful Instruction set architecture in the history of Personal Intel has recently addressed the first problem by adding an instruction in SSE3 to reduce the overhead of accessing unaligned data, and the last problem by widening the execution engine in their Core microarchitecture. SSE3, also known by its Intel code name Prescott New Instructions (PNI, is the third iteration of the SSE instruction set for the IA-32 architecture The Intel Core microarchitecture (previously known as the Intel Next-Generation Micro-Architecture, or NGMA is a multi-core processor
When first introduced in 2000, SSE2 was not supported by software development tools. For example, to use SSE2 in a Microsoft Developer Studio project, the programmer had to either manually write inline-assembly or import object-code from an external source. Microsoft Visual Studio is the main Integrated Development Environment (IDE from Microsoft. Later the Visual C++ Processor Pack added SSE2 support to Visual C++ and MASM
The Intel C++ Compiler can automatically generate SSE4/SSSE3/SSE3/SSE2 and/or SSE-code without the use of hand-coded assembly, letting programmers focus on algorithmic development instead of assembly-level implementation. Intel C++ Compiler (also known as icc or icl) describes a group of C / C++ Compilers from Intel. Since its introduction, the Intel C Compiler has greatly increased adoption of SSE2 in Windows application development.
Since GCC 3, GCC can automatically generate SSE/SSE2 scalar code when the target supports those instructions. The GNU Compiler Collection (usually shortened to GCC) is a set of Compilers produced for various Programming languages by the GNU Project Automatic vectorization for SSE/SSE2 has been added since GCC 4.
The Sun Studio Compiler Suite can also generate SSE2 instructions when the compiler flag -xvector=simd is used. The Sun Studio compiler suite is Sun Microsystems ' flagship software development product for Solaris and Linux.
SSE2 is an extension of the IA-32 architecture. The Athlon 64 is an eighth-generation AMD64 architecture Microprocessor produced by AMD, released on The Athlon 64 is an eighth-generation AMD64 architecture Microprocessor produced by AMD, released on Sempron has been the marketing name used by AMD for several different entry level desktop CPUs using several different technologies and CPU socket Turion 64 is the Brand name AMD applies to its 64- Bit low-consumption ( mobile) processors The Intel NetBurst Microarchitecture, called P68 inside Intel was the successor to the P6 microarchitecture in the X86 family of CPUs The Pentium 4 brand refers to Intel 's line of single- core mainstream desktop and Laptop Central processing units (CPUs introduced The Xeon brand refers to many families of Intel 's x86 Multiprocessing CPUs – for dual-processor (DP and multi-processor (MP configuration The Celeron brand is a range of X86 CPUs from Intel targeted at budget/value Personal computers €”with the motto "delivering great quality The Celeron brand is a range of X86 CPUs from Intel targeted at budget/value Personal computers €”with the motto "delivering great quality Overview The Pentium M represented a new and radical departure for Intel as it was not a low-power version of the desktop-oriented Pentium 4, but instead a heavily modified The Celeron brand is a range of X86 CPUs from Intel targeted at budget/value Personal computers €”with the motto "delivering great quality The Core brand refers to Intel 's 32-bit mobile Dual-core X86 CPUs that derived from the Pentium M branded processors The Core 2 brand refers to a range of Intel 's consumer 64-bit dual-core and 2x2 MCM quad-core CPUs with the X86-64 instruction set Intel Atom is the Brand name for a line of X86 and X86-64 CPUs (or Microprocessors from Intel, previously code-named Transmeta Corporation ( is a US -based Corporation that licenses low power semiconductor IP The Efficeon processor is Transmeta 's second-generation 256-bit VLIW design which employs a software engine to convert code written for X86 processors VIA Technologies ( is a Taiwanese manufacturer of Integrated circuits mainly Motherboard Chipsets CPUs, and memory, and The VIA C7 is an X86 Central processing unit designed by Centaur Technology and sold by VIA Technologies. VIA Technologies ( is a Taiwanese manufacturer of Integrated circuits mainly Motherboard Chipsets CPUs, and memory, and The VIA Nano (formerly codenamed VIA Isaiah) is a 64-bit Central processing unit for Personal computers released by VIA Technologies IA-32 ( Intel Architecture 32-bit) often generically called X86 or x86-32, is the Instruction set architecture of Intel Therefore any architecture that does not support IA-32 does not support SSE2. x86-64 CPUs all implement IA-32, by definition. x86-64 is a Superset of the x86 instruction set architecture. IA-32 ( Intel Architecture 32-bit) often generically called X86 or x86-32, is the Instruction set architecture of Intel All known x86-64 CPUs also implement SSE2. x86-64 is a Superset of the x86 instruction set architecture. Since IA-32 predates SSE2, early IA-32 CPUs did not implement it. SSE2 and the other SIMD instruction sets were intended primarily to improve CPU support for realtime graphics, notably gaming. A CPU that is not marketed for this purpose or that has an alternative SIMD instruction set has no need for SSE2.
The following CPUs implemented IA-32 after SSE2 was developed, but did not implement SSE2:
|