Nvidia Geforce 6 Series Manual
Here you can view all the pages of manual Nvidia Geforce 6 Series Manual. The Nvidia manuals for Processor are available online for free. You can easily download all the documents as PDF.
Page 11
30.3 GPU Features This section covers both fixed-function features and Shader Model 3.0 support (de- scribed in detail later) in GeForce 6 Series GPUs. As we describe the various pieces, we focus on the many new features that are meant to make applications shine (in terms of both visual quality and performance) on GeForce 6 Series GPUs. 30.3.1 Fixed-Function Features Geometry Instancing With Shader Model 3.0, the capability for sending multiple batches of geometry with one Direct3D call has been added,...
Page 12
482 Z-Cull NVIDIA GPUs since GeForce3 have technology, called z-cull, that allows hidden sur- face removal at speeds much faster than conventional rendering. The GeForce 6 Series z-cull unit is the third generation of this technology, which has increased efficiency for a wider range of cases. Also, in cases where stencil is not being updated, early stencil reject can be employed to remove rendering early when stencil test (based on equals comparison) fails. Occlusion Query Occlusion query is the ability...
Page 13
distance passes the test, it’s in light; if not, it’s in shadow. NVIDIA GPUs have dedi- cated transistors to perform four z-compares per pixel (on four neighboring z-values) per clock, and to perform bilinear filtering of the pass/fail data. This more advanced variation of percentage-closer filtering saves many shader instructions compared to GPUs that don’t have direct shadow buffer support. High-Dynamic-Range Blending Using fp16 Surfaces, Texture Filtering, and Blending GeForce 6 Series GPUs allow for...
Page 14
484 ●Dynamic flow control.Branching and looping are now part of the shader model. On the GeForce 6 Series vertex engine, branching and looping have minimal overhead of just two cycles. Also, each vertex can take its own branches without being grouped in the way pixel shader branches are. So as branches diverge, the GeForce 6 Series vertex processor still operates efficiently. ●Vertex texturing. Textures can now be fetched in a vertex program, although only nearest-neighbor filtering is supported in...
Page 15
separate textures. So, for example, the surface normal and the diffuse and specular material properties could be written to textures, and the textures could all be used in subsequent passes when lighting the scene with multiple lights. This is illustrated in Figure 30-8. ●Dynamic flow control (branching).Shader Model 3.0 supports conditional branch- ing and looping, allowing for more flexible shader programs. ●Indexing of attributes. With Shader Model 3.0, an index register can be used to select which...
Page 16
486 ●3:1 and 2:2 coissue.Each four-component-wide vector unit is capable of executing two independent instructions in parallel, as shown in Figure 30-9: either one three- wide operation on RGB and a separate operation on alpha, or one two-wide opera- tion on red-green and a separate two-wide operation on blue-alpha. This gives the compiler more opportunity to pack scalar computations into vectors, thereby doing more work in a shorter time. ●Dual issue. Dual issue is similar to coissue, except that the...
Page 17
Fragment Processor Performance The GeForce 6 Series fragment processor architecture has the following performance characteristics: ●Each pipeline is capable of performing a four-wide, coissue-able multiply-add (MAD) or four-term dot product ( DP4), plus a four-wide, coissue-able and dual-issuable multiply instruction per clock in series, as shown in Figure 30-11. In addition, a multifunction unit that performs complex operations can replace the alpha channel MADoperation. Operations are performed at...
Page 18
488 Table 30-2.Overhead Incurred When Executing Flo w-Control Operations in Fragment Programs Instruction Cost (Cycles) If/ endif4 If/else/ endif6 Call2 Ret2 Loop/ endloop4 Furthermore, branching in the fragment processor is affected by the level of divergence of the branches. Because the fragment processor operates on hundreds of pixels per instruction, if a branch is taken by some fragments and not others, all fragments exe- cute both branches, but only writing to the registers on the branches each...
Page 19
30.4 Performance489 Table 30-3. Data Storage Formats Supported by GeForce 6 Series GPUs FormatDescription of Data in Memory Ver tex Texture SupportFragment Texture SupportRender Target Support B8One 8-bit fixed-point number✗✓✓ A1R5G5B5A 1-bit value and three 5-bit unsigned fixed-point numbers✗✓✓ A4R4G4B4Four 4-bit unsigned fixed-point numbers✗✓✗ R5G6B55-bit, 6-bit, and 5-bit fixed-point numbers✗✓✓ A8R8G8B8Four 8-bit fixed-point numbers✗✓✓ DXT1Compressed 4×4 pixels into 8 bytes ✗✓✗ DXT2,3,4,5Compressed...
Page 20
490 30.5 Achieving Optimal Performance While graphics hardware is becoming more and more programmable, there are still some tricks to ensuring that you exploit the hardware fully to get the most perform- ance. This section lists some common techniques that you may find helpful. A more detailed discussion of performance advice is available in the NVIDIA GPU Program- ming Guide , which is freely available in several languages from the NVIDIA Developer Web site...
All Nvidia manuals