Nvidia Geforce 6 Series Manual
Have a look at the manual Nvidia Geforce 6 Series Manual online for free. It’s possible to download the document as PDF or print. UserManuals.tech offer 9 Nvidia manuals and user’s guides for free. Share the user manual or guide on Facebook, Twitter or Google+.
Alternatively, conditional writes (that is, write if a condition code is set) can b\ e used when branching is not performance-effective. In practice, the compiler will use the method that delivers higher performance when possible. 30.5.4 Use fp16 Intermediate Values Wherever Possible Because GeForce 6 Series GPUs support a full-speed fp16 normalize instruction in parallel with the multiplies and adds, and because fp16 intermediate values reduce internal storage and datapath requirements, using fp16 intermediate values wherever possible can be a performance win, saving fp32 intermediate values for cases where the precision is needed. Excessive internal storage requirements can adversely affect performance in the follow- ing way: The shader pipeline is optimized to keep hundreds of fragments in flight given a fixed amount of register space per fragment (four fp32 ×4 registers or eight fp16×4 registers). If the register space is exceeded, then fewer fragments can remain in flight, reducing the latency tolerance for texture fetches, and adversely affecting performance. The GeForce 6 Series fragment processor will have the maximum number of fragments in flight when shader programs use up to four fp32 ×4 temporary registers (or eight fp16 ×4 registers). That is, at any one time, a maximum of four temporary fp32×4 (or eight fp16 ×4) registers are in use. This decision was based on the fact that for the over- whelming majority of analyzed shaders, four or fewer simultaneously active fp32 ×4 registers proved to be the sweet spot during the shaders’ execution. In addition, the architecture is designed so that performance degrades slowly if more registers are used. Similarly, the register file has enough read and write bandwidth to keep all the units busy if reading fp16 ×4 values, but it may run out of bandwidth to feed all units if using fp32 ×4 values exclusively. NVIDIA’s compiler technology is smart enough to reduce this effect’s impact substantially, but fp16 intermediate values are never slower than fp32 values; because of the resource restrictions and the fp16 normalize hardware, they can often be much faster. 30.6 Conclusion GeForce 6 Series GPUs provide the GPU programmer with unparalleled flexibility and performance in a product line that spans the entire PC market. After reading this chap- ter, you should have a better understanding of what GeForce 6 Series GPUs are capable of, and you should be able to use this knowledge to develop applications—either graphical or general purpose—in a more efficient way. 30.6 Conclusion 491 430_gems2_ch30_new.qxp 1/31/2005 6:58 PM Page 491 Excerpted from GPU Gems 2 Copyright 2005 by NVIDIA Corporation
Copyright © NVIDIA Corporation 2004 GPU Gems 2 GPU Gems 2 Programming Techniques for HighProgramming Techniques for High--Performance Performance Graphics and GeneralGraphics and General--Purpose ComputationPurpose Computation 880 full-color pages, 330 figures Hard cover $59.99 Available at GDC 2005 (March 7, 2005) Experts from universities and industry Geometric Complexity Shading, Lighting, and Shadows High-Quality Rendering General Purpose Computation on GPUs: A Primer Image-Oriented Computing Simulation and Numerical Algorithms Graphics ProgrammingGraphics ProgrammingGPGPU ProgrammingGPGPU Programming Sign up for e-mail notification when the book is available at: http://developer.nvidia.com/object/gpu_gems_2_notification.html For more information, please visit: http://developer.nvidia.com/object/gpu_gems_2_home.html