Gpgpu

Friday, 26 October 2012

Wii U's GPGPU Squashes Xbox 360, PS3; Capable Of DirectX 11 Equivalent Graphics

Introduction to GPGPU Development using CUDA - Rob Gillen

Nintendo Direct Wii U Preview 9.13.2012 (Japan) [Part 1]

rCUDA 4: GPGPU as a Service in HPC Clusters

GPGPU Cloth simulation using GLSL, OpenCL and CUDA

Real-Time 3D Face Reproduction Using GPGPU : DigInfo

GPGPU fully procedural planetary terrain demo

AFDS 2012: Can GPGPU Programming be Liberated from the Data-Parallel Bottleneck

Open Collage in a new Window

Email the Collage

Loading suggestions ...

Nvidia
Nvidia ( ) is a multinational corporation which specializes in the development of graphics processing units and chipset technologies for workstations, personal computers, and mobile devices. Based in Santa Clara, California, the company has become a major supplier of integrated circuits (ICs), designing graphics processing units (GPUs) and chipsets used in graphics cards, in personal-computer motherboards, and in video game consoles.
http://wn.com/Nvidia
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university located in Stanford, California, United States. The university is located on an campus in northwestern Santa Clara Valley approximately southeast of San Francisco and approximately northwest of San Jose.
http://wn.com/Stanford_University

GPGPU 1

...

published: 09 Feb 2010

Wii U's GPGPU Squashes Xbox 360, PS3; Capable Of DirectX 11 Equivalent Graphics

...

published: 20 Sep 2012

Introduction to GPGPU Development using CUDA - Rob Gillen

...

published: 14 Jul 2011

Nintendo Direct Wii U Preview 9.13.2012 (Japan) [Part 1]

...

published: 13 Sep 2012

rCUDA 4: GPGPU as a Service in HPC Clusters

...

published: 16 Sep 2012

GPGPU Cloth simulation using GLSL, OpenCL and CUDA

...

published: 12 Jan 2012

Real-Time 3D Face Reproduction Using GPGPU : DigInfo

...

published: 13 Dec 2010

GPGPU fully procedural planetary terrain demo

...

published: 09 Jun 2012

AFDS 2012: Can GPGPU Programming be Liberated from the Data-Parallel Bottleneck

...

published: 27 Sep 2012

GPGPU Compute On AMD

...

published: 04 May 2011

WII U GPGPU AND MUCH MORE

...

published: 28 Sep 2012

GPU コンピューティング入門

...

published: 16 Mar 2012

GPGPU Development for Financial Services using NVIDIA Parallel Nsight™ and Microsoft Visual Studio

...

published: 26 Jan 2011

GPGPU: From 3D games to supercomputing (by Taras Shpot)

...

published: 22 Feb 2012

<li class="playlistitemli thumbnail border_color">
        <div class="video_facebook">
            <a href="http://web.archive.org./web/20121026162408/http://www.facebook.com/sharer.php?u=http://wn.com/gpgpu&t=<%= title %>" title="Share this video" data-title="Share this video" target="_blank" style="target-name: new; target-new: tab;"></a>
        </div>

Photo: Creative Commons / Neogeolegend

Kim Jong-il Funeral ceremony .Kim Jong-il died of a suspected heart attack on 17 December 2011 at 08:30 while traveling by train to an area outside Pyongyang.

N. Korean Army General Executed with Mortar for Drinking

Novosti

25 Oct 2012

MOSCOW, October 25 (RIA Novosti) - A high-ranking North Korean Army official was reportedly executed with a mortar round for drinking liquor during the 100-day...

Chat

Photo: White House / Pete Souza

File - President Barack Obama disembarks Air Force One upon arrival at Joplin Regional Airport in Joplin, Missouri, May 21, 2012.

Did Obama Just Play His Iran Card?

WorldNews.com

24 Oct 2012

Article by WN.com Correspondent Dallas Darling By the time President Richard Nixon told his Chinese hosts during a banquet toast, "Let us start a long march together, not in lockstep, but on different...

Chat

Photo: AP / Khin Maung Win

Myanmar Buddhist monks march, protesting against plans to open office of the Organization of the Islamic Conference (OIC), in Yangon, Myanmar, Monday, Oct. 15, 2012.

Muslims flee Myanmar as unrest continues

New Straits/Business Times

25 Oct 2012

YANGON: A new wave of sectarian violence in western Myanmar has left five people dead and dozens injured in recent days, triggering another exodus of Muslims to emergency camps, officials said...

Chat

Photo: AP / Collin Reid

Waves, brought by Hurricane Sandy, crash on a house in the Caribbean Terrace neighborhood in eastern Kingston, Jamaica, Wednesday, Oct. 24, 2012.

Hurricane Sandy growing stronger as it nears Cuba

Yahoo Daily News

25 Oct 2012

HAVANA (Reuters) - Hurricane Sandy, growing stronger over warm Caribbean waters, lashed eastern Cuba with heavy rains and rising winds on Wednesday as it bore down on the communist island after...

Chat

Photo: U.S. Army Courtesy

Soldiers of the 6th Battalion, 2nd Brigade, 203rd Afghan National Army Corp and Soldiers of 1st Squadron, 4th Cavalry Regiment, 4th Infantry Brigade Combat Team, 1st Infantry Division, conducted dismounted patrol to an outpost in Paktika Province, Afghanistan, June 21, 2012.

Whom Surges Destroy They First Make Mad

WorldNews.com

23 Oct 2012

Article by WN.com Correspondent Dallas Darling As the two armies faced each other on that fateful day, a day in which perhaps the die had already been cast, a ground swell of experienced and newly...

Chat

Timeline:
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001

Apple says no Java for you, removes it from OS X browsers 19 Oct 2012

Engadget Apple has recently released a Mac update for OS X Lion and Mountain Lion that removes its Java plugin from all OS X browsers. If you install the update, you'll find a region labeled "Missing plug-in" in place of a Java applet; of course, Apple can't stop you from clicking on it to download a Java...(size: 15.1Kb)

timeline:
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001

General-purpose computing on graphics processing units (GPGPU, GPGP or less often GP²U) is the means of using a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU).^[1]^[2]^[3] Any GPU providing a functionally complete set of operations performed on arbitrary bits can compute any computable value. Additionally, the use of multiple graphics cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.^[4]

Programmability [link]

In principle, any boolean function can be built-up from a functionally complete set of logic operators. An early example of general purpose computing with a GPU involved performing additions by using an early stream processor called a blitter to invoke a special sequence of logical operations on bit vectors.^[5] Such methods are seldom used today as modern GPUs now include support for more advanced mathematical operations including addition, multiplication, and often certain transcendental functions.

The programmability of the pipelines have trended^{[clarification needed]} according to Microsoft’s DirectX specification,^{[citation needed]} with DirectX 8 introducing Shader Model 1.1, DirectX 8.1 Pixel Shader Models 1.2, 1.3 and 1.4, and DirectX 9 defining Shader Model 2.x and 3.0. Each shader model increased the programming model flexibilities and capabilities, ensuring the conforming hardware follows suit. The DirectX 10 specification introduces Shader Model 4.0 which unifies the programming specification for vertex, geometry (“Geometry Shaders” are new to DirectX 10) and fragment processing allowing for a better fit for unified shader hardware, thus providing one computational pool of programmable resource.^[vague]

Data types [link]

Pre-DirectX 9 graphics cards only supported paletted or integer color types. Various formats are available, each containing a red element, a green element, and a blue element.^{[citation needed]} Sometimes an additional alpha value is added, to be used for transparency. Common formats are:

8 bits per pixel – Sometimes palette mode, where each value is an index in a table with the real color value specified in one of the other formats. Sometimes two bits for red, three bits for green, and three bits for blue.
16 bits per pixel – Usually allocated as five bits for red, six bits for green, and five bits for blue.
24 bits per pixel – eight bits for each of red, green, and blue
32 bits per pixel – eight bits for each of red, green, blue, and alpha

For early fixed-function or limited programmability graphics (i.e. up to and including DirectX 8.1-compliant GPUs) this was sufficient because this is also the representation used in displays. This representation does have certain limitations, however. Given sufficient graphics processing power even graphics programmers would like to use better formats, such as floating point data formats, to obtain effects such as high dynamic range imaging. Many GPGPU applications require floating point accuracy, which came with graphics cards conforming to the DirectX 9 specification.

DirectX 9 Shader Model 2.x suggested the support of two precision types: full and partial precision. Full precision support could either be FP32 or FP24 (floating point 32- or 24-bit per component) or greater, while partial precision was FP16. ATI’s R300 series of GPUs supported FP24 precision only in the programmable fragment pipeline (although FP32 was supported in the vertex processors) while Nvidia’s NV30 series supported both FP16 and FP32; other vendors such as S3 Graphics and XGI supported a mixture of formats up to FP24.

Shader Model 3.0 altered the specification, increasing full precision requirements to a minimum of FP32 support in the fragment pipeline. ATI’s Shader Model 3.0 compliant R5xx generation (Radeon X1000 series) supports just FP32 throughout the pipeline while Nvidia’s NV4x and G7x series continued to support both FP32 full precision and FP16 partial precisions. Although not stipulated by Shader Model 3.0, both ATI and Nvidia’s Shader Model 3.0 GPUs introduced support for blendable FP16 render targets, more easily facilitating the support for High Dynamic Range Rendering.^{[citation needed]}

The implementations of floating point on Nvidia GPUs are mostly IEEE compliant; however, this is not true across all vendors.^[6] This has implications for correctness which are considered important to some scientific applications. While 64-bit floating point values (double precision float) are commonly available on CPUs, these are not universally supported on GPUs; some GPU architectures sacrifice IEEE compliance while others lack double-precision altogether. There have been efforts to emulate double-precision floating point values on GPUs; however, the speed tradeoff negates any benefit to offloading the computation onto the GPU in the first place.^[7]

Most operations on the GPU operate in a vectorized fashion: one operation can be performed on up to four values at once. For instance, if one color <R1, G1, B1> is to be modulated by another color <R2, G2, B2>, the GPU can produce the resulting color <R1*R2, G1*G2, B1*B2> in one operation. This functionality is useful in graphics because almost every basic data type is a vector (either 2-, 3-, or 4-dimensional). Examples include vertices, colors, normal vectors, and texture coordinates. Many other applications can put this to good use, and because of their higher performance, vector instructions (SIMD) have long been available on CPUs.

In 2002, James Fung et al developed OpenVIDIA at University of Toronto, and demonstrated this work, which was later published in 2003, 2004, and 2005,^[8] in conjunction with a collaboration between University of Toronto and nVIDIA. In November 2006 Nvidia launched CUDA, an SDK and API that allows using the C programming language to code algorithms for execution on Geforce 8 series GPUs. OpenCL, an open standard defined by the Khronos Group^[9] provides a cross-platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is actively supported on Intel, AMD, Nvidia and ARM platforms. GPGPU compared, for example, to traditional floating point accelerators such as the 64-bit CSX700 boards from ClearSpeed that are used in today's supercomputers, current top-end GPUs from AMD and Nvidia emphasize single-precision (32-bit) computation; double-precision (64-bit) computation executes more slowly.^{[citation needed]}

GPGPU programming concepts [link]

GPUs are designed specifically for graphics and thus are very restrictive in operations and programming. Because of their nature, GPUs are only effective for problems that can be solved using stream processing and the hardware can only be used in certain ways.

Stream processing [link]

Main article: Stream processing

GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running one kernel on many records in a stream at once.

A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them. Since GPUs process elements independently there is no way to have shared or static data. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.^[vague]

Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup.^[10]

Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.

GPU programming concepts [link]

Computational resources [link]

There are a variety of computational resources available on the GPU:

Programmable processors – Vertex, primitive, and fragment pipelines allow programmer to perform kernel on streams of data
Rasterizer – creates fragments and interpolates per-vertex constants such as texture coordinates and color
Texture Unit – read only memory interface
Framebuffer – write only memory interface

In fact, the programmer can substitute a write only texture for output instead of the framebuffer. This is accomplished either through Render to Texture (RTT), Render-To-Backbuffer-Copy-To-Texture (RTBCTT), or the more recent stream-out.

Textures as stream [link]

The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.

Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.

Kernels [link]

Kernels can be thought of as the body of loops. For example, if the programmer were operating on a grid on the CPU they might have code that looked like this:

// Input and output grids have 10000 x 10000 or 100 million elements.

void transform_10k_by_10k_grid(float in[10000][10000], float *out[10000][10000]) {

 for(int x = 0; x < 10000; x++)
 {
   for(int y = 0; y < 10000; y++)
   {
     // The next line is executed 100 million times
     *out[x][y] = do_some_hard_work(in[x][y]);
   }
 }

}

</source>

On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.

Flow control [link]

In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs.^[11] Conditional writes could be accomplished using a properly crafted series of arithmetic/bit operations, but looping and conditional branching were not possible.

Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various methods, such as static branch resolution, pre-computation, predication, loop splitting,^[12] and Z-cull^[13] can be used to achieve branching when hardware support does not exist.

GPU methods [link]

Map [link]

The map operation simply applies the given function (the kernel) to every element in the stream. A simple example is multiplying each value in the stream by a constant (increasing the brightness of an image). The map operation is simple to implement on the GPU. The programmer generates a fragment for each pixel on screen and applies a fragment program to each one. The result stream of the same size is stored in the output buffer.

Reduce [link]

Some computations require calculating a smaller stream (possibly a stream of only 1 element) from a larger stream. This is called a reduction of the stream. Generally a reduction can be accomplished in multiple steps. The results from the prior step are used as the input for the current step and the range over which the operation is applied is reduced until only one stream element remains.

Stream filtering [link]

Stream filtering is essentially a non-uniform reduction. Filtering involves removing items from the stream based on some criteria.

Scatter [link]

The scatter operation is most naturally defined on the vertex processor. The vertex processor is able to adjust the position of the vertex, which allows the programmer to control where information is deposited on the grid. Other extensions are also possible, such as controlling how large an area the vertex affects.

The fragment processor cannot perform a direct scatter operation because the location of each fragment on the grid is fixed at the time of the fragment's creation and cannot be altered by the programmer. However, a logical scatter operation may sometimes be recast or implemented with an additional gather step. A scatter implementation would first emit both an output value and an output address. An immediately following gather operation uses address comparisons to see whether the output value maps to the current output slot.

Gather [link]

The fragment processor is able to read textures in a random access fashion, so it can gather information from any grid cell, or multiple grid cells, as desired.^[vague]

Sort [link]

The sort operation transforms an unordered set of elements into an ordered set of elements. The most common implementation on GPUs is using sorting networks.^[13]

Search [link]

The search operation allows the programmer to find a particular element within the stream, or possibly find neighbors of a specified element. The GPU is not used to speed up the search for an individual element, but instead is used to run multiple searches in parallel.^{[citation needed]}

Data structures [link]

A variety of data structures can be represented on the GPU:

Dense arrays
Sparse arrays – static or dynamic
Adaptive structures

Applications [link]

The following are some of the areas where GPUs have been used for general purpose computing:

MATLAB acceleration using the Parallel Computing Toolbox and MATLAB Distributed Computing Server,^[14] as well as 3rd party packages like Jacket.
k-nearest neighbor algorithm^[15]
Computer clusters or a variation of a parallel computing (utilizing GPU cluster technology) for highly calculation-intensive tasks:
- High-performance computing clusters (HPC clusters) (often termed supercomputers)
  - including cluster technologies like Message Passing Interface, and single-system image (SSI), distributed computing, and Beowulf
- Grid computing (a form of distributed computing) (networking many heterogeneous computers to create a virtual computer architecture)
- Load-balancing clusters (sometimes termed a server farm)
Physical based simulation and physics engines (usually based on Newtonian physics models)
- Conway's Game of Life, cloth simulation, incompressible fluid flow by solution of Navier-Stokes equations
Statistical physics
- Ising model
Lattice gauge theory
Segmentation – 2D and 3D
Level-set methods
CT reconstruction
Fast Fourier transform
Tone mapping
Audio signal processing
- Audio and Sound Effects Processing, to use a GPU for DSP (digital signal processing)
- Analog signal processing
- Speech processing
Digital image processing
Video Processing^[16]
- Hardware accelerated video decoding and post-processing
  - Motion compensation (mo comp)
  - Inverse discrete cosine transform (iDCT)
  - Variable-length decoding (VLD)
  - Inverse quantization (IQ)
  - In-loop deblocking
  - Bitstream processing (CAVLC/CABAC) using special purpose hardware for this task because this is a serial task not suitable for regular GPGPU computation
  - Deinterlacing
    - Spatial-temporal de-interlacing
  - Noise reduction
  - Edge enhancement
  - Color correction
- Hardware accelerated video encoding and pre-processing
Global illumination – ray tracing, photon mapping, radiosity among others, subsurface scattering
Geometric computing – constructive solid geometry, distance fields, collision detection, transparency computation, shadow generation
Scientific computing
- Monte Carlo simulation of light propagation^[17]
- Weather forecasting
- Climate research
- Molecular modeling on GPU
- Quantum mechanical physics
- Astrophysics^[18]
Bioinformatics^[19]^[20]
Computational finance
Medical imaging
Computer vision
Digital signal processing / signal processing
Control engineering
Neural networks
Database operations^[21]
Lattice Boltzmann methods
Cryptography and cryptanalysis
- Implementation of MD6
- Implementation of AES^[22]^[23]
- Implementation of DES
- Implementation of RSA^[24]
- Implementation of ECC
- Password cracking^[25]^[26]
Electronic Design Automation^[27]^[28]^[29]
Antivirus software^[30]^[31]
Intrusion Detection^[32]^[33]
Bitcoin peer-to-peer currency relies on a distributed computing network for performing SHA256 calculations where GPGPUs have become the dominant mode of calculation

References [link]

^ Fung etal, "Mediated Reality Using Computer Graphics Hardware for Computer Vision", Proceedings of the International Symposium on Wearable Computing 2002 (ISWC2002), Seattle, Washington, USA, 7-10 Oct 2002, pp. 83--89.
^ An EyeTap video-based featureless projective motion estimation assisted by gyroscopic tracking for wearable computer mediated reality, ACM Personal and Ubiquitous Computing published by Springer Verlag, Vol.7, Iss. 3, 2003.
^ "Computer Vision Signal Processing on Graphics Processing Units", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004): Montreal, Quebec, Canada, 17–21 May 2004, pp. V-93 - V-96
^ "Using Multiple Graphics Cards as a General Purpose Parallel Computer: Applications to Computer Vision", Proceedings of the 17th International Conference on Pattern Recognition (ICPR2004), Cambridge, United Kingdom, 23–26 August 2004, volume 1, pages 805-808.
^ Hull, Gerald (December 1987). "LIFE". Amazing Computing 2 (12): 81–84. http://www.archive.org/stream/amazing-computing-magazine-1987-12/Amazing_Computing_Vol_02_12_1987_Dec#page/n81/mode/2up.
^ Mapping computational concepts to GPUs: Mark Harris. Mapping computational concepts to GPUs. In ACM SIGGRAPH 2005 Courses (Los Angeles, California, 31 July – 4 August 2005). J. Fujii, Ed. SIGGRAPH '05. ACM Press, New York, NY, 50.
^ Double precision on GPUs (Proceedings of ASIM 2005): Dominik Goddeke, Robert Strzodka, and Stefan Turek. Accelerating Double Precision (FEM) Simulations with (GPUs). Proceedings of ASIM 2005 – 18th Symposium on Simulation Technique, 2005.
^ James Fung, Steve Mann, Chris Aimone, "OpenVIDIA: Parallel GPU Computer Vision", Proceedings of the ACM Multimedia 2005, Singapore, 6-11 Nov. 2005, pages 849-852
^ [1]:OpenCL at the Khronos Group
^ Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52(10) (2009) 56–67
^ GPU Gems - Chapter 34, GPU Flow-Control Idioms
^ [2]: Future Chips. "Tutorial on removing branches", 2011
^ ^a ^b GPGPU survey paper: John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, and Tim Purcell. "A Survey of General-Purpose Computation on Graphics Hardware". Computer Graphics Forum, volume 26, number 1, 2007, pp. 80-113.
^ "MATLAB Adds GPGPU Support". 20 September 2010. http://www.hpcwire.com/features/MATLAB-Adds-GPGPU-Support-103307084.html.
^ Fast k nearest neighbor search using GPU. In Proceedings of the CVPR Workshop on Computer Vision on GPU, Anchorage, Alaska, USA, June 2008. V. Garcia and E. Debreuve and M. Barlaud.
^ Wilson, Ron (3 September 2009). "DSP brings you a high-definition moon walk". EDN. http://www.edn.com/article/CA6685974.html. Retrieved 3 September 2009. "Lowry is reportedly using Nvidia Tesla GPUs (graphics-processing units) programmed in the company's CUDA (Compute Unified Device Architecture) to implement the algorithms. Nvidia claims that the GPUs are approximately two orders of magnitude faster than CPU computations, reducing the processing time to less than one minute per frame."
^ E. Alerstam, T. Svensson & S. Andersson-Engels, "Parallel computing with graphics processing units for high speed Monte Carlo simulation of photon migration" [3], J. Biomedical Optics 13, 060504 (2008) [4]
^ http://www.astro.lu.se/compugpu2010/
^ Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A. (2007) High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics 8:474.
^ Svetlin A. Manavski, Giorgio Valle (2008). "CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment url=http://www.biomedcentral.com/1471-2105/9/S2/S10". BMC Bioinformatics 9 (Suppl. 2): S10.
^ GPU-based Sorting in PostgreSQL Naju Mancheril, School of Computer Science - Carnegie Mellon University
^ AES on SM3.0 compliant GPUs. Owen Harrison, John Waldron, AES Encryption Implementation and Analysis on Commodity Graphics Processing Units. In proceedings of CHES 2007.
^ AES and modes of operations on SM4.0 compliant GPUs. Owen Harrison, John Waldron, Practical Symmetric Key Cryptography on Modern Graphics Hardware. In proceedings of USENIX Security 2008.
^ RSA on SM4.0 compliant GPUs. Owen Harrison, John Waldron, Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware. In proceedings of AfricaCrypt 2009.
^ "Teraflop Troubles: The Power of Graphics Processing Units May Threaten the World's Password Security System". Georgia Tech Research Institute. http://www.gtri.gatech.edu/casestudy/Teraflop-Troubles-Power-Graphics-Processing-Units-GPUs-Password-Security-System. Retrieved 7 November 2010.
^ "Want to deter hackers? Make your password longer". MSNBC. 19 August 2010. http://www.msnbc.msn.com/id/38771772/. Retrieved 7 November 2010.
^ Lerner, Larry (9 April 2009). "Viewpoint: Mass GPUs, not CPUs for EDA simulations". EE Times. http://www.eetimes.com/news/design/showArticle.jhtml?articleID=216500149. Retrieved 3 May 2009
^ "GPU-Accelerated Time-Domain Circuit Simulation Paper at CICC". Signal Integrity. Agilent Technologies, Inc.. 2 September 2009. http://signal-integrity.tm.agilent.com/2009/gpu-accelerated-time-domain-circuit-simulation-paper-at-cicc/?utm_source=wikipedia-gpgpu&utm_medium=wiki&utm_campaign=2009-09-gpu. Retrieved 3 September 2009.
^ "W2500 ADS Transient Convolution GT". http://www.home.agilent.com/agilent/redirector.jspx?action=ref&cname=PRODUCT&ckey=1582838&cc=US&lc=eng&cmpid=29280. "accelerates signal integrity simulations on workstations that have NVIDIA Compute Unified Device Architecture (CUDA)-based Graphics Processing Units (GPU)"
^ GrAVity: A Massively Parallel Antivirus Engine. Giorgos Vasiliadis and Sotiris Ioannidis, GrAVity: A Massively Parallel Antivirus Engine. In proceedings of RAID 2010.
^ "Kaspersky Lab utilizes NVIDIA technologies to enhance protection". 14 December 2009. http://www.kaspersky.com/news?id=207575979. "During internal testing, the Tesla S1070 demonstrated a 360-fold increase in the speed of the similarity-defining algorithm when compared to the popular Intel Core 2 Duo central processor running at a clock speed of 2.6 GHz."
^ Gnort: High Performance Network Intrusion Detection Using Graphics Processors. Giorgos Vasiliadis et al, Gnort: High Performance Network Intrusion Detection Using Graphics Processors. In proceedings of RAID 2008.
^ Regular Expression Matching on Graphics Hardware for Intrusion Detection. Giorgos Vasiliadis et al, Regular Expression Matching on Graphics Hardware for Intrusion Detection. In proceedings of RAID 2009.
^ Open HMPP

External links [link]

openhmpp.org - New Open Standard for Many-Core
OCLTools Open Source OpenCL Compiler and Linker
GPGPU.org - General-Purpose Computation Using Graphics Hardware
GPGPU Wiki
SIGGRAPH 2005 GPGPU Course Notes
IEEE VIS 2005 GPGPU Course Notes
NVIDIA Developer Zone
AMD GPU Tools
CPU vs. GPGPU
What is GPU Computing?
Tech Report article: "ATI stakes claims on physics, GPGPU ground" by Scott Wasson
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model - porting a standard model to GPU hardware
GPGPU Computing @ Duke Statistical Science
GPGPU Programming in F# using the Microsoft Research Accelerator system.
GPGPU Review, Tobias Preis, European Physical Journal Special Topics 194, 87-119 (2011)

CPU technologies

Architecture

Parallelism

Pipeline	Instruction pipelining In-order & out-of-order execution Register renaming Speculative execution Hazards

Level	Bit Instruction Superscalar Data Task

Threads	Multithreading Simultaneous multithreading Hyperthreading Superthreading

Flynn's taxonomy	SISD SIMD MISD MIMD

Types

Components

Power management

v t e Parallel computing

General	Cloud computing High-performance computing Cluster computing Distributed computing Grid computing

Levels	Bit Instruction Data Task

Threads	Superthreading Hyperthreading

Theory	Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric slowdown speedup

Elements	Process Thread Fiber PRAM Instruction window

Coordination	Multiprocessing Multithreading (computer architecture) Memory coherency Cache coherency Cache invalidation Barrier Synchronization Application checkpointing

Programming	Models Implicit parallelism Explicit parallelism Concurrency Flynn's taxonomy SISD SIMD MISD MIMD SPMD Thread (computer science) Non-blocking algorithm

Hardware	Multiprocessor Symmetric Asymmetric Memory NUMA COMA distributed shared distributed shared SMT MPP Superscalar Vector processor Supercomputer Beowulf

APIs	Ateji PX POSIX Threads OpenMP OpenHMPP PVM MPI UPC Intel Threading Building Blocks Boost.Thread Global Arrays Charm++ Cilk Co-array Fortran OpenCL CUDA Dryad

Problems	Embarrassingly parallel Grand Challenge Software lockout Scalability Race conditions Deadlock Livelock Deterministic algorithm Parallel slowdown

Category Commons

Emerging technologies

Technology

Fields

Agriculture	Agricultural robot In vitro meat Genetically modified food Precision agriculture Vertical farming

Biomedical	Ampakine Cryonics Full genome sequencing Genetic engineering Gene therapy Personalized medicine Regenerative medicine Stem cell treatments Tissue engineering Robotic surgery Strategies for Engineered Negligible Senescence Suspended animation Synthetic biology Synthetic genomics Whole-body transplant Head transplant Isolated brain

Displays	Autostereoscopy Holographic display Next generation of display technology Screenless display Bionic contact lens Head-mounted display Head-up display Virtual retinal display Ultra High Definition Television

Electronics	Electronic nose Electronic textile Flexible electronics Memristor Spintronics Thermal copper pillar bump

Energy	Energy storage Beltway battery Compressed air energy storage Flywheel energy storage Grid energy storage Lithium air battery Molten salt battery Nanowire battery Silicon air battery Thermal energy storage Ultracapacitor Fusion power Molten salt reactor Renewable energy Airborne wind turbine Artificial photosynthesis Biofuels Concentrated solar power Home fuel cell Hydrogen economy Nantenna Solar roadway Space-based solar power Smart grid Wireless energy transfer

IT and communications	Artificial intelligence Applications of artificial intelligence Progress in artificial intelligence Machine translation Machine vision Semantic Web Speech recognition Atomtronics Cybermethodology Fourth-generation optical discs 3D optical data storage Holographic data storage GPGPU Memory CBRAM FRAM Millipede MRAM NRAM PRAM Racetrack memory RRAM SONOS Optical computing Quantum computing Quantum cryptography RFID Three-dimensional integrated circuit

Manufacturing	3D printing Contour Crafting Claytronics Molecular assembler Utility fog

Materials science	Graphene High-temperature superconductivity High-temperature superfluidity Metamaterials Metamaterial cloaking Multi-function structures Nanotechnology Carbon nanotubes Molecular nanotechnology Nanomaterials Programmable matter Quantum dots

Military	Antimatter weapon Directed-energy weapon Laser Maser Particle beam weapon Sonic weapon Electromagnetic weapon Coilgun Railgun Plasma weapon Pure fusion weapon

Neuroscience	Artificial brain Blue Brain Project Electroencephalography Mind uploading Brain-reading Neuroinformatics Neuroprosthetics Bionic eye Brain implant Exocortex Retinal implant

Robotics	Nanorobotics Powered exoskeleton Self-reconfiguring modular robot Swarm robotics

Transport	Adaptive Compliant Wing Alternative fuel vehicle Hydrogen vehicle Backpack helicopter Driverless car Flying car Ground effect train Jet pack Interstellar travel Laser propulsion Maglev train Non-rocket spacelaunch Mass driver Orbital ring Skyhook Space elevator Space fountain Space tether Personal rapid transit ETT Pulse detonation engine Nuclear pulse propulsion Scramjet Solar sail Spaceplane Supersonic transport Tweel Vactrain

Other	Anti-gravity Arcology Cloak of invisibility Digital scent technology Domed city Force field Plasma window Immersive virtual reality Magnetic refrigeration Phased-array optics Quantum technology Quantum teleportation

Other

Category
List

This page contains text from Wikipedia, the Free Encyclopedia - http://en.wikipedia.org/wiki/GPGPU

This article is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License, which means that you can copy and modify it as long as the entire work (including additions) remains under this license.

Branches of Mathematics

Pure mathematics

Applied mathematics

Related Topics

Make changes yourself !

Contents

Programmability [link]

Data types [link]

GPGPU programming concepts [link]

Stream processing [link]

GPU programming concepts [link]

Computational resources [link]

Textures as stream [link]

Kernels [link]

Flow control [link]

GPU methods [link]

Map [link]

Reduce [link]

Stream filtering [link]

Scatter [link]

Gather [link]

Sort [link]

Search [link]

Data structures [link]

Applications [link]

See also [link]

References [link]

External links [link]

http://wn.com/GPGPU

Related pages:

http://ru.wn.com/GPGPU

http://fr.wn.com/General-Purpose Processing on Graphics Processing Units

http://nl.wn.com/GPGPU

http://pt.wn.com/GPGPU

http://de.wn.com/General Purpose Computation on Graphics Processing Unit

http://it.wn.com/GPGPU

http://pl.wn.com/GPGPU

http://es.wn.com/GPGPU