accelerate-cuda-0.12.0.0: Accelerate backend for NVIDIA GPUs

Portabilitynon-portable (GHC extensions)
Stabilityexperimental
MaintainerTrevor L. McDonell <tmcdonell@cse.unsw.edu.au>
Safe HaskellNone

Data.Array.Accelerate.CUDA

Contents

Description

This module implements the CUDA backend for the embedded array language Accelerate. Expressions are on-line translated into CUDA code, compiled, and executed in parallel on the GPU.

The accelerate-cuda library is hosted at https://github.com/tmcdonell/accelerate-cuda. Comments, bug reports, and patches are always welcome.

Synopsis

Documentation

class (Typeable (ArrRepr a), Typeable (ArrRepr' a), Typeable a) => Arrays a

Instances

Arrays () 
(Arrays b, Arrays a) => Arrays (b, a) 
(Shape sh, Elt e) => Arrays (Array sh e) 
(Arrays c, Arrays b, Arrays a) => Arrays (c, b, a) 
(Arrays d, Arrays c, Arrays b, Arrays a) => Arrays (d, c, b, a) 
(Arrays e, Arrays d, Arrays c, Arrays b, Arrays a) => Arrays (e, d, c, b, a) 
(Arrays f, Arrays e, Arrays d, Arrays c, Arrays b, Arrays a) => Arrays (f, e, d, c, b, a) 
(Arrays g, Arrays f, Arrays e, Arrays d, Arrays c, Arrays b, Arrays a) => Arrays (g, f, e, d, c, b, a) 
(Arrays h, Arrays g, Arrays f, Arrays e, Arrays d, Arrays c, Arrays b, Arrays a) => Arrays (h, g, f, e, d, c, b, a) 
(Arrays i, Arrays h, Arrays g, Arrays f, Arrays e, Arrays d, Arrays c, Arrays b, Arrays a) => Arrays (i, h, g, f, e, d, c, b, a) 

Synchronous execution

run :: Arrays a => Acc a -> a

Compile and run a complete embedded array program using the CUDA backend. This will select the fastest device available on which to execute computations, based on compute capability and estimated maximum GFLOPS.

run1 :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> a -> b

Prepare and execute an embedded array program of one argument.

This function can be used to improve performance in cases where the array program is constant between invocations, because it allows us to bypass all front-end conversion stages and move directly to the execution phase. If you have a computation applied repeatedly to different input data, use this.

See the Crystal demo, part of the 'accelerate-examples' package, for an example.

stream :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> [a] -> [b]

Stream a lazily read list of input arrays through the given program, collecting results as we go.

runIn :: Arrays a => Context -> Acc a -> a

As run, but execute using the specified device context rather than using the default, automatically selected device.

Contexts passed to this function may all refer to the same device, or to separate devices of differing compute capabilities.

Note that each thread has a stack of current contexts, and calling create pushes the new context on top of the stack and makes it current with the calling thread. You should call pop to make the context floating before passing it to runIn, which will make it current for the duration of evaluating the expression. See the CUDA C Programming Guide (G.1) for more information.

run1In :: (Arrays a, Arrays b) => Context -> (Acc a -> Acc b) -> a -> b

As run1, but execute in the specified context.

streamIn :: (Arrays a, Arrays b) => Context -> (Acc a -> Acc b) -> [a] -> [b]

As stream, but execute in the specified context.

Asynchronous execution

data Async a

wait :: Async a -> IO a

Block the calling thread until the computation completes, then return the result.

poll :: Async a -> IO (Maybe a)

Test whether the asynchronous computation has already completed. If so, return the result, else Nothing.

cancel :: Async a -> IO ()

Cancel a running asynchronous computation.

runAsync :: Arrays a => Acc a -> Async a

As run, but allow the computation to continue running in a thread and return immediately without waiting for the result. The status of the computation can be queried using wait, poll, and cancel.

Note that a CUDA Context can only be active no one host thread at a time. If you want to execute multiple computations in parallel, use runAsyncIn.

run1Async :: (Arrays a, Arrays b) => (Acc a -> Acc b) -> a -> Async b

As run1, but the computation is executed asynchronously.

runAsyncIn :: Arrays a => Context -> Acc a -> Async a

As runIn, but execute asynchronously. Be sure not to destroy the context, or attempt to attach it to a different host thread, before all outstanding operations have completed.

run1AsyncIn :: (Arrays a, Arrays b) => Context -> (Acc a -> Acc b) -> a -> Async b

As run1In, but execute asynchronously.