[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]

details ChunkedArray< N, T > Class Template Referenceabstract VIGRA

Interface and base class for chunked arrays. More...

#include <vigra/multi_array_chunked.hxx>

Inheritance diagram for ChunkedArray< N, T >:
ChunkedArrayCompressed< N, T, Alloc > ChunkedArrayFull< N, T, Alloc > ChunkedArrayHDF5< N, T, Alloc > ChunkedArrayLazy< N, T, Alloc > ChunkedArrayTmpFile< N, T >

Public Member Functions

std::string backend () const
 Return the class that implements this ChunkedArray.
 
iterator begin ()
 Create a scan-order iterator for the entire chunked array.
 
const_iterator begin () const
 Create a read-only scan-order iterator for the entire chunked array.
 
template<unsigned int M>
MultiArrayView< N-1, T,
ChunkedArrayTag
bind (difference_type_1 index) const
 Create a lower dimensional view to the chunked array. More...
 
MultiArrayView< N-1, T,
ChunkedArrayTag
bindAt (MultiArrayIndex dim, MultiArrayIndex index) const
 Create a lower dimensional view to the chunked array. More...
 
MultiArrayView< N-1, T,
ChunkedArrayTag
bindInner (difference_type_1 index) const
 Create a lower dimensional view to the chunked array. More...
 
template<int M, class Index >
MultiArrayView< N-M, T,
ChunkedArrayTag
bindInner (const TinyVector< Index, M > &d) const
 Create a lower dimensional view to the chunked array. More...
 
MultiArrayView< N-1, T,
ChunkedArrayTag
bindOuter (difference_type_1 index) const
 Create a lower dimensional view to the chunked array. More...
 
template<int M, class Index >
MultiArrayView< N-M, T,
ChunkedArrayTag
bindOuter (const TinyVector< Index, M > &d) const
 Create a lower dimensional view to the chunked array. More...
 
std::size_t cacheMaxSize () const
 Get the number of chunks the cache will hold. More...
 
int cacheSize () const
 Number of chunks currently fitting into the cache.
 
const_iterator cbegin () const
 Create a read-only scan-order iterator for the entire chunked array.
 
const_iterator cend () const
 Create the end iterator for read-only scan-order iteration over the entire chunked array.
 
template<class U , class Stride >
void checkoutSubarray (shape_type const &start, MultiArrayView< N, U, Stride > &subarray) const
 Copy an ROI of the chunked array into an ordinary MultiArrayView. More...
 
chunk_iterator chunk_begin (shape_type const &start, shape_type const &stop)
 Create an iterator over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_begin (shape_type const &start, shape_type const &stop) const
 Create a read-only iterator over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_cbegin (shape_type const &start, shape_type const &stop) const
 Create a read-only iterator over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_cend (shape_type const &start, shape_type const &stop) const
 Create the end iterator for read-only iteration over all chunks intersected by the given ROI.
 
chunk_iterator chunk_end (shape_type const &start, shape_type const &stop)
 Create the end iterator for iteration over all chunks intersected by the given ROI.
 
chunk_const_iterator chunk_end (shape_type const &start, shape_type const &stop) const
 Create the end iterator for read-only iteration over all chunks intersected by the given ROI.
 
virtual shape_type chunkArrayShape () const
 Number of chunks along each coordinate direction.
 
shape_type chunkShape (shape_type const &chunk_index) const
 Find the shape of the chunk indexed by 'chunk_index'. More...
 
shape_type const & chunkShape () const
 Return the global chunk shape. More...
 
shape_type chunkStart (shape_type const &global_start) const
 Find the chunk that contains array element 'global_start'.
 
shape_type chunkStop (shape_type global_stop) const
 Find the chunk that is beyond array element 'global_stop'. More...
 
template<class U , class Stride >
void commitSubarray (shape_type const &start, MultiArrayView< N, U, Stride > const &subarray)
 Copy an ordinary MultiArrayView into an ROI of the chunked array. More...
 
const_view_type const_subarray (shape_type const &start, shape_type const &stop) const
 Create a read-only view to the specified ROI. More...
 
std::size_t dataBytes () const
 Bytes of main memory occupied by the array's data. More...
 
std::size_t dataBytesPerChunk () const
 Number of data bytes in an uncompressed chunk.
 
iterator end ()
 Create the end iterator for scan-order iteration over the entire chunked array.
 
const_iterator end () const
 Create the end iterator for read-only scan-order iteration over the entire chunked array.
 
value_type getItem (shape_type const &point) const
 Read the array element at index 'point'. More...
 
bool isInside (shape_type const &p) const
 Check if the given point is in the array domain.
 
template<class U , class C1 >
bool operator!= (MultiArrayView< N, U, C1 > const &rhs) const
 Check if two arrays differ in at least one element.
 
template<class U , class C1 >
bool operator== (MultiArrayView< N, U, C1 > const &rhs) const
 Check if two arrays are elementwise equal.
 
std::size_t overheadBytes () const
 Bytes of main memory needed to manage the chunked storage.
 
virtual std::size_t overheadBytesPerChunk () const =0
 Bytes of main memory needed to manage a single chunk.
 
void releaseChunks (shape_type const &start, shape_type const &stop, bool destroy=false)
 
void setCacheMaxSize (std::size_t c)
 Set the number of chunks the cache will hold. More...
 
void setItem (shape_type const &point, value_type const &v)
 Write the array element at index 'point'. More...
 
shape_type const & shape () const
 Return the shape in this array.
 
MultiArrayIndex size () const
 Return the number of elements in this array.
 
view_type subarray (shape_type const &start, shape_type const &stop)
 Create a view to the specified ROI. More...
 
const_view_type subarray (shape_type const &start, shape_type const &stop) const
 Create a read-only view to the specified ROI. More...
 

Detailed Description

template<unsigned int N, class T>
class vigra::ChunkedArray< N, T >

Interface and base class for chunked arrays.

Very big data arrays (possibly bigger than the available RAM) can only be processed in smaller pieces. To support quick access to these pieces, it is advantegeous to store big arrays in chunks, i.e. as a collection of small rectagular subarrays. The class ChunkedArray encapsulates storage and handling of these chunks and provides various APIs to easily access the data.

#include <vigra/multi_array_chunked.hxx>
Namespace: vigra

Template Parameters
Nthe array dimension
Tthe type of the array elements

(these are the same as in MultiArrayView). The actual way of chunk storage is determined by the derived class the program uses:

You must use these derived classes to construct a chunked array because ChunkedArray itself is an abstract class.

Chunks can be in one of the following states:

In-memory chunks (active and inactive) are placed in a cache. If a chunk transitions from the 'asleep' to the 'active' state, it is added to the cache, and an 'inactive' chunk is removed and sent 'asleep'. If there is no 'inactive' chunk in the cache, the cache size is temporarily increased. All state transitions are thread-safe.

In order to optimize performance, the user should adjust the cache size (via setCacheMaxSize() or ChunkedArrayOptions) so that it can hold all chunks that are frequently needed (e.g. all chunks forming a row of the full array).

Another performance critical parameter is the chunk shape. While the system uses sensible defaults (5122 for 2D arrays, 643 for 3D, 64x64x16x4 for 4D, and 64x64x16x4x4 for 5D), the shape may need to be adjusted via the array's constructor to match the access patterns of the algorithms to be used. For speed reasons, chunk shapes must be powers of 2.

The data in the array can be accessed in several ways. The simplest is via calls to checkoutSubarray() and commitSubarray(): These functions copy an arbitrary subregion of a chunked array (possibly straddling many chunks) into a standard MultiArrayView for processing, and write results back into the chunked array:

ChunkedArray<3, float> & chunked_array = ...;
Shape3 roi_start(1000, 500, 500);
MultiArray<3, float> work_array(Shape3(100, 100, 100));
// copy data from region (1000,500,500)...(1100,600,600)
chunked_array.checkoutSubarray(roi_start, work_array);
... // work phase: process data in work_array as usual
// write results back into chunked_array
chunked_array.commitSubarray(roi_start, work_array);

The required chunks in chunked_array will only be active while the checkout and commit calls are executing. During the work phase, other threads can use the chunked array's cache to checkout or commit different subregions.

Alternatively, one can work directly on the chunk storage. This is most easily achieved by means of chunk iterators:

ChunkedArray<3, float> & chunked_array = ...;
// define the ROI to be processed
Shape3 roi_start(100, 200, 300), roi_end(1000, 2000, 600);
// get a pair of chunk iterators ( = iterators over chunks)
auto chunk = chunked_array.chunk_begin(roi_start, roi_end),
end = chunked_array.chunk_end(roi_start, roi_end);
// iterate over the chunks in the ROI
for(; chunk != end; ++chunk)
{
// get a view to the current chunk's data
// Note: The view actually refers to the intersection of the
// current chunk with the ROI. Thus, chunks which are
// partially outside the ROI are appropriately trimmed.
MultiArrayView<3, float> chunk_view = *chunk;
... // work phase: process data in chunk_view as usual
}

No memory is duplicated in this approach, and only the current chunk needs to be active, so that a small chunk cache is sufficient. The iteration over chunks can be distributed over several threads that process the array data in parallel. The programmer must make sure that write operations to individual elements are synchronized between threads. This is usually achieved by ensuring that the threads are responsible for non-overlapping regions of the output array.

An even simpler method is direct element access via indexing. However, the chunked array has no control over the access order in this case, so it must potentially activate the present chunk upon each access. This is rather expensive and should only be used for debugging:

ChunkedArray<3, float> & chunked_array = ...;
Shape3 index(100, 200, 300);
// access data at coordinate 'index'
chunked_array.setItem(index, chunked_array.getItem(index) + 2.0);

Two additional APIs provide access in a way compatible with an ordinary MultiArrayView. These APIs should be used in functions that are supposed to work unchanged on both ordinary and chunked arrays. The first possibility is the chunked scan-order iterator:

ChunkedArray<3, float> & chunked_array = ...;
// get a pair of scan-order iterators ( = iterators over elements)
auto iter = chunked_array.begin(),
end = chunked_array.end();
// iterate over all array elements
for(; iter != end; ++iter)
{
// access current element
iter = *iter + 2.0;
}

A new chunk must potentially be activated whenever the iterator crosses a chunk boundary. Since the overhead of the activation operation can be amortized over many within-chunk steps, the iteration (excluding the workload within the loop) takes only twice as long as the iteration over an unstrided array using an ordinary StridedScanOrderIterator.

The final possibility is the creation of a MultiArrayView that accesses an arbitrary ROI directly:

ChunkedArray<3, float> & chunked_array = ...;
// define the ROI to be processed
Shape3 roi_start(100, 200, 300), roi_end(1000, 2000, 600);
// create view for ROI
MultiArrayView<3, float, ChunkedArrayTag> view =
chunked_array.subarray(roi_start, roi_stop);
... // work phase: process view like any ordinary MultiArrayView

Similarly, a lower-dimensional view can be created with one of the bind functions. This approach has the advantage that 'view' can be passed to any function which is implemented in terms of MultiArrayViews. However, there are two disadvantages: First, data access in the view requires two steps (first find the chunk, then find the appropriate element in the chunk), which causes the chunked view to be slower than an ordinary MultiArrayView. Second, all chunks intersected by the view must remain active throughout the view's lifetime, which may require a big chunk cache and thus keeps many chunks in memory.

Member Function Documentation

std::size_t dataBytes ( ) const

Bytes of main memory occupied by the array's data.

Compressed chunks are only counted with their compressed size. Chunks swapped out to the hard drive are not counted.

shape_type chunkStop ( shape_type  global_stop) const

Find the chunk that is beyond array element 'global_stop'.

Specifically, this computes

chunkStart(global_stop - shape_type(1)) + shape_type(1)
shape_type chunkShape ( shape_type const &  chunk_index) const

Find the shape of the chunk indexed by 'chunk_index'.

This may differ from the global chunk shape because chunks at the right/lower border of the array may be smaller than usual.

shape_type const& chunkShape ( ) const

Return the global chunk shape.

This is the shape of all chunks that are completely contained in the array's domain.

void releaseChunks ( shape_type const &  start,
shape_type const &  stop,
bool  destroy = false 
)

Sends all chunks asleep which are completely inside the given ROI. If destroy == true and the backend supports destruction (currently: ChunkedArrayLazy and ChunkedArrayCompressed), chunks will be deleted entirely. The chunk's contents after releaseChunks() are undefined. Currently, chunks retain their values when sent asleep, and assume the array's fill_value when deleted, but applications should not rely on this behavior.

void checkoutSubarray ( shape_type const &  start,
MultiArrayView< N, U, Stride > &  subarray 
) const

Copy an ROI of the chunked array into an ordinary MultiArrayView.

The ROI's lower bound is given by 'start', its upper bound (in 'beyond' sense) is 'start + subarray.shape()'. Chunks in the ROI are only activated while the read is in progress.

void commitSubarray ( shape_type const &  start,
MultiArrayView< N, U, Stride > const &  subarray 
)

Copy an ordinary MultiArrayView into an ROI of the chunked array.

The ROI's lower bound is given by 'start', its upper bound (in 'beyond' sense) is 'start + subarray.shape()'. Chunks in the ROI are only activated while the write is in progress.

view_type subarray ( shape_type const &  start,
shape_type const &  stop 
)

Create a view to the specified ROI.

The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.

const_view_type subarray ( shape_type const &  start,
shape_type const &  stop 
) const

Create a read-only view to the specified ROI.

The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.

const_view_type const_subarray ( shape_type const &  start,
shape_type const &  stop 
) const

Create a read-only view to the specified ROI.

The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.

value_type getItem ( shape_type const &  point) const

Read the array element at index 'point'.

Since the corresponding chunk must potentially be activated first, this function may be slow and should mainly be used in debugging.

void setItem ( shape_type const &  point,
value_type const &  v 
)

Write the array element at index 'point'.

Since the corresponding chunk must potentially be activated first, this function may be slow and should mainly be used in debugging.

MultiArrayView<N-1, T, ChunkedArrayTag> bindAt ( MultiArrayIndex  dim,
MultiArrayIndex  index 
) const

Create a lower dimensional view to the chunked array.

Dimension 'dim' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

MultiArrayView<N-1, T, ChunkedArrayTag> bind ( difference_type_1  index) const

Create a lower dimensional view to the chunked array.

Dimension 'M' (given as a template parameter) is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

MultiArrayView<N-1, T, ChunkedArrayTag> bindOuter ( difference_type_1  index) const

Create a lower dimensional view to the chunked array.

Dimension 'N-1' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

MultiArrayView<N-M, T, ChunkedArrayTag> bindOuter ( const TinyVector< Index, M > &  d) const

Create a lower dimensional view to the chunked array.

The M rightmost dimensions are bound to the indices given in 'd'. All chunks intersecting the view remain active throughout the view's lifetime.

MultiArrayView<N-1, T, ChunkedArrayTag> bindInner ( difference_type_1  index) const

Create a lower dimensional view to the chunked array.

Dimension '0' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.

MultiArrayView<N-M, T, ChunkedArrayTag> bindInner ( const TinyVector< Index, M > &  d) const

Create a lower dimensional view to the chunked array.

The M leftmost dimensions are bound to the indices given in 'd'. All chunks intersecting the view remain active throughout the view's lifetime.

std::size_t cacheMaxSize ( ) const

Get the number of chunks the cache will hold.

If there are any inactive chunks in the cache, these will be sent asleep until the max cahce size is reached. The max cache size may be temporarily overridden when more chunks need to be active simultaneously.

void setCacheMaxSize ( std::size_t  c)

Set the number of chunks the cache will hold.

This should be big enough to hold all chunks that are frequently needed and must therefore be adopted to the application's access pattern.


The documentation for this class was generated from the following file:

© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de)
Heidelberg Collaboratory for Image Processing, University of Heidelberg, Germany

html generated using doxygen and Python
vigra 1.11.1 (Fri May 19 2017)