[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]
ChunkedArray< N, T > Class Template Referenceabstract |
Interface and base class for chunked arrays. More...
#include <vigra/multi_array_chunked.hxx>
Public Member Functions | |
std::string | backend () const |
Return the class that implements this ChunkedArray. | |
iterator | begin () |
Create a scan-order iterator for the entire chunked array. | |
const_iterator | begin () const |
Create a read-only scan-order iterator for the entire chunked array. | |
template<unsigned int M> | |
MultiArrayView< N-1, T, ChunkedArrayTag > | bind (difference_type_1 index) const |
Create a lower dimensional view to the chunked array. More... | |
MultiArrayView< N-1, T, ChunkedArrayTag > | bindAt (MultiArrayIndex dim, MultiArrayIndex index) const |
Create a lower dimensional view to the chunked array. More... | |
MultiArrayView< N-1, T, ChunkedArrayTag > | bindInner (difference_type_1 index) const |
Create a lower dimensional view to the chunked array. More... | |
template<int M, class Index > | |
MultiArrayView< N-M, T, ChunkedArrayTag > | bindInner (const TinyVector< Index, M > &d) const |
Create a lower dimensional view to the chunked array. More... | |
MultiArrayView< N-1, T, ChunkedArrayTag > | bindOuter (difference_type_1 index) const |
Create a lower dimensional view to the chunked array. More... | |
template<int M, class Index > | |
MultiArrayView< N-M, T, ChunkedArrayTag > | bindOuter (const TinyVector< Index, M > &d) const |
Create a lower dimensional view to the chunked array. More... | |
std::size_t | cacheMaxSize () const |
Get the number of chunks the cache will hold. More... | |
int | cacheSize () const |
Number of chunks currently fitting into the cache. | |
const_iterator | cbegin () const |
Create a read-only scan-order iterator for the entire chunked array. | |
const_iterator | cend () const |
Create the end iterator for read-only scan-order iteration over the entire chunked array. | |
template<class U , class Stride > | |
void | checkoutSubarray (shape_type const &start, MultiArrayView< N, U, Stride > &subarray) const |
Copy an ROI of the chunked array into an ordinary MultiArrayView. More... | |
chunk_iterator | chunk_begin (shape_type const &start, shape_type const &stop) |
Create an iterator over all chunks intersected by the given ROI. | |
chunk_const_iterator | chunk_begin (shape_type const &start, shape_type const &stop) const |
Create a read-only iterator over all chunks intersected by the given ROI. | |
chunk_const_iterator | chunk_cbegin (shape_type const &start, shape_type const &stop) const |
Create a read-only iterator over all chunks intersected by the given ROI. | |
chunk_const_iterator | chunk_cend (shape_type const &start, shape_type const &stop) const |
Create the end iterator for read-only iteration over all chunks intersected by the given ROI. | |
chunk_iterator | chunk_end (shape_type const &start, shape_type const &stop) |
Create the end iterator for iteration over all chunks intersected by the given ROI. | |
chunk_const_iterator | chunk_end (shape_type const &start, shape_type const &stop) const |
Create the end iterator for read-only iteration over all chunks intersected by the given ROI. | |
virtual shape_type | chunkArrayShape () const |
Number of chunks along each coordinate direction. | |
shape_type | chunkShape (shape_type const &chunk_index) const |
Find the shape of the chunk indexed by 'chunk_index'. More... | |
shape_type const & | chunkShape () const |
Return the global chunk shape. More... | |
shape_type | chunkStart (shape_type const &global_start) const |
Find the chunk that contains array element 'global_start'. | |
shape_type | chunkStop (shape_type global_stop) const |
Find the chunk that is beyond array element 'global_stop'. More... | |
template<class U , class Stride > | |
void | commitSubarray (shape_type const &start, MultiArrayView< N, U, Stride > const &subarray) |
Copy an ordinary MultiArrayView into an ROI of the chunked array. More... | |
const_view_type | const_subarray (shape_type const &start, shape_type const &stop) const |
Create a read-only view to the specified ROI. More... | |
std::size_t | dataBytes () const |
Bytes of main memory occupied by the array's data. More... | |
std::size_t | dataBytesPerChunk () const |
Number of data bytes in an uncompressed chunk. | |
iterator | end () |
Create the end iterator for scan-order iteration over the entire chunked array. | |
const_iterator | end () const |
Create the end iterator for read-only scan-order iteration over the entire chunked array. | |
value_type | getItem (shape_type const &point) const |
Read the array element at index 'point'. More... | |
bool | isInside (shape_type const &p) const |
Check if the given point is in the array domain. | |
template<class U , class C1 > | |
bool | operator!= (MultiArrayView< N, U, C1 > const &rhs) const |
Check if two arrays differ in at least one element. | |
template<class U , class C1 > | |
bool | operator== (MultiArrayView< N, U, C1 > const &rhs) const |
Check if two arrays are elementwise equal. | |
std::size_t | overheadBytes () const |
Bytes of main memory needed to manage the chunked storage. | |
virtual std::size_t | overheadBytesPerChunk () const =0 |
Bytes of main memory needed to manage a single chunk. | |
void | releaseChunks (shape_type const &start, shape_type const &stop, bool destroy=false) |
void | setCacheMaxSize (std::size_t c) |
Set the number of chunks the cache will hold. More... | |
void | setItem (shape_type const &point, value_type const &v) |
Write the array element at index 'point'. More... | |
shape_type const & | shape () const |
Return the shape in this array. | |
MultiArrayIndex | size () const |
Return the number of elements in this array. | |
view_type | subarray (shape_type const &start, shape_type const &stop) |
Create a view to the specified ROI. More... | |
const_view_type | subarray (shape_type const &start, shape_type const &stop) const |
Create a read-only view to the specified ROI. More... | |
Interface and base class for chunked arrays.
Very big data arrays (possibly bigger than the available RAM) can only be processed in smaller pieces. To support quick access to these pieces, it is advantegeous to store big arrays in chunks, i.e. as a collection of small rectagular subarrays. The class ChunkedArray encapsulates storage and handling of these chunks and provides various APIs to easily access the data.
#include <vigra/multi_array_chunked.hxx>
Namespace: vigra
N | the array dimension |
T | the type of the array elements |
(these are the same as in MultiArrayView). The actual way of chunk storage is determined by the derived class the program uses:
ChunkedArrayFull: Provides the chunked array API for a standard MultiArray (i.e. there is only one chunk for the entire array).
ChunkedArrayLazy: All chunks reside in memory, but are only allocated upon first access.
ChunkedArrayCompressed: Like ChunkedArrayLazy, but temporarily unused chunks are compressed in memory to save space.
ChunkedArrayTmpFile: Chunks are stored in a memory-mapped file. Temporarily unused chunks are written to the hard-drive and deleted from memory.
You must use these derived classes to construct a chunked array because ChunkedArray itself is an abstract class.
Chunks can be in one of the following states:
uninitialized: Chunks are only initialized (i.e. allocated) upon the first write access. If an uninitialized chunk is accessed in a read-only manner, the system returns a pseudo-chunk whose elements have a user-provided fill value.
asleep: The chunk is currently unused and has been compressed and/or swapped out to the hard drive.
inactive: The chunk is currently unused, but still resides in memory.
active: The chunk resides in memory and is currently in use.
locked: Chunks are briefly in this state during transitions between the other states (e.g. while loading and/or decompression is in progress).
In-memory chunks (active and inactive) are placed in a cache. If a chunk transitions from the 'asleep' to the 'active' state, it is added to the cache, and an 'inactive' chunk is removed and sent 'asleep'. If there is no 'inactive' chunk in the cache, the cache size is temporarily increased. All state transitions are thread-safe.
In order to optimize performance, the user should adjust the cache size (via setCacheMaxSize() or ChunkedArrayOptions) so that it can hold all chunks that are frequently needed (e.g. all chunks forming a row of the full array).
Another performance critical parameter is the chunk shape. While the system uses sensible defaults (5122 for 2D arrays, 643 for 3D, 64x64x16x4 for 4D, and 64x64x16x4x4 for 5D), the shape may need to be adjusted via the array's constructor to match the access patterns of the algorithms to be used. For speed reasons, chunk shapes must be powers of 2.
The data in the array can be accessed in several ways. The simplest is via calls to checkoutSubarray()
and commitSubarray()
: These functions copy an arbitrary subregion of a chunked array (possibly straddling many chunks) into a standard MultiArrayView for processing, and write results back into the chunked array:
The required chunks in chunked_array
will only be active while the checkout and commit calls are executing. During the work phase, other threads can use the chunked array's cache to checkout or commit different subregions.
Alternatively, one can work directly on the chunk storage. This is most easily achieved by means of chunk iterators:
No memory is duplicated in this approach, and only the current chunk needs to be active, so that a small chunk cache is sufficient. The iteration over chunks can be distributed over several threads that process the array data in parallel. The programmer must make sure that write operations to individual elements are synchronized between threads. This is usually achieved by ensuring that the threads are responsible for non-overlapping regions of the output array.
An even simpler method is direct element access via indexing. However, the chunked array has no control over the access order in this case, so it must potentially activate the present chunk upon each access. This is rather expensive and should only be used for debugging:
Two additional APIs provide access in a way compatible with an ordinary MultiArrayView. These APIs should be used in functions that are supposed to work unchanged on both ordinary and chunked arrays. The first possibility is the chunked scan-order iterator:
A new chunk must potentially be activated whenever the iterator crosses a chunk boundary. Since the overhead of the activation operation can be amortized over many within-chunk steps, the iteration (excluding the workload within the loop) takes only twice as long as the iteration over an unstrided array using an ordinary StridedScanOrderIterator.
The final possibility is the creation of a MultiArrayView that accesses an arbitrary ROI directly:
Similarly, a lower-dimensional view can be created with one of the bind
functions. This approach has the advantage that 'view' can be passed to any function which is implemented in terms of MultiArrayViews. However, there are two disadvantages: First, data access in the view requires two steps (first find the chunk, then find the appropriate element in the chunk), which causes the chunked view to be slower than an ordinary MultiArrayView. Second, all chunks intersected by the view must remain active throughout the view's lifetime, which may require a big chunk cache and thus keeps many chunks in memory.
std::size_t dataBytes | ( | ) | const |
Bytes of main memory occupied by the array's data.
Compressed chunks are only counted with their compressed size. Chunks swapped out to the hard drive are not counted.
shape_type chunkStop | ( | shape_type | global_stop | ) | const |
Find the chunk that is beyond array element 'global_stop'.
Specifically, this computes
shape_type chunkShape | ( | shape_type const & | chunk_index | ) | const |
Find the shape of the chunk indexed by 'chunk_index'.
This may differ from the global chunk shape because chunks at the right/lower border of the array may be smaller than usual.
shape_type const& chunkShape | ( | ) | const |
Return the global chunk shape.
This is the shape of all chunks that are completely contained in the array's domain.
void releaseChunks | ( | shape_type const & | start, |
shape_type const & | stop, | ||
bool | destroy = false |
||
) |
Sends all chunks asleep which are completely inside the given ROI. If destroy == true and the backend supports destruction (currently: ChunkedArrayLazy and ChunkedArrayCompressed), chunks will be deleted entirely. The chunk's contents after releaseChunks() are undefined. Currently, chunks retain their values when sent asleep, and assume the array's fill_value when deleted, but applications should not rely on this behavior.
void checkoutSubarray | ( | shape_type const & | start, |
MultiArrayView< N, U, Stride > & | subarray | ||
) | const |
Copy an ROI of the chunked array into an ordinary MultiArrayView.
The ROI's lower bound is given by 'start', its upper bound (in 'beyond' sense) is 'start + subarray.shape()'. Chunks in the ROI are only activated while the read is in progress.
void commitSubarray | ( | shape_type const & | start, |
MultiArrayView< N, U, Stride > const & | subarray | ||
) |
Copy an ordinary MultiArrayView into an ROI of the chunked array.
The ROI's lower bound is given by 'start', its upper bound (in 'beyond' sense) is 'start + subarray.shape()'. Chunks in the ROI are only activated while the write is in progress.
view_type subarray | ( | shape_type const & | start, |
shape_type const & | stop | ||
) |
Create a view to the specified ROI.
The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.
const_view_type subarray | ( | shape_type const & | start, |
shape_type const & | stop | ||
) | const |
Create a read-only view to the specified ROI.
The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.
const_view_type const_subarray | ( | shape_type const & | start, |
shape_type const & | stop | ||
) | const |
Create a read-only view to the specified ROI.
The view can be used like an ordinary MultiArrayView, but is a but slower. All chunks intersecting the view remain active throughout the view's lifetime.
value_type getItem | ( | shape_type const & | point | ) | const |
Read the array element at index 'point'.
Since the corresponding chunk must potentially be activated first, this function may be slow and should mainly be used in debugging.
void setItem | ( | shape_type const & | point, |
value_type const & | v | ||
) |
Write the array element at index 'point'.
Since the corresponding chunk must potentially be activated first, this function may be slow and should mainly be used in debugging.
MultiArrayView<N-1, T, ChunkedArrayTag> bindAt | ( | MultiArrayIndex | dim, |
MultiArrayIndex | index | ||
) | const |
Create a lower dimensional view to the chunked array.
Dimension 'dim' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.
MultiArrayView<N-1, T, ChunkedArrayTag> bind | ( | difference_type_1 | index | ) | const |
Create a lower dimensional view to the chunked array.
Dimension 'M' (given as a template parameter) is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.
MultiArrayView<N-1, T, ChunkedArrayTag> bindOuter | ( | difference_type_1 | index | ) | const |
Create a lower dimensional view to the chunked array.
Dimension 'N-1' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.
MultiArrayView<N-M, T, ChunkedArrayTag> bindOuter | ( | const TinyVector< Index, M > & | d | ) | const |
Create a lower dimensional view to the chunked array.
The M rightmost dimensions are bound to the indices given in 'd'. All chunks intersecting the view remain active throughout the view's lifetime.
MultiArrayView<N-1, T, ChunkedArrayTag> bindInner | ( | difference_type_1 | index | ) | const |
Create a lower dimensional view to the chunked array.
Dimension '0' is bound at 'index', all other dimensions remain unchanged. All chunks intersecting the view remain active throughout the view's lifetime.
MultiArrayView<N-M, T, ChunkedArrayTag> bindInner | ( | const TinyVector< Index, M > & | d | ) | const |
Create a lower dimensional view to the chunked array.
The M leftmost dimensions are bound to the indices given in 'd'. All chunks intersecting the view remain active throughout the view's lifetime.
std::size_t cacheMaxSize | ( | ) | const |
Get the number of chunks the cache will hold.
If there are any inactive chunks in the cache, these will be sent asleep until the max cahce size is reached. The max cache size may be temporarily overridden when more chunks need to be active simultaneously.
void setCacheMaxSize | ( | std::size_t | c | ) |
Set the number of chunks the cache will hold.
This should be big enough to hold all chunks that are frequently needed and must therefore be adopted to the application's access pattern.
© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de) |
html generated using doxygen and Python
|