MADNESS 0.10.1
|
Base class for parallel algorithms that employ a systolic loop to generate all row pairs in parallel. More...
#include <systolic.h>
Public Member Functions | |
SystolicMatrixAlgorithm (DistributedMatrix< T > &A, int tag, int nthread=ThreadPool::size()+1) | |
A must be a column distributed matrix with an even column tile >= 2. | |
virtual | ~SystolicMatrixAlgorithm () |
virtual bool | converged (const TaskThreadEnv &env) const =0 |
Invoked simultaneously by all threads after each sweep to test for convergence. | |
virtual void | end_iteration_hook (const TaskThreadEnv &env) |
Invoked by all threads at the end of each iteration before convergence test. | |
int64_t | get_coldim () const |
Returns length of column. | |
ProcessID | get_rank () const |
Returns rank of this process in the world. | |
int64_t | get_rowdim () const |
Returns length of row. | |
World & | get_world () const |
Returns a reference to the world. | |
virtual void | kernel (int i, int j, T *rowi, T *rowj)=0 |
Threadsafe routine to apply the operation to rows i and j of the matrix. | |
void | run (World &world, const TaskThreadEnv &env) |
Invoked by the task queue to run the algorithm with multiple threads. | |
void | solve_sequential () |
Invoked by the user to run the algorithm with one thread mostly for debugging. | |
virtual void | start_iteration_hook (const TaskThreadEnv &env) |
Invoked by all threads at the start of each iteration. | |
Public Member Functions inherited from madness::TaskInterface | |
TaskInterface (const TaskAttributes &attr) | |
Create a new task with zero dependencies and given attributes. | |
TaskInterface (int ndepend, const char *caller, const TaskAttributes attr=TaskAttributes()) | |
TaskInterface (int ndepend=0, const TaskAttributes attr=TaskAttributes()) | |
Create a new task with ndepend dependencies (default 0) and given attributes. | |
virtual | ~TaskInterface () |
World * | get_world () const |
virtual void | run (World &) |
Runs a single-threaded task ... derived classes must implement this. | |
Public Member Functions inherited from madness::PoolTaskInterface | |
PoolTaskInterface () | |
Default constructor. | |
PoolTaskInterface (const TaskAttributes &attr) | |
virtual | ~PoolTaskInterface ()=default |
Destructor. | |
void | execute () |
void | set_nthread (int nthread) |
Call this to reset the number of threads before the task is submitted. | |
Public Member Functions inherited from madness::TaskAttributes | |
TaskAttributes (const TaskAttributes &attr) | |
Copy constructor. | |
TaskAttributes (unsigned long flags=0) | |
Sets the attributes to the desired values. | |
virtual | ~TaskAttributes () |
int | get_nthread () const |
Get the number of threads. | |
bool | is_generator () const |
Test if the generator attribute is true. | |
bool | is_high_priority () const |
Test if the high priority attribute is true. | |
bool | is_stealable () const |
Test if the stealable attribute is true. | |
template<typename Archive > | |
void | serialize (Archive &ar) |
Serializes the attributes for I/O. | |
TaskAttributes & | set_generator (bool generator_hint) |
Sets the generator attribute. | |
TaskAttributes & | set_highpriority (bool hipri) |
Sets the high priority attribute. | |
void | set_nthread (int nthread) |
Set the number of threads. | |
TaskAttributes & | set_stealable (bool stealable) |
Sets the stealable attribute. | |
Public Member Functions inherited from madness::DependencyInterface | |
DependencyInterface (int ndep, const char *caller) | |
DependencyInterface (int ndep=0) | |
virtual | ~DependencyInterface () |
Destructor. | |
void | dec () |
Decrement the number of dependencies and invoke the callback if ndepend==0 . | |
void | dec_debug (const char *caller) |
void | inc () |
Increment the number of dependencies. | |
void | inc_debug (const char *caller) |
Same as inc(), but keeps track of caller ; calling dec_debug() will signal error if no matching inc_debug() had been invoked | |
int | ndep () const |
Returns the number of unsatisfied dependencies. | |
void | notify () |
Invoked by callbacks to notify of dependencies being satisfied. | |
void | notify_debug (const char *caller) |
Overload of CallbackInterface::notify_debug(), updates dec() | |
bool | probe () const |
Returns true if ndepend == 0 (no unsatisfied dependencies). | |
void | register_callback (CallbackInterface *callback) |
Registers a callback that will be executed when ndepend==0 ; immediately invoked if ndepend==0 . | |
void | register_final_callback (CallbackInterface *callback) |
Registers the final callback to be executed when ndepend==0 ; immediately invoked if ndepend==0 . | |
Public Member Functions inherited from madness::CallbackInterface | |
virtual | ~CallbackInterface () |
Private Member Functions | |
void | cycle () |
Cycles data around the loop ... only one thread should invoke this. | |
virtual void | get_id (std::pair< void *, unsigned short > &id) const |
Get the task id. | |
void | iteration (const int nthread) |
void | unshuffle () |
Call this after iterating to restore correct order of rows in original matrix. | |
Private Attributes | |
DistributedMatrix< T > & | A |
const int64_t | coldim |
A(coldim,rowdim) | |
std::vector< T * > | iptr |
std::vector< T * > | jptr |
Indirection for implementing cyclic buffer !! SHOULD BE VOLATILE ????? | |
std::vector< int64_t > | map |
Used to keep track of actual row indices. | |
const int64_t | nlocal |
No. of local pairs. | |
const int64_t | nproc |
No. of processes with rows of the matrix (not size of world) | |
const ProcessID | rank |
Rank of current process. | |
const int64_t | rowdim |
A(coldim,rowdim) | |
const int | tag |
MPI tag to be used for messages. | |
Additional Inherited Members | |
Static Public Member Functions inherited from madness::TaskAttributes | |
static TaskAttributes | generator () |
static TaskAttributes | hipri () |
static TaskAttributes | multi_threaded (int nthread) |
Static Public Attributes inherited from madness::TaskInterface | |
static bool | debug = false |
Static Public Attributes inherited from madness::TaskAttributes | |
static const unsigned long | GENERATOR = 1ul<<8 |
Mask for generator bit. | |
static const unsigned long | HIGHPRIORITY = GENERATOR<<2 |
Mask for priority bit. | |
static const unsigned long | NTHREAD = 0xff |
Mask for nthread byte. | |
static const unsigned long | STEALABLE = GENERATOR<<1 |
Mask for stealable bit. | |
Protected Member Functions inherited from madness::TaskInterface | |
virtual void | run (const TaskThreadEnv &env) |
Override this method to implement a multi-threaded task. | |
Protected Member Functions inherited from madness::CallbackInterface | |
virtual void | notify_debug_impl (const char *caller) |
Static Protected Member Functions inherited from madness::PoolTaskInterface | |
template<typename fnobjT > | |
static std::enable_if<!(detail::function_traits< fnobjT >::value||detail::memfunc_traits< fnobjT >::value)>::type | make_id (std::pair< void *, unsigned short > &id, const fnobjT &) |
template<typename fnT > | |
static std::enable_if< detail::function_traits< fnT >::value||detail::memfunc_traits< fnT >::value >::type | make_id (std::pair< void *, unsigned short > &id, fnT fn) |
Base class for parallel algorithms that employ a systolic loop to generate all row pairs in parallel.
|
inline |
A must be a column distributed matrix with an even column tile >= 2.
It is assumed that it is the main thread invoking this.
[in,out] | A | The matrix on which the algorithm is performed and modified in-place |
[in] | tag | The MPI tag used for communication (obtain from world.mpi.comm() .unique_tag() ) |
[in] | nthread | The number of local threads to use (default is main thread all threads in the pool) |
References madness::SystolicMatrixAlgorithm< T >::coldim, madness::SystolicMatrixAlgorithm< T >::iptr, madness::SystolicMatrixAlgorithm< T >::jptr, lo, MADNESS_ASSERT, madness::SystolicMatrixAlgorithm< T >::map, madness::SystolicMatrixAlgorithm< T >::nlocal, madness::SystolicMatrixAlgorithm< T >::nproc, p(), madness::SystolicMatrixAlgorithm< T >::rank, and madness::TaskAttributes::set_nthread().
|
inlinevirtual |
|
pure virtual |
Invoked simultaneously by all threads after each sweep to test for convergence.
There is a thread barrier before and after the invocation of this routine
[in] | env | The madness thread environment in case synchronization between threads is needed during computation of the convergence condition. |
Implemented in madness::SystolicFixOrbitalOrders, madness::SystolicPMOrbitalLocalize, and TestSystolicMatrixAlgorithm< T >.
Referenced by madness::SystolicMatrixAlgorithm< T >::run().
|
inlineprivate |
Cycles data around the loop ... only one thread should invoke this.
References madness::World::await(), madness::SystolicMatrixAlgorithm< T >::coldim, madness::SystolicMatrixAlgorithm< T >::iptr, madness::WorldMpiInterface::Irecv(), madness::SystolicMatrixAlgorithm< T >::jptr, MADNESS_ASSERT, madness::World::mpi, madness::SystolicMatrixAlgorithm< T >::nlocal, madness::SystolicMatrixAlgorithm< T >::nproc, madness::SystolicMatrixAlgorithm< T >::rank, madness::WorldMpiInterface::Recv(), madness::SystolicMatrixAlgorithm< T >::rowdim, madness::WorldMpiInterface::Send(), T(), madness::SystolicMatrixAlgorithm< T >::tag, and madness::TaskInterface::world.
Referenced by madness::SystolicMatrixAlgorithm< T >::iteration().
|
inlinevirtual |
Invoked by all threads at the end of each iteration before convergence test.
There is a thread barrier before and after the invocation of this routine. Note that the converged()
method is const
whereas this can modify the class.
[in] | env | The madness thread environment in case synchronization between threads is needed during startup. |
Reimplemented in madness::SystolicFixOrbitalOrders, and madness::SystolicPMOrbitalLocalize.
Referenced by madness::SystolicMatrixAlgorithm< T >::iteration().
|
inline |
Returns length of column.
References madness::SystolicMatrixAlgorithm< T >::coldim.
|
inlineprivatevirtual |
Get the task id.
id | The id to set for this task |
Reimplemented from madness::PoolTaskInterface.
References madness::PoolTaskInterface::make_id().
|
inline |
Returns rank of this process in the world.
References madness::SystolicMatrixAlgorithm< T >::rank.
|
inline |
Returns length of row.
References madness::SystolicMatrixAlgorithm< T >::rowdim.
|
inline |
Returns a reference to the world.
Referenced by madness::SystolicFixOrbitalOrders::end_iteration_hook(), and madness::SystolicPMOrbitalLocalize::end_iteration_hook().
|
inlineprivate |
References madness::SystolicMatrixAlgorithm< T >::coldim, madness::SystolicMatrixAlgorithm< T >::cycle(), madness::SystolicMatrixAlgorithm< T >::end_iteration_hook(), madness::SystolicMatrixAlgorithm< T >::iptr, madness::SystolicMatrixAlgorithm< T >::jptr, madness::SystolicMatrixAlgorithm< T >::kernel(), madness::SystolicMatrixAlgorithm< T >::map, madness::SystolicMatrixAlgorithm< T >::nlocal, madness::SystolicMatrixAlgorithm< T >::rank, and madness::SystolicMatrixAlgorithm< T >::start_iteration_hook().
Referenced by madness::SystolicMatrixAlgorithm< T >::run().
|
pure virtual |
Threadsafe routine to apply the operation to rows i and j of the matrix.
[in] | i | First row index in the matrix |
[in] | j | Second row index in the matrix |
[in] | rowi | Pointer to row i of the matrix (to be modified by kernel in-place) |
[in] | rowj | Pointer to row j of the matrix (to be modified by kernel in-place) |
Implemented in TestSystolicMatrixAlgorithm< T >.
Referenced by madness::SystolicMatrixAlgorithm< T >::iteration().
|
inlinevirtual |
Invoked by the task queue to run the algorithm with multiple threads.
This is a collective call ... all processes in world should submit this task
Reimplemented from madness::TaskInterface.
References madness::SystolicMatrixAlgorithm< T >::converged(), madness::SystolicMatrixAlgorithm< T >::iteration(), madness::TaskThreadEnv::nthread(), and madness::SystolicMatrixAlgorithm< T >::unshuffle().
Referenced by madness::SystolicMatrixAlgorithm< T >::solve_sequential().
|
inline |
Invoked by the user to run the algorithm with one thread mostly for debugging.
This is a collective call ... all processes in world should call this routine.
References madness::SystolicMatrixAlgorithm< T >::run().
|
inlinevirtual |
Invoked by all threads at the start of each iteration.
There is a thread barrier before and after the invocation of this routine
[in] | env | The madness thread environment in case synchronization between threads is needed during startup. |
Reimplemented in madness::SystolicFixOrbitalOrders, madness::SystolicPMOrbitalLocalize, and TestSystolicMatrixAlgorithm< T >.
Referenced by madness::SystolicMatrixAlgorithm< T >::iteration().
|
inlineprivate |
Call this after iterating to restore correct order of rows in original matrix.
At the end of each iteration the matrix rows are logically back in their correct order. However, due to indirection to reduce data motion, if the local column dimension is not a factor of the number of cycles the underlying data may be in a different order. This restores sanity.
Only one thread should invoke this routine
References madness::SystolicMatrixAlgorithm< T >::coldim, madness::BaseTensor::dims(), madness::SystolicMatrixAlgorithm< T >::iptr, madness::SystolicMatrixAlgorithm< T >::jptr, madness::SystolicMatrixAlgorithm< T >::nlocal, madness::SystolicMatrixAlgorithm< T >::nproc, madness::Tensor< T >::ptr(), madness::SystolicMatrixAlgorithm< T >::rank, madness::SystolicMatrixAlgorithm< T >::rowdim, madness::BaseTensor::size(), and T().
Referenced by madness::SystolicMatrixAlgorithm< T >::run().
|
private |
|
private |
|
private |
|
private |
Indirection for implementing cyclic buffer !! SHOULD BE VOLATILE ?????
Referenced by madness::SystolicMatrixAlgorithm< T >::SystolicMatrixAlgorithm(), madness::SystolicMatrixAlgorithm< T >::cycle(), madness::SystolicMatrixAlgorithm< T >::iteration(), and madness::SystolicMatrixAlgorithm< T >::unshuffle().
|
private |
Used to keep track of actual row indices.
Referenced by madness::SystolicMatrixAlgorithm< T >::SystolicMatrixAlgorithm(), and madness::SystolicMatrixAlgorithm< T >::iteration().
|
private |
|
private |
No. of processes with rows of the matrix (not size of world)
Referenced by madness::SystolicMatrixAlgorithm< T >::SystolicMatrixAlgorithm(), madness::SystolicMatrixAlgorithm< T >::cycle(), and madness::SystolicMatrixAlgorithm< T >::unshuffle().
|
private |
Rank of current process.
Referenced by madness::SystolicMatrixAlgorithm< T >::SystolicMatrixAlgorithm(), madness::SystolicMatrixAlgorithm< T >::cycle(), madness::SystolicMatrixAlgorithm< T >::get_rank(), madness::SystolicMatrixAlgorithm< T >::iteration(), and madness::SystolicMatrixAlgorithm< T >::unshuffle().
|
private |
|
private |
MPI tag to be used for messages.
Referenced by madness::SystolicMatrixAlgorithm< T >::cycle().