This directory, paral, contains tests which exercise parallel
features of the ABINIT package.

Copyright (C) 1998-2012 ABINIT group (XG,LSi)
This file is distributed under the terms of the
GNU General Public License, see ~abinit/COPYING
or http://www.gnu.org/copyleft/gpl.txt .
For the initials of contributors, see ~abinit/doc/developers/contributors.txt .

=============================================================================

Most of these tests are designed primarily to exercise parts of the code
quickly, NOT necessarily to give physically sensible results.
For tests of correctness, see the Tutorial directory.
For greater speed, some tests are not run to full convergence.
Also the quality parameters (especially ecut) are minimal, i.e.
the calculations are underconverged.

Tests A, B, D, E, H, I, J, and K are NOT intended to be used as a measure of the
parallelisation speed-up : they contain too much initialisation.
Tests C, F, G, M and N should be OK for some speed-up testing : they are more
realistic than the others. They represent the case of a large number
of k points. In this respect, test C could be modified, to have a still finer
grid of k points (ngkpt and nkpt should be changed).
Also, one might test localrdwf=0 as well as localrdwf=1

WARNING : test F and G use outputs of test C, so test C must be run
before being able to run tests F or G. One run of test C is enough
to initialize all further uses of tests F and G in the same test directory.

Test "kpoints+spin" is intended to serve for speed-up testing. Its use is
explained in the latest part of this file.

==============================================================================

To run these tests :

1. Submit the 'Run' script. See the header of the Run file, for a
   description of the procedure.
   The script 'Run' will create a subdirectory with the name_of_machine and the
   date, where all the results will be placed.
   Beware : only some machine names are allowed, since for each machine
   the procedure to launch parallel execution is different !!

2. In the directory so created, you will find for each test case that you have
   run, a log file (with the name of the test case), an output
   file, but also a 'diff.xxx' file, automatically created by making
   a 'diff' with respect to the "Refs" subdirectory output files.
   It contains output files from a recent version of the code.
   There may be large differences in timing but there should only
   be minor differences in the output of physical quantities.

3. There is also a global report file, generated by the use of the
   fldiff script. Its name is fldiff.report . See the install_notes
   in the Infos directory for information about the use of this file.

**********

Test cases :

set A :  Si in diamond structure; 60 special points in core; low ecut.
set B :  Si in diamond structure; 60 special points, not in core; low ecut.
set C :  FCC Al metallic; 10 special points
set D :  Molybdenum slab (5 atoms+3 vacuum), with ixc=1. 4 k-points, in core.
            Use iprcel=45 for SCF cycle (iscf=3).
set E :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to test v2 #30, except that only two q points are
             considered)
             the 5-th dataset compute 3DTE.
            localrdwf=1
            Also check parallelism for Raman calculations.
set F :  FCC Al metallic : 2 non-self-consistent calculations with 256 k-points,
         for q=Gamma and q=1/4 -1/8 1/8, from the results of set C.
set G :  FCC Al metallic : phonon RF calculation at q=1/4 -1/8 1/8 .
         Need the output files of tests C and F
set H :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to set E, except that localrdwf=0)
set I :  Fe in FCC structure; GS and RF calculation (RF at q=0 0 0)
            Test the parallelism on both spin and k points
set J :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to set E, except that mkmem,mkqmem,mk1mem=0)
set K : N2 molecule
        Test TDDFT in parallel, with nsppol=2 even if the molecule is non spin-polarized
set L : LiNbO3, parallelism over k points
            (coming from test v4#55 written by MVeithen, then modified by DHamann)
            Test parallelism of the Berry phase calculation, and finite electric field calculation.
set M : Si, Bulk, 2 atoms, one-shot GW calculation, parallelism over k points (contributed by RShaltaf).
set N : Si, Bulk, 2 atoms, one-shot GW calculation, parallelism over bands (contributed by RShaltaf).
set O : Si, Bulk, 2 atoms, qp-SC calculation, parallelism over k points
set P : Si, Bulk, 2 atoms , paral_kgb.
        Test of ground state with different occs(7 and 0),
        and also ionmov 3. Only with 0 and 4 procs.
set Q : Si, Bulk, 2 atoms , parallelism -DMPI_IO .Test of ground state
        Only with 4 procs, no sequential version (tests accesswf 1)
        Must be launched with sh ../../tests/Scripts/drive-parallel-tests.sh machname Q
set R : C-diamond, Bulk, 2 atoms, paral_kgb, with PAW.
        Test of ground state with different occs(7 and 0),
        and also ionmov 3. Only with 0 and 4 procs.
set T : He FCC solid in conventional cell (4 atoms).
        Test the recursion algorithm (for high-temperature calculations).
        Only with 0 and 4 procs.
set U : Si, Bulk, 2 atoms, parallelism over k-points for the KSS file creation
        parallelism over bands for GW without PPM (contributed by FBruneval)
set V : Na2, Molecule, 2 atoms, parallelism over bands for scfGW with a cutoffed interaction
        (contributed by FBruneval)
set W : C-diamond, Bulk, 2 atoms , paral_kgb, with PAW.
        Test of ground state with different occs(7 and 0),
        and also ionmov 3. Only with 0 and 4 procs.
        Similar to test R, except istwfk.
        Must be launched with sh ../../tests/Scripts/drive-parallel-tests.sh machname W
set X : C-diamond, Bulk, 2 atoms, paral_kgb, with PAW.
        Test the triple parallelisation.
        We cannot check the distribution npband*npfft*npkpt=2*2*2=8 processors,
        this number being not allowed in the test procedure.
        Here, we only test the parallelisation over FFT and kpoints:
        npband*npfft*npkpt=1*2*2=4 processors.
        In test tY.in we check other distributions for guarantee.
set Y :	C-diamond, Bulk, 2 atoms, paral_kgb, with PAW.
        Test the triple parallelisation.
        We cannot check the distribution npband*npfft*npkpt=2*2*2=8 processors,
        this number being not allowed in the test procedure.
        Here, we only test the parallelisation over bands and spins:
        npband*npfft*npkpt=2*1*2=4 processors.
        In addition, we also test here various features of bandfft-kpt parallelisation
        In particular, the bandpp, istwfk=2 and wfoptalg=14 variables.
set Z : GaAs in hypothetical wurtzite structure; GS and RF calculation
        parallelism over the perturbations (contributed by PPlaenitz)
        (not activated in the automatic testing suite at present - v4.7)
set AA : PAW Berrys Phase calculation of Born effective charge in AlAs by
         finite electric fields (contributed by J. Zwanziger, adapted from efield
         tutorial). The need to have the number of points a multiple of the number of processors
         is not convenient ...
set AB : Calculation of the electron-phonon band structure renormalisation for Diamond,
         due to the phonon at the Gamma point.
         The computation with ecut=20 Ha and elph2_imagden 0.0 gives 24.482 meV for the
         HOMO shift at Gamma, while the finite-difference of phonon frequencies
         gives 28.975 meV, in excellent agreement with frozen-phonon changes of HOMO eigenenergy.
         The difference is due to the Non-Site-Diagonal Debye-Waller contribution, that
         was explicitly obtained by a finite-difference approach.
set AC : Test the string method within parallelization over images
         Inspired by test v6#22.
         Hydrogen diatomic molecule in a cell, close to BCC
         7 images, exploring the transition path.
         Three datasets, testing each value of prtvolimg.
         Processors distribution automatically determined:
            # With 1  proc,  should be: npkpt 1, npimage 1
            # With 2  procs, should be: npkpt 1, npimage 2
            # With 4  procs, should be: npkpt 1, npimage 4
            # With 10 procs, should be: npkpt 2, npimage 5
         (from M. Torrent)
set AD : Test the parallelization over spinorial components of WF
         Bi A7 structure (2 atoms, treated as semi-conductor),
         using PAW, within LDA and spin-orbit coupling.
          - with zero magnetization      (nspden=1, nspinor=2)
          - with non-collinear magnetism (nspden=4, nspinor=2)
         (from M. Torrent)
set AE : C-diamond, Bulk, 2 atoms, paral_kgb, with PAW.
        Test of ground state with different occs(7 and 0),
        and also ionmov 3. Only with 4 procs.
        Same of test R:test the automatic parallelisation
set AF : Test k-point parallelization for selfconsistent DFT+DMFT calculations.
         NiO 
set AH : C-diamond, Bulk, 2 atoms, with PAW.
        Test of ground state with different occs(7 and 0),
        and also ionmov 3. Only with 4 procs.
        test the automatic parallelisation when a processor is unoccupied.


FIXME: case 0 below has no meaning anymore
For each set, one has usually the following cases :
0 use the sequential code (abinit), for check
1 use the parallel code (abinit), with only one processing element
2  use abinit, with two threads and two processing elements
4  use abinit, with four threads and four processing elements
10 use abinit, with ten threads and ten processing elements
   (not for test D, I and O, as there are only 4 k points for test D,
    2 spins and 2 k points for test I, and either 8 or 10 k points for test O)

=============================================================================

To test the speed-up on a number of processor larger than 10, one
should use the files  t_kpt+spin.in and t_kpt+spin.files .
The parallelisation over k-points and spins can be tested.

One is advised first to execute the above-mentioned test I ,
and see whether the output file is correct, see the fldiff file,
where the automatic analysis is performed. The test "kpt+spin"
contains parameters that make the run much longer, and much
more suitable for parallelisation, still being quite realistic.

Supposing that the test I went smoothly, then the kpt+spin
test should be performed as follows :
(1) Create your own directory in ~abinit/tests/paral , or
  go inside the directory created for the test I
 (likely  ~abinit/tests/paral/tmp-<name_of_machine>_<yyyymmdd>)
(2) Execute :
 cp ../Input/t_kpt+spin.in .
 cp ../Input/t_kpt+spin.files .
(3) Run your job with a command like
 /usr/local/mpi-pgi4/bin/mpirun -np 32 ../../../src/main/abinit < t_kpt+spin.files >& log
 (this being for 32 processors).
 The main output file is called t_kpt+spin.out .
 At its end, it contains an analysis of the CPU and wall clock time.
 In sequential, this job lasts between 3000 secs (IFC compiler)
 and 4000 secs (PGI compiler) on a PC Intel 2.8 GHz bought in May 2005.
 With 182 k-points and 2 spins, it might be parallelized over 364 processors,
 but the sequential part of the job is estimated to a bit more than 1%, so that it should
 saturate at a speed-up below 100 .
(4) You can edit the file t_kpt+spin.in, and increase
 ngkpt, or set ndtset 2 as proposed in the file ,
 then go back to step 3 and run the job.
 Increasing the number of k points (thanks to ngkpt e.g. ngkpt 16 16 16) will
 allow a better maximal speed-up. However, the test becomes
 less realistic.

You can get more information about the t_kpt+spin.in ,
t_kpt+spin.files and corresponding output files by reading the
http://www.abinit.org/Infos/abinit_help.html help file.

==============================================================================
