This directory, paral, contains tests which exercise parallel
features of the ABINIT package.

Copyright (C) 1998-2009 ABINIT group (XG,LSi)
This file is distributed under the terms of the
GNU General Public License, see ~abinit/COPYING
or http://www.gnu.org/copyleft/gpl.txt .
For the initials of contributors, see ~abinit/doc/developers/contributors.txt .

=============================================================================

Most of these tests are designed primarily to exercise parts of the code
quickly, NOT necessarily to give physically sensible results.
For tests of correctness, see the Tutorial directory.
For greater speed, some tests are not run to full convergence.
Also the quality parameters (especially ecut) are minimal, i.e.
the calculations are underconverged.

Tests A, B, D, E, H, I, J, and K are NOT intended to be used as a measure of the
parallelisation speed-up : they contain too much initialisation.
Tests C, F, G, M and N should be OK for some speed-up testing : they are more
realistic than the others. They represent the case of a large number
of k points. In this respect, test C could be modified, to have a still finer
grid of k points (ngkpt and nkpt should be changed).
Also, one might test localrdwf=0 as well as localrdwf=1

WARNING : test F and G use outputs of test C, so test C must be run
before being able to run tests F or G. One run of test C is enough
to initialize all further uses of tests F and G in the same test directory.

Test "kpoints+spin" is intended to serve for speed-up testing. Its use is
explained in the latest part of this file.

==============================================================================

To run these tests :

1. Submit the 'Run' script. See the header of the Run file, for a
   description of the procedure.
   The script 'Run' will create a subdirectory with the name_of_machine and the
   date, where all the results will be placed.
   Beware : only some machine names are allowed, since for each machine
   the procedure to launch parallel execution is different !!

2. In the directory so created, you will find for each test case that you have
   run, a log file (with the name of the test case), an output
   file, but also a 'diff.xxx' file, automatically created by making
   a 'diff' with respect to the "Refs" subdirectory output files.
   It contains output files from a recent version of the code.
   There may be large differences in timing but there should only
   be minor differences in the output of physical quantities.

3. There is also a global report file, generated by the use of the
   fldiff script. Its name is fldiff.report . See the install_notes
   in the Infos directory for information about the use of this file.

**********

Test cases :

set A :  Si in diamond structure; 60 special points in core; low ecut.
set B :  Si in diamond structure; 60 special points, not in core; low ecut.
set C :  FCC Al metallic; 10 special points
set D :  Molybdenum slab (5 atoms+3 vacuum), with ixc=1. 4 k-points, in core.
            Use iprcel=45 for SCF cycle (iscf=3).
set E :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to test v2 #30, except that only two q points are
             considered)
            localrdwf=1
set F :  FCC Al metallic : 2 non-self-consistent calculations with 256 k-points,
         for q=Gamma and q=1/4 -1/8 1/8, from the results of set C.
set G :  FCC Al metallic : phonon RF calculation at q=1/4 -1/8 1/8 .
         Need the output files of tests C and F
set H :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to set E, except that localrdwf=0)
set I :  Fe in FCC structure; GS and RF calculation (RF at q=0 0 0)
            Test the parallelism on both spin and k points
set J :  GaAs in zinc-blende structure; GS and RF calculation
            (similar to set E, except that mkmem,mkqmem,mk1mem=0)
set K :  GaAs in hypothetical wurtzite structure; GS and RF calculation
            parallelism over the perturbations (contributed by PPlaenitz)
            (not activated in the automatic testing suite at present - v4.7)
set L : Si3N4, parallelism over G basis set (contributed by THoefler).
            (not activated in the automatic testing suite at present - v4.7)
set M : Si, Bulk, 2 atoms, GW calculation, parallelism over k points (contributed by RShaltaf).
set N : Si, Bulk, 2 atoms, GW calculation, parallelism over bands (contributed by RShaltaf).
set O : Si, Bulk, 2 atoms, qp-SC calculation, parallelism over k points
set P : Si, Bulk, 2 atoms , paral_kgb.
        Test of ground state with different occs(7 and 0), 
        and also ionmov 3. Only with 0 and 4 procs.
set Q : Si, Bulk, 2 atoms , parallelism -DMPI_IO .Test of ground state  
        Only with 4 procs, no sequential version (tests accesswf 1)
        Must be launched with sh ../../tests/Scripts/drive-parallel-tests.sh machname Q	
set R : C-diamond, Bulk, 2 atoms, paral_kgb, with PAW.
        Test of ground state with different occs(7 and 0),
        and also ionmov 3. Only with 0 and 4 procs.
set T : He FCC solid in conventional cell (4 atoms).
        Test the recursion algorithm (for high-temperature calculations).
        Only with 0 and 4 procs.
set U : Si, Bulk, 2 atoms, parallelism over k-points for the KSS file creation
        parallelism over bands for GW without PPM (contributed by FBruneval)
set V : Na2, Molecule, 2 atoms, parallelism over bands for scfGW
        with a cutoffed interaction  (contributed by FBruneval)
set W : C-diamond, Bulk, 2 atoms , paral_kgb, with PAW.
        Test of ground state with different occs(7 and 0),
        and also ionmov 3. Only with 0 and 4 procs.
        Similar to test R, except istwfk.
        Must be launched with sh ../../tests/Scripts/drive-parallel-tests.sh machname W
set X : C-diamond, Bulk, 2 atoms, paral_kgb, with PAW.
        Test the triple parallelisation. 
        We cannot check the distribution npband*npfft*npkpt=2*2*2=8 processors, 
        this number being not allowed in the test procedure.
        Here, we only test the parallelisation over FFT and kpoints:
        npband*npfft*npkpt=1*2*2=4 processors.
        In test tY.in we check other distributions for guarantee.
set Y :	C-diamond, Bulk, 2 atoms, paral_kgb, with PAW.
        Test the triple parallelisation. 
        We cannot check the distribution npband*npfft*npkpt=2*2*2=8 processors, 
        this number being not allowed in the test procedure.
        Here, we only test the parallelisation over bands and spins:
        npband*npfft*npkpt=2*1*2=4 processors.
        In addition, we also test here various features of bandfft-kpt parallelisation
        In particular, the bandpp, istwfk=2 and wfoptalg=14 variables.


For each set, one has usually the following cases :
0 use the sequential code (abinis), for check
1 use the parallel code (abinip), with only one processing element
2  use abinip, with two threads and two processing elements
4  use abinip, with four threads and four processing elements
10 use abinip, with ten threads and ten processing elements
   (not for test D, I and O, as there are only 4 k points for test D,
    2 spins and 2 k points for test I, and either 8 or 10 k points for test O)

=============================================================================

To test the speed-up on a number of processor larger than 10, one
should use the files  t_kpt+spin.in and t_kpt+spin.files .
The parallelisation over k-points and spins can be tested.

One is advised first to execute the above-mentioned test I ,
and see whether the output file is correct, see the fldiff file,
where the automatic analysis is performed. The test "kpt+spin"
contains parameters that make the run much longer, and much
more suitable for parallelisation, still being quite realistic.

Supposing that the test I went smoothly, then the kpt+spin
test should be performed as follows :
(1) Create your own directory in ~abinit/tests/paral , or
  go inside the directory created for the test I  
 (likely  ~abinit/tests/paral/,,name_of_machine_yyyymmdd )
(2) Execute :
 cp ../Input/t_kpt+spin.in .
 cp ../Input/t_kpt+spin.files .
(3) Run your job with a command like
 /usr/local/mpi-pgi4/bin/mpirun -np 32 ../../../src/main/abinip < t_kpt+spin.files >& log
 (this being for 32 processors).
 The main output file is called t_kpt+spin.out .
 At its end, it contains an analysis of the CPU and wall clock time.
 In sequential, this job lasts between 3000 secs (IFC compiler) 
 and 4000 secs (PGI compiler) on a PC Intel 2.8 GHz bought in May 2005.
 With 182 k-points and 2 spins, it might be parallelized over 364 processors,
 but the sequential part of the job is estimated to a bit more than 1%, so that it should
 saturate at a speed-up below 100 .
(4) You can edit the file t_kpt+spin.in, and increase
 ngkpt, or set ndtset 2 as proposed in the file ,
 then go back to step 3 and run the job.
 Increasing the number of k points (thanks to ngkpt e.g. ngkpt 16 16 16) will
 allow a better maximal speed-up. However, the test becomes
 less realistic.

You can get more information about the t_kpt+spin.in ,
t_kpt+spin.files and corresponding output files by reading the 
http://www.abinit.org/Infos/abinis_help.html help file.

==============================================================================
