Ampdffs
Ampdffs
PAPER
During the embargo period (the 12 month period from the publication of the Version of Record of this article), the Accepted Manuscript is fully
protected by copyright and cannot be reused or reposted elsewhere.
As the Version of Record of this article is going to be / has been published on a subscription basis, this Accepted Manuscript will be available for
reuse under a CC BY-NC-ND 3.0 licence after the 12 month embargo period.
After the embargo period, everyone is permitted to use copy and redistribute this article for non-commercial purposes only, provided that they
adhere to all the terms of the licence https://creativecommons.org/licences/by-nc-nd/3.0
Although reasonable endeavours have been taken to obtain all necessary permissions from third parties to include their copyrighted content
within this article, their full citation and copyright line may not be present in this Accepted Manuscript version. Before using any content from this
article, please refer to the Version of Record on IOPscience once published for full citation and copyright details, as permissions may be required.
All third party content is fully copyright protected, unless specifically stated otherwise in the figure caption in the Version of Record.
1
2
3 Advanced capabilities for materials modelling with Quantum ESPRESSO
4
5
6 P. Giannozzia , O. Andreussib,i , T. Brummec , O. Bunaud , M. Buongiorno Nardellie ,
7 M. Calandrad , R. Carf , C. Cavazzonig , D. Ceresolih , M. Cococcionii , N. Colonnai , I.
8
9 Carnimeoa , A. Dal Corsoj , S. de Gironcolij , P. Delugasj , R. A. DiStasio Jr.k , A. Ferrettil ,
10
11 A. Florism , G. Fratesin , G. Fugalloo , R. Gebauerp , U. Gerstmannq , F. Giustinor , T.
12
Gornij , J. Jiak , M. Kawamuras , H.-Y. Kof , A. Kokaljt , E. Küçükbenlij , M. Lazzerid , M.
13
14 Marsiliu , N. Marzarii , F. Mauriv , N. L. Nguyenj , H.-V. Nguyenw , A. Otero-de-la-Rozax , L.
15
16 Paulattod , S. Poncér , D. Roccay,z , R. Sabatini1 , B. Santraf , M. Schlipfr , A. P. Seitsonen2,3 ,
17
18 A. Smogunov4 , I. Timrovi , T. Thonhauser5 , P. Umariu,6 , N. Vast7 , X. Wu8 , S. Baroni j
2
1
2
3 (q) Department Physik, Universität Paderborn, D-33098 Paderborn, Germany
4 (r) Department of Materials, University of Oxford,
5
6 Parks Road, Oxford OX1 3PH, United Kingdom
7 (s) The Institute for Solid State Physics, Kashiwa, Japan
8
9 (t) Department of Physical and Organic Chemistry,
10 Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
11
12 (u) Dipartimento di Fisica e Astronomia,
13 Università di Padova, via Marzolo 8, I-35131 Padova, Italy
14
(v) Dipartimento di Fisica, Università di Roma La Sapienza,
15
16 Piazzale Aldo Moro 5, I-00185 Roma, Italy
17
(w) Institute of Physics, Vietnam Academy of Science and Technology, 10 Dao Tan, Hanoi, Vietnam
18
19 (x) Department of Chemistry, University of British Columbia,
20 Okanagan, Kelowna BC V1V 1V7, Canada
21
22 (y) Université de Lorraine, CRM2 , UMR 7036, F-54506 Vandoeuvre-lès-Nancy, France
23 (z) CNRS, CRM2 , UMR 7036, F-54506 Vandoeuvre-lès-Nancy, France
24
25 (1) Orionis Biosciences, Boston (2) Institut für Chimie,
26 Universität Zurich, CH-8057 Zürich, Switzerland
27
(3) Département de Chimie, École Normale Supérieure, F-75005 Paris, France
28
29 (4) SPEC, CEA, CNRS, Université Paris-Saclay, F-91191 Gif-Sur-Yvette, France
30
(5) Department of Physics, Wake Forest University, Winston-Salem, NC 27109, USA
31
32 (6) CNR-IOM DEMOCRITOS, Istituto Officina dei Materiali, Consiglio Nazionale delle Ricerche
33 (7) Laboratoire des Solides Irradiés, École Polytechnique, CEA-DRF-IRAMIS,
34
35 CNRS UMR 7642, Université Paris-Saclay, F-91120 Palaiseau, France
36 (8) Department of Physics, Temple University, Philadelphia, PA 19122-1801, USA
37
38
39 (Dated: September 23, 2017)
40
41 Quantum ESPRESSO is an integrated suite of open-source computer codes for quan-
42
43 tum simulations of materials using state-of-the art electronic-structure techniques, based on
44 density-functional theory, density-functional perturbation theory, and many-body pertur-
45
bation theory, within the plane-wave pseudo-potential and projector-augmented-wave ap-
46
47 proaches. Quantum ESPRESSO owes its popularity to the wide variety of properties
48 and processes it allows to simulate, to its performance on an increasingly broad array of
49
50 hardware architectures, and to a community of researchers that rely on its capabilities as a
51 core open-source development platform to implement theirs ideas. In this paper we describe
52
53 recent extensions and improvements, covering new methodologies and property calculators,
54 improved parallelization, code modularization, and extended interoperability both within
55
56 the distribution and with external software.
57
58
59
60
Page 3 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
3
1
2
3 CONTENTS
4
5
6 I. Introduction 4
7
8 II. Theoretical, algorithmic, and methodological extensions 6
9
10 A. Advanced functionals 7
11
12 1. Advanced implementation of exact (Fock) exchange and hybrid functionals 7
13 2. Dispersion interactions 10
14
15 3. Hubbard-corrected functionals: DFT+U 15
16
17 4. Adiabatic-connection fluctuation-dissipation theory 18
18 B. Linear response and excited states without virtual orbitals 20
19
20 1. Static perturbations and vibrational spectroscopy 21
21
22 2. Dynamic perturbations: optical, electron energy loss, and magnetic spectroscopies 22
23 3. Many-body perturbation theory 27
24
25 C. Other spectroscopies 29
26
1. QE-GIPAW: Nuclear magnetic and electron paramagnetic resonance 29
27
28 2. XSpectra: L2,3 X-ray absorption edges 31
29
30 D. Other lattice-dynamical and thermal properties 32
31
1. thermo pw: Thermal properties from the quasi-harmonic approximation 32
32
33 2. thermal2: phonon-phonon interaction and thermal transport 33
34
35 3. EPW: Electron-phonon coefficients from Wannier interpolation 35
36
4. Non-perturbative approaches to vibrational spectroscopies 37
37
38 E. Multi-scale modeling 38
39
40 1. Environ: Self-Consistent Continuum Solvation embedding model 38
41
2. QM-MM 40
42
43 F. Miscellaneous feature enhancements and additions 41
44
45 1. Fully relativistic projector augmented-wave method 41
46
2. Electronic and structural properties in field-effect configuration 42
47
48 3. Cold restart of Car-Parrinello molecular dynamics 43
49
50 4. Optimized tetrahedron method 45
51 5. Wyckoff positions 46
52
53
54 III. Parallelization, modularization, interoperability and stability 47
55
56 A. New parallelization levels 47
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 4 of 68
4
1
2
3 B. Aspects of interoperability 49
4
5 C. Input/Output and data file format 50
6 1. XML files with schema 51
7
8 2. Large-record data file format 51
9
10 D. Organization of the distribution 52
11 1. Package re-organization and modularization 53
12
13 2. Modular parallelism 53
14
15 3. Reorganization of linear-response codes 54
16 E. Quantum ESPRESSO and scripting languages 54
17
18 1. AiiDA: a python materials’ informatics infrastructure 54
19
20 2. Pwtk: a toolkit for PWscf 55
21 3. QE-emacs-modes 56
22
23 F. Continuous Integration and testing 57
24
25
IV. Outlook and conclusions 59
26
27
28 References 60
29
30
31
I. INTRODUCTION
32
33
34 Numerical simulations based on density-functional theory (DFT) [1, 2] have become a powerful
35
36 and widely used tool for the study of materials properties. Many of such simulations are based
37
38 upon the “plane-wave pseudopotential method”, often using ultrasoft pseudopotentials [3] or the
39 projector augmented wave method (PAW) [4] (in the following, all of these modern developments
40
41 will be referred to under the generic name of “pseudopotentials”). An important role in the diffu-
42
sion of DFT-based techniques has been played by the availability of robust and efficient software
43
44 implementations [5], as is the case for Quantum ESPRESSO, which is an open-source software
45
46 distribution—i.e., an integrated suite of codes—for electronic-structure calculations based on DFT
47 or many-body perturbation theory, and using plane-wave basis sets and pseudopotentials [6].
48
49 The core philosophy of Quantum ESPRESSO can be summarized in four keywords: openness,
50
51 modularity, efficiency, and innovation. The distribution is based on two core packages, PWscf and
52 CP, performing self-consistent and molecular-dynamics calculations respectively, and on additional
53
54 packages for more advanced calculations. Among these we quote in particular: PHonon, for linear-
55
56 response calculations of vibrational properties; PostProc, for data analysis and postprocessing;
57
58
59
60
Page 5 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
5
1
2
3 atomic, for pseudopotential generation; XSpectra, for the calculation of X-ray absorption spectra;
4
5 GIPAW, for nuclear magnetic resonance and electron paramagnetic resonance calculations.
6 In this paper we describe and document the novel or improved capabilities of Quantum
7
8 ESPRESSO up to and including version 6.2. We do not cover features already present in v.4.1
9
and described in Ref. 6, to which we refer for further details. The list of enhancements includes
10
11 theoretical and methodological extensions but also performance enhancements for current parallel
12
13 machines and modularization and extended interoperability with other software.
14
Among the theoretical and methodological extensions, we mention in particular:
15
16
17 • Fast implementations of exact (Fock) exchange for hybrid functionals [7, 42–44]; implemen-
18
tation of non-local van der Waals functionals [9] and of explicit corrections for van der Waals
19
20 interactions [10–13]; improvement and extensions of Hubbard-corrected functionals [14, 15].
21
22
• Excited-state calculations within time-dependent density-functional and many-body pertur-
23
24 bation theories.
25
26
• Relativistic extension of the PAW formalism, including spin-orbit interactions in density-
27
28 functional theory[16, 17].
29
30
31 • Continuum embedding environments (dielectric solvation models, electronic enthalpy, elec-
32 tronic surface tension, periodic boundary corrections) via the Environ module [18, 19] and
33
34 its time-dependent generalization [20].
35
36 Several new packages, implementing the calculation of new properties, have been added to Quan-
37
38 tum ESPRESSO. We quote in particular:
39
40 • turboTDDFT [21–24] and turboEELS [25, 26], for excited-state calculations within time-
41
42 dependent DFT (TDDFT), without computing virtual orbitals, also interfaced with the
43
Environ module (see above).
44
45
46 • QE-GIPAW, replacing the old GIPAW package, for nuclear magnetic resonance and electron
47
paramagnetic resonance calculations.
48
49
50 • EPW, for electron-phonon calculations using Wannier-function interpolation [27].
51
52 • GWL and SternheimerGW for quasi-particle and excited-state calculations within many-
53
54 body perturbation theory, without computing any virtual orbitals, using the Lanczos bi-
55
56 orthogonalization [28, 29] and multi-shift conjugate-gradient methods [30], respectively.
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 6 of 68
6
1
2
3 • thermo pw, for computing thermodynamical properties in the quasi-harmonic approximation,
4
5 also featuring an advanced master-slave distributed computing scheme, applicable to generic
6 high-throughput calculations [31].
7
8
9 • d3q and thermal2, for the calculation of anharmonic 3-body interatomic force constants,
10
11 phonon-phonon interaction and thermal transport [32, 33].
12
13
14 Improved parallelization is crucial to enhance performance and to fully exploit the power of modern
15
16 parallel architectures. A careful removal of memory bottlenecks and of scalar sections of code is
17 a pre-requisite for better and extending scaling. Significant improvements have been achieved, in
18
19 particular for hybrid functionals [34, 35].
20
21 Complementary to this, a complete pseudopotential library, pslibrary, including fully-
22 relativistic pseudopotentials, has been generated [36, 37]. A curation effort [38] on all the
23
24 pseudopotential libraries available for Quantum ESPRESSO has led to the identification of
25
26 optimal pseudopotentials for efficiency or for accuracy in the calculations, the latter delivering an
27 agreement comparable to any of the best all-electron codes [5]. Finally, a significant effort has been
28
29 dedicated to modularization and to enhanced interoperability with other software. The structure
30
of the distribution has been revised, the code base has been re-organized, the format of data files
31
32 re-designed in line with modern standards. As notable examples of interoperability with other
33
34 software, we mention in particular the interfaces with the LAMMPS molecular dynamics (MD) code
35 [39] used as molecular-mechanics “engine” in the Quantum ESPRESSO implementation of the
36
37 QM-MM methodology [40], and with the i-PI MD driver [41], also featuring path-integral MD.
38
39 All advances and extensions that have not been documented elsewhere are described in the next
40
sections. For more details on new packages we refer to the respective references.
41
42 The paper is organized as follows. Sec. II contains a description of new theoretical and method-
43
44 ological developments and of new packages distributed together with Quantum ESPRESSO. Sec.
45
III contains a description of improvements of parallelization, updated information on the philos-
46
47 ophy and general organization of Quantum ESPRESSO, notably in the field of modularization
48
49 and interoperability. Sec. IV contains an outlook of future directions and our conclusions.
50
51
52
53 II. THEORETICAL, ALGORITHMIC, AND METHODOLOGICAL EXTENSIONS
54
55
56 In the following, CGS units are used, unless noted otherwise.
57
58
59
60
Page 7 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
7
1
2
3 A. Advanced functionals
4
5
1. Advanced implementation of exact (Fock) exchange and hybrid functionals
6
7
8 Hybrid functionals are already the de facto standard in quantum chemistry and are quickly gain-
9
10 ing popularity in the condensed-matter physics and computational materials science communities.
11
12 Hybrid functionals reduce the self-interaction error that plagues lower-rung exchange-correlation
13 functionals, thus achieving more accurate and reliable predictive capabilities. This is of particular
14
15 importance in the calculation of orbital energies, which are an essential ingredient in the treat-
16
ment of band alignment and charge transfer in heterogeneous systems, as well as the input for
17
18 higher-level electronic-structure calculations based on many-body perturbation theory. However,
19
20 the widespread use of hybrid functionals is hampered by the often prohibitive computational re-
21
quirements of the exact-exchange (Fock) contribution, especially when working with a plane-wave
22
23 basis set. The basic ingredient here is the action (V̂x φi )(r) of the Fock operator V̂x onto a (single-
24
25 particle) electronic state φi , requiring a sum over all occupied Kohn-Sham (KS) states {ψj }. For
26 spin-unpolarized systems, one has:
27
Z
28 2
X ψj∗ (r′ )φi (r′ )
29 (V̂x φi )(r) = −e ψj (r) dr′ , (1)
|r − r′ |
30 j
31 where −e is the charge of the electron. In the original algorithm [6] implemented in PWscf, self-
32
33 consistency is achieved via a double loop: in the inner one the ψ’s entering the definition of the
34
35 Fock operator in Eq. (1) are kept fixed, while the outer one cycles until the Fock operator converges
36 to within a given threshold. In the inner loop, the integrals appearing in Eq. (1):
37 Z
38 ρij (r′ )
39
vij (r) = dr′ , ρij (r) = ψi∗ (r)φj (r), (2)
|r − r′ |
40
41 are computed by solving the Poisson equation in reciprocal space using fast Fourier transforms
42
(FFT). This algorithm is straightforward but slow, requiring O (Nb Nk )2 FFTs, where Nb is the
43
44 number of electronic states (“bands” in solid-state parlance) and Nk the number of k points in
45
46 the Brillouin zone (BZ). While feasible in relatively small cells, this unfavorable scaling with the
47
system size makes calculations with hybrid functionals challenging if the unit cell contains more
48
49 than a few dozen atoms.
50
51 To enable exact-exchange calculations in the condensed phase, various ideas have been con-
52 ceived and implemented in recent Quantum ESPRESSO versions. Code improvements aimed
53
54 at either optimizing or better parallelizing the standard algorithm are described in Sec. III A. In
55
56 this section we describe two important algorithmic developments in Quantum ESPRESSO, both
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 8 of 68
8
1
2
3 entailing a significant reduction in the computational effort: the adaptively compressed exchange
4
5 (ACE) concept [7] and a linear-scaling (O(Nb )) framework for performing hybrid-functional ab
6 initio molecular dynamics using maximally localized Wannier functions (MLWF) [42–44].
7
8 a. Adaptively compressed exchange The simple formal derivation of ACE allows for a robust
9
implementation, which applies straightforwardly both to isolated or aperiodic systems (Γ−only
10
11 sampling of the BZ, that is, k = 0) and to periodic ones (requiring sums over a grid of k points
12
13 in the BZ); to norm conserving and ultrasoft pseudopotentials or PAW; to spin-unpolarized or
14
polarized cases or to non-collinear magnetization. Furthermore, ACE is compatible with, and
15
16 takes advantage of, all available parallelization levels implemented in Quantum ESPRESSO:
17
18 over plane waves, over k points, and over bands.
19 With ACE, the action of the exchange operator is rewritten as
20
21 X
22 |V̂x φi i ≃ |ξj i(M −1 )jm hξm |φi i, (3)
23 jm
24
where |ξi i = V̂x |ψi i and Mjm = hψj |ξm i. At self-consistency, ACE becomes exact for φi ’s in
25
26 the occupied manifold of KS states. It is straightforward to implement ACE in the double-loop
27
28 structure of PWscf. The new algorithm is significantly faster while not introducing any loss of
29 accuracy at convergence. Benchmark tests on a single processor show a 3× to 4× speedup for
30
31 typical calculations in molecules, up to 6× in extended systems [45].
32
33 An additional speedup may be achieved by using a reduced FFT cutoff in the solution of Poisson
34 equations. In Eq. (1), the exact FFT algorithm requires a FFT grid containing G-vectors up to a
35
36 modulus Gmax = 2Gc , where Gc is the largest modulus of G-vectors in the plane-wave basis used
37
38 to expand ψi and φj , or, in terms of kinetic energy cutoff, up to a cutoff Ex = 4Ec , where Ec is the
39 plane-wave cutoff. The presence of a 1/G2 factor in the reciprocal space expression suggests, and
40
41 experience confirms, that this condition can be relaxed to Ex ∼ 2Ec with little loss of precision,
42
down to Ex = Ec at the price of increasing somewhat this loss [46]. The kinetic-energy cutoff for
43
44 Fock-exchange computations can be tuned by specifying the keyword ecutfock in input.
45
46 Hybrid functionals have also been extended to the case of ultrasoft pseudopotentials and to
47
PAW, following the method of Ref. 47. A large number of integrals involving augmentation charges
48
49 qlm are needed in this case, thus offsetting the advantage of a smaller plane-wave basis set. Better
50
51 performances are obtained by exploiting the localization of the qlm and computing the related
52 terms in real space, at the price of small aliasing errors.
53
54 These improvements allow to significantly speed up a calculation, or to execute it on a larger
55
56 number of processors, thus extending the reach of calculations with hybrid functionals. The bot-
57
58
59
60
Page 9 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
9
1
2
3 tleneck represented by the sum over bands and by the FFT in Eq. (1) is however still present:
4
5 ACE just reduces the number of such expensive calculations, but doesn’t eliminate them. In order
6 to achieve a real breakthrough, one has to get rid of delocalized bands and FFT’s, moving to a
7
8 representation of the electronic structure in terms of localized orbitals. Work along this line using
9
the selected column density matrix localization scheme [48, 49] is ongoing. In the next section we
10
11 describe a different approach, implemented in the CP code, based on maximally localized Wannier
12
13 functions (MLWF).
14
b. Ab-initio molecular dynamics using maximally localized Wannier functions The CP code
15
16 can now perform highly efficient hybrid-functional ab initio MD using MLWFs [50] {ϕi } to represent
17
18 the occupied space, instead of the canonical KS orbitals {ψi }, which are typically delocalized
19 over the entire simulation cell. The MLWF localization procedure can be written as a unitary
20 P
21 transformation, ϕi (r) = j Uij ψj (r), where Uij is computed at each MD time step by minimizing
22
23 the total spread of the orbitals via a second-order damped dynamics scheme, starting with the
24 converged Uij from the previous time step as initial guesses [51].
25
26 The natural sparsity of the exchange interaction provided by a localized representation of the
27
occupied orbitals (at least in systems with a finite band gap) is efficiently exploited during the
28
29 evaluation of exact-exchange based applications (e.g., hybrid DFT functionals). This is accom-
30
31 plished by computing each of the required pair-exchange potentials v ij (r) (corresponding to a
32
given localized pair-density ρij (r)) through the numerical solution of the Poisson equation:
33
34
35 ∇2 v ij (r) = −4πρij (r), ρij (r) = ϕ∗i (r)ϕj (r) (4)
36
37 using finite differences on the real-space grid. Discretizing the Laplacian operator (∇2 ) using a
38
39 19-point central-difference stencil (with an associated O(h6 ) accuracy in the grid spacing h), the
40
41 resulting sparse linear system of equations is solved using the conjugate-gradient technique subject
42 to the boundary conditions imposed by a multipolar expansion of v ij (r):
43
X Qlm Ylm (θ, φ) Z
44 ∗
45 v ij (r) = 4π , Qlm = drYlm (θ, φ)rl ρij (r) (5)
2l + 1 rl+1
46 lm
47
in which the Qlm are the multipoles describing ρij (r) [42–44].
48
49 Since v ij (r) only needs to be evaluated for overlapping pairs of MLWFs, the number of Poisson
50
51 equations that need to be solved is substantially decreased from O(Nb2 ) to O(Nb ). In addition,
52 v ij (r) only needs to be solved on a subset of the real-space grid (that is in general of fixed size)
53
54 that encompasses the overlap between a given pair of MLWFs. This further reduces the overall
55
56 computational effort required to evaluate exact-exchange related quantities and results in a linear-
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 10 of 68
10
1
2
3 scaling (O(Nb )) algorithm. As such, this framework for performing exact-exchange calculations is
4
5 most efficient for non-metallic systems (i.e., systems with a finite band gap) in which the occupied
6 KS orbitals can be efficiently localized.
7
8 The MLWF representation not only yields the exact-exchange energy Exx ,
9
10 XZ
2
Exx = −e dr ρij (r)v ij (r), (6)
11
ij
12
13 at a significantly reduced computational cost, but it also provides an amenable way of computing
14 P
i
15 the exact-exchange contributions to the (MLWF) wavefunction forces, Dxx (r) = e2 j v ij (r)ϕj (r),
16
17 which serve as the central quantities in Car-Parrinello MD simulations [53]. Moreover, the exact-
18 exchange contributions to the stress tensor are readily available, thereby providing a general code
19
20 base which enables hybrid DFT based simulations in the NVE, NVT, and NPT ensembles for
21
22 simulation cells of any shape [44]. We note in passing that applications of the current implementa-
23 tion of this MLWF-based exact-exchange algorithm are limited to Γ–point calculations employing
24
25 norm-conserving pseudo-potentials.
26
The MLWF-based exact-exchange algorithm in CP employs a hybrid MPI/OpenMP paralleliza-
27
28 tion strategy that has been extensively optimized for use on large-scale massively-parallel (super-)
29
30 computer architectures. The required set of Poisson equations—each one treated as an indepen-
31 dent task—are distributed across a large number of MPI ranks/processes using a task distribution
32
33 scheme designed to minimize the communication and to balance computational workload. Per-
34
35 formance profiling demonstrates excellent scaling up to 30,720 cores (for the α-glycine molecular
36 crystal, see Fig. 1) and up to 65,536 cores (for (H2 O)256 , see Ref. 43) on Mira (BG/Q) with
37
38 extremely promising efficiency. In fact, this algorithm has already been successfully applied to the
39
40 study of long-time MD simulations of large-scale condensed-phase systems such as (H2 O)128 [43, 52].
41 For more details on the performance and implementation of this exact-exchange algorithm, we refer
42
43 the reader to Ref. 44.
44
45
46 2. Dispersion interactions
47
48
49 Dispersion, or van der Waals, interactions arise from dynamical correlations among charge fluc-
50
51 tuations occurring in widely separated regions of space. The resulting attraction is a non-local
52 correlation effect that cannot be reliably captured by any local (such as local density approxi-
53
54 mation, LDA) or semi-local (generalized gradient approximation, GGA) functional of the electron
55
56 density [54]. Such interactions can be either accounted for by a truly non-local exchange-correlation
57
58
59
60
Page 11 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 FIG. 1. Strong (left) and weak (right) scaling plots on Mira (BG/Q) for hybrid-DFT simulations of the α-
20 glycine molecular crystal polymorph using the linear-scaling exact-exchange algorithm in CP. In these plots,
21
22 unit cells containing 16-64 glycine molecules (160-640 atoms, 240-960 bands) were considered as a function
23 of z, the number of MPI ranks per band (z = 0.5-2). On Mira, 30,720 cores (1920 MPI ranks × 16 OpenMP
24
25 threads/rank × 1 core/OpenMP thread) were utilized for the largest system (gly064, z = 2), retaining over
26 88% (strong scaling) and 80% (weak scaling) of the ideal efficiencies (dashed lines). Deviations from ideal
27
28 scaling are primarily due to the FFT (which scales non-linearly) required to provide the MLWFs in real
29 space.
30
31
32
33
(XC) functional, or modeled by effective interactions amongst atoms, whose parameters are either
34
35 computed from first principles or estimated semi-empirically. In Quantum ESPRESSO both
36
37 approaches are implemented. Non-local XC functionals are activated by selecting them in the
38 input dft variable, while explicit interactions are turned on with the vdw corr option. From the
39
40 latter group, DFT-D2 [10], Tkatchenko-Scheffler [11], and exchange-hole dipole moment models
41
42 [12, 13] are currently implemented (DFT-D3 [55] and the many-body dispersion (MBD) [56–58]
43 approaches are already available in a development version).
44
45
46 a. Non-local van der Waals density functionals A fully non-local correlation functional able to
47
account for van der Waals interactions for general geometries was first developed in 2004 and named
48
49 vdW-DF [59]. Its development is firmly rooted in many-body theory, where the so-called adiabatic
50
51 connection fluctuation-dissipation theorem (ACFD) [60] provides a formally exact expression for
52 the XC energy through a coupling constant integration over the response function—see Sec. II A 4.
53
54 A detailed review of the vdW-DF formalism is provided in Ref. 9. The overall XC energy given
55
56 by the ACFD theorem—as a functional of the electron density n—is then split in vdW-DF into a
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 12 of 68
12
1
2
0 [n] and a truly non-local correlation part E nl [n], i.e.
GGA-type XC part Exc
3 c
4
5 0
Exc [n] = Exc [n] + Ecnl [n] , (7)
6
7
8 where the non-local part is responsible for the van der Waals forces. Through a second-order
9
expansion in the plasmon-response expression used to approximate the response function, the non-
10
11 local part turns into a computationally tractable form involving a universal kernel Φ(r, r′ ),
12 Z
13 nl 1
14 Ec [n] = dr dr′ n(r) Φ(r, r′ ) n(r′ ) . (8)
2
15
16 The kernel Φ(r, r′ ) depends on r and r′ only through q0 (r)|r − r′ | and q0 (r′ )|r − r′ |, where q0 (r) is
17
18 a function of n(r) and ∇n(r). As such, the kernel can be pre-calculated, tabulated, and stored in
19 some external file. To make the scheme self-consistent, the XC potential Vcnl (r) = δEcnl [n]/δn(r)
20
21 also needs to be computed [61]. The evaluation of Ecnl [n] in Eq. (8) is computationally expensive.
22
23 In addition, the evaluation of the corresponding potential Vcnl (r) requires one spatial integral for
24 each point r. A significant speedup can be achieved by writing the kernel in terms of splines [62]
25
26
27 Φ(r, r′ ) = Φ q0 (r), q0 (r′ ), |r − r′ |)
28 X
≈ Φ(qα , qβ , |r − r′ |) pα q0 (r) pβ q0 (r′ ) , (9)
29
αβ
30
31 where qα are fixed values and pα are cubic splines. Equation (8) then becomes a convolution that
32
33 can be simplified to
34 Z
1X
35 Ecnl [n] = dr dr′ θα (r) Φαβ (|r − r′ |) θβ (r′ )
36 2
αβ
37 Z
1X
38 = dk θα∗ (k) Φαβ (k) θβ (k) . (10)
39 2
αβ
40
41 Here θα (r) = n(r)pα q0 (r) and θα (k) is its Fourier transform. Accordingly, Φαβ (k) is the Fourier
42
transform of the original kernel Φαβ (r) = Φ(qα , qβ , |r−r′ |). Thus, two spatial integrals are replaced
43
44 by one integral over Fourier transformed quantities, resulting in a considerable speedup. This
45
46 approach also provides a convenient evaluation for Vcnl (r).
47
The vdW-DF functional was implemented in Quantum ESPRESSO version 4.3, following
48
49 Eq. (10). As a result, in large systems, compute times in vdW-DF calculations are only in-
50
51 significantly longer than for standard GGA functionals. The implementation uses a tabulation
52 of the Fourier transformed kernel Φαβ (k) from Eq. (10) that is computed by an auxiliary code,
53
54 generate vdW kernel table.x, and stored in the external file vdW kernel table. The file then
55
56 has to be placed either in the directory where the calculation is run or in the directory where the
57
58
59
60
Page 13 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
13
1
2
3 corresponding pseudopotentials reside. The formalism for vdW-DF stress was derived and imple-
4
5 mented in Ref. 63. The proper spin extension of vdW-DF, termed svdW-DF [64], was implemented
6 in Quantum ESPRESSO version 5.2.1.
7
8 Although the ACFD theorem provides guidelines for the total XC functional in Eq. (7), in
9 0 [n] is approximated by simple GGA-type functional forms. This has been used to im-
practice Exc
10
11 prove vdW-DF—and correct the often too large binding separations found in its original form—by
12
0 [n]. The naming convention for the resulting variants
optimizing the exchange contribution to Exc
13
14
is that the extension should describe the exchange functional used. In this context, the functionals
15
16 vdW-DF-C09 [65], vdW-DF-obk8 [66], vdW-DF-ob86 [67], and vdW-DF-cx [68] have been devel-
17
18 oped and implemented in Quantum ESPRESSO. While all of these variants use the same kernel
19 to evaluate Ecnl [n], advances have also been made in slightly adjusting the kernel form, which is
20
21 referred to and implemented as vdW-DF2 [69]. A corresponding variant, i.e., vdW-DF2-b86r [70],
22
23 is also implemented. Note that vdW-DF2 uses the same kernel file as vdW-DF.
24 The functional VV10 [71] is related to vdW-DF, but adheres to fewer exact constraints and
25
26 follows a very different design philosophy. It is implemented in Quantum ESPRESSO in a form
27
called rVV10 [72] and uses a different kernel and kernel file that can be generated by running the
28
29 auxiliary code generate rVV10 kernel table.x.
30
31 b. Interatomic pairwise dispersion corrections An alternative approach to accounting for dis-
32 0 a dispersion energy, E
persion forces is to add to the XC energy Exc disp , written as a damped
33
34 asymptotic pairwise expression:
35
36 1 X X C (n) fn (RIJ )
0 IJ
37
Exc = Exc + Edisp , Edisp = − n (11)
2 RIJ
n=6,8,10 I6=J
38
39 where I and J run over atoms, RIJ = |RI − RJ | is the interatomic distance between atoms I and
40 (n)
41 J, and fn (R) is a suitable damping function. The interatomic dispersion coefficients CIJ can be
42
derived from fits, as in DFT-D2 [10], or calculated non-empirically, as in the Tkatchenko-Scheffler
43
44 (TS-vdW) [11] and exchange-hole dipole moment (XDM) models [12, 13].
45 (n)
46 In XDM, the CIJ coefficients are calculated assuming that dispersion interactions arise from
47
the electrostatic attraction between the electron-plus-exchange-hole distributions on different
48
49 atoms [12, 13]. In this way, XDM retains the simplicity of a pairwise dispersion correction, like
50 (n)
51 in DFT-D2, but derives the CIJ coefficients from the electronic properties of the system under
52 study. The damping functions fn in Eq. (11) suppress the dispersion interaction at short distances,
53
54 and serve the purpose of making the link between the short-range correlation (provided by the
55
56 XC functional) and the long-range dispersion energy, as well as mitigating erroneous behavior
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 14 of 68
14
1
2
3 from the exchange functional in the representation of intermolecular repulsion [13]. The damping
4
5 functions contain two adjustable parameters, available online [73] for a number of popular density
6 functionals. Although any functional for which damping parameters are available can be used, the
7
8 functionals showing best performance when combined with XDM appear to be B86bPBE [74, 75]
9
and PW86PBE [75, 76], thanks to their accurate modeling of Pauli repulsion [13]. Both functionals
10
11 have been implemented in Quantum ESPRESSO since version 5.0.
12
13 In the canonical XDM implementation, recently included in Quantum ESPRESSO and de-
14
scribed in detail elsewhere [77], the dispersion coefficients are calculated from the electron density,
15
16 its derivatives, and the kinetic energy density, and assigned to the different atoms in the system
17
18 using a Hirshfeld atomic partition scheme [78]. This means that XDM is effectively a meta-GGA
19 functional of the dispersion energy whose evaluation cost is small relative to the rest of the self-
20
21 consistent calculation. Despite the conceptual and computational simplicity of XDM, and because
22
23 the dispersion coefficients depend upon the atomic environment in a physically meaningful way, the
24 XDM dispersion correction offers good performance in the calculation of diverse properties, such
25
26 as lattice energies, crystal geometries, and surface adsorption energies. XDM is especially good for
27
28 modeling organic crystals and organic/inorganic interfaces. For a recent review, see Ref. 13.
29 The XDM dispersion calculation is turned on by specifying vdw corr=’xdm’ and optionally
30
31 selecting appropriate damping function parameters (with the xdm a1 and xdm a2 keywords). Be-
32
33 cause the reconstructed all-electron densities are required during self-consistency, XDM can be
34 used only in combination with a PAW approach. The XDM contribution to forces and stress is
35
36 not entirely consistent with the energies because the current implementation neglects the change
37
in the dispersion coefficients. Work is ongoing to remove this limitation, as well as to make XDM
38
39 available for Car-Parrinello MD, in future Quantum ESPRESSO releases.
40
41 In the TS-vdW approach (vdw corr=’ts-vdw’), all vdW parameters (which include the atomic
42 (6)
dipole polarizabilities, αI , vdW radii, RI0 , and interatomic CIJ dispersion coefficients) are function-
43
44 als of the electron density and computed using the Hirshfeld partitioning scheme [78] to account
45
46 for the unique chemical environment surrounding each atom. This approach is firmly based on a
47 fluctuating quantum harmonic oscillator (QHO) model and results in highly accurate CIJ coeffi-
(6)
48
49 cients with an associated error of approximately 5.5% [11]. The TS-vdW approach requires a single
50
51 empirical range-separation parameter based on the underlying XC functional and is recommended
52 in conjunction with non-empirical DFT functionals such as PBE and PBE0. For a recent review
53
54 of the TS-vdW approach and several other vdW/dispersion corrections, please see Ref. 79.
55
56 The implementation of the density-dependent TS-vdW correction in Quantum ESPRESSO is
57
58
59
60
Page 15 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
15
1
2
3 fully self-consistent [80] and currently available for use with norm-conserving pseudo-potentials. An
4
5 efficient linear-scaling implementation of the TS-vdW contribution to the ionic forces and stress
6 tensor allows for Born-Oppenheimer and Car-Parrinello MD simulations at the DFT+TS-vdW
7
8 level of theory; this approach has already been successfully employed in long-time MD simulations
9
of large-scale condensed-phase systems such as (H2 O)128 [43, 52]. We note in passing that the
10
11 Quantum ESPRESSO implementation of the TS-vdW correction also includes analytical deriva-
12
13 tives of the Hirshfeld weights, thereby completely reflecting the change in all TS-vdW parameters
14
during geometry/cell optimizations and MD simulations.
15
16
17
18 3. Hubbard-corrected functionals: DFT+U
19
20
Most approximate XC functionals used in modern DFT codes fail quite spectacularly on systems
21
22 with atoms whose ground-state electronic structure features partially occupied, strongly localized
23
24 orbitals (typically of the d or f kind), that suffer from strong self-interaction effects and a poor
25 description of electronic correlations. In these circumstances, DFT+U is often, although not always,
26
27 an efficient remedy. This method is based on the addition to the DFT energy functional EDFT of
28
29 a correction EU , shaped on a Hubbard model Hamiltonian: EDFT+U = EDFT + EU . The original
30 implementation in Quantum ESPRESSO, extensively described in Refs. 81 and 82, is based on
31
32 the simplest rotationally invariant formulation of EU , due to Dudarev and coworkers [83]:
33 ( )
34 1X IX Iσ
X
Iσ Iσ
EU = U nmm − nmm′ nm′ m , (12)
35 2 m,σ
I ′ m
36
37 where
38 X
39 nIσ
mm′ =
σ
fkν σ
hψkν |φIm ihφIm′ |ψkν
σ
i, (13)
40 k,ν
41 σ i is the valence electronic wave function for state kν of spin σ, f σ the corresponding occupation
|ψkν kν
42
43 number, |φIm i is the chosen Hubbard manifold of atomic orbitals, centered on atomic site I, that
44
45 may be orthogonalized or not. The presence of the Hubbard functional results in extra terms in
46 energy derivatives such as forces, stresses, elastic constants, or force-constant (dynamical) matrices.
47
48 For instance, the additional term in forces is
49
1 X ∂nIσ
50 FUIα = − U I
δ mm ′ − 2n
Iσ
mm ′
mm′
(14)
51 2 ∂R Iα
I,m,m ,σ
′
52
where RIα is the α component of position for atom I in the unit cell,
53
54 ∂nIσ X
mm′ σ σ ∂φIm σ σ ∂φIm′ σ
55 = fkν ψkν hφIm |ψkν i + hψkν |φIm i
′ ψ . (15)
∂RIα ∂RIα ∂RIα kν
56 k,ν
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 16 of 68
16
1
2
3 a. Recent extensions of the formulation As a correction to the total energy, the Hubbard
4
5 functional naturally contributes an extra term to the total potential that enters the KS equations.
6 An alternative formulation [14] of the DFT+U method, recently introduced and implemented in
7
8 Quantum ESPRESSO for transport calculations, eliminates the need of extra terms in the po-
9
tential by incorporating the Hubbard correction directly into the (PAW) pseudopotentials through
10
11 a renormalization of the coefficients of their non-local terms.
12
13 A simple extension to the Dudarev functional, DFT+U+J0, was proposed in Ref. 15 and
14
used to capture the insulating ground state of CuO. In CuO the localization of holes on the d
15
16 states of Cu and the consequent on-set of a magnetic ground state can only be stabilized against
17
18 a competing tendency to hybridize with oxygen p states when on-site exchange interactions are
19 precisely accounted for. A simplified functional, depending upon the on-site (screened) Coulomb
20
21 interaction U and the Hund’s coupling J, can be obtained from the full second-quantization for-
22
23 mulation of the electronic interaction potential by keeping only on-site terms that describe the
24 interaction between up to two orbitals and by approximating on-site effective interactions with the
25
26 (orbital-independent) atomic averages of Coulomb and exchange terms:
27 X UI − JI h i X JI h i
28 EU +J = Tr nI σ (1 − nI σ ) + Tr nI σ nI −σ . (16)
29 2 2
I, σ I, σ
30
31 The on-site exchange coupling J I not only reduces the effective Coulomb repulsion between like-
32
33 spin electrons as in the simpler Dudarev functional (first term of the right-hand side), but also
34 contributes a second term that acts as an extra penalty for the simultaneous presence of anti-aligned
35
36 spins on the same atomic site and further stabilizes ferromagnetic ground states.
37
38
The fully rotationally invariant scheme of Liechtenstein et al. [84], generalized to non-collinear
39 magnetism and two-component spinor wave-functions, is also implemented in the current version
40
41 of Quantum ESPRESSO. The corrective energy term for each correlated atom can be quite
42
generally written as:
43
44 1 X 1 X
45 EUf ull = Uαβγδ hc†α c†β cδ cγ iDFT = Uαβγδ − Uαβδγ nαγ nβδ , (17)
2 2
46 αβγδ αβγδ
47
where the average is taken over the DFT Slater determinant, Uαβγδ are Coulomb integrals, and some
48
49 set of orthonormal spin-space atomic functions, {α}, is used to calculate the occupation matrix,
50
51 nαβ , via Eq. (13). These basis functions could be spinor wave functions of total angular momentum
52 j = l ± 1/2, originated from spherical harmonics of orbital momentum l, which is a natural choice
53
54 in the presence of spin-orbit coupling. Another choice, adopted in our implementation, is to use
55
56 the standard basis of separable atomic functions, Rl (r)Ylm (θ, φ)χ(σ), where χ(σ) are spin up/down
57
58
59
60
Page 17 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
17
1
2
3 projectors and the radial function, Rl (r), is an eigenfunction of the pseudo-atom. In the presence of
4
5 spin-orbit coupling, the radial function is constructed by averaging between the two radial functions
6 Rl±1/2 . These radial functions are read from the file containing the pseudopotential, in this case a
7
8 fully-relativistic one. In this conventional basis, the corrective functional takes the form:
9
10 1 X 1 X
EUf ull = Uijkl nσσ σ′ σ′
ik njl − σσ ′ σ ′ σ
Uijlk nik njl , (18)
11 2 2
ijkl,σσ
′ ijkl,σσ
′
12
13 where {ijkl} run over azimuthal quantum number m. The second term contains a spin-flip con-
14 ′
15 tribution if σ ′ 6= σ. For collinear magnetism, when nσσ σ
ij = δσσ ′ nij , the present formulation reduces
16 to the scheme [84] of Liechtenstein et al. All Coulomb integrals, Uijkl , can be parameterized by
17
18 few input parameters such as U (s-shell); U and J (p-shell); U, J and B (d-shell); U, J, E2 , and E3
19
20 (f -shell), and so on. We note that if all parameters but U are set to zero, the Dudarev functional
21 is recovered.
22
23 b. Calculation of Hubbard parameters The Hubbard corrective functional EU depends linearly
24
25 upon the effective on-site interactions, U I . Therefore, using a proper value for these interaction
26 parameters is crucial to obtain quantitatively reliable results from DFT+U calculations. The
27
28 Quantum ESPRESSO implementation of DFT+U has also been the basis to develop a method
29
for the calculation of U [81], based on linear-response theory. This method is completely ab initio
30
31 and provides values of the effective interactions that are consistent with the system and with
32
33 the ground state that the Hubbard functional aims at correcting. A comparative analysis of this
34 method with other approaches proposed in the literature to compute the Hubbard interactions
35
36 has been initiated in Ref. 82 and will be further refined in forthcoming publications by the same
37
38 authors.
39 Within linear-response theory, the Hubbard interactions are the elements of an effective in-
40
41 teraction matrix, computed as the difference between bare and screened inverse susceptibilities
42
43 [81]:
44
45 U I = χ−1
0 −χ
−1
II
. (19)
46
47
In Eq. (19) the susceptibilities χ and χ0 measure the response of atomic occupations to shifts in
48
49 the potential acting on the states of single atoms in the system. In particular, χ is defined as
50 P Iσ
J and is evaluated at self consistency, while χ has a similar definition
51 χIJ = mσ dnmm /dα 0
52 but is computed before the self-consistent re-adjustment of the Hartree and XC potentials. In the
53
54 current implementation these susceptibilities are computed from a series of self-consistent total
55
56 energy calculations (varying the strength α of the perturbing potential over a range of values)
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 18 of 68
18
1
2
3 performed on supercells of sufficient size for the perturbations to be isolated from their periodic
4
5 replicas. While easy to implement, this approach is quite cumbersome to use, requiring multiple
6 calculations, expensive convergence tests of the resulting parameters and complex post-processing
7
8 tools.
9
These difficulties can be overcome by using density-functional perturbation theory (DFpT) to
10
11 automatize the calculation of the Hubbard parameters. The basic idea is to recast the entries of
12
13 the susceptibility matrices into sums over the BZ:
14 Nq
15 dnIσ
mm′ 1 X iq·(Rl −Rl′ ) s′ s σ
16 = e ∆q nmm′ , (20)
dαJ Nq q
17
18
where I ≡ (l, s) and J ≡ (l′ , s′ ), l and l′ label unit cells, s and s′ label atoms in the unit cell, Rl
19
′
20 and Rl′ are Bravais lattice vectors, and ∆sq nsmm
σ represent the (lattice-periodic) response of atomic
′
21
22 occupations to monochromatic perturbations constructed by modulating the shift to the potential
23 of all the periodic replica of a given atom by a wave-vector q. This quantity is evaluated within
24
25 DFpT (see Sec. II B), using linear-response routines contained in LR Modules (see Sec. III D 3).
26
27 This approach eliminates the need for supercell calculations in periodic systems (along with the
28 cubic scaling of their computational cost) and automatizes complex post-processing operations
29
30 needed to extract U from the output of calculations. The use of DFpT also offers the perspective
31
to directly evaluate inverse susceptibilities, thus avoiding the matrix inversions of Eq. (19), and to
32
33 calculate the Hubbard parameters for closed-shell systems, a notorious problem for schemes based
34
35 on perturbations to the potential. Full details about this implementation will be provided in a
36
forthcoming publication [85] and the corresponding code will be made available in one of the next
37
38 Quantum ESPRESSO releases.
39
40
41 4. Adiabatic-connection fluctuation-dissipation theory
42
43
44 In the quest for better approximations for the unknown XC energy functional in KS-DFT,
45
46 the approach based on the adiabatic connection fluctuation-dissipation (ACFD) theorem [60] has
47
received considerable interest in recent years. This is largely due to some attractive features: (i)
48
49 a formally exact expression for the XC energy in term of density linear response functions can be
50
51 derived providing a promising way for a systematic improvement of the XC functional; (ii) the
52 method treats the exchange energy exactly, thus canceling out the spurious self-interaction error
53
54 present in the Hartree energy; (iii) the correlation energy is fully non local and automatically
55
56 includes long-range van der Waals interactions (see Sec. II A 2 a).
57
58
59
60
Page 19 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
19
1
2
3 Within the ACFD framework a formally exact expression for the XC energy Exc of an electronic
4
5 system can be derived:
6 Z 1 Z Z ∞
h̄ ′ e2 ′ ′
7 Exc =− dλ drdr × χλ (r, r , iu)du + δ(r − r )n(r) , (21)
2π 0 |r − r′ | 0
8
9 where h̄ = h/2π and h is the Planck constant, χλ (r, r′ ; iu) is the density response function at
10
11 imaginary frequency iu of a system whose electrons interact via a scaled Coulomb interaction, i.e.,
12 λe2 /|r−r′ |, and are subject to a local potential such that the electronic density n(r) is independent
13
14 of λ, and is thus equal to the ground-state density of the fully interacting system (λ = 1). The
15
16 XC energy, Eq. (21), can be further separated into a KS exact-exchange energy Exx , Eq. (6), and
17 a correlation energy Ec . The former is routinely evaluated as in any hybrid functional calculation
18
19 (see Sec. II A 1). Using a matrix notation, the latter can be expressed in a compact form in terms
20
of the Coulomb interaction, vc = e2 /|r − r′ |, and of the density response functions:
21
Z 1 Z ∞
22 h̄
23 Ec = − dλ du tr vc [χλ (iu) − χ0 (iu)] . (22)
2π 0 0
24
25 For λ > 0, χλ can be related to the noninteracting density response function χ0 via a Dyson
26
27 equation obtained from TDDFT:
28
λ
29 χλ (iu) = χ0 (iu) + χ0 (iu) λvc + fxc (iu) χλ (iu). (23)
30
31 The exact expression of the XC kernel fxc is unknown, and in practical applications one needs
32
33 to approximate it. In the ACFDT package, the random phase approximation (RPA), obtained by
34 λ = 0, and the RPA plus exact-exchange kernel (RPAx), obtained by setting f λ = λf , are
setting fxc xc x
35
36 implemented. The evaluation of the RPA and RPAx correlation energies is based on an eigenvalue
37
38 decomposition of the non-interacting response functions and of its first-order correction in the limit
39 of vanishing electron-electron interaction [86–88]. Since only a small number of these eigenvalues
40
41 are relevant for the calculation of the correlation energy, an efficient iterative scheme can be used
42
to compute the low-lying modes of the RPA/RPAx density response functions.
43
44 The basic operation required for the eigenvalue decomposition is a number of loosely coupled
45
46 DFpT calculations for different imaginary frequencies and trial potentials. Although the global
47
scaling of the iterative approach is the same as for implementations based on the evaluation of the
48
49 full response matrices (N 4 ), the number of operation involved is 100 to 1000 times smaller [87], thus
50
51 largely reducing the global scaling pre-factor. Moreover, the calculation can be parallelized very
52 efficiently by distributing different trial potentials on different processors or groups of processors.
53
54 In addition, the local EXX and RPA-correlation potentials can be computed through an opti-
55
56 mized effective potential (OEP) scheme fully compatible with the eigenvalue decomposition strategy
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 20 of 68
20
1
2
3 adopted for the evaluation of the EXX/RPA energy. Iterating the energy and the OEP calculations
4
5 and using an effective mixing scheme to update the KS potential, a self-consistent minimization of
6 the EXX/RPA functional can be achieved [89].
7
8
9
10 B. Linear response and excited states without virtual orbitals
11
12
13 One of the key features of modern DFT implementations is that they do not require the cal-
14
15
culation of virtual (unoccupied) orbitals. This idea, first pioneered by Car and Parrinello in their
16 landmark 1985 paper [53] and later adopted by many groups world-wide, found its way in the
17
18 computation of excited-state properties with the advent of density-functional perturbation theory
19
(DFpT) [90–93]. DFpT is designed to deal with static perturbations and its use is therefore re-
20
21 stricted to those excitations that can be described in the Born-Oppenheimer approximation, such
22
23 as lattice vibrations. The main idea underlying DFpT is to represent the linear response of KS
24 orbitals to an external perturbation as generic orbitals satisfying an orthogonality constraint with
25
26 respect to the occupied-state manifold and a self-consistent Sternheimer equation [94, 95], rather
27
28 than as linear combinations of virtual orbitals (which would require the computation of all, or a
29 large number, of them).
30
31 Substantial progress has been made over the past decade, allowing one to extend DFpT to the
32
33 dynamical regime, and thus simulate sizable portions of the optical and loss spectra of complex
34 molecular and extended systems, without making any explicit reference to their virtual states.
35
36 Although the Sternheimer approach can be easily extended to time-dependent perturbations [96–
37
38 98], its use is hampered in practice by the fact that a different Sternheimer equation has to be
39 solved for each different value of the frequency of the perturbation. When the perturbation acting
40
41 on the system vanishes, the frequency-dependent Sternheimer equation becomes a non-Hermitian
42
eigenvalue equation, whose eigenvalues are the excitation energies of the system. In the TDDFT
43
44 community, this equation is known as the Casida equation [99, 100], which is the immediate trans-
45
46 lation to the DFT parlance of the time-dependent Hartree-Fock equation [101]. This approach to
47
excited states is optimal in those cases where one is interested in a few excitations only, but can
48
49 hardly be extended to continuous spectra, such as those arising in extended systems or above the
50
51 ionization threshold of even finite ones. In those cases where extended portions of a continuous
52 spectrum is required, a new method has been developed, based on the Lanczos (bi-) orthogonaliza-
53
54 tion algorithm, and dubbed the Liouville-Lanczos approach to time-dependent density-functional
55
56 perturbation theory (TDDFpT). This method allows one to reuse intermediate products of an
57
58
59
60
Page 21 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
21
1
2
3 iterative process, essentially identical to that used for static perturbations, to build dynamical re-
4
5 sponse functions from which spectral properties can be computed for a whole wide spectral range
6 at once [21, 22]. A similar approach to linear optical spectroscopy was proposed later, based on
7
8 the multi-shift conjugate gradient algorithm [102], instead of Lanczos. This powerful idea has been
9
generalized to the solution of the Bethe-Salpeter equation, which is formally very similar to the
10
11 eigenvalue equations arising in TDDFpT [103–105], and to the computation of the polarization
12
13 propagator and self-energy operator appearing in the GW equations [28, 29, 106]. It is presently
14
exploited in several components of the Quantum ESPRESSO distribution, as well as in other
15
16 advanced implementations of many-body perturbation theory [106].
17
18
19
20 1. Static perturbations and vibrational spectroscopy
21
22
23 The computation of vibrational properties in extended systems is one of the traditional fields
24
25 of application of DFpT, as thoroughly described, e.g., in Ref. 93. The latest releases of Quantum
26
ESPRESSO feature the linear-response implementation of several new functionals in the van der
27
28 Waals and DFT+U families. Explicit expressions of the XC kernel, implementation details, and a
29
30 thorough benchmark are reported in Ref. 107 for the first case. As for the latter, DFpT+U has
31 been implemented for both the Dudarev [83] and DFT+U+J0 functionals [15], allowing one to
32
33 account for electronic localization effects acting selectively on specific phonon modes at arbitrary
34
35 wave-vectors, thus substantially improving the description of the vibrational properties of strongly
36 correlated systems with respect to “standard” LDA/GGA functionals. The current implementa-
37
38 tion allows for both norm-conserving and ultrasoft pseudopotentials, insulators and metals alike,
39
also including the spin-polarized case. The implementation of DFpT+U requires two main addi-
40
41 tional ingredients with respect to standard DFpT [108]. First, the dynamical matrix contains an
42
43 additional term coming from the second derivative of the Hubbard term EU with respect to the
44
atomic positions (denoted λ or µ), namely:
45
46
X δmm′ X
47 µ λ I
∆ (∂ EU ) = U − nmm′ ∆µ ∂ λ nIσ
Iσ
mm′ − U I ∆µ nIσ λ Iσ
mm′ ∂ nmm′ , (24)
48 2
Iσmm′ Iσmm
′
49
50
51 where the notations are the same as in Eq. (12). The symbols ∂ and ∆ indicate, respectively, a
52 bare derivative (leaving the KS wavefunctions unperturbed) and a total derivative (including also
53
54 linear-response contributions). Second, in order to obtain a consistent electronic density response
55
56 to the atomic displacements from the DFT+U ground state, the perturbed KS potential ∆VSCF
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 22 of 68
22
1
2
3 in the Sternheimer equation is augmented with the Hubbard perturbed potential ∆λ VU :
4 h
5 X δmm′ i
λ I
6 ∆ VU = U − nmm′ × |∆λ φIm′ ihφIm | + |φIm′ ih∆λ φIm |
Iσ
2
7 Iσmm′
X
8 − U I ∆λ nIσ I I
mm′ |φm′ ihφm |, (25)
9 Iσmm′
10
11 where the notations are the same as in Eq. (13). The unperturbed Hamiltonian in the Sternheimer
12
13 equation is the DFT+U Hamiltonian (including the Hubbard potential VU ). More implementation
14 details will be given in a forthcoming publication [109].
15
16 Applications of DFpT+U include the calculation of the vibrational spectra of transition-metal
17
monoxides like MnO and NiO [108], investigations of properties of materials of geophysical interest
18
19 like goethite [110, 111], iron-bearing [112, 113] and aluminum-bearing bridgmanite [114]. These
20
21 results feature a significantly better agreement with experiment of the predictions of various lattice-
22
dynamical properties, including the LO-TO and magnetically-induced TO splittings, as compared
23
24 with standard LDA/GGA calculations.
25
26
27 2. Dynamic perturbations: optical, electron energy loss, and magnetic spectroscopies
28
29
30 Electronic excitations can be described in terms of the dynamical charge- and spin-density
31
32 susceptibilities, which are accessible to TDDFT [115, 116]. In the linear regime the TDDFT
33 equations can be solved using first-order perturbation theory. The time Fourier transform of the
34
35 charge-density response, ñ′ (r, ω), is determined by the projection over the unoccupied-state mani-
36 ′ (r, ω),
37 fold of the Fourier transforms of the first-order corrections to the one-electron orbitals, ψ̃kν
38 [21–24, 117]. For each band index (kν), two response orbitals xkν and ykν can be defined as
39
40 1 ′ ′∗
41 xkν (r) = Q̂ ψ̃kν (r, ω) + ψ̃−kν (r, −ω) (26)
2
42 1 ′∗
43 ykν (r) = Q̂ ψ̃kν (r, ω) − ψ̃−kν (r, −ω) , (27)
2
44
45 where Q̂ is the projector on the unoccupied-state manifold. The response orbitals xkν and ykν
46
47 can be collected in so-called batches X = {xkν } and Y = {ykν }, which uniquely determine the
48 response density matrix. In a similar way, the perturbing potential V̂ ′ can be represented by the
49
50 batch Z = {zkν } = {Q̂V̂ ′ ψkν }. Using these definitions, the linear-response equations of TDDFpT
51
52 take the simple form:
53
X 0 0 D̂
54
h̄ω − L̂ · = , L̂ = , (28)
55
Y Z D̂ + K̂ 0
56
57
58
59
60
Page 23 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
23
1
2
3 where the super-operators D̂ and K̂, which enter the definition of the Liouvillian super-operator,
4
5 L̂, are defined in terms of the unperturbed Hamiltonian and of the perturbed Hartree-plus-XC
6 potential [21–24, 117]. This implies that a Liouvillian build costs roughly twice as much as a single
7
8 iteration in time-independent DFpT. It is important to note that D̂ and K̂, and therefore L̂, do
9
not depend on the frequency ω. For this reason, when in Eq. (28) the vector on the right-hand
10
11 side, (0, Z)⊤ , is set to zero, a linear eigenvalue equation is obtained (Casida’s equation).
12
13 The quantum Liouville equation (28) can be seen as the equation for the response density matrix
14
operator ρ̂′ (ω), namely (h̄ω− L̂)· ρ̂′ (ω) = [V̂ ′ , ρ̂◦ ], where [·, ·] is the commutator and ρ̂◦ is the ground-
15
16 state density matrix operator. With this at hand, we can define a generalized susceptibility χAV (ω),
17
18 which characterizes the response of an arbitrary one-electron Hermitian operator  to the external
19
perturbation V̂ ′ as
20
21 h i D E
22 χAV (ω) = Tr Âρ̂′ (ω) = Â (h̄ω − L̂)−1 · [V̂ ′ , ρ̂◦ ] , (29)
23
24
25 where h·|·i denotes a scalar product in operator space. For instance, when both  and V̂ ′ are
26
one of the three Cartesian components of the dipole (position) operator, Eq. (29) gives the dipole
27
28 polarizability of the system, describing optical absorption spectroscopy [21, 22]; setting  and V̂ ′
29
30 to one of the space Fourier components of the electron charge-density operator would correspond to
31 the simulation of electron energy loss or inelastic X-ray scattering spectroscopies, giving access to
32
33 plasmon and exciton excitations in extended systems [25, 26]; two different Cartesian components of
34
35 the Fourier transform of the spin polarization would give access to spectroscopies probing magnetic
36 excitations (e.g. inelastic neutron scattering or spin-polarized electron energy loss) [118], and so on.
37
38 When dealing with macroscopic electric fields, the dipole operator in periodic boundary conditions
39
is treated using the standard DFpT prescription, as explained in Refs. [119, 120].
40
41 The Quantum ESPRESSO distribution contains several codes to solve the Casida’s equation
42
43 or to directly compute generalized susceptibilities according to Eq. (29) and by solving Eq. (28)
44
using different approaches for different pairs of Â/V̂ ′ , corresponding to different spectroscopies. In
45
46 particular, Eq. (28) can be solved iteratively using the Lanczos recursion algorithm, which allows
47
48 one to avoid computationally expensive inversion of the Liouvillian. The basic principle of how
49 matrix elements of the resolvent of an operator can be calculated using a Lanczos recursion chain
50
51 has been worked out by Haydock, Heine, and Kelly [121, 122] for the case of Hermitian operators
52
53 and diagonal matrix elements. The quantity of interst can be written as
54 D E
55 gv (ω) = v (h̄ω − L̂)−1 v . (30)
56
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 24 of 68
24
1
2
3 A chain of vectors is defined by
4
5 |q0 i = 0 (31)
6
7 |q1 i = |vi (32)
8
9 αn = hqn |L̂qn i (33)
10
11 βn+1 |qn+1 i = (L̂ − αn ) |qn i − βn |qn−1 i , (34)
12
13 where βn+1 is given by the condition hqn+1 |qn+1 i = 1. The vectors |qn i created by this recursive
14
15 chain are orthonormal. Furthermore, the operator L̂, written in the basis of these vectors, is
16 tridiagonal. If one limits the chain to the M first vectors |q0 i, |q1 i, · · · , |qM i, then the resulting
17
18 representation of L̂ is a M × M square matrix TM which reads
19
20 α 1 β 2 0 · · · 0
21
..
22 β2 α2 β3 . . . .
23 .. ..
TM = 0 β3 . . 0 . (35)
24 .
25 . ... ...
. αM −1 βM
26
27 0 · · · 0 βM αM
28
29 Using such a truncated representation of L̂, the resolvent in Eq. (30) can be approximated as
30 D E
31 gv (ω) ≈ v (h̄ω − TM )−1 v . (36)
32
33
34 Thanks to the tridiagonal form of TM , the approximate resolvent can finally be written as a
35 continued fraction
36
37 1
38 gv (ω) ≈ . (37)
β22
39 h̄ω − α1 +
h̄ω − α2 + ...
40
41 Note that performing the recursion (31) – (34) is the computational bottleneck of this algorithm,
42
while evaluating the continued fraction in Eq. (37) is very fast. The recursion being independent of
43
44 the frequency ω, a single recursion chain yields information about any desired number of frequen-
45
46 cies, at negligible additional computational cost. It is also important to note that at any stage of
47
the recursion chain, only three vectors need to be kept in memory, namely |qn−1 i, |qn i, and |qn−1 i.
48
49 This is a considerable advantage with respect to the direct calculation of N eigenvectors of L̂ where
50
51 all N vectors need to be kept in memory in order to enforce orthogonality.
52 The Liouvillian L̂ in Eq. (28) is not a Hermitian operator. For this reason, the Lanczos algo-
53
54 rithm presented above cannot be directly applied to the calculation of the generalized susceptibil-
55
56 ity (29). There are two distinct algorithms that can be applied in the non-Hermitian case. The
57
58
59
60
Page 25 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
25
1
2
3 non-Hermitian Lanczos biorthogonalization algorithm [22, 23] amounts to recursively applying the
4
5 operator L̂ and it Hermitian conjugate L̂† to two previous Lanczos vectors |vn i and |wn i. In this
6 way, a pair of bi-orthogonal basis sets is created. The operator L̂ can then be represented in this ba-
7
8 sis as a tridiagonal matrix, similarly to the Hermitian case, Eq. (35). The Liouvillian L̂ of TDDFT
9
belongs to a special class of non-Hermitian operators which are called pseudo-Hermitian [24, 123].
10
11 For such operators, there exists a second recursive algorithm to compute the resolvent – pseudo-
12
13 Hermitian Lanczos algorithm – which is numerically more stable and requires only half the numbers
14
of operations per Lanczos step [24, 123]. Both algorithms have been implemented in Quantum
15
16 ESPRESSO. Because of its speed and numerical stability, the use of the pseudo-Hermitian method
17
18 is recommended.
19
This methodology has also been extended—presently only in the case of absorption spectroscopy—
20
21 to employ hybrid functionals [24, 103, 104] (see Sec. II A 1). In this case the calculation requires
22
23 the evaluation of the linear response of the non-local Fock potential, which is readily available from
24 the response density matrix, represented by the batches of response orbitals. The corresponding
25
26 hybrid-functional Liouvillian features additional terms with respect to the definition in Eq. (28),
27
28 but presents a similar structure and similar mathematical properties. Accordingly, semi-local and
29 hybrid-functional TDDFpT employ the same numerical algorithms in practical calculations.
30
31 a. Optical absorption spectroscopy The turbo lanczos.x [23, 24] and turbo davidson.x [24]
32
33 codes are designed to simulate the optical response of molecules and clusters. turbo lanczos.x
34 computes the dynamical dipole polarizability [see Eq. (29)] of finite systems over extended frequency
35
36 ranges without ever computing any eigenpairs of the Casida equation. This goal is achieved by
37
applying a recursive non-Hermitian or pseudo-Hermitian Lanczos algorithm. The two flavours
38
39 of the Lanczos algorithm implemented in turbo lanczos.x are particularly suited in those cases
40
41 where one is interested in the spectrum over a wide frequency range comprising a large number of
42
individual excitations. In turbo davidson.x a Davidson-like algorithm [124] is used to selectively
43
44 compute a few eigenvalues and eigenvectors of L̂. This is useful when very few low-lying excited
45
46 states are needed and/or when the excitation eigenvector is explicitly needed, e.g., to compute
47 ionic forces on excited potential energy surfaces, a feature that will be implemented in one of
48
49 the forthcoming releases. Both turbo lanczos.x and turbo davidson.x are interfaced with the
50
51 Environ module [18], to simulate the absorption spectra of complex molecules in solution using
52 the self-consistent continuum solvation model [20] (see Sec. II E 1).
53
54 b. Electron energy loss spectroscopy The turbo eels.x code [26] computes the response of ex-
55
56 tended systems to an incoming beam of electrons or X rays, aimed at simulating electron energy loss
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 26 of 68
26
1
2
3 (EEL) or inelastic X-ray scattering (IXS) spectroscopies, sensitive to collective charge-fluctuation
4
5 excitations, such as plasmons. Similarly to the description of vibrational modes in a lattice by
6 the PHonon package, here the perturbation can be represented as a sum of monochromatic compo-
7
8 nents corresponding to different momenta, q, and energy transferred from the incoming electrons
9
to electrons of the sample. The quantum Liouville equation (28) in the batch representation can
10
11 be formulated for individual q components of the perturbation, which can be solved independently
12
13 [25]. The recursive Lanczos algorithm is used to solve iteratively the quantum Liouville equation,
14
much like in the case of absorption spectroscopy. The entire EEL/IXS spectrum is obtained in
15
16 an arbitrarily wide energy range (up to the core-loss region) with only one Lanczos chain. Such
17
18 a numerical algorithm allows one to compute directly the diagonal element of the charge-density
19 susceptibility, see Eq. (29), by avoiding computationally expensive matrix inversions and multipli-
20
21 cations characteristic of standard methods based on the solution of the Dyson equation [125]. The
22
23 current version of turbo eels.x allows to explicitly account for spin-orbit coupling effects [126].
24
c. Magnetic spectroscopy The response of the system to a magnetic perturbation is described
25
26 by a spin-density susceptibility matrix, see Eq. (29), labeled by the Cartesian components of
27
28 the perturbing magnetic field and magnetization response, whose poles characterize spin-wave
29 (magnon) and Stoner excitations. The methodology implemented in turbo eels.x to deal with
30
31 charge-density fluctuations has been generalized to spin-density fluctuations so as to deal with
32
33 magnetic (spin-polarized neutron and electron) spectroscopies in extended systems. In the spin-
34 polarized formulation of TDDFpT the time-dependent KS wave functions are two-component
35
σ (r, t)} (σ is the spin index), which satisfy a time-dependent Pauli-type KS equa-
spinors {ψkν
36
37 P σ∗ (r, t)ψ σ ′ (r, t). In-
38 tions and describe a time-dependent spin-charge-density, nσσ′ (r, t) = kν ψkν kν
39 stead of using the latter quantity it is convenient to change variables and use the charge density
40 P P
41 n(r, t) = σ nσσ (r, t) and the spin density m(r, t) = µB σσ′ σ σσ′ nσ′ σ (r, t), where µB is the Bohr
42
magneton and σ is the vector of Pauli matrices. In the linear-response regime, the charge- and
43
44 spin-density response n′ (r, t) and m′ (r, t) are coupled via the scalar and magnetic XC response
45
′ (r, t) and B′ (r, t), which are treated on a par with the Hartree response poten-
potentials Vxc
46 xc
47
tial VH′ (r, t), depending only on n′ (r, t), and which all enter the linear-response time-dependent
48
49 Pauli-type KS equations. The lack of time-reversal symmetry in the ground state means that the
50
51 TDDFpT equations have to be generalized to treat KS spinors at k and −k and various combi-
52 nations with the q vector. Moreover, this also implies that no rotation of batches is possible, as
53
54 in Eqs. (26) and (27), and a generalization of the Lanczos algorithm to complex arithmetics is
55
56 required. At variance with the cumbersome Dyson’s equation approach, which requires the sep-
57
58
59
60
Page 27 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
27
1
2
3 arate calculation and coupling of charge-charge, spin-spin, and charge-spin independent-electron
4
5 polarizabilities, in our approach the coupling between spin and charge fluctuations is naturally
6 accounted for via Lanczos chains for the spinor KS response orbitals. The current implementation
7
8 supports general non-collinear spin-density distributions, which allows us to account for spin-orbit
9
interaction and magnetic anisotropy. All the details about the present formalism will be given in
10
11 a forthcoming publication [118] and the corresponding code will be made available in one of the
12
13 next Quantum ESPRESSO releases.
14
15
16
3. Many-body perturbation theory
17
18
19 Many-body perturbation theory refers to a set of computational methods, based on quantum
20
21 field theory, that are designed to calculate electronic and optical excitations beyond standard
22
23 DFT [125]. The most popular among such methods are the GW approximation and the Bethe-
24 Salpeter equation (BSE) approach. The former is intended to calculate accurate quasiparticle
25
26 excitations, e.g., ionization energies and electron affinities in molecules, band structures in solids,
27
28 and accurate band gaps in semiconductor and insulators. The latter is employed to study optical
29 excitations by including electron-hole interactions.
30
31 In the GW method the XC potential of DFT is corrected using a many-body self-energy con-
32
33 sisting of the product of the electron Green’s function G and the screened Coulomb interaction
34 W [127, 128], which represents the lowest-order term in the diagrammatic expansion of the exact
35
36 electron self-energy. In the vast majority of GW implementations, the evaluation of G and W
37
requires the calculation of both occupied and unoccupied KS eigenstates. The convergence of the
38
39 resulting self-energy correction with respect to the number of unoccupied states is rather slow,
40
41 and in many cases it constitutes the main bottleneck in the calculations. During the past decade
42
there has been a growing interest in alternative techniques which only require the calculation of
43
44 occupied electronic states, and several computational strategies have been developed [29, 129–131].
45
46 The common denominator to all these strategies is that they rely on linear-response DFpT and
47 the Sternheimer equation, as in the PHonon package.
48
49 In Quantum ESPRESSO the GW approximation is realized based on a DFpT represen-
50
51 tation of response and self-energy operators, thus avoiding any explicit reference to unoccupied
52 states. There are two different implementations: the GWL (GW -Lanczos) package [28, 29] and the
53
54 SternheimerGW package [30]. The former focuses on efficient GW calculations for large systems
55
56 (including disordered solids, liquids, and interfaces), and also supports the calculations of optical
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 28 of 68
28
1
2
3 spectra via the Bethe-Salpeter approach [105]. The latter focuses on high-accuracy calculations of
4
5 band structures, frequency-dependent self-energies, and quasi-particle spectral functions for crys-
6 talline solids. In addition to these, the WEST code [106], not part of the Quantum ESPRESSO
7
8 distribution, relies on Quantum ESPRESSO as an external library to perform similar tasks and
9
to achieve similar goals.
10
11 a. GWL The GWL package consists of four different codes. The pw4gww.x code reads the KS
12
13 wave-functions and charge density previously calculated by PWscf and prepares a set of data which
14
are used by code gww.x to perform the actual GW calculation. While pw4gww.x uses the plane-
15
16 wave representation of orbitals and charges and the same Quantum ESPRESSO environment as
17
18 all other linear response codes, gww.x does not rely on any specific representation of the orbitals.
19 Its parallelization strategy is based on the distribution of frequencies. Only a few basic routines,
20
21 such as the MPI drivers, are common with the rest of Quantum ESPRESSO.
22
23 GWL supports three different basis sets for representing polarisability operators: i) plane wave-
24 basis set, defined by an energy cutoff; ii) the basis set formed by the most important eigenvectors
25
26 (i.e., corresponding to the highest eigenvalues) of the actual irreducible polarisability operator at
27
28 zero frequency calculated through linear response; iii) the basis set formed by the most important
29 eigenvectors of an approximated polarisability operator. The last choice permits the control of the
30
31 balance between accuracy and dimension of the basis. The GW scheme requires the calculation of
32
33 products in real space of KS orbitals with vectors of the polarisability basis. These are represented
34 in GWL through dedicated additional basis sets of reduced dimensions.
35
36 GWL supports only the Γ−point sampling of the BZ and considers only real wave-functions. How-
37
ever, ordinary k-point sampling of the BZ can be used for the long-range part of the (symmetrized)
38
39 dielectric matrix. These terms are calculated by the head.x code. In this way reliable calculations
40
41 for extended materials can be performed using quite small simulation cells (with cell edges of the
42
order of 20 Bohr). Self-consistency is implemented in GWL, although limited to the quasi-particle
43
44 energies; the so-called vertex term, arising in the diagrammatic expansion of the self-energy, is not
45
46 yet implemented.
47
Usually ordinary GW calculations for transition elements require the explicit inclusion of semi-
48
49 core orbitals in the valence manifold, resulting in a significantly higher computational cost. To
50
51 cope with this issue, an approximate treatment of semicore orbitals has been introduced in GWL as
52 described in Ref. 132. In addition to collinear spin polarization, GWL provides a fully relativistic
53
54 non collinear implementation relying on the scalar relativistic calculation of the screened Coulomb
55
56 interactions [133].
57
58
59
60
Page 29 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
29
1
2
3 The bse.x code of the GWL package performs BSE calculations and permits to evaluate either
4
5 the entire frequency-dependent complex dielectric function through the Lanczos algorithm or a
6 discrete set of excited states and their energies through a conjugate gradient minimization. In
7
8 contrast to ordinary implementations, bse.x scales as N 3 instead of N 4 with respect to the system
9
size N (e.g., the number of atoms) thanks to the use of maximally localized Wannier functions
10
11 for representing the valence manifold [105]. The bse.x code, apart from reading the screened
12
13 Coulomb interaction at zero frequency from a gww.x calculation, works as a separate code and uses
14
the Quantum ESPRESSO environment. Therefore it could be easily be interfaced with other
15
16 GW codes.
17
18 b. SternheimerGW The SternheimerGW package calculates the frequency-dependent GW
19
self-energy and the corresponding quasiparticle corrections at arbitrary k-points in the BZ. This
20
21 feature enables accurate calculations of band structures and effective masses without resorting to
22
23 interpolation. The availability of the complete GW self-energy (as opposed to the quasiparticle
24
shifts) makes it possible to calculate spectral functions, for example including plasmon satellites
25
26 [134]. The spectral function can be directly compared to angle-resolved photoelectron spectroscopy
27
28 (ARPES) experiments. In SternheimerGW the screened Coulomb interaction W is evaluated for
29 wave-vectors in the irreducible BZ by exploiting crystal symmetries. Calculations of G and W for
30
31 multiple frequencies ω rely on the use of multishift linear system solvers that construct solutions
32
33 for all frequencies from the solution of a single linear system [131, 135]. This method is closely
34 related to the Lanczos approach. The convolution in the frequency domain required to obtain the
35
36 self energy from G and W can be performed either on the real axis or the imaginary axis. Padé
37
functions are employed to perform approximate analytic continuations from the imaginary to the
38
39 real frequency axis; the standard Godby-Needs plasmon pole model is also available to compare
40
41 with literature results. The stability and portability of the SternheimerGW package are verified
42
via a test-suite and a Buildbot test-farm (see Sec. III F).
43
44
45
46
47 C. Other spectroscopies
48
49 1. QE-GIPAW: Nuclear magnetic and electron paramagnetic resonance
50
51
52 The QE-GIPAW package allows for the calculation of various physical parameters measured in
53
54 nuclear magnetic resonance (NMR) and electron paramagnetic resonance (EPR) spectroscopies.
55
56 These encompass (i) NMR chemical shift tensors and magnetic susceptibility, (ii) electric field
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 30 of 68
30
1
2
3 gradient (EFG) tensors, (iii) EPR g-tensor, and (iv) hyperfine coupling tensor.
4
5 In QE-GIPAW, the NMR and EPR parameters are obtained from the orbital linear response to
6 an external uniform magnetic field. The response depends critically upon the exact shape of the
7
8 electronic wavefunctions near the nuclei. Thus, all-electron wavefunctions are reconstructed from
9
10 the pseudo-wavefunctions in a gauge- and translationally invariant way using the gauge-including
11 projector augmented-wave (GIPAW) method [136]. The description of a uniform magnetic field
12
13 within periodic boundary conditions is achieved by the long-wavelength limit (q ≪ 1) of a sinu-
14
soidally modulated field in real space. In practice, for each k point, we calculate the first order
15
16 change of the wavefunctions at k + q, where q runs over a star of 6 points. The magnetic suscepti-
17
18 bility and the induced orbital currents are then evaluated by finite differences, in the limit of small
19
q. The induced magnetic field at the nucleus, which is the central quantity in NMR, is obtained
20
21 from the induced current via the Biot-Savart law. In QE-GIPAW, the NMR orbital chemical shifts
22
23 and magnetic susceptibility can be calculated both for insulators [34] and for metals [137] (the ad-
24 ditional contribution for metals coming from the spin-polarization of valence electrons, namely the
25
26 Knight shift, can also be computed but it is not yet ready for production at the time of writing).
27
28 Similarly to the NMR chemical shift, the EPR g-tensor is calculated as the cross product of the
29 induced current with the spin-orbit operator [138].
30
31 For the quantities defined in zero magnetic field, namely the EFG, Mössbauer and relativistic
32
33 hyperfine tensors, the usual PAW reconstruction of the wavefunctions is sufficient and these are
34 computed as described in Refs. [139, 140]. The hyperfine Fermi contact term, proportional to
35
36 the spin density evaluated at the nuclear coordinates, however requires the relaxation of the core
37
electrons in response to the magnetization of valence electrons. We implemented the core relaxation
38
39 in perturbation theory, according to Ref. 141. Basically we compute the spherically averaged PAW
40
41 spin density around each atom. Then we compute the change of the XC potential, ∆VXC , on a
42
radial grid, and compute in perturbation theory the core radial wavefunction, both for spin up and
43
44 spin down. This provides an extra contribution to the Fermi contact, in most cases opposite in
45
46 sign to and as large as that of valence electrons.
47
By combining the quadrupole coupling constants derived from EFG tensors and hyperfine split-
48
49 tings, electron nuclear double resonance (ENDOR) frequencies can be calculated. Applications
50
51 highlighting all these features of the QE-GIPAW package can be found in Ref. 142. These quantities
52 are also needed to compute NMR shifts in paramagnetic systems, like novel cathode materials for
53
54 Li batteries [143]. Previously restricted to norm-conserving pseudopotentials only, all features are
55
56 now applicable using any kind of pseudization scheme and to PAW, following the theory described
57
58
59
60
Page 31 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
31
1
2
3 in [144]. The use of smooth pseudopotentials allows for the calculation of chemical shifts in systems
4
5 with several hundreds of atoms [145].
6 The starting point of all QE-GIPAW calculations is a previous calculation of KS orbitals via
7
8 PWscf. Hence, much like other linear response routines, the QE-GIPAW code uses many subroutines
9
10 of PWscf and of the linear response module. As usually done in linear response methods, the
11 response of the unoccupied states is calculated using the completeness relation between occupied
12
13 and unoccupied manifolds [146]. As a result, for insulating as well as metallic systems, the linear
14
response of the system is efficiently obtained without the need to include virtual orbitals.
15
16 As an alternative to linear response method, the theory of orbital magnetization via Berry
17
18 curvature [147, 148] can be used to calculate the NMR [149] and EPR parameters [150]. Specifically,
19
it can be shown that the variation of the orbital magnetization M orb with respect to spin flip is
20
2
21 directly related to the g-tensor: gµν = ge − αS eµ · Morb (eν ), where ge = 2.002319, α is the fine
22
23 structure constant, S is the total spin, e are Cartesian unit vectors, provided that the spin-orbit
24 interaction is explicitly considered in the Hamiltonian. This converse method of calculating the
25
26 g-tensor has been implemented in an older version of QE-GIPAW. It is especially useful in critical
27
28 cases where linear response is not appropriate, e.g., systems with quasi-degenerate HOMO-LUMO
29 levels. A demonstration of this method applied to delocalized conduction band electrons can be
30
31 found in Ref. 151.
32
33 The converse method will be shortly ported into the current QE-GIPAW and we will explore the
34 possibility of computing in a converse way the Knight shift as the response to a small nuclear
35
36 magnetic dipole. The present version of the code allows for parameter-free calculations of g-
37
38 tensors, hyperfine splittings, and ENDOR frequencies also for systems with total spin S > 1/2.
39 Such triplet or even higher-spin states give rise to additional spin-spin interactions, that can be
40
41 calculated within the magnetic dipole-dipole interaction approximation. This interaction results
42
in a fine structure which can be measured in zero magnetic field. This so-called zero-field splitting
43
44 is being implemented following the methodology described in Ref. [152].
45
46
47
48 2. XSpectra: L2,3 X-ray absorption edges
49
50
51 The XSpectra code [153, 154] has been extended to the calculation of X-ray absorption spectra
52 at the L2,3 -edges [155]. The XSpectra code uses the self-consistent charge density produced by
53
54 PWscf and acts as a post processing tool [153, 154, 156]. The spectra are calculated for the L2 edge,
55
56 while the L3 edge is obtained by multiplying by two (single-particle statistical branching ratio) the
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 32 of 68
32
1
2
3 L2 edge spectrum and by shifting it by the value of the spin-orbit splitting of the 2p1/2 core levels of
4
5 the absorbing atom. The latter can be taken either from a DFT relativistic all-electron calculation
6 on the isolated atom, or from experiments.
7
8 In practice, the L3 edge is obtained from the L2 with the spectra correction.x tool. Such
9
10 tool contains a table of experimental 2p spin-orbit splittings for all the elements. In addition to
11 computing L3 edges, spectra correction.x allows one to remove states from the spectrum below
12
13 a certain energy, and to convolute the calculated spectrum with more elaborate broadenings. These
14
operations can be applied to any edge.
15
16 To evaluate the X-ray absorption spectrum for a system containing various atoms of the same
17
18 species but in different chemical environments, one has to sum the contribution by each atom.
19
This could be the case, for example of an organic molecule containing various C atoms in in-
20
21 equivalent sites. Such individual contributions can be computed separately by XSpectra, and the
22
23 tool molecularnexafs.x allows one to perform their weighted sum taking into account the proper
24 energy reference (initial and final state effects) [157, 158]. One should in fact notice that the refer-
25
26 ence for initial state effects will depend upon the environment (e.g., the vacuum level for gas phase
27
28 molecules, or the Fermi level for molecules adsorbed on a metal).
29
30
31 D. Other lattice-dynamical and thermal properties
32
33
34 1. thermo pw: Thermal properties from the quasi-harmonic approximation
35
36
37 thermo pw [31] is a collection of codes aimed at computing various thermodynamical quantities
38
in the quasi-harmonic approximation. The key ingredient is the vibrational contribution, Fph , to
39
40 the Helmholtz free energy at temperature T :
41
42 X
h̄ωqν
43 Fph = kB T ln 2 sinh , (38)
q,ν
2kB T
44
45
46 where ωq,ν are phonon frequencies at wave-vector q, kB is the Boltzmann constant. thermo pw
47
works by calling Quantum ESPRESSO routines from PWscf and PHonon, that perform one of
48
49 the following tasks: i) compute the DFT total energy and possibly the stress for a given crystal
50
51 structure; ii) compute for the same system the electronic band structure along a specified path; iii)
52 compute for the same system phonon frequencies at specified wave-vectors. Using such quantities,
53
54 thermo pw can calculate numerically the derivatives of the free energy with respect to the external
55
56 parameters (e.g., different volumes). Several calls to such routines, with slightly different geome-
57
58
59
60
Page 33 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
33
1
2
3 tries, are typically needed in a run. All such tasks can be independently performed on different
4
5 groups of processors (called images).
6 When the tasks carried out by different images require approximately the same time, or when
7
8 the amount of numerical work needed to accomplish each task is easy to estimate a priori, it would
9
10 be possible to statically assign tasks to images at the beginning of the run so that images do not
11 need to communicate during the run. However, such conditions are seldom met in thermo pw and
12
13 therefore it would be impossible to obtain a good load balancing between images. thermo pw takes
14
advantage of an engine that controls these different tasks in an asynchronous way, dynamically
15
16 assigning tasks to the images at run time.
17
18 At the core of thermo pw there is a module mp asyn, based on MPI routines, that allows for
19
asynchronous communication between different images. One of the images is the “master” and
20
21 assigns tasks to the other images (the “slaves”) as soon as they communicate that they have
22
23 accomplished the previously assigned task. The master image also accomplishes some tasks but
24
once in a while, with negligible overhead, it checks if there is an image available to do some work;
25
26 if so, it assigns to it the next task to do. The code stops when the master recognizes that all
27
28 the tasks have been done and communicates this information to the slaves. The routines of this
29 communication module are quite independent of the thermo pw variables and in principle can be
30
31 used in conjunction with other codes to set up complex workflows to be executed in a massively
32
33 parallel environment. It is assumed that each processor of each image reads the same input and
34 that the only information that the image needs to synchronize with the other images is which tasks
35
36 to do. The design of thermo pw makes it easily extensible to the calculation of new properties in
37
an incremental way.
38
39
40
41 2. thermal2: phonon-phonon interaction and thermal transport
42
43
44 Phonon-phonon interaction (ph-ph) plays a role in different physical phenomena: phonon life-
45
46 time (and its inverse, the linewidth), phonon-driven thermal transport in insulators or semi-
47
metals, thermal expansion of materials. Ph-ph is possible because the harmonic Hamiltonian
48
49 of ionic motion, of which phonons are stationary states, is only approximate. At first order
50
51 in perturbation theory we have the third derivative of the total energy with respect to three
52 phonon perturbations, which we compute ab-initio. This calculation is performed by the d3q
53
54 code via the 2n + 1 theorem [32, 159]. The d3q code is an extension of the old D3 code,
55
56 which only allowed the calculation of zone-centered phonon lifetimes and of thermal expansion.
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 34 of 68
34
1
2
3 The current version can compute the three-phonon matrix element of arbitrary wave-vectors
4
5 D(3) (q1 , q2 , q3 ) = ∂ 3 E/∂uq1 ∂uq2 ∂uq3 , where u are the phonon displacement patterns, the mo-
6 mentum conservation rule imposes q1 + q2 + q3 = 0. The current version of the code can treat any
7
8 kind of crystal geometry, metals and insulators, both local density and gradient-corrected func-
9
tionals, and multi-projector norm-conserving pseudopotentials. Ultrasoft pseudopotentials, PAW,
10
11 spin polarization and non-collinear magnetization are not implemented. Higher order derivative of
12
13 effective charges[160] are not implemented.
14
15 The ph-ph matrix elements, computed from linear response, can be transformed, via a general-
16
17 ized Fourier transform, to the real-space three-body force constants which could be computed in a
18 supercell by finite difference derivation:
19
20
21
22 X ′ ′′ ·q
D(3) (q1 , q2 , q3 ) = e−2iπ(R ·q2 +R 3)
F (3) (0, R′ , R′′ ), (39)
23
R′ ,R′′
24
25
26
27 where F 3 (0, R′ , R′′ ) = ∂ 3 E/∂τ ∂(τ ′ + R′ )∂(τ ′′ + R′′ ) is the derivative of the total energy w.r.t.
28
29 nuclear positions of ions with basis τ , τ ′ , τ ′′ from the unit cells identified by direct lattice vectors
30
0 (the origin), R′ and R′′ . The sum over R′ and R′′ runs, in principle, over all unit cells, however
31
32 the terms of the sum quickly decay as the size of the triangle 0 − R′ − R′′ increases. The real-space
33
34 finite-difference calculation, performed by some external softwares[168], has some advantages: it
35 is easier to implement and it can readily include all the capabilities of the self-consistent code; on
36
37 the other hand it is much more computationally expensive than the linear-response method we
38
39 use, its cost scaling with the cube of the supercell volume, or the 9th power of the number of side
40 units of an isotropic system. We use the real-space formalism to apply the sum rule corresponding
41
42 to translational invariance to the matrix elements. This is done with an iterative method that
43
44 alternatively enforces the sum rule on the first matrix index and restores the invariance on the
45 order of the derivations. We also use the real-space force constants to Fourier-interpolate the ph-
46
47 ph matrices on a finer grid, assuming that the contribution from triangles 0 − R′ − R′′ which we
48
have not computed is zero; it is important in this stage to consider the periodicity of the system.
49
50
51 From many-body theory we get the first-order phonon linewidth[161] (γν ) of mode ν at q, which
52 is a sum over all the possible Nq ’s final and initial states (q′ ,ν ′ ,ν ′′ ) with conservation of energy (h̄ω)
53
54 and momentum (q′′ = −q − q′ ), Bose-Einstein occupations (n(q, ν) = (exp(h̄ωq,ν /kB T ) − 1)−1 )
55
56 and an amplitude V (3) , proportional to the D(3) matrix element but renormalized with phonon
57
58
59
60
Page 35 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
35
1
2
3 energies and ion masses:
4
5 π X
6 γq,ν = V (3) (qν, q′ ν ′ , q′′ ν ′′ ) × (40)
h̄2 Nq q′ ,ν ′ ,ν ′′
7
8 (1 + nq′ ,ν ′ + nq′′ ,ν ′′ )δ(ωq,ν − ωq′ ,ν ′ − ωq′′ ,ν ′′ ) + 2(nq′ ,ν ′ − nq′′ ,ν ′′ )δ(ωq,ν + ωq′ ,ν ′ − ωq′′ ,ν ′′ ) .
9
10
11 This sum is computed in the thermal2 suite, which is bundled with d3q. A similar expression
12 can be written for the phonon scattering probability which appears in the Boltzmann transport
13
14 equation. In order to properly converge the integral of the Dirac delta function, we express it
15
16 as finite-width Gaussian function and use an interpolation grid. This equation can be solved
17 either exactly or in the single mode approximation (SMA) [162]. The SMA is a good tool at
18
19 temperatures comparable to or larger than the Debye temperature, but is known to be inadequate
20
at low temperatures [163, 164] or in the case of 2D materials [165–167]. The exact solution is
21
22 computed using a variational form, minimized via a preconditioned conjugate gradient algorithm,
23
24 which is guaranteed to converge, usually in less than 10 iterations [33].
25
On top of intrinsic ph-ph events, the thermal2 codes can also treat isotopic disorder and
26
27 substitution effects and finite transverse dimension using the Casimir formalism. In addition to
28
29 using our force constants from DFpT, the code supports importing 3-body force constants computed
30
via finite differences with the thirdorder.py code [168]. Parallelization is implemented with both
31
32 MPI (with great scalability up to thousand of CPUs) and OpenMP (optimal for memory reduction).
33
34
35
36 3. EPW: Electron-phonon coefficients from Wannier interpolation
37
38
The electron-phonon-Wannier (EPW) package is designed to calculate electron-phonon coupling
39
40 using an ultra-fine sampling of the BZ by means of Wannier interpolation. EPW employs the relation
41
42 between the electron-phonon matrix elements in the Bloch representation gmnν (k, q), and in the
43 Wannier representation, gijκα (R, R′ ) [169],
44
45 X X
′ †
46 gmn (k, q) = ei(k·R+q·R ) Umik+q gijκα (R, R′ ) Ujnk uκα,qν , (41)
47 R,R′ ijκα
48
49 in order to interpolate from coarse k-point and q-point grids into dense meshes. In the above
50
51 expression k and q represent the electron and phonon wave-vector, respectively, the indices m, n
52 and i, j refer to Bloch states and Wannier states, respectively, and R, R′ are direct lattice vectors.
53
54 The matrices Umik are unitary transformations and the vector uκα,qν is the displacements of the
55
56 atom κ along the Cartesian direction α for the phonon of wavevector q and branch ν. The
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 36 of 68
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 37 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
37
1
2
3 The code supports calculations of electron-phonon couplings in the presence of spin-orbit coupling.
4
5 The current version does not support spin-polarized calculations, ultrasoft pseudopotentials nor
6 the PAW method. As shown in Fig. 2(a), epw.x scales reasonably up to 2,000 cores using MPI.
7
8 A test farm (see Sec. III F) was set up to ensure portability of the code on many architecture and
9
compilers. Detailed information about the EPW package can be found in Ref. 27.
10
11
12
13 4. Non-perturbative approaches to vibrational spectroscopies
14
15
16 Although DFpT is in many ways the state of the art in the simulation of vibrational spec-
17
18 troscopies in extended systems, and in fact one of defining features of Quantum ESPRESSO,
19
20 it is sometimes convenient to compute lattice-dynamical properties, the response to macroscopic
21 electric fields, or combinations thereof (such as e.g., the infrared or Raman activities), using non-
22
23 perturbative methods. This is so because DFpT requires the design of dedicated codes, which have
24
to be updated and maintained separately, and which therefore not always follow the pace of the
25
26 implementation of new features, methods, and functionals (such as e.g., DFT+U, vdW-DF, hybrid
27
28 functionals, or ACBN0 [172]) in their ground-state counterparts. Such a non-perturbative approach
29 is followed in the FD package, which implements the “frozen-phonon” method for the computation of
30
31 phonons and vibrational spectra: the interatomic Force Constants (IFCs) and electronic dielectric
32
33 constant are computed as finite differences of forces and polarizations, with respect to finite atomic
34 displacements or external electric fields, respectively [173, 174]. IFC’s are computed in two steps:
35
36 first, code fd.x generates the symmetry-independent displacements in an appropriate supercell;
37
38 after the calculations for the various displacements are completed, code fd ifc.x reads the forces
39 and generates IFC’s. These are further processed in matdyn.x, where non-analytical long-ranged
40
41 dipolar terms are subtracted out from the IFCs following the recipe of Ref. 175. The calculation
42
of dielectric tensor and of the Born effective charges proceeds from the evaluation of the electronic
43
44 susceptibility following the method proposed by Umari and Pasquarello [174], where the introduc-
45
E [ψ] = E 0 [ψ] − E · (Pion + Pel [ψ]) allows to compute the
tion of a non local energy functional Etot
46
47
electronic structure for periodic systems under finite homogeneous electric fields. E 0 is the ground
48
49 state total energy in the absence of external electric fields; Pion is the usual ionic polarization,
50
51 and Pel is given as a Berry phase of the manifold of the occupied bands [176]. The high-frequency
52 dielectric tensor ǫ∞ is then computed as ǫ∞ ∗
ij = δi,j + 4πχij , while Born effective-charge tensors ZI,ij
53
54 are obtained as the polarization induced along the direction i by a unit displacement of the I-th
55
56 atom in the j direction; alternatively, as the force induced on atom I by an applied electric field,
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 38 of 68
38
1
2
3 E.
4
5 The calculation of the Raman spectra proceeds along similar lines. Within the finite-field
6 approach, the Raman tensor is evaluated in terms of finite differences of atomic forces in the
7
(1)
8 presence of two electric fields [177]. In practice, the tensor χijIk is obtained from a set of calculations
9 (1)
10 combining finite electric fields along different Cartesian directions. χijIk is then symmetrized to
11 recover the full symmetry of the structure under study.
12
13
14
15 E. Multi-scale modeling
16
17
18 1. Environ: Self-Consistent Continuum Solvation embedding model
19
20
21 Continuum models are among the most popular multiscale approaches to treat solvation effects
22
23 in the quantum-chemistry community [178]. In this class of models, the degrees of freedom of
24 solvent molecules are effectively integrated out and their statistically-averaged effects on the solute
25
26 are mimicked by those of a continuous medium surrounding a cavity in which the solute is thought to
27
dwell. The most important interaction usually handled with continuum models is the electrostatic
28
29 one, in which the solvent is described as a dielectric continuum characterized by its experimental
30
31 dielectric permittivity.
32
Following the original work of Fattebert and Gygi [179] , a new class of continuum models was
33
34 designed, in which a smooth transition from the QM-solute region to the continuum-environment
35
36 region of space is introduced and defined in terms of the electronic density of the solute. The
37
corresponding free energy functional is optimized using a fully variational approach, leading to
38
39 a generalized Poisson equation that is solved via a multi-grid solver[179]. This approach, ideally
40
41 suited for plane-wave basis sets and tailored for MD simulations, has been featured in the Quantum
42 ESPRESSO distribution since v. 4.1. This approach was recently revised[18], by defining an
43
44 optimally smooth QM/continuum transition, reformulated in terms of iterative solvers[180] and
45
46 extended to handle in a compact and effective way non-electrostatic interactions [18]. The resulting
47 self-consistent continuum solvation (SCCS) model, based on a very limited number of physically
48
49 justified parameters, allows one to reproduce experimental solvation energies for aqueous solutions
50
51 of neutral [18] and charged[181] species with accuracies comparable to or higher than state-of-the-
52 art quantum-chemistry packages.
53
54 The SCCS model involves different embedding terms, each representing a specific interaction
55
56 with an external continuum environment and contributing to the total energy, KS potential, and
57
58
59
60
Page 39 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
39
1
2
3 interatomic forces of the embedded QM system. Every contribution may depend explicitly on
4
5 the ionic (rigid schemes) and/or electronic (self-consistent or soft schemes) degrees of freedom
6 of the embedded system. All the different terms are collected in the stand-alone Environ mod-
7
8 ule [182]. The present discussion refers to release 0.2 of Environ, which is compatible with Quan-
9
tum ESPRESSO starting from versions 5.1. The module requires a separate input file with
10
11 the specifications of the environment interactions to be included and of the numerical parameters
12
13 required to compute their effects. Fully parameterized and tuned SCCS environments, e.g., cor-
14
responding to water solutions for neutral and charged species, are directly available to the users.
15
16 Otherwise individual embedding terms can be switched on and tuned to the specific physical condi-
17
18 tions of the required environment. Namely, the following terms are currently featured in Environ:
19
20 • Smooth continuum dielectric, with the associated generalized Poisson problem solved via a
21
22 direct iterative approach or through a preconditioned conjugate gradient algorithm [180].
23
24 • Electronic enthalpy functional, introducing an energy term proportional to the quantum-
25
26 volume of the system and able to describe finite systems under the effect of an applied
27 external pressure[183].
28
29
30 • Electronic cavitation functional, introducing an energy term proportional to the quantum-
31 surface able to describe free energies of cavitation and other surface-related interaction
32
33 terms[184].
34
35 • Parabolic corrections for periodic boundary conditions in aperiodic and partially periodic
36
37 (slab) systems [19, 185].
38
39
• Fixed dielectric regions, allowing for the modelling of complex inhomogenous dielectric en-
40
41 vironments.
42
43
• Fixed Gaussian-smoothed distributions of charges, allowing for a simplified modelling of
44
45 countercharge distributions, e.g., in electrochemical setups.
46
47
Different packages of the Quantum ESPRESSO distribution have been interfaced with the
48
49 Environ module, including PWscf, CP, PWneb, and turboTDDFT, the latter featuring a linear-
50
51 response implementation of the SCCS model (see Sec.II B 2). Moreover, continuum environment
52 effects are fully compatible with the main features of Quantum ESPRESSO, and in particular,
53
54 with reciprocal space integration and smearing for metallic systems, with both norm-conserving
55
56 and ultrasoft pseudopotentials and PAW, with all XC functionals.
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 40 of 68
40
1
2
3 2. QM-MM
4
5
6 QM-MM was implemented in v.5.0.2 using the method documented in Ref. 40. Such methodol-
7
8 ogy accounts for both mechanical and electrostatic coupling between the QM (quantum-mechanical)
9 and MM (molecular-mechanics) regions, but not for bonding interactions (i.e., bonds between the
10
11 QM and MM regions). In practice, we need to run two different codes, Quantum ESPRESSO
12
for the QM region and a classical force-field code for the MM region, that communicate atomic
13
14 positions, forces, electrostatic potentials.
15
16 LAMMPS [39] is the software chosen to deal with the classical (MM) degrees of freedom. This
17
18 is a well-known and well-maintained package, released under an open-source license that allows
19 redistribution together with Quantum ESPRESSO. The communications between the QM and
20
21 MM regions use a “shared memory” approach: the MM code runs on a master node, communicates
22
23 directly via the memory with the QM code, which is typically running on a massively parallel
24 machine. Such approach has some advantages: the MM part is typically much faster than the QM
25
26 one and can be run in serial execution, wasting no time on the HPC machine; there is a clear and
27
neat separation between the two codes, and very small code changes in either codes are needed. It
28
29 has however also a few drawbacks, namely: the serial computation of the MM part may become a
30
31 bottleneck if the MM region contains many atoms; direct access to memory is often restricted for
32
security reasons on HPC machines.
33
34 An alternative approach has been implemented in v.5.4. A single (parallel) executable runs
35
36 both the MM and the QM codes. The two codes exchange data and communicate via MPI.
37
38 This approach is less elegant than the previous one and requires a little bit more coding, but its
39 implementation is quite straightforward thanks also to the changes in the logic of parallelization
40
41 mentioned in Sec. III D. The coupling of the two codes has required some modifications also to the
42
qmmm library inside LAMMPS and to the related fix qmmm (a “fix” in LAMMPS is any operation
43
44 that is applied to the system during the MD run). In particular, electrostatic coupling has been
45
46 introduced, following the approach described in Ref. 186. The great advantage of this approach
47
is that its performance on HPC machines is as good as the separate performances of the QM and
48
49 MM codes. Since LAMMPS is very well parallelized, this is a significant advantage if the MM
50
51 region contains many atoms. Moreover, it can be run without restrictions on any parallel machine.
52 This new QM-MM implementation is an integral part of the Quantum ESPRESSO distribution
53
54 and will soon be included into LAMMPS as well (the “fix” is currently under testing) and it is
55
56 straightforward to compile and execute it.
57
58
59
60
Page 41 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
41
1
2
3 F. Miscellaneous feature enhancements and additions
4
5
1. Fully relativistic projector augmented-wave method
6
7
8 By applying the PAW formalism to the equations of relativistic spin density functional theory
9
10 [187, 188], it is possible to obtain the fully relativistic PAW equations for four-component spinor
11
12 pseudo-wavefunctions [16]. In this formalism the pseudo-wavefunctions can be written in terms of
13 large |Ψ̃A B
i,σ i and small |Ψ̃i,σ i components, both two-component spinors (the index σ runs over the
14
v
15 two spin components). The latter is of order c of the former, where v is of the order of the velocity
16
of the electron and c is the speed of light. These equations can be simplified introducing errors of
17
18 the order of the transferability error of the pseudopotential or of order 1/c2 , depending on which is
19
20 the largest. In the final equations only the large components of the pseudo-wavefunctions appear.
21
The non relativistic kinetic energy p2 /2m (m is the electron mass) acts on the large component
22
23 of the pseudo-wavefunctions |Ψ̃A
i,σ i in the mesh defined by the FFT grid and the same kinetic
24
25 energy is used to calculate the expectation values of the Hamiltonian between partial pseudo-
26 waves |ΦI,P S,A
i. The Dirac kinetic energy is used instead to calculate the expectation values of
n,σ
27
28 the Hamiltonian between all-electron partial waves |ΦI,AE
n,η i (η is a four-component index). In this
29
30 manner, relativistic effects are hidden in the coefficients of the non-local pseudopotential. The
31 equations are formally very similar to the equations of the scalar-relativistic case:
32 "
33 X p2 XZ η1 ,η2
34 δ σ1 ,σ2 + drṼLOC (r)K̃(r)ησ11,η
,σ2 − εi S
2 σ1 ,σ2
σ2
2m η1 ,η2
35 #
36 X
37 + 1 1 I,A I,A
(DI,mn − D̃I,mn )|βm,σ1 ihβn,σ2 | |Ψ̃A i,σ2 i = 0, (42)
38 I,mn
39 1
where DI,mn 1
and D̃I,mn are calculated inside the PAW spheres:
40
41 X η1 ,η2 I,η1 ,η2 I,AE
1
DI,mn = hΦI,AE
m,η1 |TD + VLOC |Φn,η2 i, (43)
42
η1 ,η2
43
X p2 σ1 ,σ2
44 1
D̃I,mn = hΦI,P S,A I,σ1 ,σ2 I,P S,A
m,σ1 | δ + ṼLOC |Φn,σ2 i
45 σ1 ,σ2
2m
46 XZ I,η1 ,η2
47 + drQ̂Imn,η1 ,η2 (r)ṼLOC (r). (44)
48 η1 ,η2 ΩI
49
Here TD is the Dirac kinetic energy:
50
51
52 TD = cα · p + (β − 14×4 )mc2 , (45)
53
η1 ,η2
54 written in terms of the 4×4 Hermitian matrices α and β and VLOC is the sum of the local, Hartree,
55
56 and XC potential (Veff ) together, in magnetic systems, with the contribution of the XC magnetic
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 42 of 68
42
1
2
η1 ,η2
3 field: VLOC (r) = Veff (r)δ η1 ,η2 − µB Bxc (r) · (βΣ)η1 ,η2 . We refer to Ref. 16 for a detailed defini-
4
5 tion of the partial waves |ΦI,AE I,P S,A
n,η i, |Φn,σ
I,A
i and projectors |βm,σ i, of the augmentation functions
6 Q̂Imn,η1 ,η2 (r) and K̃(r)ησ11,η
,σ2 , and of the overlap matrix S
2 σ1 ,σ2 and for their rewriting in terms of
7
8 projector functions that contain only spherical harmonics. Solving these equations it is possible to
9
include spin-orbit coupling effects in electronic structure calculations. In Quantum ESPRESSO
10
11 these equations are used when input variables noncolin and lspinorb are both .TRUE. and the
12
13 PAW data sets are fully relativistic, as those available with the pslibrary project.
14
15
16 2. Electronic and structural properties in field-effect configuration
17
18
19 Since Quantum ESPRESSO v.6.0 it is possible to compute the electronic structure under
20
21 a field-effect transistor (FET) setup in periodic boundary conditions [189]. In physical FETs,
22 a voltage is applied to a gate electrode, accumulating charges at the interface between the gate
23
24 dielectric and a semiconducting system (see Fig. 3). The gate electrode is simulated with a
25
26 charged plate, henceforth referred to as the gate. Since the interaction of this charged plate with
27 its periodic image generates a spurious nonphysical electric field, a dipolar correction, equivalent to
28
29 two planes of opposite charge, is added [190], canceling out the field on the left side of the gate. In
30
31 order to prevent electrons from spilling towards the gate for large electron doping [191], a potential
32 barrier can be added to the electrostatic potential, mimicking the effect of the gate dielectric.
33
34 The setup for a system in FET configuration is shown in Fig. 3. The gate has a charge ndop A
35
36
and the system has opposite charge. Here ndop is the number of doping electrons per unit area
37 (i.e., negative for hole doping), A is the area of the unit cell parallel to the surface. In practice the
38
39 gate is represented by an external potential
40
41 z2 L
Vgate (r) = −2π ndop − |z| + L + 6 (46)
42
43
44 Here z = z − zgate with z ∈ [−L/2; L/2] measures the distance from the gate (see Fig. 3). The
45
46 dipole of the charged system plus the gate is canceled by an electric dipole generated by two planes
47
of opposite charge [190, 192, 193], placed at zdip − ddip /2 and zdip + ddip /2, in the vacuum region
48
49 next to the gate (Vdip in Fig. 3). Additionally one may include a potential barrier to avoid charge
50
51 spilling towards the gate, or as a substitute for the gate dielectric. Vb (r) is a periodic function of
52 z defined on the interval z ∈ [0, L] as equal to a constant Vb for z1 < z < z2 and zero elsewhere.
53
54 Figure 3 shows the resulting total potential (black line). The following additional variables are
55
56 needed: zgate , z1 , z2 , and V0 . In the code these variables are named zgate, block 1, block 2,
57
58
59
60
Page 43 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
43
1
2 Gate Barrier
3 -
-
-
++
++
++
4 -
-
-
++
++
++
- ++
5 - ++
Dipole
7
E=0
8
9
10
i
11 Vper + VH + Vgate + Vdip + Vb
12
13
14 V
15
16
17 zgate
Vgate
18 Vdip ddip
19
20 db
21 Vb
22 z1 z2
23
24 0 L
z
25
26
FIG. 3. Schematic picture of the planar averaged KS potential (without the exchange-correlation potential)
27
28 for periodically repeated, charged slabs. The uppermost panel shows a sketch of a gated system. The
29 different parts of the total KS potential are shown with different color: red – gate, Vgate , green – dipole,
30
31 Vdip , blue – potential barrier, Vb . The position of the gate is indicated by zgate . The black line shows the
32 i
sum of Vgate , Vdip , Vb , of the ionic potential Vper , and of the Hartree potential VH . The length of the unit
33
34 cell along ẑ is given by L.
35
36
and block height, respectively. The dipole corrections and the gate are activated by the options
37
38 dipfield=.true. and gate=.true.. In order to enable the potential barrier and the relaxation
39
40 of the system towards it, the new input parameters block and relaxz, respectively, have to be set
41 to .true. More details about the implementation can be found in Ref. 189.
42
43
44
45 3. Cold restart of Car-Parrinello molecular dynamics
46
47
In the standard Lagrangian formulation of ab initio molecular dynamics [53], the coefficients
48
49 of KS molecular orbitals over a given basis set (i.e.. their Fourier coefficients, in the case of
50
51 plane waves) are treated as classical degrees of freedom obeying Newton’s equations of motion that
52 derive from a suitably defined extended Lagrangian. This Lagrangian is obtained from the Born-
53
54 Oppenheimer total energy by augmenting it with a fictitious electronic kinetic-energy term and
55
56 relaxing the constraint that the molecular orbitals stay at each instant of the trajectory in their
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 44 of 68
44
1
2
3 instantaneous KS ground state. The idea is that, by choosing a suitably small fictitious electronic
4
5 mass, the thermalization time of the electronic degrees of freedom can be made much longer than
6 the typical simulation times, so that if the system is prepared in its electronic KS ground state at
7
8 the start of the simulation, the electronic dynamics would follow almost adiabatically the nuclear
9
one all over the simulation, thus effectively mimicking a bona fide Born-Oppenheimer dynamics.
10
11 While in Car-Parrinello MD both the physical nuclear and fictitious electronic velocities are
12
13 determined by the equations of motion on a par, the question still remains as to how choose them
14
at the start of the simulation. Initial nuclear velocities are dictated by physical considerations
15
16 (e.g., thermal equilibrium) or may be taken from a previously interrupted MD run. Electronic
17
18 velocities (i.e., the time derivatives of the KS molecular orbitals), instead, are not available when
19 the simulation is started from scratch. and are not independent of the physical nuclear ones, but
20
21 are determined by the adiabatic time evolution of the system. Moreover, the projection over the
22 k .
23 occupied-state manifold of the electronic velocities, ψ̇v = P̂ ψ̇v is ill-defined because the KS ground-
24 state solution is defined modulo a unitary transformation within this manifold. This means that
25
26 the starting electronic velocities may not be simply obtained as finite differences of KS orbitals at
27
times t = 0 and t = ∆t. Here and in the following P̂ indicates the projector over the occupied-state
28
29 manifold, and Q̂ = 1 − P̂ its complement (i.e. the projector over the virtual-orbital manifold).
30 .
31 The component of the electronic velocities over the virtual-state manifold, ψ̇v⊥ = Q̂ψ̇v , is instead
32
well defined and can be formally written using standard first-order perturbation theory:
33
34 X hψc |V̇KS |ψv i
35 ψ̇v⊥ (r) = ψc (r) , (47)
ǫv − ǫc
36 c
37
38 where v and c indicate occupied (valence) and virtual (conduction) states, respectively, ǫn the
39 corresponding orbital energies, and V̇KS is the time derivative of the KS potential, VKS . V̇KS is the
40
41 linear response of VKS to the perturbation in the external potential determined by an infinitesimal
42 P
displacement of the nuclei along a MD trajectory: V̇ext (r) = R ∂vR∂R (r−R)
· Ṙ, where vR (r − R) is
43
44 the bare ionic pseudopotential of the atom at position R and Ṙ its velocity. Electronic velocities
45
46 can conveniently be initialized to the values given by Eq. (47), which are those that minimize their
47
norm and, hence, the initial electronic temperature, which is defined as the sum of the squared
48
49 norms of the electronic velocities.
50
51 While this could in principle be done using density-functional perturbation theory [90, 93], it
52 is more convenient to compute them numerically, following the procedure described below. At
53
54 t = 0 the KS molecular orbitals are initialized from a ground-state computation, performed with
55
56 whatever method is available or preferred (standard SCF calculation or global optimization, such as
57
58
59
60
Page 45 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
45
1
2
3 e.g., with conjugate gradients [194]). The KS molecular orbitals that would result from a perfectly
4
5 adiabatic propagation at t = ∆t are then determined from a second ground-state computation,
6 performed after half a “velocity-Verlet” MD step, i.e., at nuclear positions R(∆t) = R(0)+ Ṙ(0)∆t.
7
8 The initial velocities are then obtained from the relation:
9
10
11 ψ̇v⊥ = Ṗˆ ψv , (48)
12
13
14 which is obtained by simply differentiating the definition of occupied-state projector, P̂ ψv = ψv .
15
The right-hand side of Eq. (48) is finally easily computed by subtracting from each KS orbital at
16
17 time t = 0, its component over the occupied-state manifold at t = ∆t and dividing by ∆t.
18
19
20
21 4. Optimized tetrahedron method
22
23
24 The integration over k-points in the BZ is a crucial step in the calculation of the electronic
25
26 structure of a periodic system, affecting not only the ground state but linear response as well. This
27
is especially true for metallic systems where the integrand is discontinuous at the Fermi level. Even
28
29 more problematic is the integration of Dirac delta functions, such as those appearing in the density
30
31 of states (DOS), partial DOS and in the electron-phonon coupling constant.
32
Quantum ESPRESSO has always implemented a variety of “smearing” methods, in which
33
34 the delta function is replaced by a function of finite width (e.g., a Gaussian function, or more
35
36 sophisticated choices). It has also always implemented the linear tetrahedron method [195] with the
37
correction proposed by Blöchl [196], in which the BZ is divided into tetrahedra and the integration is
38
39 performed analytically by linear interpolation of KS eigenvalues in each tetrahedron. Such method
40
41 is however limited in its convenience and range of applicability: in fact the linear interpolation
42 systematically overestimates convex functions, thus making the convergence against the number
43
44 of k-points slow. The linear tetrahedron method was thus mostly restricted to the calculation of
45
46 DOS and partial DOS.
47
Since Quantum ESPRESSO v.6.1, the optimized tetrahedron method [197] is implemented.
48
49 Such method overcomes the drawback of the linear tetrahedron method using an interpolation that
50
51 accounts for the curvature of the interpolated function. The optimized tetrahedron method has
52 better convergence properties and an extended range of applicability: in addition to the calcula-
53
54 tion of the ground-state charge density, DOS and partial DOS, it can be used in linear-response
55
56 calculation of phonons and of the electron-phonon coupling constant.
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 46 of 68
46
1
2
3 5. Wyckoff positions
4
5
6 In Quantum ESPRESSO the crystal geometry is traditionally specified by a Bravais lattice
7 index (called ibrav), by the crystal parameters (celldm, or a, b, c, cosab, cosac, cosbc) describing
8
9 the unit cell, and by the positions of all atoms in the unit cell, in crystal or Cartesian axis.
10
11 Since v.5.1.1, it is possible to specify the crystal geometry in crystallographic style[198], ac-
12 cording to the notations of the International Tables of Crystallography (ITA)[199]. A complete
13
14 description of the crystal structure is obtained by specifying the space-group number according to
15
16 the ITA and the positions of symmetry-inequivalent atoms only in the unit cell. The latter can
17 be provided either in the crystal axis of the conventional cell, or as Wyckoff positions: a set of
18
19 special positions, listed in the ITA for each space group, that can be fully specified by a number of
20
parameters, none to three depending upon the site symmetry. Table 1 reports a few examples of
21
22 accepted syntax. The code generates the symmetry operations for the specified space group and
23
24
25 TABLE I. Examples of valid syntax for Wyckoff positions. C is the element name, followed by the Wyckoff
26 label of the site (number of equivalent atoms followed by a letter identifying the site), followed by the
27
28 site-dependent parameters needed to fully specify the atomic positions.
29 ATOMIC POSITIONS sg
30
C 1a
31
32 C 8g x
33 C 24m x y
34
35 C 48n x y z
36 C x y z
37
38
39 applies them to inequivalent atoms, thus finding all atoms in the unit cell.
40
41 For some crystal systems there are alternate descriptions in the ITA, so additional input param-
42
eters may be needed to select the desired one. For the monoclinic system the “c-unique” orientation
43
44 is the default and bunique=.TRUE. must be specified in input if the “b-unique” orientation is de-
45
46 sired. For some space groups there are two possible choices of the origin. The origin appearing
47
first in the ITA is chosen by default, unless origin choice=2 is specified in input. Finally, for
48
49 trigonal space groups the atomic coordinates can be referred to the rhombohedral or to the hexag-
50
51 onal Bravais lattices. The default is the rhombohedral lattice, so rhombohedral=.FALSE. must be
52 specified in input to use the hexagonal lattice.
53
54 A final comment for centered Bravais lattices: in the crystallographic literature, the conventional
55
56 unit cell is usually assumed. Quantum ESPRESSO however assumes the primitive unit cell,
57
58
59
60
Page 47 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
47
1
2
3 having a smaller volume and a smaller number of atoms, and discards atoms outside the primitive
4
5 cell. Auxiliary code supercell.x, available in thermo pw (see Sec.II D 1), prints all atoms in the
6 conventional cell when necessary.
7
8
9
III. PARALLELIZATION, MODULARIZATION, INTEROPERABILITY AND
10
11 STABILITY
12
13
A. New parallelization levels
14
15
16 The basic modules of Quantum ESPRESSO are characterized by a hierarchy of parallelization
17
18 levels, described in Ref.6. Processors are divided into groups, labeled by a MPI communicator.
19
20 Each group of processors distributes a specific subset of computations. The growing diffusion of
21 HPC machines based on nodes with many cores (32 and more) makes however pure MPI par-
22
23 allelization not always ideal: running one MPI process per core has a high overhead, limiting
24
performances. It is often convenient to use mixed MPI-OpenMP parallelization, in which a small
25
26 number of MPI processes per node use OpenMP threads, either explicitly (i.e., with compiler di-
27
28 rectives) or implicitly (i.e., via calls to OpenMP-aware library). Explicit OpenMP parallelization,
29 originally confined to computationally intensive FFT’s, has been extended to many more parts of
30
31 the code.
32
33 One of the challenges presented by massively parallel machine is to get rid of both memory and
34 CPU time bottlenecks, caused respectively by arrays that are not distributed across processors
35
36 and by non-parallelized sections of code. It is especially important to distribute all arrays and
37
38 parallelize all computations whose size/complexity increases with the dimensions of the unit cell
39 or of the basis set. Non-parallelized computations hamper “weak” scalability, that is, parallel
40
41 performance while increasing both the system size and the amount of computational resources,
42
while non-distributed arrays may become an unavoidable RAM bottleneck with increasing problem
43
44 size. “Strong” scalability (that is, at fixed problem size and increasing number of CPUs) is even
45
46 more elusive than weak scalability in electronic-structure calculations, requiring, in addition to
47
systematic distribution of computations, to keep to the minimum the ratio between time spent
48
49 in communications and in computation, and to have a nearly perfect load balancing. In order to
50
51 achieve strong scalability, the key is to add more parallelization levels and to use algorithms that
52 permit to overlap communications and computations.
53
54 For what concerns memory, notable offenders are arrays of scalar products between KS states ψi :
55 b j i, where O
b can be either the Hamiltonian or an overlap matrix; and scalar products
56 Oij = hψi |O|ψ
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 48 of 68
48
1
2
3 between KS states and pseudopotential projectors β, pij = hψi |βj i. The size of such arrays grows
4
5 as the square of the size of the cell. Almost all of them are now distributed across processors of
6 the “linear-algebra group”, that is, the group of processors taking care of linear-algebra operations
7
8 on matrices. The most expensive of such operations are subspace diagonalization (used in PWscf
9
in the iterative diagonalization) and iterative orthonormalization (used by CP). In both cases, a
10
11 parallel dense-matrix diagonalization on distributed matrix is needed. In addition to ScaLAPACK,
12
13 Quantum ESPRESSO can now take advantage of newer ELPA libraries (Ref. 200), leading to
14
significant performance improvements.
15
16 The array containing the plane-wave representation, ck,n (G), of KS orbitals is typically the
17
18 largest array, or one of the largest. While plane waves are already distributed across processors of
19
the “plane-wave group” as defined in Ref. 6, it is now possible to distribute KS orbitals as well.
20
21 Such a parallelization level is located between the k-point and the plane-wave parallelization levels.
22
23 The corresponding MPI communicator defines a subgroup of the “k-point group” of processors and
24 is called “band group communicator”. In the CP package, band parallelization is implemented for
25
26 almost all available calculations. Its usefulness is better appreciated in simulations of large cells
27
28 — several hundreds of atoms and more — where the number of processors required by memory
29 distribution would be too large to get good scalability from plane-wave parallelization only.
30
31 In PWscf, band parallelization is implemented for calculations using hybrid functionals. The
32
33 standard algorithm to compute Hartree-Fock exchange in a plane-wave basis set (see Sec. II A 1)
34 contains a double loop on bands that is by far the heaviest part of computation. A first form
35
36 of parallelization, described in Ref. 34, was implemented in v.5.0. In the latest version, this has
37
been superseded by parallelization of pairs of bands, Ref. 35. Such algorithm is compatible with
38
39 the “task-group” parallelization level (that is: over KS states in the calculation of V ψi products)
40
41 described in Ref. 6.
42
In addition to the above-mentioned groups, that are globally defined and in principle usable in
43
44 all routines, there are a few additional parallelization levels that are local to specific routines. Their
45
46 goal is to reduce the amount of non-parallel computations that may become significant for many-
47
atom systems. An example is the calculation of DFT+U (Sec. II A 3) terms in energy and forces,
48
49 Eqs. (12) and (14) respectively. In all these expressions, the calculation of the scalar products
50
51 between valence and atomic wave functions is in principle the most expensive step: for Nb bands
52 and Npw plane waves, O(Npw Nb ) floating-point operations are required (typically, Npw ≫ Nb ).
53
54 The calculation of these terms is however easily and effectively parallelized, using standard matrix-
55
56 matrix multiplication routines and summing over MPI processes with a mpi reduce operation on
57
58
59
60
Page 49 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
49
1
2
3 the plane-wave group. The sum over k-points can be parallelized on the k-point group. The
4
5 remaining sums over band indices ν and Hubbard orbitals I, m may however require a significant
6 amount of non-parallelized computation if the number of atoms with a Hubbard U term is not small.
7
8 The sum over band indices is thus parallelized by simply distributing bands over the plane-wave
9
group. This is a convenient choice because all processors of the plane-wave group are available once
10
11 the scalar products are calculated. The addition of band parallelization speeds up the computation
12
13 of such terms by a significant factor. This is especially important for Car-Parrinello dynamics,
14
requiring the calculation of forces at each time step, when a sizable number of Hubbard manifolds
15
16 is present.
17
18
19
B. Aspects of interoperability
20
21
22 One of the original goals of Quantum ESPRESSO was to assemble different pieces of rather
23
24 similar software into an integrated software suite. The choice was made to focus on the following
25
26
four aspects: input data formats, output data files, installation mechanism, and a common base of
27 code. While work on the first three aspects is basically completed, it is still ongoing on the fourth.
28
29 It was however realized that a different form of integration — interoperability, i.e., the possibility
30
to run Quantum ESPRESSO along with other software — was more useful to the community of
31
32 users than tight integration. There are several reasons for this, all rooted in new or recent trends
33
34 in computational materials science. We mention in particular the usefulness of interoperability for
35
36 1. excited-states calculations using many-body perturbation theory, at various levels of sophis-
37
38 tication: GW , TDDFT, BSE (e.g., yambo [201], SaX [202], or BerkeleyGW [203]);
39
40
2. calculations using quantum Monte Carlo methods;
41
42
43 3. configuration-space sampling, using such algorithms as nudged elastic band (NEB), ge-
44
45 netic/evolutionary algorithms, meta-dynamics;
46
47 4. inclusion of quantum effects on nuclei via path-integral Monte Carlo;
48
49
50 5. multi-scale simulations, requiring different theoretical approaches, each valid in a given range
51
of time and length scale, to be used together;
52
53
54 6. high-throughput, or “exhaustive”, calculations (e.g., AiiDA [204, 205] and AFLOWπ [206])
55
56 requiring automated submission, analysis, retrieval of a large number of jobs;
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 50 of 68
50
1
2
3 7. “steering”, i.e., controlling the computation in real time using either a graphical user interface
4
5 (GUI) or an interface in a high-level interpreted language (e.g., python).
6
7 It is in principle possible, and done in some cases, to implement all of the above into Quan-
8
9 tum ESPRESSO, but this is not always the best practice. A better option is to use Quantum
10 ESPRESSO in conjunction with external software performing other tasks.
11
12 Cases 1 and 2 mentioned above typically use as starting step the self-consistent solution of
13
KS equations, so that what is needed is the possibility for external software to read data files
14
15 produced by the main Quantum ESPRESSO codes, notably the self-consistent code PWscf and
16
17 the molecular dynamics code CP.
18
Cases 3 and 4 typically require many self-consistent calculations at different atomic configura-
19
20 tions, so that what is needed is the possibility to use the main Quantum ESPRESSO codes as
21
22 “computational engine”, i.e., to call PWscf or CP from an external software, using atomic configu-
23 rations supplied by the calling code.
24
25 The paradigmatic case 5 is QM-MM (Sec.II E 2), requiring an exchange of data, notably:
26
27 atomic positions, forces, and some information on the electrostatic potential, between Quantum
28 ESPRESSO and the MM code – typically a classical MD code.
29
30 Case 6 requires easy access to output data from one simulation, and easy on-the-fly generation
31
of input data files as well. This is also needed for case 7, which however may also require a finer-
32
33 grained control over computations performed by Quantum ESPRESSO routines: in the most
34
35 sophisticated scenario, the GUI or python interface should be able to perform specific operations
36
“on the fly”, not just running an entire self-consistent calculation. This scenario relies upon the
37
38 existence of a set of application programming interfaces (API’s) for calls to basic computational
39
40 tasks.
41
42
43 C. Input/Output and data file format
44
45
46 On modern machines, characterized by fast CPU’s and large RAM’s, disk input/output (I/O)
47
may become a bottleneck and should be kept to a strict minimum. Since v.5.3 both PWscf and CP
48
49 do not perform by default any I/O at run time, except for the ordinary text output (printout), for
50
51 checkpointing if required or needed, and for saving data at the end of the run. The same is being
52 gradually extended to all codes. In the following, we discuss the case of the final data writing.
53
54 The original organization of output data files (or more exactly, of the output data directory)
55
56 was based on a formatted “head” file, with a XML-like syntax, containing general information on
57
58
59
60
Page 51 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
51
1
2
3 the run, and on binary data files containing the KS orbitals and the charge density. We consider
4
5 the basic idea of such approach still valid, but some improvements were needed. On one hand, the
6 original head file format had a number of small issues—inconsistencies, missing pieces of relevant
7
8 information—and used a non-standard syntax, lacking a XML “schema” for validation. On the
9
other hand, data files suffered from the lack of portability of Fortran binary files and had to be
10
11 transformed into text files, sometimes very large ones, in order to become usable on a different
12
13 machine.
14
15
16
1. XML files with schema
17
18
19 Since v.6.0, the “head” file is a true XML file using a consistent syntax, described by a XML
20
21 schema, that can be easily parsed with standard XML tools. It also contains complete information
22
23 on the run, including all data needed to reproduce the results, and on the correct execution and
24 exit status. This aspect is very useful for high-throughput applications, for databasing of results
25
26 and for verification and validation purposes.
27
28 The XML file contains an input section and can thus be used as input file, alternative to the still
29 existing text-based input. It supersedes the previous XML-based input, introduced several years
30
31 ago, that had a non-standard syntax, different from and incompatible with the one of the original
32
33 head file. Implementing a different input is made easy by the clear separation existing between
34 the reading and initialization phases: input data is read, stored in a separate module, copied to
35
36 internal variables.
37
38
The current XML file can be easily parsed and generated using standard XML tools and is
39 especially valuable in conjunction with GUI’s. The schema can be found at the URL:
40
41 http://www.quantum-espresso.org/ns/qes/qes-1.0.xsd.
42
43
44
2. Large-record data file format
45
46
47
Although not as I/O-bound as other kinds of calculations, electronic-structure simulations may
48
49 produce a sizable amount of data, either intermediate or needed for further processing. The largest
50
51 array typically contains the plane-wave representation of KS orbitals; other sizable arrays contain
52 the charge and spin density, either in reciprocal or in real space. In parallel execution using MPI,
53
54 large arrays are distributed across processors, so one has two possibilities: let each MPI process
55
56 write its own slice of the data (“distributed” I/O), or collect the entire array on a single processor
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 52 of 68
52
1
2
3 before writing it (“collected” I/O). In distributed I/O, coding is straightforward and efficient,
4
5 minimizing file size and achieving some sort of I/O parallelization. A global file system, accessible
6 to all MPI processes, is needed. The data is spread into many files that are directly usable only by a
7
8 code using exactly the same distribution of arrays, that is, exactly the same kind of parallelization.
9
In collected I/O, the coding is less straightforward. In order to ensure portability, some reference
10
11 ordering, independent upon the number of processors and the details of the parallelization, must
12
13 be provided. For large simulations, memory usage and communication pattern must be carefully
14
optimized when a distributed array is collected into a large array on a single processor.
15
16 In the original I/O format, KS orbitals were saved in reciprocal space, in either distributed or
17
18 collected format. For the latter, a reproducible ordering of plane waves (including the ordering
19
within shells of plane waves with the same module), independent upon parallelization details and
20
21 machine-independent, ensures data portability. Charge and spin density were instead saved in real
22
23 space and in collected format. In the new I/O scheme, available since v.6.0, the output directory
24 is simplified, containing only the XML data file, one file per k-point with KS orbitals, one file for
25
26 the charge and spin density. Both files are in collected format and both quantities are stored in
27
28 reciprocal space. In addition to Fortran binary, it is possible to write data files in HDF5 format[207].
29 HDF5 offers the possibility to write structured record and portability across architectures, without
30
31 significant loss in performances; it has an excellent support and is the standard for I/O in other
32
fields of scientific computing. Distributed I/O is kept only for checkpointing or as a last-resort
33
34 alternative.
35
36 In spite of its advantages, such a solution has still a bottleneck in large-scale computations on
37
massively parallel machines: a single processor must read and write large files. Only in the case of
38
39 parallelization over k-points is I/O parallelized in a straightforward way. More general solutions
40
41 to implement parallel I/O using parallel extensions of HDF5 are currently under examination in
42
view of enabling Quantum ESPRESSO towards “exascale” computing (that is: towards O(1018 )
43
44 floating-point operations per second).
45
46
47
48 D. Organization of the distribution
49
50
51 Codes contained in Quantum ESPRESSO have evolved from a small set of original codes,
52 born with rather restricted goals, into a much larger distribution via continuous additions and
53
54 extensions. Such a process - presumably common to most if not all scientific software projects -
55
56 can easily lead to uncoordinated growth and to bad decisions that negatively affect maintainability.
57
58
59
60
Page 53 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
53
1
2
3 1. Package re-organization and modularization
4
5
6 In order to make the distribution easier to maintain, extend and debug, the distribution has
7 been split into
8
9
10 a. base distribution, containing common libraries, tools and utilities, core packages PWscf, CP,
11
12 PostProc, plus some commonly used additional packages, currently: atomic, PWgui, PWneb,
13 PHonon, XSpectra, turboTDDFT, turboEELS, GWL, EPW;
14
15
16 b. external packages such as SaX [202], yambo [201], Wannier90 [171], WanT [208, 209], that are
17 automatically downloaded and installed on demand.
18
19
20 The directory structure now explicitly reflects the structure of Quantum ESPRESSO as a “fed-
21
eration” of packages rather than a monolithic one: a common base distribution plus additional
22
23 packages, each of which fully contained into a subdirectory.
24
25 In the reorganization process, the implementation of the NEB algorithm was completely rewrit-
26
ten, following the paradigm sketched in Sec. III B. PWneb is now a separate package that implements
27
28 the NEB algorithm, using PWscf as the computational engine. The separation between the NEB
29
30 algorithm and the self-consistency algorithm is quite complete: PWneb could be adapted to work
31 in conjunction with a different computational engine with a minor effort.
32
33 The implementation of meta-dynamics has also been re-considered. Given the existence of a very
34
35 sophisticated and well-maintained package[210] Plumed for all kinds of meta-dynamics calculations,
36 the PWscf and CP packages have been adapted to work in conjunction with Plumed v.1.x, removing
37
38 the old internal meta-dynamics code. In order to activate meta-dynamics, a patching process is
39
40 needed, in which a few specific “hook” routines are modified so that they call routines from Plumed.
41
42
43 2. Modular parallelism
44
45
46 The logic of parallelism has also evolved towards a more modular approach. It is now possi-
47
ble to have all Quantum ESPRESSO routines working inside a MPI communicator, passed as
48
49 argument to an initialization routine. This allows in particular the calling code to have its own
50
51 parallelization level, invisible to Quantum ESPRESSO routines; the latter can thus perform
52 independent calculations, to be subsequently processed by the calling code. For instance: the
53
54 “image” parallelization level, used by NEB calculations, is now entirely managed by PWneb and no
55
56 longer in the called PWscf routines. Such a feature is very useful for coupling external codes to
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 54 of 68
54
1
2
3 Quantum ESPRESSO routines. To this end, a general-purpose library for calling PWscf or CP
4
5 from external codes (either Fortran or C/C++ using the Fortran 2003 ISO C binding standard) is
6 provided in the directory COUPLE/.
7
8
9
3. Reorganization of linear-response codes
10
11
12 All linear-response codes described in Secs. II B and II A 4 share as basic computational step the
13
14 self-consistent solution of linear systems Ax = b for different perturbations b, where the operator A
15
16 is derived from the KS Hamiltonian H and the linear-response potential. Both the perturbations
17 and the methods of solution differ by subtle details, leading to a plethora of routines, customized to
18
19 solve slightly different versions of the same problem. Ideally, one should be able to solve any linear-
20
response problem by using a suitable library of existing code. To this end, a major restructuring
21
22 of linear-response codes has been started. Several routines have been unified, generalized and
23
24 extended. They have been collected into the same subdirectory, LR Modules, that will be the
25
container of “generic” linear-response routines. Linear-response-related packages now contain only
26
27 code that is specific to a given perturbation or property calculation.
28
29
30 E. Quantum ESPRESSO and scripting languages
31
32
33 A desirable feature of electronic-structure codes is the ability to be called from a high-level
34
35 interpreted scripting language. Among the various alternatives, python has emerged in the last
36
years due to its simple and powerful syntax and to the availability of numerical extensions (NumPy).
37
38 Since v.6.0, an interface between PWscf and the path integral MD driver i-PI [41] is available
39
40 and distributed together with Quantum ESPRESSO. Various implementations of an interface
41 between Quantum ESPRESSO codes and the atomic simulation environment (ASE) [211] are
42
43 also available. In the following we briefly highlight the integration of Quantum ESPRESSO with
44
45 AiiDA, the pwtk toolkit for PWscf, and the QE-emacs-modes package for user-friendly editing of
46 Quantum ESPRESSO with the Emacs editor [212].
47
48
49
1. AiiDA: a python materials’ informatics infrastructure
50
51
52 AiiDA [204] is a comprehensive python infrastructure aimed at accelerating, simplifying, and
53
54 organizing major efforts in computational science, and in particular computational materials sci-
55
56 ence, with a close integration with the Quantum ESPRESSO distribution. AiiDA is structured
57
58
59
60
Page 55 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
55
1
2
3 around the four pillars of the ADES model (Automation, Data, Environment, and Sharing, Ref.
4
5 204)), and provides a practical and efficient implementation of all four. In particular, it aims at
6 relieving the work of a computational scientist from the tedious and error-prone tasks of running,
7
8 overseeing, and storing hundreds or more of calculations daily (Automation pillar), while ensur-
9
ing that strict protocols are in place to store these calculations in an appropriately structured
10
11 database that preserves the provenance of all computational steps (Data pillar). This way, the
12
13 effort of a computational scientist can become focused on developing, curating, or exploiting com-
14
plex workflows (Environment pillar) that calculate in a robust manner e.g. the desired materials
15
16 properties of a given input structure, recording expertise in reproducible sequences that can be
17
18 progressively perfected, while being able to share freely both the workflows and the data generated
19 with public or private common repositories (Sharing). AiiDA is built using an agnostic structure
20
21 that allows to interface it with any given code — through plugins and a plugin repository — or
22
23 with different queuing systems, transports to remote HPC resources, and property calculators.
24 In addition, it allows to use arbitrary object-relational mappers (ORMs) as backends (currently,
25
26 Django and SQLAlchemy are supported). These ORMs map the AiiDA objects (“Codes”, “Calcu-
27
lations” and “Data”) onto python classes, and lead to the representation of calculations through
28
29 Directed Acyclic Graphs (DAGs) connecting all objects with directional arrows; this ensures both
30
31 provenance and reproducibility of a calculation. As an example, in Fig. 4 we present a simple DAG
32
representing a PWscf calculation on BaTiO3 .
33
34
35
36 2. Pwtk: a toolkit for PWscf
37
38
39 The pwtk, standing for PWscf ToolKit, is a Tcl scripting interface for PWscf set of programs
40
41 contained in the Quantum ESPRESSO distribution. It aims at providing a flexible and produc-
42
tive framework. The basic philosophy of pwtk is to lower the learning curve by using syntax that
43
44 closely matches the input syntax of Quantum ESPRESSO. Pwtk features include: (i) assignment
45
46 of default values of input variables on a project basis, (ii) reassignment of input variables on the fly,
47
(iii) stacking of input data, (iv) math-parser, (v) extensible and hierarchical configuration (global,
48
49 project-based, local), (vi) data retrieval functions (i.e., either loading the data from pre-existing
50
51 input files or retrieving the data from output files), and (vii) a few predefined higher-level tasks,
52 that consist of several seamlessly integrated calculations. Pwtk allows to easily automate large
53
54 number of calculations and to glue together different computational tasks, where output data of
55
56 preceding calculations serve as input for subsequent calculations. Pwtk and related documentation
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 56 of 68
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 57 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 58 of 68
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 59 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
59
1
2
3 (LAPACK, BLAS, ScaLAPACK, FFTW3, MKL, OpenBlas). More information can be found at
4
5 test-farm.quantum-espresso.org.
6 The official mirror of the development version of Quantum ESPRESSO (https://github.com/QEF/q-e)
7
8 employs a subset of the test-suite to run Travis CI. This tool rapidly identifies erroneous commits
9
10 and can be used to assist code review during a pull request.
11
12
13 IV. OUTLOOK AND CONCLUSIONS
14
15
16 This paper describes the core methodological developments and extensions of Quantum
17
18 ESPRESSO that have become available, or are about to be released, after Ref. 6 appeared.
19 The main goal of Quantum ESPRESSO to provide an efficient and extensible framework to
20
21 perform simulations with well-established approaches and to develop new methods remains firm,
22
23 and it has nurtured an ever growing community of developers and contributors.
24 Achieving such goal, however, becomes increasingly challenging. On one hand, computational
25
26 methods become ever more complex and sophisticated, making it harder not only to implement
27
28 them on a computer but also to verify the correctness of the implementation (for a much needed
29 initial effort on verification of electronic-structure codes based on DFT, see Ref. 5). On the other
30
31 hand, exploiting the current technological innovations in computer hardware can requires massive
32
changes to software and even algorithms. This is especially true for the case of “accelerated” archi-
33
34 tectures (GPUs and the like), whose exceptional performance can translate to actual calculations
35
36 only after heavy restructuring and optimization. The complexity of existing codes makes a rewrite
37
for new architectures a challenging choice, and a risky one given the fast evolution of computer
38
39 architectures.
40
41 We think that the main directions followed until now in the development of Quantum
42
ESPRESSO are still valid, not only for new methodologies, but also for adapting to new com-
43
44 puter architectures and future “exascale” machines. Namely, we will continue pushing towards code
45
46 reusability, by removing duplicated code and/or replacing it with routines performing well-defined
47 tasks, by identifying the time-intensive sections of the code for machine-dependent optimization,
48
49 by having documented APIs with a predictable behavior and with limited dependency upon global
50
51 variables, and we will continue to optimize performance and reliability. Finally, we will push to-
52 wards extended interoperability with other software, also in view of its usefulness for data exchange
53
54 and for cross-verification, or to satisfy the needs of high-throughput calculations.
55
56 Still, the investment in the development and maintenance of state-of-the-art scientific software
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 60 of 68
60
1
2
3 has historically lagged behind compared to the investment in the applications that use such soft-
4
5 ware, and one wonders is this the correct or even forward-looking approach given the strategic
6 importance of such tools, their impact, their powerful contribution to open science, and their full
7
8 and complete availability to the entire community. In all of this, the future of materials simulations
9
appear ever more bright[214], and the usefulness and relevance of such tools to accelerating inven-
10
11 tion and discovery in science and technology is reflected in its massive uptake by the community
12
13 at large.
14
Acknowledgments. This work has been partially funded by the European Union through the
15
16 MaX Centre of Excellence (Grant No. 676598) and by the Quantum ESPRESSO Foundation.
17
18 SdG acknowledges support from the EU Centre of Excellence E-CAM (Grant No. 676531). OA,
19
MC, NC, NM, NLN, and IT acknowledge support from the SNSF National Centre of Competence
20
21 in Research MARVEL, and from the PASC Platform for Advanced Scientific Computing. TT
22
23 acknowledges support from NSF Grant No. DMR-1145968. SP, MS, and FG are supported by the
24 Leverhulme Trust (Grant RL-2012-001). MBN acknowledges support by DOD-ONR (N00014-13-
25
26 1-0635, N00014-11-1-0136, N00014-15-1-2863) and the Texas Advanced Computing Center at the
27
28 University of Texas, Austin. RD acknowledges partial support from Cornell University through
29 start-up funding and the Cornell Center for Materials Research (CCMR) with funding from the
30
31 NSF MRSEC program (DMR-1120296). This research used resources of the Argonne Leadership
32
Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of
33
34 the U.S. Department of Energy under Contract No. DE-AC02-06CH11357. This research also used
35
36 resources of the National Energy Research Scientific Computing Center, which is supported by the
37
Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
38
39
40
41
42
43 [1] P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964).
44 [2] W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965).
45
[3] D. Vanderbilt, Phys. Rev. B 41, 7892 (1990).
46
47 [4] P. E. Blöchl, Phys. Rev. B 50, 17953 (1994).
48 [5] K. Lejaeghere, G. Bihlmayer, T. Björkman, P. Blaha, S. Blügel, V. Blum, D. Caliste, I. E. Castelli,
49
50 S. J. Clark, A. Dal Corso, S. de Gironcoli, T. Deutsch, J. K. Dewhurst, I. Di Marco, C. Draxl,
51 M. Dulak, O. Eriksson, J. A. Flores-Livas, K. F. Garrity, L. Genovese, P. Giannozzi, M. Giantomassi,
52
53 S. Goedecker, X. Gonze, O. Grånäs, E. K. U. Gross, A. Gulans, F. Gygi, D. R. Hamann, P. J.
54 Hasnip, N. A. W. Holzwarth, D. Iuşan, D. B. Jochym, F. Jollet, D. Jones, G. Kresse, K. Koepernik,
55
56 E. Küçükbenli, Y. O. Kvashnin, I. L. M. Locht, S. Lubeck, M. Marsman, N. Marzari, U. Nitzsche,
57
58
59
60
Page 61 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
61
1
2
3 L. Nordström, T. Ozaki, L. Paulatto, C. J. Pickard, W. Poelmans, M. I. J. Probert, K. Refson,
4 M. Richter, G.-M. Rignanese, S. Saha, M. Scheffler, M. Schlipf, K. Schwarz, S. Sharma, F. Tavazza,
5
6 P. Thunström, A. Tkatchenko, M. Torrent, D. Vanderbilt, M. J. van Setten, V. Van Speybroeck, J. M.
7 Wills, J. R. Yates, G.-X. Zhang, and S. Cottenier, Science 351, 10.1126/science.aad3000 (2016).
8
9 [6] P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti,
10 M. Cococcioni, I. Dabo, A. Dal Corso, S. de Gironcoli, S. Fabris, G. Fratesi, R. Gebauer, U. Gerst-
11
12 mann, C. Gougoussis, A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari, F. Mauri, R. Mazzarello,
13 S. Paolini, A. Pasquarello, L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen,
14
A. Smogunov, P. Umari, and R. M. Wentzcovitch, J. Phys.: Condens. Matter 21, 395502 (2009).
15
16 [7] L. Lin, J. Chem. Theory Comput. 12, 2242 (2016).
17
[8] J. Jia, A. Vazquez-Mayagoitia, and R. A. DiStasio Jr., in preparation.
18
19 [9] K. Berland, V. R. Cooper, K. Lee, E. Schröder, T. Thonhauser, P. Hyldgaard, and B. I. Lundqvist,
20 Reports Prog. Phys. 78, 066501 (2015).
21
22 [10] S. Grimme, J. Comput. Chem. 27, 1787 (2006).
23 [11] A. Tkatchenko and M. Scheffler, Phys. Rev. Lett. 102, 073005 (2009).
24
25 [12] A. D. Becke and E. R. Johnson, J. Chem. Phys. 127, 154108 (2007).
26 [13] E. R. Johnson, in Non-covalent Interactions in Quantum Chemistry and Physics, edited by A. Otero-
27
de-la-Roza and G. DiLabio (Elsevier, in press, 2017) pp. 215–248.
28
29 [14] G. Sclauzero and A. Dal Corso, Phys. Rev. B 87, 085108 (2013).
30
[15] B. Himmetoglu, R. M. Wentzcovitch, and M. Cococcioni, Phys. Rev. B 84, 115108 (2011).
31
32 [16] A. Dal Corso, Phys. Rev. B 82, 075116 (2010).
33 [17] A. Dal Corso, Phys. Rev. B 86, 085135 (2012).
34
35 [18] O. Andreussi, I. Dabo, and N. Marzari, J. Chem. Phys. 136, 064102 (2012).
36 [19] O. Andreussi and N. Marzari, Phys. Rev. B 90, 245101 (2014).
37
38 [20] I. Timrov, O. Andreussi, A. Biancardi, N. Marzari, and S. Baroni, J. Chem. Phys. 142, 034111 (2015).
39 [21] B. Walker, A. M. Saitta, R. Gebauer, and S. Baroni, Phys. Rev. Lett. 96, 113001 (2006).
40
[22] D. Rocca, R. Gebauer, Y. Saad, and S. Baroni, J. Chem. Phys. 128, 154105 (2008).
41
42 [23] O. B. Malcioğlu, R. Gebauer, D. Rocca, and S. Baroni, Comput. Phys. Commun. 182, 1744 (2011).
43
[24] X. Ge, S. Binnie, D. Rocca, R. Gebauer, and S. Baroni, Comput. Phys. Commun. 185, 2080 (2014).
44
45 [25] I. Timrov, N. Vast, R. Gebauer, and S. Baroni, Phys. Rev. B 88, 064301 (2013), ibid. 91, 139901(E)
46 (2015).
47
48 [26] I. Timrov, N. Vast, R. Gebauer, and S. Baroni, Comput. Phys. Commun. 196, 460 (2015).
49 [27] S. Poncé, E. Margine, C. Verdi, and F. Giustino, Comput. Phys. Commun. 209, 116 (2016).
50
51 [28] P. Umari, G. Stenuit, and S. Baroni, Phys. Rev. B 79, 201104 (2009).
52 [29] P. Umari, G. Stenuit, and S. Baroni, Phys. Rev. B 81, 115104 (2010).
53
[30] M. Schlipf, H. Lambert, N. Zibouche, and F. Giustino, “SternheimerGW,” https://github.com/
54
55 QEF/SternheimerGW (2017).
56
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 62 of 68
62
1
2
3 [31] A. Dal Corso, https://github.com/dalcorso/thermo_pw ().
4 [32] L. Paulatto, F. Mauri, and M. Lazzeri, Phys. Rev. B 87, 214303 (2013).
5
6 [33] G. Fugallo, M. Lazzeri, L. Paulatto, and F. Mauri, Phys. Rev. B 88, 045430 (2013).
7 [34] N. Varini, D. Ceresoli, L. Martin-Samos, I. Girotto, and C. Cavazzoni, Comput. Phys. Commun. 184,
8
9 1827 (2013).
10 [35] T. Barnes, T. Kurth, P. Carrier, N. Wichmann, D. Prendergast, P. R. C. Kent, and J. Deslippe,
11
12 Comput. Phys. Commun. 241, 52 (2017).
13 [36] A. Dal Corso, http://pslibrary.quantum-espresso.org ().
14
[37] A. Dal Corso, Comp. Material Science 95, 337 (2015).
15
16 [38] I. Castelli, G. Prandini, A. Marrazzo, N. Mounet, and N. Marzari, http://materialscloud.org/
17
sssp/.
18
19 [39] S. Plimpton, J. Comput. Phys. 117, 1 (1995).
20 [40] C. Ma, L. Martin-Samos, S. Fabris, A. Laio, and S. Piccinin, Comput. Phys. Commun. 195, 191
21
22 (2015).
23 [41] M. Ceriotti, J. More, and D. E. Manolopoulos, Comput. Phys. Commun. 185, 1019 (2014).
24
25 [42] X. Wu, A. Selloni, and R. Car, Phys. Rev. B 79, 085102 (2009).
26 [43] R. A. DiStasio Jr., B. Santra, Z. Li, X. Wu, and R. Car, J. Chem. Phys. 141, 084502 (2014).
27
[44] H.-Y. Ko, J. Jia, B. Santra, X. Wu, R. Car, and R. A. DiStasio Jr., J. Chem. Theory Comput.
28
29 submitted.
30
[45] I. Carnimeo, P. Giannozzi, and S. Baroni, in preparation.
31
32 [46] M. Marsili and P. Umari, Phys. Rev. B 87, 205110 (2013).
33 [47] J. Paier, R. Hirschl, M. Marsman, and G. Kresse, J. Chem. Phys. 122, 234102 (2005).
34
35 [48] D. Anil, L. Lin, and Y. Lexing, J. Chem. Theory Comput. 11, 1463 (2015).
36 [49] D. Anil, L. Lin, and Y. Lexing, (2017), SIAM J. Sci. Comput., in press.
37
38 [50] N. Marzari and D. Vanderbilt, Phys. Rev. B 56, 12847 (1997).
39 [51] M. Sharma, Y. Wu, and R. Car, Int. J. Quantum Chem. 95, 821 (2003).
40
[52] B. Santra, R. A. DiStasio Jr., F. Martelli, and R. Car, Mol. Phys. 113, 2829 (2015).
41
42 [53] R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985).
43
[54] R. H. French, V. A. Parsegian, R. Podgornik, R. F. Rajter, A. Jagota, J. Luo, D. Asthagiri, M. K.
44
45 Chaudhury, Y.-m. Chiang, S. Granick, S. Kalinin, M. Kardar, R. Kjellander, D. C. Langreth, J. Lewis,
46 S. Lustig, D. Wesolowski, J. S. Wettlaufer, W.-Y. Ching, M. Finnis, F. Houlihan, O. A. von Lilienfeld,
47
48 C. J. van Oss, and T. Zemb, Rev. Mod. Phys. 82, 1887 (2010).
49 [55] S. Grimme, J. Antony, S. Ehrlich, and S. Krieg, J. Chem. Phys. 132, 154104 (2010).
50
51 [56] A. Tkatchenko, R. A. DiStasio Jr., R. Car, and M. Scheffler, Phys. Rev. Lett. 108, 236402 (2012).
52 [57] A. Ambrosetti, A. M. Reilly, R. A. DiStasio Jr., and A. Tkatchenko, J. Chem. Phys. 140, 18A508
53
(2014).
54
55 [58] M. A. Blood-Forsythe, T. Markovich, R. A. DiStasio Jr., R. Car, and A. Aspuru-Guzik, Chem. Sci.
56
57
58
59
60
Page 63 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
63
1
2
3 7, 1712 (2016).
4 [59] M. Dion, H. Rydberg, E. Schröder, D. C. Langreth, and B. I. Lundqvist, Phys. Rev. Lett. 92, 246401
5
6 (2004).
7 [60] D. C. Langreth and J. P. Perdew, Phys. Rev. B 15, 2884 (1977).
8
9 [61] T. Thonhauser, V. R. Cooper, S. Li, A. Puzder, P. Hyldgaard, and D. C. Langreth, Phys. Rev. B 76,
10 125112 (2007).
11
12 [62] G. Román-Pérez and J. M. Soler, Phys. Rev. Lett. 103, 096102 (2009).
13 [63] R. Sabatini, E. Küçükbenli, B. Kolb, T. Thonhauser, and S. de Gironcoli, J. Phys. Condens. Matter
14
24, 424209 (2012).
15
16 [64] T. Thonhauser, S. Zuluaga, C. A. Arter, K. Berland, E. Schröder, and P. Hyldgaard, Phys. Rev. Lett.
17
115, 136402 (2015).
18
19 [65] V. R. Cooper, Phys. Rev. B 81, 161104 (2010).
20 [66] J. Klimeš, D. R. Bowler, and A. Michaelides, J. Phys. Condens. Matter 22, 022201 (2010).
21
22 [67] J. Klimeš, D. R. Bowler, and A. Michaelides, Phys. Rev. B 83, 195131 (2011).
23 [68] K. Berland and P. Hyldgaard, Phys. Rev. B 89, 035412 (2014).
24
25 [69] K. Lee, E. D. Murray, L. Kong, B. I. Lundqvist, and D. C. Langreth, Phys. Rev. B 82, 081101 (2010).
26 [70] I. Hamada and M. Otani, Phys. Rev. B 82, 153412 (2010).
27
[71] O. A. Vydrov and T. Van Voorhis, J. Chem. Phys. 133, 244103 (2010).
28
29 [72] R. Sabatini, T. Gorni, and S. de Gironcoli, Phys. Rev. B 87, 041108(R) (2013).
30
[73] http://schooner.chem.dal.ca.
31
32 [74] A. Becke, J. Chem. Phys. 85, 7184 (1986).
33 [75] J. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996).
34
35 [76] J. Perdew and W. Yue, Phys. Rev. B 33, 8800 (1986).
36 [77] A. Otero-de-la-Roza and E. R. Johnson, J. Chem. Phys. 136, 174109 (2012).
37
38 [78] F. L. Hirshfeld, Theor. Chim. Acta 44, 129 (1977).
39 [79] J. Hermann, R. A. DiStasio Jr., and A. Tkatchenko, Chem. Rev. 117, 4714 (2017).
40
[80] N. Ferri, R. A. DiStasio Jr., A. Ambrosetti, R. Car, and A. Tkatchenko, Phys. Rev. Lett. 114, 176802
41
42 (2015).
43
[81] M. Cococcioni and S. de Gironcoli, Phys. Rev. B 71, 035105 (2005).
44
45 [82] B. Himmetoglu, A. Floris, S. de Gironcoli, and M. Cococcioni, Int. J. Quantum Chem. 114, 14 (2014).
46 [83] S. L. Dudarev, G. A. Botton, S. Y. Savrasov, C. J. Humphreys, and A. P. Sutton, Phys. Rev. B 57,
47
48 1505 (1998).
49 [84] A. I. Liechtenstein, V. I. Anisimov, and J. Zaanen, Phys. Rev. B 52, R5467 (1995).
50
51 [85] I. Timrov, M. Cococcioni, and N. Marzari, in preparation.
52 [86] H. F. Wilson, F. Gygi, and G. Galli, Phys. Rev. B 78, 113303 (2008).
53
[87] H.-V. Nguyen and S. de Gironcoli, Phys. Rev. B 79, 205114 (2009).
54
55 [88] N. Colonna, M. Hellgren, and S. de Gironcoli, Phys. Rev. B 90, 125150 (2014).
56
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 64 of 68
64
1
2
3 [89] N. L. Nguyen, N. Colonna, and S. de Gironcoli, Phys. Rev. B 90, 045138 (2014).
4 [90] S. Baroni, P. Giannozzi, and A. Testa, Phys. Rev. Lett. 58, 1861 (1987).
5
6 [91] P. Giannozzi, S. de Gironcoli, P. Pavone, and S. Baroni, Phys. Rev. B 43, 7231 (1991).
7 [92] X. Gonze, Phys. Rev. A 52, 1096 (1995).
8
9 [93] S. Baroni, S. de Gironcoli, A. Dal Corso, and P. Giannozzi, Rev. Mod. Phys. 73, 515 (2001).
10 [94] R. M. Sternheimer, Phys. Rev. 96, 951 (1954).
11
12 [95] G. D. Mahan, Phys. Rev. A 22, 1780 (1980).
13 [96] C. Schwartz and J. Tiemann, Ann. Phys. 2, 178 (1959).
14
[97] W. Zernik, Phys. Rev. 135, A51 (1964).
15
16 [98] S. Baroni and A. Quattropani, Il Nuovo Cimento D 5, 89 (1985).
17
[99] M. Casida, in Recent Developments and Applications of Modern Density Functional Theory, edited by
18
19 J. Seminario (Elsevier, Amsterdam, 1996) p. 391.
20 [100] C. Jamorski, M. E. Casida, and D. R. Salahub, J. Chem. Phys. 104, 5134 (1996).
21
22 [101] A. D. McLachlan and M. A. Ball, Rev. Mod. Phys. 36, 844 (1964).
23 [102] H. Hübener and F. Giustino, J. Chem. Phys. 141 (2014).
24
25 [103] D. Rocca, D. Lu, and G. Galli, J. Chem. Phys. 133, 164109 (2010).
26 [104] D. Rocca, Y. Ping, R. Gebauer, and G. Galli, Phys. Rev. B 85, 045116 (2012).
27
[105] M. Marsili, E. Mosconi, F. De Angelis, and P. Umari, Phys. Rev. B 95, 075415 (2017).
28
29 [106] M. Govoni and G. Galli, J. Chem. Theory Comput. 11, 2680 (2015).
30
[107] R. Sabatini, E. Küçükbenli, C. H. Pham, and S. de Gironcoli, Phys. Rev. B 93, 235120 (2016).
31
32 [108] A. Floris, S. de Gironcoli, E. K. U. Gross, and M. Cococcioni, Phys. Rev. B 84, 161102(R) (2011).
33 [109] A. Floris, I. Timrov, N. Marzari, S. de Gironcoli, and M. Cococcioni, in preparation.
34
35 [110] M. Blanchard, E. Balan, P. Giura, K. Beneut, H. Yi, G. Morin, C. Pinilla, M. Lazzeri, and A. Floris,
36 Physics and Chemistry of Minerals 41, 289 (2014).
37
38 [111] M. Blanchard, N. Dauphas, M. Hu, M. Roskosz, E. Alp, D. Golden, C. Sio, F. Tissot, J. Zhao, L. Gao,
39 R. Morris, M. Fornace, A. Floris, M. Lazzeri, and E. Balan, Geochimica et Cosmochimica Acta 151,
40
19 (2014).
41
42 [112] G. Shukla, Z. Wu, H. Hsu, A. Floris, M. Cococcioni, and R. M. Wentzcovitch, Geophysical Research
43
Letters 42, 1741 (2015).
44
45 [113] G. Shukla and R. M. Wentzcovitch, Physics of the Earth and Planetary Interiors 260, 53 (2016).
46 [114] G. Shukla, M. Cococcioni, and R. M. Wentzcovitch, Geophysical Research Letters 43, 5661 (2016).
47
48 [115] E. Runge and E. Gross, Phys. Rev. Lett. 52, 997 (1984).
49 [116] M. A. L. Marques, N. T. Maitra, F. M. S. Nogueira, E. K. U. Gross, and A. Rubio, eds., Fundamentals
50
51 of Time-Dependent Density Functional Theory, Vol. 837 (Lecture Notes in Physics, Springer-Verlag,
52 Berlin, Heidelberg, 2012).
53
[117] S. Baroni and R. Gebauer, The Liouville-Lanczos Approach to Time-Dependent Density-Functional
54
55 (Perturbation) Theory (Ref. 116, chapter 19, p. 375-390).
56
57
58
59
60
Page 65 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
65
1
2
3 [118] T. Gorni, I. Timrov, and S. Baroni, in preparation.
4 [119] S. Baroni and R. Resta, Phys. Rev. B 33, 7017 (1986).
5
6 [120] J. Tobik and A. Dal Corso, J. Chem. Phys. 120, 9934 (2004).
7 [121] R. Haydock, V. Heine, and M. J. Kelly, Journal of Physics C: Solid State Physics 5, 2845 (1972).
8
9 [122] R. Haydock, V. Heine, and M. J. Kelly, Journal of Physics C: Solid State Physics 8, 2591 (1975).
10 [123] M. Grüning, A. Marini, and X. Gonze, Computational Materials Science 50, 2148 (2011).
11
12 [124] E. R. Davidson, J. Comput. Phys. 17, 87 (1975).
13 [125] G. Onida, L. Reining, and A. Rubio, Rev. Mod. Phys. 74, 601 (2002).
14
[126] I. Timrov, M. Markov, T. Gorni, M. Raynaud, O. Motornyi, R. Gebauer, S. Baroni, and N. Vast,
15
16 Phys. Rev. B 95, 094301 (2017).
17
[127] L. Hedin, Phys. Rev. 139, A796 (1965).
18
19 [128] M. S. Hybertsen and S. G. Louie, Phys. Rev. Lett. 55, 1418 (1985).
20 [129] L. Reining, G. Onida, and R. W. Godby, Phys. Rev. B 56, R4301 (1997).
21
22 [130] H. F. Wilson, D. Lu, F. Gygi, and G. Galli, Phys. Rev. B 79, 245106 (2009).
23 [131] F. Giustino, M. L. Cohen, and S. G. Louie, Phys. Rev. B 81, 115105 (2010).
24
25 [132] P. Umari and S. Fabris, J. Chem. Phys. 136, 174310 (2012).
26 [133] P. Umari, E. Mosconi, and F. De Angelis, Scientific Reports 4, 4467 EP (2014).
27
[134] F. Caruso, H. Lambert, and F. Giustino, Phys. Rev. Lett. 114, 146404 (2015).
28
29 [135] H. Lambert and F. Giustino, Phys. Rev. B 88, 075117 (2013).
30
[136] C. J. Pickard and F. Mauri, Phys. Rev. B 63, 245101 (2001).
31
32 [137] M. d’Avezac, N. Marzari, and F. Mauri, Phys. Rev. B 76, 165122 (2007).
33 [138] C. J. Pickard and F. Mauri, Phys. Rev. Lett. 88, 086403 (2002).
34
35 [139] H. M. Petrilli, P. E. Blöchl, P. Blaha, and K. Schwarz, Phys. Rev. B 57, 14690 (1998).
36 [140] J. W. Zwanziger, J. Phys.: Condens. Matter 21, 195501 (2009).
37
38 [141] M. S. Bahramy, M. H. F. Sluiter, and Y. Kawazoe, Phys. Rev. B 76, 035124 (2007).
39 [142] H. J. von Bardeleben, J. L. Cantin, H. Vrielinck, F. Callens, L. Binet, E. Rauls, and U. Gerstmann,
40
Phys. Rev. B 90, 085203 (2014).
41
42 [143] R. Pigliapochi, A. J. Pell, I. D. Seymour, C. P. Grey, D. Ceresoli, and M. Kaupp, Phys. Rev. B 95,
43
054412 (2017).
44
45 [144] J. R. Yates, C. J. Pickard, and F. Mauri, Phys. Rev. B 76, 024401 (2007).
46 [145] E. Küçükbenli, K. Sonkar, N. Sinha, and S. de Gironcoli, J. Phys. Chem. A 116, 3765 (2012).
47
48 [146] S. de Gironcoli, Phys. Rev. B 51, 6773 (1995).
49 [147] D. Xiao, J. Shi, and Q. Niu, Phys. Rev. Lett. 95, 137204 (2005).
50
51 [148] T. Thonhauser, D. Ceresoli, D. Vanderbilt, and R. Resta, Phys. Rev. Lett. 95, 137205 (2005).
52 [149] T. Thonhauser, D. Ceresoli, A. A. Mostofi, N. Marzari, R. Resta, and D. Vanderbilt, J. Chem. Phys.
53
131, 101101 (2009).
54
55 [150] D. Ceresoli, U. Gerstmann, A. P. Seitsonen, and F. Mauri, Phys. Rev. B 81, 060409 (2010).
56
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 66 of 68
66
1
2
3 [151] B. M. George, J. Behrends, A. Schnegg, T. F. Schulze, M. Fehr, L. Korte, B. Rech, K. Lips,
4 M. Rohrmüller, E. Rauls, W. G. Schmidt, and U. Gerstmann, Phys. Rev. Lett. 110, 136803 (2013).
5
6 [152] Z. Bodrog and A. Gali, J. Phys.: Condens. Matter 26, 015305 (2014).
7 [153] C. Gougoussis, M. Calandra, A. P. Seitsonen, and F. Mauri, Phys. Rev. B 80, 075102 (2009).
8
9 [154] C. Gougoussis, M. Calandra, A. Seitsonen, C. Brouder, A. Shukla, and F. Mauri, Phys. Rev. B 79,
10 045118 (2009).
11
12 [155] O. Bunău and M. Calandra, Phys. Rev. B 87, 205105 (2013).
13 [156] M. Taillefumier, D. Cabaret, A.-M. Flank, and F. Mauri, Phys. Rev. B 66, 195107 (2002).
14
[157] G. Fratesi, V. Lanzilotto, L. Floreano, and G. P. Brivio, J. Phys. Chem. C 117, 6632 (2013).
15
16 [158] G. Fratesi, V. Lanzilotto, S. Stranges, M. Alagia, G. P. Brivio, and L. Floreano, Phys. Chem. Chem.
17
Phys. 16, 14834 (2014).
18
19 [159] M. Lazzeri and S. de Gironcoli, Phys. Rev. B 65, 245402 (2002).
20 [160] G. Deinzer, G. Birner, and D. Strauch, Phys. Rev. B 67, 144304 (2003).
21
22 [161] M. Calandra, M. Lazzeri, and F. Mauri, Physica C: Superconductivity 456, 38 (2007), Recent
23 Advances in MgB2 Research.
24
25 [162] J. Callaway, Phys. Rev. 113, 1046 (1959).
26 [163] M. Markov, J. Sjakste, G. Fugallo, L. Paulatto, M. Lazzeri, F. Mauri, and N. Vast, Phys. Rev. B 93,
27
064301 (2016).
28
29 [164] M. Markov, J. Sjakste, G. Fugallo, L. Paulatto, M. Lazzeri, F. Mauri, and N. Vast, submitted.
30
[165] A. Ward, D. A. Broido, D. A. Stewart, and G. Deinzer, Phys. Rev. B 80, 125203 (2009).
31
32 [166] G. Fugallo, A. Cepellotti, L. Paulatto, M. Lazzeri, N. Marzari, and F. Mauri, Nano Letters 14, 6109
33 (2014).
34
35 [167] A. Cepellotti, G. Fugallo, L. Paulatto, M. Lazzeri, F. Mauri, and N. Marzari, Nat. Commun. 6 (2015).
36 [168] W. Li, J. Carrete, N. A. Katcho, and N. Mingo, Comp. Phys. Commun. 185, 17471758 (2014).
37
38 [169] F. Giustino, Rev. Mod. Phys. 89, 015003 (2017).
39 [170] N. Marzari, A. A. Mostofi, J. R. Yates, I. Souza, and D. Vanderbilt, Rev. Mod. Phys. 84, 1419 (2012).
40
[171] A. A. Mostofi, J. R. Yates, G. Pizzi, Y. S. Lee, I. Souza, D. Vanderbilt, and N. Marzari, Comput.
41
42 Phys. Commun. 185, 2309 (2014).
43
[172] L. A. Agapito, S. Curtarolo, and M. Buongiorno Nardelli, Physical Review X 5, 011006 (2015).
44
45 [173] A. Calzolari and M. Buongiorno Nardelli, Scientific Reports 3 (2013).
46 [174] P. Umari and A. Pasquarello, Phys. Rev. Lett. 89, 157602 (2002).
47
48 [175] Y. Wang, S. Shang, Z.-K. Liu, and L.-Q. Chen, Phys. Rev. B 85, 224303 (2012).
49 [176] R. King-Smith and D. Vanderbilt, Phys. Rev. B 47, 1651 (1993).
50
51 [177] P. Umari, X. Gonze, and A. Pasquarello, Phys. Rev. Lett. 90, 027401 (2003).
52 [178] J. Tomasi, B. Mennucci, and R. Cammi, Chem. Rev. 105, 2999 (2005).
53
[179] J.-L. Fattebert and F. Gygi, J. Comput. Chem. 23, 662 (2002).
54
55 [180] G. Fisicaro, L. Genovese, O. Andreussi, N. Marzari, and S. Goedecker, J. Chem. Phys. 144, 014103
56
57
58
59
60
Page 67 of 68 AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2
67
1
2
3 (2016).
4 [181] C. Dupont, O. Andreussi, and N. Marzari, J. Chem. Phys. 139, 214110 (2013).
5
6 [182] O. Andreussi, I. Dabo, G. Fisicaro, S. Goedecker, I. Timrov, S. Baroni, and N. Marzari, “Environ 0.2:
7 a library for environment effect in first-principles simulations of materials,” www.quantum-environ.
8
9 org (2016).
10 [183] M. Cococcioni, F. Mauri, G. Ceder, and N. Marzari, Phys. Rev. Lett. 94, 145501 (2005).
11
12 [184] D. A. Scherlis, J. L. Fattebert, F. Gygi, M. Cococcioni, and N. Marzari, J. Chem. Phys. 124, 074103
13 (2006).
14
[185] I. Dabo, B. Kozinsky, N. E. Singh-Miller, and N. Marzari, Phys. Rev. B 77, 115139 (2008).
15
16 [186] A. Laio, J. VandeVondele, and U. Rothlisberger, J. Chem. Phys. 116, 6941 (2002).
17
[187] A. H. MacDonald and S. H. Vosko, J. Phys. C 12, 2977 (1979).
18
19 [188] A. K. Rajagopal and J. Callaway, Phys. Rev. B 7, 1912 (1973).
20 [189] T. Brumme, M. Calandra, and F. Mauri, Phys. Rev. B 89, 245406 (2014).
21
22 [190] L. Bengtsson, Phys. Rev. B 59, 12301 (1999).
23 [191] M. Topsakal and S. Ciraci, Phys. Rev. B 85, 045121 (2012).
24
25 [192] J. Neugebauer and M. Scheffler, Phys. Rev. B 46, 16067 (1992).
26 [193] B. Meyer and D. Vanderbilt, Phys. Rev. B 63, 205426 (2001).
27
[194] I. Štich, R. Car, M. Parrinello, and S. Baroni, Phys. Rev. B 39, 4997 (1989).
28
29 [195] O. Jepsen and O. K. Andersen, Solid State Commun. 9, 1763 (1971).
30
[196] P. E. Blöchl, O. Jepsen, and O. K. Andersen, Phys. Rev. B 49, 16223 (1994).
31
32 [197] M. Kawamura, Y. Gohda, and S. Tsuneyuki, Phys. Rev. B 89, 094515 (2014).
33 [198] F. Zadra and A. Dal Corso, unpublished.
34
35 [199] T. Hahn, ed., International tables for crystallography Volume A: Space-group symmetry (Springer,
36 2005).
37
38 [200] A. Marek, V. Blum, R. Johanni, V. Havu, B. Lang, T. Auckenthaler, A. Heinecke, H.-J. Bungartz,
39 and H. Lederer, J. Phys.: Condens. Matter 26, 213201 (2014).
40
[201] A. Marini, C. Hogan, M. Grüning, and D. Varsano, Comput. Phys. Commun. 180, 1392 (2009).
41
42 [202] L. Martin-Samos and G. Bussi, Comput. Phys. Commun. 180, 1416 (2009).
43
[203] J. Deslippe, G. Samsonidze, D. A. Strubbe, M. Jain, M. L. Cohen, and S. G. Louie, Comput. Phys.
44
45 Commun. 183, 1269 (2012).
46 [204] G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, and B. Kozinsky, Computational Materials Science
47
48 111, 218 (2016).
49 [205] N. Mounet, M. Gibertini, P. Schwaller, A. Merkys, I. E. Castelli, A. Cepellotti, G. Pizzi, and
50
51 N. Marzari, (2016), arXiv:1611.05234 [cond-mat].
52 [206] A. R. Supka, T. E. Lyons, L. Liyanage, P. D’Amico, R. A. R. Al Orabi, S. Mahatara, P. Gopal, C. To-
53
her, D. Ceresoli, A. Calzolari, S. Curtarolo, M. Buongiorno Nardelli, and M. Fornari, Computational
54
55 Materials Science 136, 76 (2017).
56
57
58
59
60
AUTHOR SUBMITTED MANUSCRIPT - JPCM-109727.R2 Page 68 of 68
68
1
2
3 [207] The HDF Group, “Hierarchical data format version 5,” http://www.hdfgroup.org/HDF5 (2000-2010).
4 [208] A. Calzolari, I. Souza, N. Marzari, and M. Buongiorno Nardelli, Phys. Rev. B 69, 035108 (2004).
5
6 [209] A. Ferretti, A. Calzolari, B. Bonferroni, and R. Di Felice, J. Phys.: Condens. Matter 19, 036215
7 (2007).
8
9 [210] M. Bonomi, D. Branduardi, G. Bussi, C. Camilloni, D. Provasi, P. Raiteri, D. Donadio, F. Marinelli,
10 F. Pietrucci, R. A. Broglia, and M. Parrinello, Comput. Phys. Commun. 180, 1961 (2009).
11
12 [211] S. R. Bahn and K. W. Jacobsen, Comput. Sci. Eng. 4, 56 (2002).
13 [212] D. A. Moon, G. L. Steele Jr., and R. Stallman et al., “Emacs: The extensible, customizable display
14
editor,” https://www.gnu.org/software/emacs/ (2017).
15
16 [213] J. Spencer, “testcode,” https://github.com/jsspencer/testcode (2017).
17
[214] N. Marzari, Nature Materials 15, 381 (2016).
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60