Dynaphopy

Big data

In order to obtain well converged results in the calculation of anharmonic phonon properties, specially in phonon linewidths, usually it becomes necessary to use very long MD simulations. This requires long calculation times and imply to hadle very large files. Fortunatelly, in canonical ensemble calculations the velocity time correlation usually decreases fast with the time. That means that it becomes possible to join several independent MD trajectories to obtain a larger one with almost negligible added error. That allows to calculate many independent MD simulations at the same time and join them later to obtain a larger MD trajectory and increase the accuracy of the results.
To support this task DynaPhoPy can save the MD information into a hdf5 file to be read afterwards. Additionally I provide an independent script called concath5 which allows join several hdf5 files into one.
Note: VASP code has to be compiled with advanced features in order to start MD from a random seed and generate different trajectories each run. Please, refere to VASP documentation for further information.

Save data into hdf5 file

Saving data into hdf5 file can be usefull in order to save space in the hard drive. Since only the necessary information for dynaphopy is stored, this hdf5 file size can be around 5 times smaller than the original MD TRAJECTORY file. Besides it allows to manipulate this information as an intermediate step using concath5 or other third party software. To request to save MD trajectory into hdf5 file use -sv flag follower by the file name.
This option can be used alone or together with other flags but, in any case, all time steps in TRAJECTORY file will be stored in the hdf5 file (-n flag has no effect in the number of time steps stored).
```
$ dynaphopy input_file TRAJECTORY -sv filename.h5
```
Load data into hdf5 file

Once saved into a hdf5 file, MD data can be load to dynaphopy using -lv flag. Loading data from hdf5 is much faster than using TRAJECTORY file, if several calculation using the same data have to be done, it is highly recomended to use this option. than using TRAJECTORY file, if several calculation using the same data have to be done, it is highly recomended to use this option.
Notice that when loading data this way, TRAJECTORY must not not be set.
```
$ dynaphopy input_file -lv filename.h5
```
Manipulating data

The structure of the hdf5 file is quite simple. With a minimum knowledge of programming it can be easly manipulated by third parties software. In dynaphopy script directory a simple script for joining data is provided named concath5. It can be used as example for more complex ones. To use this script just call it followed by the name of all the hdf5 files to be joined and the last one the name of the new hdf5 file that will contain all the joined data.
```
$ concath5 source1.h5 source2.h5 source3.h5 ··· joined.h5
```
By default concath5 skips the first 20000 time steps of each source MD trajectory. This is done to ensure that the MD is totally converged in each file. Using -f flag, is possible to define the number of time steps to be skipped.
```
$ concath5 source1.h5 source2.h5 ··· joined.h5
```
Depending on the processed MD simulation the size of the resulting hdf5 file could be huge. In the lastest version, by default concath5 stores all the velocity as well as the trajectory coordinates at each timestep. To calculate the phonon related properties using DynPhoPy only the velocity is needed. So if no atomic displacement calculation is needed option --velocity_only can be used. This option makes concath5 script to not store the trajectory in the final hdf5 file saving HD space.
```
$ concath5 --velocity_only source1.h5 source2.h5 ··· joined.h5
```
Working with wave-vector projected velocity (Saving RAM memory)

The amount of physical RAM memory available may strongly limit the size of the trajectory that DyanPhopy can read. Specially for LAMMPS trajectories, whose output files can weight a few tens of Gb for long MD simulations, the memory can become easily full. This is usually no due to the trajectory size itself but the process of it that DynaPhoPy does.
To overcome this problem the trajectory can be read by pieces, projected into a wave vector, stored in the disk (hdf5 files), joined and read with DynaPhoPy again to calculate the power spectrum and the phonon properties.
Due to the dimensionality of the projected velocity is much lower than the original trajectory the, MD length that can be used is much longer. However, since the velocity is already projected, it only can be used to analyze the phonons in the projected wave vector. That means that for each wave vector to be analyzed this procedure has to be repeated.
Read data in pieces
There are to flags to control the piece of data that DynaPhoPy will read. --read_from and --read_to. Each flag accept only one integer number each and define the interval of data to be read from the output file.
```
$ dynaphopy input_file OUT.lammpstrj --read_from 10000 --read_to 20000
```
This reads the trajectory in OUT.lammpstrj file from the 10000th until 20000th time step.
To store the wave vector projected velocity into a file the -svc flag is used.
```
$ dynaphopy input_file OUT.lammpstrj -svc projected_velocity.h5
```
Combining these three flags we can divide a long MD trajectory into smaller wave projected velocity files. These files can be joint using the concath5 script in the save way shown before.
```
$ concath5 proj_vel1.h5 proj_vel2.h5 ··· proj_vel_joined.h5
```
In this case -f flag should not be since those files are part of the same trajectory and we want the files to match exactly.
Once joined, the combined proj_vel_joined.h5 file can be read by DynaPhoPy using -lv flag.
```
$ dynaphopy input_file -lv projected_velocity.h5
```
It has to be pointed that when reading proj_vel_joined.h5 file the wave vector must not to be set. Wave vector is already stored in proj_vel_joined.h5 file. Defining another wave vector using -q flag will make DynaPhoPy to stop. Keep in mind that the number of features available reading from a projected velocity file are limited to power spectrum calculations and peak analysis
Mapping RAM memory on disk (Saving RAM memory in exchange of storage space)

Alterativelly to divide the MD data into small pieces DynaPhoPy allows to map most part of the RAM memory on disk. This option allow us to work normally with dynaphopy without any pre processing of the data. To activate memory mapping use --memmap. This option requieres to define explicitily the amount of data to be loaded by options --read_from and --read_to
```
$ dynaphopy input_file TRAJECTORY --read_to 100000 --memmap
```
Memory will be mapped in up to 4 files stored by default in the working directory. These files will be removed once the execution is finished (if DynaPhoPy crashed these files may not be removed). Environment variable DYNAPHOPY_TEMPDIR can be set to define an alternative location for these temporal files.
```
export DYNAPHOPY_TEMPDIR=/scratch
```
When working with LAMMPS MD trajectories, this procedure can be done more efficently reading the atomic velocities directly from LAMMPS output file instead to calculate them from the atomic positions (check prepare MD section for more information about how to generate a LAMMPS output). This reduces the amount of RAM memmory to be used during the calculation and therefore the amount of storage space when using --memmap option.
Warning: Using memory mapping will decrease the performance of the calculation and the i/o disk access will be intensive. It is recommended to use a local SSD disk for this purpose.

MEM coeficient analysis

The correct setting of the number of coefficents in MEM algorithm to calculate the power spectrum is one of the crucial point to take in account. Depending on the number of coefficents the peak width of the peaks may significantly change, which directly affects the phonon linewidths results.
From my experience, if the MD simulations are long enough, as the number of coefficents increase the peak width become to stabilize while keeping the peaks shape smooth. Is to be expected than this stabilized width is more reliable even if the peaks become a little more uglyer.
In order to check the evolution of the power spectra peaks width with the number of coefficents dynaphopy provides option -csa which performs a scan for all peaks anf fits each one to Lorentzian function to extract width information. All this information is plotted in 3 matplotlib windows for each phonon.

$ dynaphopy input_file TRAJECTORY -csa

Additionally, it shows the optimum number of coeficients that has the best fit with a Lorentzian function and estimates the peak width value taking a average for all coefficents ponderated by the inverse of fit error.

Peak # 5
------------------------------------
Estimated width(FWHM): 0.0570242354113 THz
Position: 3.8546680966 THz
Optimum coefficients num: 336.0
Fitting Error: 2.31991499922e-07

External power spectra analysis

The calculation of the power spectra, specially using Fourier transform method, can take a lot of time (maybe few days). For this reason the calculation of these spectra may be done in a cluster computer in which may not have acces through graphical interface. This makes the fitting analysis difficult since there is no way to evaluate visually if the fitting has been correctly done. Besides, the power spectrum obtained may be noisy and a post processing may be needed to improve the fitting quality.
In dynaphopy package, fitdata script is included in scripts folder to extract the anharmonic information from a power spectrum file obtained with Dynaphopy throught -sw or -sp options. This allows to first calculate the power spectrum using Dynaphopy in a cluster computer, saving it in a file and later extract the information in a local machine. Additionally this script allows to smooth the spectrum using Local regression method before fitting it with a Lorentzian function.
During the execution it shows the smoothing result and the fitting result for each spectrum present in the spectrum file allowing to test different smoothing parameters to obtain the best results.

Simple execution

By default the script takes all the spectra present in the file and fits them to a Lorentzian function (equivalent to Dynaphopy peak analysis).
```
$ fitdata file
```
Smoothing spectra using local regression method

This method works very well removing the noise in very noisy spectra. This allows to obtain much better results in the fitting and, in many cases, it becomes indispensable in order to get a meaningful result. To do this, the visual analysis becomes crucial to compare the original noisy spectrum and the smoothed one to make sure that the essetial peak shape is kept. To request the smooth methos use -sm flag.
```
$ fitdata -sm file
```
Changing smoothing parameters

In case that the smoothing process is not succesfull, it can be ajusted via smoothing span parameter -f (Default: 0.2). Increasing its value will make the strectra smoother but may differ more from the original spectrum.
```
$ fitdata -sm -f 0.4 file
```
Combining spectra

In high symmetry points in the reciprocal space (wave vector), it may happen that two or more phonon are degenerated. In that case the projection into this wave vector and, consecutively, the power spectra should present the same properties (Width and position). In order to increase the fitting precision these equivalent spectra can be conbined into one reducing the random error. This can be done using -bi flag. Following this flag it is nedeed to introduce a string containing the number of the spectra to be combined (equivalent spectra) separated by spaces and the non equivalent spectra separated by comma like in the following example:
```
$ fitdata -bi "1 2, 3 4 5" file
```
in this example, the script will fit two phonon power spectra. The first one using the avarage of the spectra 1 and 2. The second one using the average of spectra 3, 4 and 5.

Thermal expansion contribution

Thermal expansion may have a huge contribution to the frequency shift at high temperatures. This contribution can be included in DynaPhoPy via force constants. These force constants are obtained from a lattice dynamics calculation using a unit cell correspoponding to the cell volume at a given temperature. This temperature should correspond to the temperature at which the MD simulation is calculated. The crystal symmetry of the unit cell is assumed to remain constant with the volume. If the crystal symmetry of the structure at the analyzed temperature is different respect the symmetry of the structure at 0 K temperature this approximation may be incorrect.
In order to calculate the thermal expansion contribution the quasi-harmonic approximation (QHA) may be used. DynaPhoPy package includes a script to facilitate this calculation using phonopy: qha_extract. To use this script, first it is necessary to prepare a set of phonon calculations at different volumes as described in phonopy-qha manual. Using this procedure we should obtain 3 kinds of files:
1) A FORCE_CONSTANTS file at each volume.
2) A thermal_properties.yaml file at each volume
3) A e-v.dat file.
The qha_extract is executed by:

$ qha_extract input_file -ev e-v.dat 
-fc FORCE_CONSTANTS_1 FORCE_CONSTANTS_2 FORCE_CONSTANTS_3 ...  
-tp thermal_properties_1.yaml thermal_properties_2.yaml ...
-t 800

where input_file is a usual DynaPhoPy input file (the same used to analyze the MD is OK) and -t flag indicates the temperature at which the force constants are obtained. This temperature should be the one at which the MD simulation is calculated. The output result of this script is a file named FORCE_CONSTANTS_QHA that contains the force constants at the given temperature.
The order in which the files are introduced using -fc and -tp flags matters, make sure to be consistent each other. Wildcards can be used, just make sure the filenames is properly set according to alphanumeric order:

$ qha_extract input_file -ev e-v.dat 
-fc FORCE_CONSTANTS_*  
-tp thermal_properties_*
-t 800

Optional flags:
-p Plot results of the QHA calculation using phonopy-qha
-sfc Defines custom name for the ouput FORCE_CONSTANTS_QHA file.

Once the calculation is finished, the file FORCE_CONSTANTS_QHA is used in DynaPhoPy by the flag --qha_force_constants:

$ dynaphopy input_file TRAJECTORY 
--qha_force_constants FORCE_CONSTANTS_QHA

Big data

Save data into hdf5 file

Load data into hdf5 file

Manipulating data

Working with wave-vector projected velocity (Saving RAM memory)

Mapping RAM memory on disk (Saving RAM memory in exchange of storage space)

MEM coeficient analysis

External power spectra analysis

Simple execution

Smoothing spectra using local regression method

Changing smoothing parameters

Combining spectra

Thermal expansion contribution