As part of my dissertation, under my advisor Dr. Tao Pang , I compiled a threaded version of FHI98md from the Fritz-Haber-Institute theory group. This program uses Density Functional Theory to do First Principles Calculations with various types of materials. I have not compiled a demo version but if you have registered, you may download the parallel distribution here . Contained in the distribution is also a serial version linked against the SCSL Libraries referenced below.
This version uses the Parallel SGI/Cray Scientific Library (SCSL) version 1.2.0 which can be obtained freely from
Right now the parallel version is fairly well optimized for shared memory systems. It could be optimized a bit more, but unfortunately I have to do some physics so that I can get my Ph.D. and graduate. The threaded version automatically uses the maximum number of empty processors on a particular computer. One can set the maximum number of processors available to the job by setting the environment variable MP_SET_NUMTHREADS. So if one wishes to set the maximum number of threads to 4 in a bash shell....
With this code you will not get perfect speed up. The speed up is a measure of how much faster the program runs on multiple processors. Theoretically, if you use four processors instead of one, the program should theoretically run 4 times faster. The best I have seen so far is a 2.2 times speed up on four processors. The speed ups you see will be dependent on the specifics of the system you are running. For runs with few k-points, small cells, low energy cutoffs, and few electrons, your jobs will run slower on multiple processors than they would on one. On the other hand, for jobs with many k-points, large cells, and high energy cutoffs, and lots of electrons will run faster on four processors than on one.
This page will be updated continuously as I get access to larger machines with more processors. For now here is an old graph I did last year. I can't remember the differences between the runs, but this gives you an idea as to what kinds
of speed ups you should get.
I would be happy to post other timings on different computers if you would send them to me. If you do, please send me the input files as well.
One may notice that the SCSL serial version is faster that the SCSL parallel version using one thread. This is because the serial SCSL version uses some of the old netlib subroutines which are faster than the some of SCSL subroutines.
I would really appreciate an email from anyone who decides to use this version of the code. If one wishes to reference the program in a publication please use the reference to the original fhi98md paper:
M. Bockstedte, A. Kley, J. Neugebauer and M. Scheffler: "Density-functional theory calculations for polyatomic systems:Electronic structure,
static and elastic properties and ab initio molecular dynamics",
Comp. Phys. Commun. 107, 187 (1997).
[abs,src,ps], European mirror: [abs,src,ps]
If you wish to reference this parallel version please just include a reference to my name which is Anthony J. Zukaitis.