= Introduction = This page ilustrates MUSCLization of legacy astrophysics code [http://piernik.astri.umk.pl/ PIERNIK] (what stands for "Gingerbread" in polish) developed outside of the MAPPER project. == The orginal code == - single binary image - FORTRAN2008 - MPI/HDF5 dependency - high scalability of the main MHD module - propotype particule simulation Monte Carlo module - coupling done via global variables == Motivations== The next sections section describes what steps are needed to transform from existing coupling done via shared memory to one exploiting MUSCLE framework. Here we shortly list expected benefits: * Both MHD and MC can run concurrently, exchanging data at the begging of each time step. This brings a potential of introducing new level of parallelism, thus resulting in shorter walltimes. In curent code MHD and MC simulations are called sequentially, one after another. * The previous tests shows that MC code is much more resource demanding while having potential for greater scalability than the MHD part. Using MUSCLE it is possible to assign different number of resources to kernels (e.g. 12 cores for MHC code, 120 cores for MC one) * the MC code is in process of GPU-enabling, one may want to run both modules on different heterogenous resources (i.e. MC on GPU cluster, base MHD code on Intel Nehalem cluster). In the end we will try to verify the above hypothesis in production runs. == Step By Step Guide == == linking with MUSCLE == PIERNIK uses own build system based on Python scripts, all set of flags used for compilation are stored in plain configuration files like this one: {{{ PROG = piernik USE_GNUCPP = yes F90 = mpif90 F90FLAGS = -ggdb -fdefault-real-8 -ffree-form -std=gnu -fimplicit-none -ffree-line-length-none F90FLAGS += -Ofast -funroll-loops F90FLAGS += -I/software/local/libs/hdf5/1.8.9-pre1/gnu-4.7.2-ompi/include LDFLAGS = -Wl,--as-needed -Wl,-O1 -L/software/local/libs/hdf5/1.8.9-pre1/gnu-4.7.2-ompi/lib }}} In order to link with MUSCLE 2.0 we have to alter the last lines of the file: {{{ ... LDFLAGS = -Wl,--as-needed -Wl,-O1 -L/software/local/libs/hdf5/1.8.9-pre1/gnu-4.7.2-ompi/lib -L/mnt/lustre/scratch/groups/plggmuscle/2.0/devel-debug/lib LIBS = -lmuscle2 }}} Then we build the code with the following command: {{{ # load MUSCLE module load muscle2/devel-debug # load PIERN dependencies (HDF5, OpenMPI, newest GNU compiler) module load plgrid/libs/hdf5/1.8.9-gnu-4.7.2-ompi ./setup mc_collisions_test -c gnufast -d HDF5,MUSCLE,MHD_KERNEL }}} The `MUSCLE` and `MHD_KERNEL` stands for preprocessor defines as we want to keep the MUSCLE dependency conditional, we will use them later. == Preparation - adding MUSCLE_Init and MUSCLE _finalize calls == Using this [Fortran API|MUSCLE Fortran tuturial] as reference it was relatively easy to add the following code to the main PIERNIK file (piernik.F90): {{{ #ifdef MUSCLE call muscle_fortran_init #endif call init_piernik ... call cleanup_piernik #ifdef MUSCLE call MUSCLE_Finalize #endif ... #ifdef MUSCLE subroutine muscle_fortran_init() implicit none integer :: argc, i, prevlen, newlen character(len=25600) :: argv character(len=255) :: arg prevlen = 0 argc = command_argument_count() do i = 0, argc call get_command_argument(i, arg) newlen = len_trim(arg) argv = argv(1:prevlen) // arg(1:newlen) // char(0) prevlen = prevlen + newlen + 1 end do call MUSCLE_Init(argc, argv(1:prevlen)) end subroutine muscle_fortran_init #endif end program piernik }}} == First try - run the NOP kernel in MUSCLE environment == The MUSCLE_Init assumes that application is called with MUSCLE environment so it will always fail if called directly: {{{ $./piernik (12:29:26 ) MUSCLE port not given. Starting new MUSCLE instance. (12:29:26 ) ERROR: Could not instantiate MUSCLE: no command line arguments given. (12:29:26 ) Program finished }}} At first we need to prepare a simplistic CxA file which describes the simulation, we starts from single kernel and no conduits: {{{ # configure cxa properties cxa = Cxa.LAST # declare kernels and their params cxa.add_kernel('mhd', 'muscle.core.standalone.NativeKernel') cxa.env["mhd:command"] = "./piernik" cxa.env["mhd:dt"] = 1 # global params cxa.env["max_timesteps"] = 4 cxa.env["cxa_path"] = File.dirname(__FILE__) # configure connection scheme cs = cxa.cs }}} Now we are ready to run PIERNIK MHD module in MUSCLE: {{{ $muscle2 --main --cxa piernik.cxa.rb mhd Running both MUSCLE2 Simulation Manager and the Simulation === Running MUSCLE2 Simulation Manager === [12:39:05 muscle] Started the connection handler, listening on 10.3.1.22:5000 === Running MUSCLE2 Simulation === [12:39:06 muscle] Using directory [12:39:06 muscle] mhd: connecting... [12:39:06 muscle] Registered ID mhd [12:39:06 muscle] mhd conduit entrances (out): [] mhd conduit exits (in): [] [12:39:06 muscle] mhd: executing (12:39:06 mhd) Spawning standalone kernel: [./piernik] [n3-1-22.local:23649] mca: base: component_find: unable to open /software/local/OpenMPI/1.6.3/ib/gnu/4.1.2/lib/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) Start of the PIERNIK code. No. of procs = 1 Warning @ 0: [units:init_units] PIERNIK will use 'cm', 'sek', 'gram' defined in problem.par [units:init_units] cm = 1.3459000E-11 [user unit] [units:init_units] sek = 3.1688088E-08 [user unit] [units:init_units] gram = 1.0000000E-22 [user unit] Starting problem : mctest :: tst Info @ 0: Working with 2 fluid. Info @ 0: Number of cells: 1 Info @ 0: Cell volume: 4.1016785411997372E+032 Info @ 0: Monomer mass: 4.2893211697012652E-013 Info @ 0: Temperature: 200.02221228956296 Info @ 0: Number of monomers per one representative particle: 1.3865676291650172E+030 Info @ 0: Dust density [g/cm3]: 2.9000000001269637E-013 Warning @ 0: [initfluids:sanitize_smallx_checks] adjusted smalld to 1.1895E-04 Warning @ 0: [initfluids:sanitize_smallx_checks] adjusted smallp to 1.7705E-01 Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000 Info @ 0: Timesteps: 50.000000000000000 50.000000000000000 Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000 Info @ 0: Timesteps: 50.000000000000000 50.000000000000000 [MC] nstep = 1 dt = 1.5778800002006178E+09 s t = 9.9998257934644172E+01 yr dWallClock = 0.04 s [MC] Writing output 1 time = 9.9998257934644172E+01 yr = 3.1557600004012356E+09 s Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000 Info @ 0: Timesteps: 50.000000000000000 50.000000000000000 [MC] nstep = 2 dt = 1.5778800002006178E+09 s t = 1.9999651586928834E+02 yr dWallClock = 0.04 s [MC] Writing output 2 time = 1.9999651586928834E+02 yr = 6.3115200008024712E+09 s Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000 Info @ 0: Timesteps: 50.000000000000000 50.000000000000000 [MC] nstep = 3 dt = 1.5778800002006178E+09 s t = 2.9999477380393250E+02 yr dWallClock = 0.11 s [MC] Writing output 3 time = 2.9999477380393250E+02 yr = 9.4672800012037067E+09 s Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000 Info @ 0: Timesteps: 50.000000000000000 50.000000000000000 [MC] nstep = 4 dt = 1.5778800002006178E+09 s t = 3.9999303173857669E+02 yr dWallClock = 0.17 s [MC] Writing output 4 time = 3.9999303173857669E+02 yr = 1.2623040001604942E+10 s Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000 Info @ 0: Timesteps: 50.000000000000000 50.000000000000000 [MC] nstep = 5 dt = 1.5778800002006178E+09 s t = 4.9999128967322088E+02 yr dWallClock = 0.18 s [MC] Writing output 5 time = 4.9999128967322088E+02 yr = 1.5778800002006178E+10 s Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000 Info @ 0: Timesteps: 50.000000000000000 50.000000000000000 [MC] nstep = 6 dt = 1.5778800002006178E+09 s t = 5.9998954760786501E+02 yr dWallClock = 0.77 s [MC] Writing output 6 time = 5.9998954760786501E+02 yr = 1.8934560002407413E+10 s Info @ 0: Simulation has reached final time t = 600.000 Finishing .......... (12:39:08 mhd) Program finished. (12:39:08 mhd) Command [./piernik] finished. [12:39:08 muscle] mhd: finished [12:39:08 muscle] All ID's have finished, quitting MUSCLE now. [12:39:08 muscle] All local submodels have finished; exiting. Executed in }}} == Adding second kernel: MC == At first we need to create separate PIERNIK build for the MC kernel {{{ ./setup -o mc mc_collisions_test -c gnufast -d HDF5,MUSCLE,MC_KERNEL }}} Please note that we use `-o mc` (use suffix for obj directory) and MC_KERNEL instead of MHD_KERNEL. This will create another build in `./obj_mc/piernik`. Now we are ready to add another kernel definition in the CxA file: {{{ cxa.add_kernel('mc', 'muscle.core.standalone.NativeKernel') cxa.env["mc:command"] = "./piernik" #here we assume that the other kernel is started in obj_mc cxa.env["mc:dt"] = 1; }}} Because we need to run every kernel in separate directory, we need to start two MUSCLE instances. {{{ cd obj time muscle2 --main --bindaddr 127.0.0.1 --bindport 1234 --cxa ../scripts/piernik.cxa.rb mhd & cd .. cd obj_mc time muscle2 --manager 127.0.0.1:1234 --cxa ../scripts/piernik.cxa.rb mc & wait }}} == Coupling == So we have now both kernels running with MUSCLE, but get scientific relevant results we must finish the coupling: exchanging gas density and momentum for each cell. At first we have two define conduits in the CxA file: {{{ cs.attach('mhd' => 'mc') { tie('rho_gas', 'rho_gas') tie('m_gas', 'm_gas') } }}} In the original code, before every MC step the rho_gas and m_gas multidimensional arrays were recomputed, in the MUSCLE variant we replace the code with MUSCLE_Receive calls {{{ call MUSCLE_Receive("rho_gas", rho_gas, size(rho_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8 call MUSCLE_Receive("m_gas", m_gas, size(m_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8 }}} As mentioned before the arrays are multidimensional (4 and 3 dimensional saying more precisely) but taking into consideration fact that both kernels are FORTRAN codes and that FORTRAN multidimensional arrays have continues memory layout we can safely cast it to single dimension. Now we need to add corresponding `MUSCLE_Send` calls to the MHD kernel after the gas density and momentum values are recomputed: {{{ call MUSCLE_Send("rho_gas", rho_gas, size(rho_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8 call MUSCLE_Send("m_gas", m_gas, size(m_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8 }}} == Synchronizing delta timesteps == Until now we silently omitted one important aspect of the PIERNIK simulaitons: the delta timestep is not constant and depends on the current state of both modules: MHD and MC. In the original code the minimum of the dt,,mc,, and dt,,mhd,, values were always chosen. In order to do the same in the MUSCLE flavor, we must define two new conduits: one for sending dt,,mc,, from MC to MHD and second one for sending back the final dt. The conduits definition looks now as follows: {{{ cs.attach('mhd' => 'mc') { tie('rho_gas', 'rho_gas') tie('m_gas', 'm_gas') tie('dt_final', 'dt_final') } cs.attach('mc' => 'mhd') { tie('dt_mc', 'dt_mc') } }}} And the corresponding code: {{{ #ifdef MHD_KERNEL call time_step(dt) call MUSCLE_Send("dt_final", dt, dt_len, MUSCLE_DOUBLE) #else /*MC_KERNEL*/ dt = timestep_mc() call MUSCLE_Send("dt_mc", dt, dt_len, MUSCLE_DOUBLE) call MUSCLE_Receive("dt_final", dt, dt_len, MUSCLE_DOUBLE) #endif }}} == MPI enabling == As mentioned PIERNIK is an MPI code and there is one MUSCLE cavet that must be taken into while MUSCLEizing legacy applications: the MUSCLE Send and Receive routines must be called **only** by rank zero process. Until now we spawned only one process by kernel so no extra effort was needed. Now we will add some additional logic exploiting MPI_Bcast, MPI_Gather and MPI_Scatter calls. We need to to do this whenever MUSCLE send or receive operations are used: * timestep.F90 time_step routine {{{ if (master) then ts = set_timer("dt_mc_receive") call MUSCLE_Receive("dt_mc", dt_mc, dt_mc_len, MUSCLE_DOUBLE) ts = set_timer("dt_mc_receive") write (*,"('[MHD] Waiting for dt_mc took: ', f8.3, ' s dt = ', f8.4, ' dt_mc = ', f8.4)") ts, dt, dt_mc endif !broadcast dt_mc value call MPI_Bcast(dt_mc, 1, MPI_DOUBLE_PRECISION, 0, comm, ierr) }}} * piernik.F90 main routine {{{ #ifdef MHD_KERNEL call time_step(dt) if (master) then call MUSCLE_Send("dt_final", dt, dt_len, MUSCLE_DOUBLE) endif #else /*MC_KERNEL*/ dt = timestep_mc() if (master) then write (*,"('[MC] sending dt_mc:', es23.16 )") dt call MUSCLE_Send("dt_mc", dt, dt_len, MUSCLE_DOUBLE) call MUSCLE_Receive("dt_final", dt, dt_len, MUSCLE_DOUBLE) endif !broadcast dt final value call MPI_Bcast(dt, 1, MPI_DOUBLE_PRECISION, 0, comm, ierr) #endif }}} * piernik.F90 send_gas_state routine {{{ call MPI_Gather(rho_gas, size(rho_gas), MPI_DOUBLE_PRECISION, rho_gas_global, size(rho_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr) call MPI_Gather(m_gas, size(m_gas), MPI_DOUBLE_PRECISION, m_gas_global, size(m_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr) if (master) then call MUSCLE_Send("rho_gas", rho_gas_global, %REF(size(rho_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8 call MUSCLE_Send("m_gas", m_gas_global, %REF(size(m_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8 endif }}} * mc.F90 get_gas_state routine {{{ if (master) then call MUSCLE_Receive("rho_gas", rho_gas_global, %REF(size(rho_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8 call MUSCLE_Receive("m_gas", m_gas_global, %REF(size(m_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8 endif call MPI_Scatter(rho_gas_global, size(rho_gas), MPI_DOUBLE_PRECISION, rho_gas, size(rho_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr) call MPI_Scatter(m_gas_global, size(m_gas), MPI_DOUBLE_PRECISION, m_gas, size(m_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr) }}} We also need to change in the CxA file kernel implementations: {{{ cxa.add_kernel('mhd', 'muscle.core.standalone.MPIKernel') ... cxa.add_kernel('mc', 'muscle.core.standalone.MPIKernel') }}} \\ [[Documentation|<< Back to Documentation]]