Coupling MPI codes using MUSCLE

Example Application

As an example "Hello World" application that shows coupling MPI codes via MUSCLE we will use an extremely simplistic and naive simulation of the Large Hadron Collider (LHC) experiment ;-). The application would model only two accelerators rings:

  • Proton Synchrotron Booster (PSB) - the "small one",
  • Large Hadron Collider (LHC) - the "big one".

The aforementioned accelerators are modeled as separate submodels (MUSCLE kernels) and are implemented using the "MPI Ring" code. In our quasi-simulation:

  • insert a single proton (at an energy of PSB:InitialEnergy) into the PSB,
  • where it is accelerated (of every `"PSB:DeltaEnergy") whenever it passes a ring node,
  • until achieving energy of PSB:MaxEnergy,
  • then the proton is transmitted from PSB into LHC,
  • where it is accelerated further until it increase energy to the level of LHC:MaxEnergy,
  • at this moment simulation stops.

LHC Naive Simulation

The example codes can be found in src/cpp/examples/mpiring/ directory of the MUSCLE source distribution. The next two sections will describe two diffrent approaches of running such coupled simulation via MUSCLE.

MPI Kernels as dynamic libraries

This approach follows the original MUSCLE philosophy that relay on using  Java Native Interface (or more userfriendly  Java Native Access) mechanism to integrate C/C++ codes as MUSCLE kernels.

In order to support MPI applications a new method, public void executeDirectly(), was introduced in the CaController class. Only this method is called instead of normal MUSCLE routines on the processes with non-zero rank. Process with rank 0 is started in the usual way. Portals cannot be attached to slave processes (i.e. to the processes with non-zero rank). The executeDirectly method default implementation simply calls execute().

Source files

  • LHC.cxa - the Complex Automata simulation file
    cxa.env["PSB:InitialEnergy"] = 1.2
    cxa.env["PSB:DeltaEnergy"] = 0.1
    cxa.env["PSB:MaxEnergy"] = 4.0
    
    cxa.env["LHC:DeltaEnergy"] = 0.2
    cxa.env["LHC:MaxEnergy"] = 12.0
    
    # declare kernels
    cxa.add_kernel('LHC', 'examples.mpiring.LHC')
    cxa.add_kernel('PSB', 'examples.mpiring.PSB')
    
    # configure connection scheme
    cs = cxa.cs
    
    cs.attach('PSB' => 'LHC') {
            tie('pipe', 'pipe')
    }
    
  • LHC.java - a Java wrapper kernel for LHC submodel
  • PSB.java - a Java wrapper kernel for PSB submodel
  • mpiringlib.c - compiled into libmpiring dynamic loadable library

 browse all sources

Running

  • plumber
    $ muscle --cxa_file share/muscle/cxa/LHC.cxa.rb --main plumber --autoquit
    ...
    INFO: Listening for intra-platform commands on address:
    jicp://150.254.149.108:1099
    
  • LHC (note the --mpi switch)
    $mpiexec  -np 2 muscle --mpi --cxa_file share/muscle/cxa/LHC.cxa.rb --mainhost 150.254.149.108 --mainport 1099 LHC
    ...
    Initialized 0 node in ring LHC.
    Initialized 1 node in ring LHC.
    
    LHC: Received proton from PSB: 4.000000000000002
    LHC: Proton energy callback: 4.200000000000002
    LHC: Proton energy callback: 4.400000000000002
    LHC: Proton energy callback: 4.600000000000002
    LHC: Proton energy callback: 4.8000000000000025
    LHC: Proton energy callback: 5.000000000000003
    LHC: Proton energy callback: 5.200000000000003
    LHC: Proton energy callback: 5.400000000000003
    LHC: Proton energy callback: 5.600000000000003
    LHC: Proton energy callback: 5.800000000000003
    LHC: Proton energy callback: 6.0000000000000036
    LHC: Proton energy callback: 6.200000000000004
    LHC: Proton energy callback: 6.400000000000004
    LHC: Proton energy callback: 6.600000000000004
    LHC: Proton energy callback: 6.800000000000004
    LHC: Proton energy callback: 7.000000000000004
    LHC: Proton energy callback: 7.200000000000005
    LHC: Proton energy callback: 7.400000000000005
    LHC: Proton energy callback: 7.600000000000005
    LHC: Proton energy callback: 7.800000000000005
    LHC: Proton energy callback: 8.000000000000005
    LHC: Proton energy callback: 8.200000000000005
    LHC: Proton energy callback: 8.400000000000004
    LHC: Proton energy callback: 8.600000000000003
    LHC: Proton energy callback: 8.800000000000002
    LHC: Proton energy callback: 9.000000000000002
    LHC: Proton energy callback: 9.200000000000001
    LHC: Proton energy callback: 9.4
    LHC: Proton energy callback: 9.6
    LHC: Proton energy callback: 9.799999999999999
    LHC: Proton energy callback: 9.999999999999998
    LHC: Proton energy callback: 10.199999999999998
    LHC: Proton energy callback: 10.399999999999997
    LHC: Proton energy callback: 10.599999999999996
    LHC: Proton energy callback: 10.799999999999995
    LHC: Proton energy callback: 10.999999999999995
    LHC: Proton energy callback: 11.199999999999994
    LHC: Proton energy callback: 11.399999999999993
    LHC: Proton energy callback: 11.599999999999993
    LHC: Proton energy callback: 11.799999999999992
    LHC: Proton energy callback: 11.999999999999991
    LHC: Proton energy callback: 12.19999999999999
    LHC: Final energy: 12.19999999999999
    
  • PSB (note the --mpi switch)
    $mpiexec  -np 2 muscle --jvmflags -d32 --mpi --cxa_file share/muscle/cxa/LHC.cxa.rb --mainhost 150.254.149.108 --mainport 1099 PSB
    Initialized 0 node in ring PSB.
    Initialized 1 node in ring PSB.
    Inserting proton into Proton Synchrotron Booster (PSB). Initial energy: 1.2
    PSB: Proton energy callback: 1.3
    PSB: Proton energy callback: 1.4000000000000001
    PSB: Proton energy callback: 1.5000000000000002
    PSB: Proton energy callback: 1.6000000000000003
    PSB: Proton energy callback: 1.7000000000000004
    PSB: Proton energy callback: 1.8000000000000005
    PSB: Proton energy callback: 1.9000000000000006
    PSB: Proton energy callback: 2.0000000000000004
    PSB: Proton energy callback: 2.1000000000000005
    PSB: Proton energy callback: 2.2000000000000006
    PSB: Proton energy callback: 2.3000000000000007
    PSB: Proton energy callback: 2.400000000000001
    PSB: Proton energy callback: 2.500000000000001
    PSB: Proton energy callback: 2.600000000000001
    PSB: Proton energy callback: 2.700000000000001
    PSB: Proton energy callback: 2.800000000000001
    PSB: Proton energy callback: 2.9000000000000012
    PSB: Proton energy callback: 3.0000000000000013
    PSB: Proton energy callback: 3.1000000000000014
    PSB: Proton energy callback: 3.2000000000000015
    PSB: Proton energy callback: 3.3000000000000016
    PSB: Proton energy callback: 3.4000000000000017
    PSB: Proton energy callback: 3.5000000000000018
    PSB: Proton energy callback: 3.600000000000002
    PSB: Proton energy callback: 3.700000000000002
    PSB: Proton energy callback: 3.800000000000002
    PSB: Proton energy callback: 3.900000000000002
    PSB: Proton energy callback: 4.000000000000002
    Proton energy after PSB:  4.000000000000002. Injecting into LHC.
    ...
    

Limitations

  • Any MUSCLE API routine may be ONLY called by the rank 0 process. If you need any parameters to be available for the all MPI processes use MPI_Bcast function, for e.g. (as in provided example):
    void Ring_Broadcast_Params(double *deltaE, double *maxE)
    {
            assert( MPI_Bcast(deltaE, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD) == MPI_SUCCESS);
            assert( MPI_Bcast(maxE, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD) == MPI_SUCCESS);
    }
    
  • A separate Java Virtual Machine is started for every MPI process what increases significantly the memory footprint of the whole application.
  • MPI kernel must be the sole kernel of the particular MUSCLE instance (for this reason, in our example we had to start the plumber kernel in a separate MUSCLE instance).
  • Many MPI implementations exploit low level optimization techniques (like Direct Memory Access) that may cause a crash of Java Virtual Machine.
  • Using MPI to start many Java Virtual Machines, which loads native dynamic-link library that later calls MPI routines is something that most people rarely do. In case of problems you might not found any help (you have been warned! ;-).

MPI Kernels as standalone executables

The other approach is to run MPI code as a separate processes. The MUSCLE provide two base kernel classes: NativeKernel and MPIKernel that can be used to run application code as a separate process. The process can comunicate with the library (and other kernels) via a new C/C++ MUSCLE API:

...
		muscle::env::init(&argc, &argv);

		cout << "c++: begin "<< argv[0] <<endl;
		cout << "Kernel Name: " << muscle::cxa::kernel_name() << endl;

		for(int time = 0; !muscle::env::will_stop(); time ++) {
								
			// process data
			for(int i = 0; i < 5; i++) {
				dataA[i] = i;
			}
						
			// dump to our portals
			muscle::env::send("data", dataA, 5, MUSCLE_DOUBLE);
		}

		muscle::env::finalize();
...

This approach has an advantage of separation of Java and C/C++ processes.

Sources

  • LHC2.cxa - the Complex Automata simulation file
    cxa.env["PSB:InitialEnergy"] = 1.2
    cxa.env["PSB:DeltaEnergy"] = 0.1
    cxa.env["PSB:MaxEnergy"] = 4.0
    cxa.env["PSB:command"] = "PSB"
    cxa.env["PSB:mpiexec_args"] = "-np 2"
    
    cxa.env["LHC:DeltaEnergy"] = 0.2
    cxa.env["LHC:MaxEnergy"] = 12.0
    cxa.env["LHC:command"] = "LHC"
    cxa.env["LHC:mpiexec_args"] = "-np 2"
    
    # declare kernels
    cxa.add_kernel('LHC', 'examples.mpiring.LHC2')
    cxa.add_kernel('PSB', 'examples.mpiring.PSB2')
    
    # configure connection scheme
    cs = cxa.cs
    
    cs.attach('PSB' => 'LHC') {
            tie('pipe', 'pipe')
    }
    
  • LHC2.java - a Java wrapper kernel for LHC submodel (extends MPIKernel class)
  • PSB2.java - a Java wrapper kernel for PSB submodel (extends MPIKernel class)
  • mpiringlib.c, LHC.c - compiled into the LHC executable
  • mpitinglib.c, PBS.c - compiled into the PBS executable

 browse all sources

Running

  • LHC, plumber:
    $muscle --cxa_file share/muscle/cxa/LHC2.cxa.rb --main plumber LHC --autoquit
    - jicp://150.254.149.108:1099
    
    Initialized 0 node in ring LHC.
    Initialized 1 node in ring LHC.
    
    LHC:Received proton energy: 4.000000
    LHC:Energy in loop 1: 4.200000
    LHC:Energy in loop 2: 4.400000
    LHC:Energy in loop 3: 4.600000
    LHC:Energy in loop 4: 4.800000
    LHC:Energy in loop 5: 5.000000
    LHC:Energy in loop 6: 5.200000
    LHC:Energy in loop 7: 5.400000
    LHC:Energy in loop 8: 5.600000
    LHC:Energy in loop 9: 5.800000
    LHC:Energy in loop 10: 6.000000
    LHC:Energy in loop 11: 6.200000
    LHC:Energy in loop 12: 6.400000
    LHC:Energy in loop 13: 6.600000
    LHC:Energy in loop 14: 6.800000
    LHC:Energy in loop 15: 7.000000
    LHC:Energy in loop 16: 7.200000
    LHC:Energy in loop 17: 7.400000
    LHC:Energy in loop 18: 7.600000
    LHC:Energy in loop 19: 7.800000
    LHC:Energy in loop 20: 8.000000
    LHC:Energy in loop 21: 8.200000
    LHC:Energy in loop 22: 8.400000
    LHC:Energy in loop 23: 8.600000
    LHC:Energy in loop 24: 8.800000
    LHC:Energy in loop 25: 9.000000
    LHC:Energy in loop 26: 9.200000
    LHC:Energy in loop 27: 9.400000
    LHC:Energy in loop 28: 9.600000
    LHC:Energy in loop 29: 9.800000
    LHC:Energy in loop 30: 10.000000
    LHC:Energy in loop 31: 10.200000
    LHC:Energy in loop 32: 10.400000
    LHC:Energy in loop 33: 10.600000
    LHC:Energy in loop 34: 10.800000
    LHC:Energy in loop 35: 11.000000
    LHC:Energy in loop 36: 11.200000
    LHC:Energy in loop 37: 11.400000
    LHC:Energy in loop 38: 11.600000
    LHC:Energy in loop 39: 11.800000
    LHC:Energy in loop 40: 12.000000
    LHC:Energy in loop 41: 12.200000
    LHC:Final proton energy: 12.200000
    
  • PSB
    $muscle   --cxa_file share/muscle/cxa/LHC2.cxa.rb --mainhost 150.254.149.108 --mainport 1099 PSB
    ...
    Initialized 0 node in ring PSB.
    Initialized 1 node in ring PSB.
    PSB: Inserting proton into Proton Synchrotron Booster (PSB). Initial energy: 1.200000
    PSB:Energy in loop 1: 1.300000
    PSB:Energy in loop 2: 1.400000
    PSB:Energy in loop 3: 1.500000
    PSB:Energy in loop 4: 1.600000
    PSB:Energy in loop 5: 1.700000
    PSB:Energy in loop 6: 1.800000
    PSB:Energy in loop 7: 1.900000
    PSB:Energy in loop 8: 2.000000
    PSB:Energy in loop 9: 2.100000
    PSB:Energy in loop 10: 2.200000
    PSB:Energy in loop 11: 2.300000
    PSB:Energy in loop 12: 2.400000
    PSB:Energy in loop 13: 2.500000
    PSB:Energy in loop 14: 2.600000
    PSB:Energy in loop 15: 2.700000
    PSB:Energy in loop 16: 2.800000
    PSB:Energy in loop 17: 2.900000
    PSB:Energy in loop 18: 3.000000
    PSB:Energy in loop 19: 3.100000
    PSB:Energy in loop 20: 3.200000
    PSB:Energy in loop 21: 3.300000
    PSB:Energy in loop 22: 3.400000
    PSB:Energy in loop 23: 3.500000
    PSB:Energy in loop 24: 3.600000
    PSB:Energy in loop 25: 3.700000
    PSB:Energy in loop 26: 3.800000
    PSB:Energy in loop 27: 3.900000
    PSB:Energy in loop 28: 4.000000
    Proton energy after PSB:  4.000000. Injecting into LHC.
    

Limitations

  • Any MUSCLE API routine may be ONLY called by the rank 0 process. If you need any parameters to be available for the all MPI processes use MPI_Bcast function, for e.g. (as in provided example):
    void Ring_Broadcast_Params(double *deltaE, double *maxE)
    {
            assert( MPI_Bcast(deltaE, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD) == MPI_SUCCESS);
            assert( MPI_Bcast(maxE, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD) == MPI_SUCCESS);
    }
    

}}}

Attachments