=================== local installation =================== bigloop use two root directories : - BIGLROOT : directory for managing. - BIGLDIST : target directory on remote client. relatively to home directories. For exemple, my .bashrc file contains the lines : export BIGLROOT=CC/bigloop-0.2 export BIGLDIST=bigloop-0.2 I assume you defined these variables in your .bashrc file by appendind the two appropriate lines : export BIGLROOT= [your choice]/bigloop-0.2 export BIGLDIST= [your choice]/bigloop-0.2 Now, we install the manager. $ cd $HOME/$BIGLROOT/.. $ wget http://langevin.univ-tln.fr/bigloop/bigloop-0.2.tar $ tar xvf bigloop-0.2.tar $ cd bigloop-0.2 We precompile some objects $ make I assume you have a $HOME/bin directory. $ make biglman This compile the bigloop manager biglman, and put a copy of the binary file in your $HOME/bin. $ make script This copy useful scripts in your $HOME/bin. At this point, the manager is installed. The main commands (in $HOME/bin ) that you will have to use are : biglman : the bigloop manager biglinst.sh : client installer. biglcmd.sh : starts commandes on clients biglstat.sh : stats the clients bigljoin.sh : invites hosts to loop biglgraph.sh : makes nice pictures. =================== testing in local =================== We first test in local mode $ cd $HOME/$BIGLROOT/test $ make clean $ make The cluster is almost empty : $ cat grappe.grp #local 127.0.0.1 langevin 2 std Look at the client source src/client.c and the bigloop.conf. Change the adress of the manager taking one of the ipv4 : $ ifconfig | grep -A 2 eth delete the old logs $ rm log/* open a new terminal $ xterm & start the manager in this terminal. % biglman and the client $ ./bigl.exe you shoud get -------------------------------- using server : 10.2.73.86-31415 identificator: 1 loop from 1 to 16 by 8 (1 steps) usually it takes a while... RDY 1:0 0 0 0 : 10.2.73.86:53154 PID 1:0 0 0 0 : 10.2.73.86:53154 step=0/1 actifs : 1 GET 1:0 0 0 0 : 10.2.73.86:20364 >>> Sun Mar 18 17:15:01 2012 job started on 10.2.73.86 JOB 1:0 1 9 0 : 10.2.73.86:20364 step=1/1 actifs : 1 END 1:0 1 9 0 : 10.2.73.86:5018 8 sec on 10.2.73.86 step=1/1 actifs : 1 GET 1:0 0 0 0 : 10.2.73.86:38084 >>> Sun Mar 18 17:15:09 2012 job started on 10.2.73.86 JOB 1:0 9 16 0 : 10.2.73.86:38084 step=2/1 actifs : 1 END 1:0 9 16 0 : 10.2.73.86:19624 7 sec on 10.2.73.86 step=2/1 actifs : 1 GET 1:0 0 0 0 : 10.2.73.86:7632 STP 1:0 0 0 0 : 10.2.73.86:7632 step=2/1 actifs : 0 waiting for zombies... eoj ----------------------------------- The report is : $ cat log/output* ================== Another trial ================== % rm log/* % biglman Now, we start to client $ ./bigl.exe &; $ ./bigl.exe &; When fisnished you compare the outputs. =================== A crash test =================== still test directory. If you look carefuly the scr/client you will see a crash can appear with, of course, the value 13. % rm log/* % biglman -b1 The -b option initializes the score value at 1. $ ./bigl.exe After a few seconds, bigl.exe terminates, but not biglman : a crash somewhere. You stop biglman. $ killall biglman You restart % biglman Look at the terminal, biglman restarts detects lost job. $ ./bigl.exe $ biglman using server : 10.2.73.86-31415 identificator: 1 loop from 1 to 16 by 8 (1 steps) usually it takes a while... looking for lost jobs >>> Sun Mar 18 17:26:28 2012 step 9 started on 10.2.73.86 is lost 1 jobs were lost loop restart after 17 lost job 9 selected RDY 1:0 0 0 0 : 10.2.73.86:2283 PID 1:0 0 0 0 : 10.2.73.86:2283 step=0/1 actifs : 1 GET 1:0 0 0 0 : 10.2.73.86:21712 >>> Sun Mar 18 17:29:46 2012 job started on 10.2.73.86 JOB 1:0 9 16 0 : 10.2.73.86:21712 step=1/1 actifs : 1 END 1:0 9 16 0 : 10.2.73.86:55709 7 sec on 10.2.73.86 step=1/1 actifs : 1 GET 1:0 0 0 0 : 10.2.73.86:31133 STP 1:0 0 0 0 : 10.2.73.86:31133 step=1/1 actifs : 0 waiting for zombies... =================== global installation =================== Now, you need grappe to complete the hosts.grp files. A typical line of a grappe file precise the way you log on some host or a network. #single host 10.2.73.58 pavle 6 std #multi host 10.9.185.133-148 ginn 4 std (1) This means your ssh keys allow you to login by ssh on 10.2.73.58 using login pavle, and on all the hosts in the range 10.9.185.133-148 as ginn. (2) Note that bigloop will use ssh connection, so you have to put the keys corresponding to the logname. IMPORTANT : declaring an host range implicitely means the hosts share the same account, using nfs for example. Each time you modify the hosts.grp : $ make update This install biglclient.o object file on the hosts listed in the file hosts.grp. Later remote compilations will requier biglcient.o. At this point, you can check everything works with scripts $ biglcmd.sh $ biglcmd.sh -c "grep ENDIAN /usr/include/bits/endian.h" REMARK : my code assume little endianess :-) This bug is easy to fix that is an exercice. ==================== playing the demo ==================== Ockay, I assume that you get success with the above trials. The directory demo contains an example. see http://langevin.unv-tln.fr/bigloop-0.2/termshot.html $ cd demo compile the client $ make After changing the grappe file grappe.grp, it could be a good idea to change the value (Last) in the bigloop.conf file (20 for exemple). you prepare the clients : $ biglinst.sh delete the residual log file. $ rm log/* ready ? $ nohup bigman & $ bigljoin and wait...