Saturday, June 14, 2025

MPI Clustering on Debian

 This guide outlines the steps to set up an MPI (Message Passing Interface) clustering environment on Debian and Red Hat-based systems. MPI is essential for parallel computing, allowing multiple nodes to work together on computational tasks.

1. Required Packages

For Debian

Install the following packages:

sudo apt install hwloc libhwloc-dev libevent-dev libpmix-dev libpmix-bin nfs-common ssh-client

For Red Hat

Install the corresponding packages:

sudo dnf install hwloc hwloc-devel hwloc-libs libevent-devel pmix-devel pmix-tools pmix nfs-utils openssh-clients

2. Install Open MPI

For Debian

Install Open MPI with the following command:

sudo apt install openmpi-bin openmpi-common

For Red Hat

Install the Open MPI packages:

sudo dnf install openmpi python3-openmpi openmpi-devel

2.1 If Using Fedora or RHEL

Reboot your system and load the Open MPI module:

sudo reboot
source /etc/profile.d/modules.sh 
module load mpi/openmpi-x86_64
mpicc  # This is for testing if MPI is set up correctly

3. Update /etc/hosts

Edit the /etc/hosts file to add the IP addresses or hostnames of all compute nodes and the master node:

sudo nano /etc/hosts

Add entries like the following:

192.168.0.2 master_node
192.168.0.3 worker1
192.168.0.4 worker2

4. Add User & SSH Key

  1. Create a New User
    Create a user for running MPI jobs:

    sudo adduser mpiuser
    sudo usermod -aG sudo mpiuser
    
  2. Switch to the New User
    Switch to the newly created user:

    su - mpiuser
    
  3. Generate SSH Keys
    Generate SSH keys for passwordless login:

    ssh-keygen -t rsa
    
  4. Add SSH Key to Authorized Keys
    Navigate to the .ssh directory and add the public key to authorized_keys:

    cd .ssh/
    cat id_rsa.pub >> authorized_keys
    
  5. Copy SSH ID to Worker Nodes
    Use ssh-copy-id to copy the SSH key to the worker node:

    ssh-copy-id worker1
    

5. Create Hosts File for mpirun

Create a file to specify the hosts for mpirun. Each line should contain an IP address or hostname:

nano ~/hosts

Add entries like the following:

192.168.0.2
192.168.0.3

or

master_node
worker1
worker2

6. Set Environment Variables

Finally, set the LD_LIBRARY_PATH to include your SSH binaries:

export LD_LIBRARY_PATH=LD_LIBRARY_PATH64=/usr/bin/ssh

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Popular Posts