4.1.5-5 mpirun fails when installed in remote host

Task Info (Flyspray)
Opened By Francisco J. Vazquez (Fran)
Task ID 79670
Type Bug Report
Project Arch Linux
Category Packages: Extra
Version None
OS x86_64
Opened 2023-09-12 15:52:36 UTC
Status Assigned
Assignee David Runge (dvzrv)
Assignee Levente Polyak (anthraxx)
Assignee Christian Heusel (gromit)

Details

I have two fully updated arch systems host1 and host2, both with openmpi 4.1.5-5 installed. Running:

$ mpirun -v -n 2 --hostfile hosts.txt bash -c 'echo $HOSTNAME'

in host1, where hosts.txt is:

host1 slots=1 host2 slots=1

fails with:


ORTE was unable to reliably start one or more daemons. This usually is caused by:

  • not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default

  • lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities.

  • the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use.

  • compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type.

  • an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements).


Downgrading the remote host to openmpi 4.1.5-4 solves the problem:

$ mpirun -v -n 2 --hostfile hosts.txt bash -c 'echo $HOSTNAME' host2 host1

The local version of openmpi does not seem to influence the result.

The same thing happens with -n 1, even though the program is launched locally.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information