Plantage de partitions en boucle lors du reboot, avec carte RoCE.
Sans les cartes RoCE, le système démarre bien.
# lscfg -vl roce0
roce0 U2C4E.001.DBJR146-P2-C2-T1-L0 PCIe2 10GbE RoCE Converged Network Adapter
PCIe2 2-Port 10GbE RoCE SFP+ Adapter:
Part Number.................00E1601
EC Level....................D77301
FRU Number..................00E1599
Serial Number...............00E1601YA50AG3CM01Y
Product Specific.(Z0).......IBM0F30001010
Product Specific.(Z1).......EC30
Manufacture ID..............335X2557740966
Product Specific.(RV)......."
ROM Level.(alterable).......000200091326
ROM Level.(ext alterable)...000200091326
$ lsmcode -rd roce0
b315506714106104.000200091326
Un palliatif consiste à redémarrer la partition sans les cartes RoCE, puis d'effacer les définitions des cartes hba par la commande 'rmdev -Rdl hba0' et la même chose avec 'hba1'.
Ci-dessous un extrait de la documentation IBM sur le sujet :
AIX NIC + OFED RDMA
As of AIX® 7 with 7100-02 the PCIe2 10 GbE RoCE Adapter can be configured to run in the AIX NIC configuration. As of AIX 7 with 7100-03, the OFED RDMA functionality was also added to the AIX NIC configuration. If you do not have the network-intensive applications that benefit from RDMA, then you can use the adapter as a Network Interface Card (NIC) only.
- devices.ethernet.mlx
- Converged Ethernet Adapter main device driver (mlxentdd) to support the AIX NIC + OFED RoCE configuration.
- devices.pciex.b315506b3157265
- Packaging support for the NGP ITE Converge Ethernet Adapter ASIC2.
- devices.pciex.b3155067b3157365
- Packaging support for the NGP ITE Converge Ethernet Adapter ASIC1.
- devices.pciex.b315506714101604
- Packaging for Mellanox 2 Ports 10 GbE Converge Ethernet Adapter with the small form factor pluggable (SFP+) transceivers.
- devices.pciex.b315506714106104
- Packaging for Mellanox 2 Ports 10 GbE Converge Ethernet Adapter that supports any SFP+ transceivers.
- devices.common.IBM.ib
- ICM device driver that is required to use the AIX RoCE configuration.
- devices.pciex.b3154a63
- Mellanox 10 GbE Converge Ethernet Adapter device driver that is required to use the AIX RoCE configuration.
- ofed.core
- OFED Core Runtime Environment file set that is needed only if OFED RDMA is required.
- Delete the roceX instances that are related to the PCIe2 10 GbE RoCE Adapter by entering the following command:
# rmdev -dl roce0[, roce1][, roce2,...]
- Delete the entX instances that are related to the PCIe2 10 GbE RoCE Adapter by entering the following command:
# rmdev -dl ent1[,ent2][, ent3...]
- If there are one or more converged host bus adapters (hbaX) that are related to the PCIe2 10 GbE RoCE Adapter, delete them by entering the following command:
# rmdev -dl hba0[, hba1][,hba2...]
- Run the configuration manager to incorporate the changes by entering the following command:
# cfgmgr
- Stop all RDMA applications that are running on the PCIe2 10 GbE RoCE Adapter.
- Delete or redefine the roceX instances by entering one of the following commands:
- # rmdev -d -l roce0
- # rmdev -l roce0
- Change the attribute of the hba stack_type setting from aix_ib (AIX RoCE) to ofed (AIX NIC + OFED RoCE) by entering the following command:
# chdev -l hba0 -a stack_type=ofed
- Run the configuration manager tool so that the host bus adapter can configure the PCIe2 10 GbE RoCE Adapter as a NIC adapter by entering the following command:
# cfgmgr
- Verify that the adapter is now running in NIC configuration by entering the following command:
The following example shows the results when you run the lsdev command on the adapter when it is configured in AIX NIC + OFED RoCE mode:# lsdev -C -c adapter
Figure 1. Example output of lsdev command on an adapter with the AIX NIC + OFED RoCE configurationent1 Available 00-00-01 PCIe2 10GbE RoCE Converged Network Adapter ent2 Avaliable 00-00-02 PCIe2 10GbE RoCE Converged Network Adapter hba0 Available 00-00 PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714101604)
- Install the package ofed.core.
- Set the RDMA mode in the ent1, ent2 devices by entering the following command:
The RDMA mode is set before the en1 or en2 interfaces are configured.# chdev –l ent1 –a rdma=desired # chdev –l ent2 -a rdma=desired
- You can disable the RDMA mode by entering the following command:
# chdev –l ent1 –a rdma=disabled # chdev –l ent2 –a rdma=disabled