If you're running SageMaker HyperPod clusters, you know version mismatches across nodes can kill a training run. This script checks everything that matters: NVIDIA drivers, CUDA, cuDNN, NCCL, EFA, MPI, Neuron SDK for Trainium, and the usual Python/PyTorch stack. It's a bash script you upload via the hyperpod-ssm skill and run on each node, with JSON output that makes diffing across nodes straightforward. The real value is the compatibility analysis built in, like catching when your driver doesn't support your CUDA version or when NCCL versions aren't consistent. Saves you from debugging distributed training failures that turn out to be a version mismatch on node 7.
npx -y skills add awslabs/agent-plugins --skill hyperpod-version-checker --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
mindrally/skills
giuseppe-trisciuoglio/developer-kit
syncfusion/react-ui-components-skills
supercent-io/skills-template
binjuhor/shadcn-lar