When running multi-turn reinforcement learning training on AMD MI300X GPUs, the compute node crashes consistently within 2 training steps. The crash is preceded by CPU utilization spiking to 100%, ...
When exporting opened files to Excel, large files (40+MB CSVs, etc.) do not export. SmoothCSV hangs and Tauri process can use up to 100% CPU. (It is possible that the files would eventually export, ...