My Deep Learning Rig (Minerva)

My beloved deep learning workstation

View the parts list here. Tim Dettmers’ deep learning hardware blog and gpu guide were instrumental in helping me build this rig.

My friends often ask, “Why would you possibly need 4 GPUs?” or “Why would you spend so much on a computer?”

The short answer is that it is an investment to be able to train and run machine learning models as fast as possible locally to encourage me to experiment frequently while owning what I create.

Most modern models support distributed data parallelism, or other forms of parallelism, which allows for training on multiple GPUs while recieving a speedup proportional to the number of GPUs.

My system runs 2x NVIDIA 3090 TI 24G and 2x NVIDIA 3090 24G (96GB VRAM total) at a GPU clock ~1750Mhz - 1800Mhz speed limit and 300W limit per card (limits are for training stability to prevent triggering PSU safety stops). Under full load, the system peaks at about 1550W total power draw and 75C internal temperature. Minerva also features a 16-core 32-thread AMD Threadripper 7955wx (3.5Mhz base, 4.5Mhz boost), and 128G DDR5 RAM.

What sort of workloads can Minvera run?

For all of the following workloads, I set up the GPUs according to my guide. Having the NVIDIA Container Toolkit is essential for the majority of my desired workloads.

Case Study I: NVIDIA Isaac Lab Multi-GPU training

I ran multi-gpu training to benchmark against the Isaac Lab perfomance benchmarks, where a Single-Node, 4x NVIDIA L40 GPU, CPU: Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz achieves 390,000 frames per second for environment step (with 8192 environments per GPU), inference and train in the Isaac-Repose-Cube-Shadow-Direct-v0 RL environment. In this environment, a robot hand learns to position a cube in the desired orientation.

Click to see the commands to run multi-GPU training inside of Docker
mkdir -p projects/ && cd projects && git clone https://github.com/isaac-sim/IsaacLab.git && cd IsaacLabs
echo \
"services:
  isaac-lab-base:
    shm_size: '2gb'" > docker/shm-config.yaml
python3 docker/container.py start --files shm-config.yaml
# [INFO] Using container profile: base
# [INFO] X11 Forwarding is configured as '0' in '.container.cfg'.
# 	To enable X11 forwarding, set 'X11_FORWARDING_ENABLED=1' in '.container.cfg'.
# [INFO] Building the docker image and starting the container 'isaac-lab-base' in the background...
#  ✔ isaac-lab-base            Built                                                                                                     0.0s 
#  ✔ Container isaac-lab-base  Started                                                                                                  11.7s 
python3 docker/container.py enter
# [INFO] Using container profile: base
# [INFO] X11 Forwarding is disabled from the settings in '.container.cfg'
# [INFO] X11 forwarding is disabled. No action taken.
# [INFO] Entering the existing 'isaac-lab-base' container in a bash session...

# Option A: Training (8192 environments per GPU)
OMP_NUM_THREADS=8 python -m torch.distributed.run --nnodes=1 --nproc_per_node=4 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Repose-Cube-Shadow-Direct-v0 --headless --distributed
# Example RL Games output collected
# fps step: 611460 fps step and policy inference: 588767 fps total: 472288 epoch: 174/5000 frames: 90701824
# fps step: 592850 fps step and policy inference: 571132 fps total: 464365 epoch: 175/5000 frames: 91226112
# fps step: 616768 fps step and policy inference: 594105 fps total: 479784 epoch: 176/5000 frames: 91750400
# fps step: 602694 fps step and policy inference: 581137 fps total: 477391 epoch: 177/5000 frames: 92274688

# Option B: Training Benchmark
python scripts/benchmarks/benchmark_rlgames.py --task=Isaac-Repose-Cube-Shadow-Direct-v0 --headless

It would appear, from the RL Games training output, that Minerva runs training and simulation at around ~477,000 FPS, or ~23% faster than the benchmarked 4x L40 node, and ~182% faster than 1x 4090 (which achieves 170,000 FPS), while running at a clock limit of 1750MHz. However, a 1x 4090 is faster than 1x 3090TI.

On Minerva, the hand cube repose task can be solved within 20 minutes with 4 GPUs. It takes longer with one GPU.

Click to see verbose output of benchmark
root@minerva:/workspace/isaaclab# python scripts/benchmarks/benchmark_rlgames.py --task=Isaac-Repose-Cube-Shadow-Direct-v0 --headless
[INFO][AppLauncher]: Loading experience file: /workspace/isaaclab/apps/isaaclab.python.headless.kit
Loading user config located at: '/isaac-sim/kit/data/Kit/Isaac-Sim/4.5/user.config.json'
[Info] [carb] Logging to file: /isaac-sim/kit/logs/Kit/Isaac-Sim/4.5/kit_20250224_025719.log
2025-02-24 02:57:19 [0ms] [Warning] [omni.kit.app.plugin] No crash reporter present, dumps uploading isn't available.
2025-02-24 02:57:20 [436ms] [Warning] [omni.usd_config.extension] Enable omni.materialx.libs extension to use MaterialX
Authorization required, but no authorization protocol specified
2025-02-24 02:57:20 [508ms] [Warning] [omni.platforminfo.plugin] failed to open the default display.  Can't verify X Server version.
Authorization required, but no authorization protocol specified
2025-02-24 02:57:20 [612ms] [Warning] [omni.datastore] OmniHub is inaccessible
2025-02-24 02:57:20 [760ms] [Warning] [omni.isaac.dynamic_control] omni.isaac.dynamic_control is deprecated as of Isaac Sim 4.5. No action is needed from end-users.
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified

|---------------------------------------------------------------------------------------------|
| Driver Version: 560.35.03     | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name                             | Active | LDA | GPU Memory | Vendor-ID | LUID       |
|     |                                  |        |     |            | Device-ID | UUID       |
|     |                                  |        |     |            | Bus-ID    |            |
|---------------------------------------------------------------------------------------------|
| 0   | NVIDIA GeForce RTX 3090 Ti       | Yes: 0 |     | 24810   MB | 10de      | 0          |
|     |                                  |        |     |            | 2203      | 2b26c591.. |
|     |                                  |        |     |            | 1         |            |
|---------------------------------------------------------------------------------------------|
| 1   | NVIDIA GeForce RTX 3090          | Yes: 1 |     | 24822   MB | 10de      | 0          |
|     |                                  |        |     |            | 2204      | 49e6f8d4.. |
|     |                                  |        |     |            | 21        |            |
|---------------------------------------------------------------------------------------------|
| 2   | NVIDIA GeForce RTX 3090          | Yes: 2 |     | 24822   MB | 10de      | 0          |
|     |                                  |        |     |            | 2204      | c9400bc1.. |
|     |                                  |        |     |            | c1        |            |
|---------------------------------------------------------------------------------------------|
| 3   | NVIDIA GeForce RTX 3090 Ti       | Yes: 3 |     | 24810   MB | 10de      | 0          |
|     |                                  |        |     |            | 2203      | 080243db.. |
|     |                                  |        |     |            | e1        |            |
|=============================================================================================|
| OS: 22.04.5 LTS (Jammy Jellyfish) ubuntu, Version: 22.04.5, Kernel: 6.8.0-52-generic
| Processor: AMD Ryzen Threadripper PRO 7955WX 16-Cores
| Cores: 16 | Logical Cores: 32
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 128295 | Free Memory: 101049
| Total Page/Swap (MB): 2047 | Free Page/Swap: 0
|---------------------------------------------------------------------------------------------|
2025-02-24 02:57:24 [4,832ms] [Warning] [gpu.foundation.plugin] IOMMU is enabled.
2025-02-24 02:57:24 [4,832ms] [Warning] [gpu.foundation.plugin] Detected IOMMU is enabled. Running CUDA peer-to-peer bandwidth and latency validation.
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1      2      3 
     0 890.82  11.31  11.28  11.34 
     1  11.26 862.89  11.31  11.35 
     2  11.31  11.32 832.00  11.30 
     3  11.27  11.37  11.29 831.12 
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1      2      3 
     0   1.68  10.72  11.05  10.55 
     1  12.86   1.66  10.75  14.99 
     2  16.30  16.39   1.66  19.46 
     3  13.61  14.69  17.88   1.67 

   CPU     0      1      2      3 
     0   1.71   5.17   5.04   4.91 
     1   5.06   1.56   4.97   4.42 
     2   4.96   4.61   1.48   4.52 
     3   4.95   4.56   4.56   1.42 
2025-02-24 02:57:25 [5,798ms] [Warning] [gpu.foundation.plugin] CUDA peer-to-peer observed bandwidth: 11.3 GB/s.
2025-02-24 02:57:25 [5,798ms] [Warning] [gpu.foundation.plugin] CUDA peer-to-peer observed latency: 19.5 us.
2025-02-24 02:57:25 [5,798ms] [Warning] [gpu.foundation.plugin] Please verify if observed bandwidth and latency are expected.
2025-02-24 02:57:26 [6,720ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Generating formatted report = True
2025-02-24 02:57:26 [6,720ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Using metrics backend = OmniPerfKPIFile
2025-02-24 02:57:26 [6,720ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Local folder location = /tmp
2025-02-24 02:57:26 [6,720ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Starting
2025-02-24 02:57:26 [6,720ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Test mode = False
[INFO]: Parsing configuration from: isaaclab_tasks.direct.shadow_hand.shadow_hand_env_cfg:ShadowHandEnvCfg
[INFO]: Parsing configuration from: /workspace/isaaclab/source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand/agents/rl_games_ppo_cfg.yaml
[INFO] Logging experiment in directory: /workspace/isaaclab/logs/rl_games/shadow_hand
2025-02-24 02:57:26 [6,924ms] [Warning] [isaaclab.envs.direct_rl_env] Seed not set for the environment. The environment creation may not be deterministic.
[INFO]: Base environment:
	Environment device    : cuda:0
	Environment seed      : None
	Physics step-size     : 0.008333333333333333
	Rendering step-size   : 0.016666666666666666
	Environment step-size : 0.016666666666666666
[INFO]: Time taken for scene creation : 2.152462 seconds
[INFO]: Scene manager:  <class InteractiveScene>
	Number of environments: 8192
	Environment spacing   : 0.75
	Source prim name      : /World/envs/env_0
	Global prim paths     : []
	Replicate physics     : True
[INFO]: Starting the simulation. This may take a few seconds. Please wait...
2025-02-24 02:57:31 [11,745ms] [Warning] [isaaclab.assets.articulation.articulation] ImplicitActuatorCfg fingers has set both effort_limit_sim and effort_limit.Only effort_limit_sim will be used for ImplicitActuators.
2025-02-24 02:57:31 [11,745ms] [Warning] [isaaclab.assets.articulation.articulation] ImplicitActuatorCfg fingers has set both velocity_limit_sim and velocity_limit.Only velocity_limit_sim will be used for ImplicitActuators.
[INFO]: Time taken for simulation start : 6.141075 seconds
[INFO]: Completed setting up the environment...
self.seed = 42
Setting seed: 42
2025-02-24 02:57:34 [15,364ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Starting phase: sim_runtime
Started to train
Exact experiment name requested from command line: 2025-02-24_02-57-26
seq_length: 4
current training device: cuda:0
/workspace/isaaclab/_isaac_sim/kit/python/lib/python3.10/site-packages/rl_games/common/a2c_common.py:254: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  self.scaler = torch.cuda.amp.GradScaler(enabled=self.mixed_precision)
build mlp: 157
RunningMeanStd:  (1,)
RunningMeanStd:  (157,)
/workspace/isaaclab/_isaac_sim/kit/python/lib/python3.10/site-packages/rl_games/algos_torch/a2c_continuous.py:106: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=self.mixed_precision):
fps step: 58550 fps step and policy inference: 56747 fps total: 52351 epoch: 1/10 frames: 0
fps step: 184865 fps step and policy inference: 178640 fps total: 152956 epoch: 2/10 frames: 131072
fps step: 187879 fps step and policy inference: 181181 fps total: 154814 epoch: 3/10 frames: 262144
fps step: 190668 fps step and policy inference: 183906 fps total: 156801 epoch: 4/10 frames: 393216
fps step: 194337 fps step and policy inference: 187281 fps total: 159257 epoch: 5/10 frames: 524288
fps step: 197345 fps step and policy inference: 190092 fps total: 161290 epoch: 6/10 frames: 655360
fps step: 199005 fps step and policy inference: 191577 fps total: 162346 epoch: 7/10 frames: 786432
fps step: 197395 fps step and policy inference: 190023 fps total: 161233 epoch: 8/10 frames: 917504
fps step: 188600 fps step and policy inference: 181344 fps total: 154939 epoch: 9/10 frames: 1048576
fps step: 190855 fps step and policy inference: 183552 fps total: 156531 epoch: 10/10 frames: 1179648
=> saving checkpoint '/workspace/isaaclab/logs/rl_games/shadow_hand/2025-02-24_02-57-26/nn/last_shadow_hand_ep_10_rew__-19.991896_.pth'
MAX EPOCHS NUM!
2025-02-24 02:57:48 [29,315ms] [Warning] [isaacsim.benchmark.services.recorders] Detected multiple GPU types: ['NVIDIA GeForce RTX 3090 Ti', 'NVIDIA GeForce RTX 3090 Ti', 'NVIDIA GeForce RTX 3090', 'NVIDIA GeForce RTX 3090'].
2025-02-24 02:57:48 [29,315ms] [Warning] [isaacsim.benchmark.services.recorders] Only recording GPU 0 type: NVIDIA GeForce RTX 3090 Ti
/isaac-sim/exts/isaacsim.benchmark.services/isaacsim/benchmark/services/datarecorders/frametime.py:98: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(f"Unable to calculate frametime stats: {e}")
2025-02-24 02:57:48 [29,375ms] [WARNING] [isaacsim.benchmark.services.datarecorders.frametime] Unable to calculate frametime stats: mean requires at least one data point
2025-02-24 02:57:48 [29,376ms] [WARNING] [isaacsim.benchmark.services.datarecorders.frametime] Unable to calculate frametime stats: mean requires at least one data point
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Created new phase 'startup' and stored SingleMeasurement(name='App Launch Time', value=5973.722584, unit='ms', type='single')
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Python Imports Time', value=179.755971, unit='ms', type='single') for phase 'startup'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Task Creation and Start Time', value=8385.666631, unit='ms', type='single') for phase 'startup'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Scene Creation Time', value=2152.4621120006486, unit='ms', type='single') for phase 'startup'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Simulation Start Time', value=6141.0754440003075, unit='ms', type='single') for phase 'startup'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Total Start Time (Launch to Train)', value=15321.311632, unit='ms', type='single') for phase 'startup'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Created new phase 'runtime' and stored DictMeasurement(name='Step Frametimes', value={'Environment only step time': [2.238626480102539, 0.7090144157409668, 0.6976408958435059, 0.6874349117279053, 0.6744587421417236, 0.6641781330108643, 0.6586358547210693, 0.6640071868896484, 0.694974422454834], 'Environment + Inference step time': [2.3097691535949707, 0.7337219715118408, 0.7234294414520264, 0.7127134799957275, 0.6998662948608398, 0.6895182132720947, 0.6841747760772705, 0.6897702217102051, 0.7227792739868164], 'Environment + Inference + Policy update time': [0.19395899772644043, 0.12320470809936523, 0.12321305274963379, 0.12320137023925781, 0.12315535545349121, 0.12313103675842285, 0.12318849563598633, 0.12316560745239258, 0.12317991256713867], 'Environment only FPS': [58550.1875, 184865.078125, 187878.890625, 190668.234375, 194336.578125, 197344.640625, 199005.265625, 197395.453125, 188599.75], 'Environment + Inference FPS': [56746.796875, 178639.875, 181181.453125, 183905.59375, 187281.484375, 190092.15625, 191576.78125, 190022.703125, 181344.4375], 'Environment + Inference + Policy update FPS': [52350.7265625, 152955.90625, 154813.828125, 156800.65625, 159257.0, 161289.75, 162345.75, 161232.84375, 154938.875]}, type='dict')
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Min Environment only step time', value=0.6586358547210693, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Environment only step time', value=2.238626480102539, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Mean Environment only step time', value=0.8543301158481174, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Min Environment + Inference step time', value=0.6841747760772705, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Environment + Inference step time', value=2.3097691535949707, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Mean Environment + Inference step time', value=0.8850825362735324, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Min Environment + Inference + Policy update time', value=0.12313103675842285, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Environment + Inference + Policy update time', value=0.19395899772644043, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Mean Environment + Inference + Policy update time', value=0.13104428185356987, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Min Environment only FPS', value=58550.1875, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Environment only FPS', value=199005.265625, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Mean Environment only FPS', value=177627.11979166666, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Min Environment + Inference FPS', value=56746.796875, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Environment + Inference FPS', value=191576.78125, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Mean Environment + Inference FPS', value=171199.03125, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Min Environment + Inference + Policy update FPS', value=52350.7265625, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Environment + Inference + Policy update FPS', value=162345.75, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Mean Environment + Inference + Policy update FPS', value=146220.59288194444, unit='ms', type='single') for phase 'runtime'
2025-02-24 02:57:48 [29,382ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Created new phase 'train' and stored ListMeasurement(name='Rewards', length=8)
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Rewards', value=-6.726855278015137, unit='float', type='single') for phase 'train'
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored ListMeasurement(name='Episode Lengths', length=8) for phase 'train'
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stored SingleMeasurement(name='Max Episode Lengths', value=104.1601333618164, unit='float', type='single') for phase 'train'
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Stopping
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Writing metrics data.
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.base_isaac_benchmark] Metrics type = OmniPerfKPIFile
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.metrics.backend] 
sim_runtime Metrics:
workflow_name: benchmark_rlgames_train
task: Isaac-Repose-Cube-Shadow-Direct-v0
max_iterations: 10
phase: sim_runtime
System Memory RSS: 6.484 GB
System Memory VMS: 96.113 GB
System Memory USS: 6.466 GB
GPU Memory Tracked: 0.0 GB
GPU Memory Dedicated: 0 GB
System CPU iowait: 0.0 %
System CPU system: 2.0 %
System CPU user: 9.0 %
System CPU idle: 89.0 %
num_cpus: 32 
gpu_device_name: NVIDIA GeForce RTX 3090 Ti 
Mean App_Update Frametime: 0 ms
Stdev App_Update Frametime: 0 ms
Min App_Update Frametime: 0 ms
Max App_Update Frametime: 0 ms
Mean Physics Frametime: 18.81 ms
Stdev Physics Frametime: 0.78 ms
Min Physics Frametime: 17.52 ms
Max Physics Frametime: 20.44 ms
Mean GPU Frametime: 0 ms
Stdev GPU Frametime: 0 ms
Min GPU Frametime: 0 ms
Max GPU Frametime: 0 ms
Real Time Factor: 0.0 
Runtime: 10932.26 ms
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.metrics.backend] 
startup Metrics:
workflow_name: benchmark_rlgames_train
task: Isaac-Repose-Cube-Shadow-Direct-v0
max_iterations: 10
phase: startup
App Launch Time: 5973.722584 ms
Python Imports Time: 179.755971 ms
Task Creation and Start Time: 8385.666631 ms
Scene Creation Time: 2152.4621120006486 ms
Simulation Start Time: 6141.0754440003075 ms
Total Start Time (Launch to Train): 15321.311632 ms
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.metrics.backend] 
runtime Metrics:
workflow_name: benchmark_rlgames_train
task: Isaac-Repose-Cube-Shadow-Direct-v0
max_iterations: 10
phase: runtime
Min Environment only step time: 0.6586358547210693 ms
Max Environment only step time: 2.238626480102539 ms
Mean Environment only step time: 0.8543301158481174 ms
Min Environment + Inference step time: 0.6841747760772705 ms
Max Environment + Inference step time: 2.3097691535949707 ms
Mean Environment + Inference step time: 0.8850825362735324 ms
Min Environment + Inference + Policy update time: 0.12313103675842285 ms
Max Environment + Inference + Policy update time: 0.19395899772644043 ms
Mean Environment + Inference + Policy update time: 0.13104428185356987 ms
Min Environment only FPS: 58550.1875 ms
Max Environment only FPS: 199005.265625 ms
Mean Environment only FPS: 177627.11979166666 ms
Min Environment + Inference FPS: 56746.796875 ms
Max Environment + Inference FPS: 191576.78125 ms
Mean Environment + Inference FPS: 171199.03125 ms
Min Environment + Inference + Policy update FPS: 52350.7265625 ms
Max Environment + Inference + Policy update FPS: 162345.75 ms
Mean Environment + Inference + Policy update FPS: 146220.59288194444 ms
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.metrics.backend] 
train Metrics:
workflow_name: benchmark_rlgames_train
task: Isaac-Repose-Cube-Shadow-Direct-v0
max_iterations: 10
phase: train
Max Rewards: -6.726855278015137 float
Max Episode Lengths: 104.1601333618164 float
2025-02-24 02:57:48 [29,383ms] [INFO] [isaacsim.benchmark.services.metrics.backend] Writing metrics to /tmp/kpis_benchmark_rlgames_train.json
|----------------------------------------------------|
|                   Summary Report                   |
|----------------------------------------------------|
| workflow_name: benchmark_rlgames_train             |
| task: Isaac-Repose-Cube-Shadow-Direct-v0           |
| max_iterations: 10                                 |
| num_cpus: 32                                       |
| gpu_device_name: NVIDIA GeForce RTX 3090 Ti        |
|----------------------------------------------------|
| Phase: sim_runtime                                 |
| System Memory RSS: 6.484 GB                        |
| System Memory VMS: 96.113 GB                       |
| System Memory USS: 6.466 GB                        |
| GPU Memory Tracked: 0.0 GB                         |
| Real Time Factor: 0.0                              |
| Runtime: 10932.26 ms                               |
| Frametimes (ms):    mean |  stdev |   min |   max  |
| App_Update          0.00 |   0.00 |  0.00 |  0.00  |
| Physics            18.81 |   0.78 | 17.52 | 20.44  |
| GPU                 0.00 |   0.00 |  0.00 |  0.00  |
|----------------------------------------------------|
| Phase: startup                                     |
| App Launch Time: 5973.722584 ms                    |
| Python Imports Time: 179.755971 ms                 |
| Task Creation and Start Time: 8385.666631 ms       |
| Scene Creation Time: 2152.4621120006486 ms         |
| Simulation Start Time: 6141.0754440003075 ms       |
| Total Start Time (Launch to Train): 15321.311632 ms |
|----------------------------------------------------|
| Phase: runtime                                     |
| Min Environment only step time: 0.6586358547210693 ms |
| Max Environment only step time: 2.238626480102539 ms |
| Mean Environment only step time: 0.8543301158481174 ms |
| Min Environment + Inference step time: 0.6841747760772705 ms |
| Max Environment + Inference step time: 2.3097691535949707 ms |
| Mean Environment + Inference step time: 0.8850825362735324 ms |
| Min Environment + Inference + Policy update time: 0.12313103675842285 ms |
| Max Environment + Inference + Policy update time: 0.19395899772644043 ms |
| Mean Environment + Inference + Policy update time: 0.13104428185356987 ms |
| Min Environment only FPS: 58550.1875 ms            |
| Max Environment only FPS: 199005.265625 ms         |
| Mean Environment only FPS: 177627.11979166666 ms   |
| Min Environment + Inference FPS: 56746.796875 ms   |
| Max Environment + Inference FPS: 191576.78125 ms   |
| Mean Environment + Inference FPS: 171199.03125 ms  |
| Min Environment + Inference + Policy update FPS: 52350.7265625 ms |
| Max Environment + Inference + Policy update FPS: 162345.75 ms |
| Mean Environment + Inference + Policy update FPS: 146220.59288194444 ms |
|----------------------------------------------------|
| Phase: train                                       |
| Max Rewards: -6.726855278015137 float              |
| Max Episode Lengths: 104.1601333618164 float       |
|----------------------------------------------------|
root@minerva:/workspace/isaaclab# 

Case Study II: NVIDIA Isaac Lab Hyperparameter Tuning

Minerva can run 4 parallel NVIDIA Isaac Lab training runs at once, one on each GPU. This is enabled by the Ray functionality that I added to Isaac Lab. The following is an example of tuning quadrupedal gait parameters on flat terrain with 4 parallel similar experiments.

Case Study III: YoloV11 Training with Ultralytics

For the Northeastern Mars Rover team, I trained a “mallet” and “bottle” detector. About 27,000 images at 800p resolution were used to train 409 layers, 20,054,550 parameters, 20,054,534 gradients, with 12 epochs taking about 24 minutes.

Click to see the code snippet of running the model
from ultralytics import YOLO
from pathlib import Path

model = YOLO("yolo11m.pt") 
data_yaml = str(Path(__file__).parent / "dataset/data.yaml")  

model.train(data=data_yaml, 
            epochs=100, 
            imgsz=800, 
            batch=100,
            cache="disk",
            freeze=0, 
            copy_paste=.8,
            hsv_v=.3,
            erasing=.9,
            crop_fraction=.8,
            translate=.9,
            mixup=.4,
            perspective=0.00005,
            patience=20, 
            plots = True, 
            save=True, 
	        workers = 8, 
	        device="0,1,2,3",)
Click to see the output of running the model
python3 train.py # In conda environment
New https://pypi.org/project/ultralytics/8.3.78 available 😃 Update with 'pip install -U ultralytics'
Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.2.2+cu121 CUDA:0 (NVIDIA GeForce RTX 3090 Ti, 24142MiB)
                                                       CUDA:1 (NVIDIA GeForce RTX 3090 Ti, 24139MiB)
                                                       CUDA:2 (NVIDIA GeForce RTX 3090, 24154MiB)
                                                       CUDA:3 (NVIDIA GeForce RTX 3090, 24154MiB)
engine/trainer: task=detect, mode=train, model=yolo11m.pt, data=/home/garylvov/projects/urc_mallet_model_2025/dataset/data.yaml, epochs=100, time=None, patience=20, batch=100, imgsz=800, save=True, save_period=-1, cache=disk, device=0,1,2,3, workers=8, project=None, name=train37, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=0, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=None, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.3, degrees=0.0, translate=0.9, scale=0.5, shear=0.0, perspective=5e-05, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.4, copy_paste=0.8, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.9, crop_fraction=0.8, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train37
Overriding model.yaml nc=80 with nc=2

                   from  n    params  module                                       arguments                     
  0                  -1  1      1856  ultralytics.nn.modules.conv.Conv             [3, 64, 3, 2]                 
  1                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]               
  2                  -1  1    111872  ultralytics.nn.modules.block.C3k2            [128, 256, 1, True, 0.25]     
  3                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
  4                  -1  1    444928  ultralytics.nn.modules.block.C3k2            [256, 512, 1, True, 0.25]     
  5                  -1  1   2360320  ultralytics.nn.modules.conv.Conv             [512, 512, 3, 2]              
  6                  -1  1   1380352  ultralytics.nn.modules.block.C3k2            [512, 512, 1, True]           
  7                  -1  1   2360320  ultralytics.nn.modules.conv.Conv             [512, 512, 3, 2]              
  8                  -1  1   1380352  ultralytics.nn.modules.block.C3k2            [512, 512, 1, True]           
  9                  -1  1    656896  ultralytics.nn.modules.block.SPPF            [512, 512, 5]                 
 10                  -1  1    990976  ultralytics.nn.modules.block.C2PSA           [512, 512, 1]                 
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 12             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 13                  -1  1   1642496  ultralytics.nn.modules.block.C3k2            [1024, 512, 1, True]          
 14                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 15             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 16                  -1  1    542720  ultralytics.nn.modules.block.C3k2            [1024, 256, 1, True]          
 17                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 18            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 19                  -1  1   1511424  ultralytics.nn.modules.block.C3k2            [768, 512, 1, True]           
 20                  -1  1   2360320  ultralytics.nn.modules.conv.Conv             [512, 512, 3, 2]              
 21            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 22                  -1  1   1642496  ultralytics.nn.modules.block.C3k2            [1024, 512, 1, True]          
 23        [16, 19, 22]  1   1412566  ultralytics.nn.modules.head.Detect           [2, [256, 512, 512]]          
YOLO11m summary: 409 layers, 20,054,550 parameters, 20,054,534 gradients, 68.2 GFLOPs

Transferred 643/649 items from pretrained weights
DDP: debug command /home/garylvov/.conda/envs/rover/bin/python3 -m torch.distributed.run --nproc_per_node 4 --master_port 52275 /home/garylvov/.config/Ultralytics/DDP/_temp_ugpib82l137958647273360.py
Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.2.2+cu121 CUDA:0 (NVIDIA GeForce RTX 3090 Ti, 24142MiB)
                                                       CUDA:1 (NVIDIA GeForce RTX 3090 Ti, 24139MiB)
                                                       CUDA:2 (NVIDIA GeForce RTX 3090, 24154MiB)
                                                       CUDA:3 (NVIDIA GeForce RTX 3090, 24154MiB)
Overriding model.yaml nc=80 with nc=2
Transferred 643/649 items from pretrained weights
Freezing layer 'model.23.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passed ✅
train: Scanning /home/garylvov/projects/urc_mallet_model_2025/dataset/train/labels.cache... 27339 images, 6966 backgrounds, 0 corrupt: 100%|██████████| 27339/27339 [00:
train: WARNING ⚠️ /home/garylvov/projects/urc_mallet_model_2025/dataset/train/images/000000278737.jpg: 1 duplicate labels removed
train: WARNING ⚠️ /home/garylvov/projects/urc_mallet_model_2025/dataset/train/images/000000301977.jpg: 1 duplicate labels removed
train: Caching images (41.2GB Disk): 100%|██████████| 27339/27339 [00:00<00:00, 112448.68it/s]
val: Scanning /home/garylvov/projects/urc_mallet_model_2025/dataset/valid/labels.cache... 575 images, 6 backgrounds, 0 corrupt: 100%|██████████| 575/575 [00:00<?, ?it/s
val: Caching images (0.7GB Disk): 100%|██████████| 575/575 [00:00<00:00, 86981.09it/s]
Plotting labels to runs/detect/train37/labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 106 weight(decay=0.0), 113 weight(decay=0.00078125), 112 bias(decay=0.0)
Image sizes 800 train, 800 val
Using 32 dataloader workers
Logging results to runs/detect/train37
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      1/100      22.3G       1.51      1.979      1.248          5        800: 100%|██████████| 274/274 [01:59<00:00,  2.30it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.08it/s]
                   all        575       1042      0.796      0.735      0.789      0.498

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      2/100      22.4G      1.578      1.431      1.251         13        800: 100%|██████████| 274/274 [01:59<00:00,  2.30it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.31it/s]
                   all        575       1042      0.742      0.669       0.72      0.409

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      3/100      22.4G      1.813        1.8      1.395         15        800: 100%|██████████| 274/274 [01:58<00:00,  2.31it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.30it/s]
                   all        575       1042      0.457      0.406      0.408      0.198

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      4/100      22.4G      1.959      2.035      1.536          9        800: 100%|██████████| 274/274 [01:59<00:00,  2.29it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.30it/s]
                   all        575       1042      0.714      0.517      0.554      0.321

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      5/100      22.5G      1.852      1.842      1.483         11        800: 100%|██████████| 274/274 [02:03<00:00,  2.21it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.28it/s]
                   all        575       1042      0.725      0.584      0.628      0.336

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      6/100      22.4G       1.78      1.717      1.441         21        800: 100%|██████████| 274/274 [02:01<00:00,  2.25it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.24it/s]
                   all        575       1042      0.737      0.621      0.647      0.386

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      7/100      24.1G      1.732       1.63      1.409         17        800: 100%|██████████| 274/274 [02:04<00:00,  2.20it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.28it/s]
                   all        575       1042      0.814      0.685       0.72      0.431

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      8/100      22.4G      1.693       1.61      1.391         49        800: 100%|██████████| 274/274 [02:03<00:00,  2.22it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.31it/s]
                   all        575       1042      0.835       0.68       0.73      0.457

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      9/100      22.4G      1.661      1.505      1.365         13        800: 100%|██████████| 274/274 [02:03<00:00,  2.22it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.34it/s]
                   all        575       1042      0.808      0.722      0.747      0.459

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     10/100      22.4G      1.636      1.535      1.359         12        800: 100%|██████████| 274/274 [02:01<00:00,  2.26it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.33it/s]
                   all        575       1042      0.805       0.72      0.743      0.484

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     11/100      23.3G      1.615      1.454      1.329         11        800: 100%|██████████| 274/274 [02:03<00:00,  2.23it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.30it/s]
                   all        575       1042       0.85      0.721      0.764       0.49

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     12/100      22.4G      1.587      1.436      1.334         16        800: 100%|██████████| 274/274 [02:05<00:00,  2.19it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:02<00:00,  4.29it/s]
                   all        575       1042       0.83      0.728      0.781      0.491

Case Study IV: Running DeepSeek-R1-Distill-Llama-70B

Running Ollama distilled from Deepseek-R1 on NVIDIA GPUs requires the NVIDIA Container Toolkit as shown above. Then it’s as easy as increasing the context window on some 70B parameter distillations of Deepseek’s models to push Minerva’s limits.

Click to see the code snippet of running Deepseek-r1:70B with a larger context window
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama bash
echo \
"FROM deepseek-r1:70b
PARAMETER num_ctx 131072
PARAMETER num_predict 31072" > Modelfile
ollama create deepseek-r1-max-context-and-output-size:70b -f Modelfile
# gathering model components 
# writing manifest 
# success 
ollama run deepseek-r1-max-context-and-output-size:70b --verbose
# After sample prompt:
# Spills over into 80gb of RAM, from VRAM so super slow 
# total duration:       16.467195206s
# load duration:        12.409235ms
# prompt eval count:    10 token(s)
# prompt eval duration: 893ms
# prompt eval rate:     11.20 tokens/s
# eval count:           45 token(s)
# eval duration:        15.561s
# eval rate:            2.89 tokens/s

# Reset with ctrl + D pkill ollama and then redo run and exec it (may have to docker rm previous container ID)
# Let's fit this whole model onto VRAM.
echo \
"FROM deepseek-r1:70b
PARAMETER num_ctx 60000
PARAMETER num_predict 30000" > Modelfile
ollama create deepseek-r1-60k-context-and-30k:70b -f Modelfile
ollama run deepseek-r1-60k-context-and-30k:70b --verbose

# After simple sample prompt:
# total duration:       2m5.635874804s
# load duration:        13.383071ms
# prompt eval count:    258 token(s)
# prompt eval duration: 509ms
# prompt eval rate:     506.88 tokens/s
# eval count:           1663 token(s)
# eval duration:        2m4.74s
# eval rate:            13.33 tokens/s

# When prompted with the contents of this post...
# total duration:       2m52.68976111s
# load duration:        13.345338ms
# prompt eval count:    16811 token(s)
# prompt eval duration: 45.742s
# prompt eval rate:     367.52 tokens/s
# eval count:           1118 token(s)
# eval duration:        2m6.854s
# eval rate:            8.81 tokens/s

So, with a 60,000 token length context window, we are able to use ~92G VRAM while running deepseek-r1:70B, while generating 8.81 tokens/s and processing input at 367.52 tokens/s.

Cost Analysis - is it worth it?

An 1x NVIDIA H100 80G node can cost roughly $2-$3/hr to rent. An H100 can’t do the ray tracing required for running NVIDIA Isaac Lab with cameras, where an 2x NVIDIA A6000 48G node (rentable on Lamba Labs cloud for ~$1.5/hr, 96GB VRAM total) or 4x NVIDIA L4 24G node (such as g6.24xlarge on AWS, rentable for ~$6/hr, 96GB VRAM total) is able to do ray tracing and is more comparable. There also is the trouble of provisioning instances online; these nodes are in demand and can’t always be created without waiting.

Minerva cost roughly $8.2k due to most parts coming from eBay. Let’s assume that whatever comparable cloud solution costs on average, $2/hr. Minerva uses around 1500W at full load. Minerva’s electricity cost, assuming an expensive $0.34/kWh (in Boston), is ~$0.51/hr. So, for Minerva to make sense financially, I need to run training for about 5,503 hours, or about 230 days. Of course, this doesn’t account for the time-value of money (say if I invested what I saved by renting into $SPY), so I whipped up this python script to calculate some possible Net Present Value (NPV) outcomes.

Click to see the NPV estimation
import math

def calculate_pure_investment_outcome(initial_minerva_cost, cloud_cost_per_hour, 
                                   electricity_cost_per_hour, hours_per_year, 
                                   annual_return, initial_years, investment_years):
    """
    Calculate outcomes where computer usage stops after initial period and money is purely invested.
    
    Args:
        initial_minerva_cost: Initial cost of Minerva
        cloud_cost_per_hour: Hourly cost of cloud computing
        electricity_cost_per_hour: Hourly cost of electricity
        hours_per_year: Annual usage hours
        annual_return: Expected annual return rate
        initial_years: Years of computer usage
        investment_years: Years of pure investment afterwards
        
    Returns:
        list: List of tuples (year, npv, description) for key points
    """
    if annual_return >= 1:
        raise ValueError("Annual return should be expressed as a decimal")
        
    monthly_return = (1 + annual_return) ** (1/12) - 1
    monthly_hours = hours_per_year / 12
    monthly_savings = (cloud_cost_per_hour - electricity_cost_per_hour) * monthly_hours
    
    def position_at_month(month):
        """Calculate net position after given number of months of computer usage"""
        investment_value = initial_minerva_cost * (1 + monthly_return) ** month
        
        if monthly_return == 0:
            savings_value = monthly_savings * month
        else:
            savings_value = monthly_savings * ((1 + monthly_return) ** month - 1) / monthly_return
            
        return savings_value - investment_value, savings_value
    
    results = []
    
    # Initial position (year 0)
    results.append((0, -initial_minerva_cost, "Initial investment"))
    
    # Position at end of computer usage period
    total_months = initial_years * 12
    final_position, final_savings = position_at_month(total_months)
    npv_at_initial = final_position / (1 + monthly_return) ** total_months if monthly_return > 0 else final_position
    results.append((initial_years, npv_at_initial, "End of computer usage"))
    
    # Pure investment period - just let the final position grow
    for year in range(1, investment_years + 1):
        months = year * 12
        # Growth of final position for additional years
        future_value = final_position * (1 + monthly_return) ** months
        # NPV calculation should only discount back from current point in time
        npv = future_value / (1 + monthly_return) ** months
        
        if year == investment_years:  # Only include final year to keep output clean
            results.append((initial_years + year, npv, "End of investment period"))
    
    return results

def print_pure_investment_analysis(minerva_cost, cloud_cost_per_hour, electricity_cost_per_hour,
                                 usage_scenarios, returns, initial_years, investment_years):
    """Print analysis of computer usage followed by pure investment period."""
    for hours in usage_scenarios:
        print(f"\nAnalysis for {hours} hours per year:")
        print("Return (%) | Year | NPV (Dollars) | Stage")
        print("-" * 60)
        
        for ret in returns:
            try:
                results = calculate_pure_investment_outcome(
                    minerva_cost, cloud_cost_per_hour, electricity_cost_per_hour,
                    hours, ret, initial_years, investment_years
                )
                
                for year, npv, description in results:
                    print(f"{ret*100:9.1f} | {year:4d} | {npv:,.2f} | {description}")
                print()
                
            except ValueError as e:
                print(f"{ret*100:9.1f} | Error: {str(e)}")
        print()

if __name__ == "__main__":
    minerva_cost = 8200  # Minerva initial investment ($)
    cloud_cost_per_hour = 2.00  # Cloud rental cost per hour ($)
    electricity_cost_per_hour = .51  # Minerva electricity cost per hour ($)
    
    usage_scenarios = [1300, 1400, 2800]
    returns = [0.05, 0.10, 0.15, 0.20]
    
    print("\n=== Scenario: 6 years usage + 25 years investment ===")
    print_pure_investment_analysis(minerva_cost, cloud_cost_per_hour, electricity_cost_per_hour,
                                 usage_scenarios, returns, 6, 25)
Click to see the NPV results
garylvov@minerva:~$ python3 extended_value.py 

=== Scenario: 6 years usage + 25 years investment ===

Analysis for 1300 hours per year:
Return (%) | Year | NPV (Dollars) | Stage
------------------------------------------------------------
      5.0 |    0 | -8,200.00 | Initial investment
      5.0 |    6 | 1,854.94 | End of computer usage
      5.0 |   31 | 2,485.80 | End of investment period

     10.0 |    0 | -8,200.00 | Initial investment
     10.0 |    6 | 616.14 | End of computer usage
     10.0 |   31 | 1,091.54 | End of investment period

     15.0 |    0 | -8,200.00 | Initial investment
     15.0 |    6 | -378.20 | End of computer usage
     15.0 |   31 | -874.79 | End of investment period

     20.0 |    0 | -8,200.00 | Initial investment
     20.0 |    6 | -1,187.44 | End of computer usage
     20.0 |   31 | -3,545.68 | End of investment period



Analysis for 1400 hours per year:
Return (%) | Year | NPV (Dollars) | Stage
------------------------------------------------------------
      5.0 |    0 | -8,200.00 | Initial investment
      5.0 |    6 | 2,628.40 | End of computer usage
      5.0 |   31 | 3,522.30 | End of investment period

     10.0 |    0 | -8,200.00 | Initial investment
     10.0 |    6 | 1,294.31 | End of computer usage
     10.0 |   31 | 2,292.95 | End of investment period

     15.0 |    0 | -8,200.00 | Initial investment
     15.0 |    6 | 223.48 | End of computer usage
     15.0 |   31 | 516.93 | End of investment period

     20.0 |    0 | -8,200.00 | Initial investment
     20.0 |    6 | -648.01 | End of computer usage
     20.0 |   31 | -1,934.96 | End of investment period



Analysis for 2800 hours per year:
Return (%) | Year | NPV (Dollars) | Stage
------------------------------------------------------------
      5.0 |    0 | -8,200.00 | Initial investment
      5.0 |    6 | 13,456.79 | End of computer usage
      5.0 |   31 | 18,033.39 | End of investment period

     10.0 |    0 | -8,200.00 | Initial investment
     10.0 |    6 | 10,788.62 | End of computer usage
     10.0 |   31 | 19,112.69 | End of investment period

     15.0 |    0 | -8,200.00 | Initial investment
     15.0 |    6 | 8,646.96 | End of computer usage
     15.0 |   31 | 20,000.95 | End of investment period

     20.0 |    0 | -8,200.00 | Initial investment
     20.0 |    6 | 6,903.97 | End of computer usage
     20.0 |   31 | 20,615.15 | End of investment period

According to my NPV estimation, if I run training for roughly 4-6 days a month at full load on average, I could break even within 6 years. This of course includes assumptions, such as 10% return average market return, and that the cost of compute/electricity will remain similar but in retort I’ll reference one of my favorite quotes; “All models are wrong but some are useful.”

An argument that my Dad and some of my coworkers make is that there is no point in purchasing such a machine when I’ll likely often have the keys to a much more powerful cluster through an employer. There is some truth to this, as returning to Minerva after running training on many 8x NVIDIA H100 nodes can feel like going back to a Mercedes after driving a Koenigsegg. That being said, Minerva has the advantage of always being available for my use, while allowing me to retain complete ownership over what I create.

So to summarize, is Minerva worth it? Financially, maybe. Personally speaking, having such a great resource in my room encourages me to do more training runs more frequently - after all, I have no excuse to not train many of the things I would like to try out. I also really love having complete ownership of my creations, which Minerva allows me to retain. I hope that the work Minerva enables me to complete will have intrinsic value as well ;)

Future Upgrades

I designed Minerva to hopefully be my main personal computer for at least the next 10 years - with a motherboard that supports 7 PCIE Gen 5 slots, there is a lot of room to upgrade GPUs. Currently the limiting factor in getting more GPUs is my PSU (and my bank account), where all available VGA power connectors are already in use. When I can afford upgrades I plan to get a second PSU (plugged into an outlet on a different breaker to get around the US 1600W limit), and then add several more GPUs. I’ve seen some really attractive listings for the AMD Radeon Instinct MI60 32GB HBM2 300W, or I may get more NVIDIA 3090 TI 24G cards, or maybe even NVIDIA 4080 16G cards. If I add more GPUs than my current 4, I’ll definitely have to 3D print some more modifications to my case to be able to mount them.

Also, ideally I’d watercool the entire rig, although this would make it more difficult to transport, more complex, more expensive, and harder to maintain, so for now I’ve done air-cooling only. If I know that I won’t be moving for a long time, I’ll definitely watercool. This would allow for many more GPUs to fit in the case due to a lower thickness (the current 4x 3-Slot cards take up a lot of room that could be reduced) while maintaining great temperature.

There is also the potential option of bifurcating my PCIE slots to fit more than 7 GPUs, which would be really cool, although past 6 GPUs I’d have to use 8x PCIE lanes instead of the current 16x which I think would totally work well too.