Python脚本与Jupyter Notebook的GPU内存差异及扩容咨询

阿华AIGC实验室

2026-5-15

Solutions to Boost GPU Memory Usage for Python Scripts & Unattended Training on GCP Tesla K80

Hey there! Let’s tackle your problem with the Tesla K80 on GCP—super interesting observation about Jupyter vs shell script performance, and I’ve got some actionable fixes for you.

First: Why Jupyter Might Be Faster & Using More GPU Memory

Before jumping into fixes, let’s quickly unpack the difference you’re seeing: Jupyter Notebook often defaults to pre-allocating all available GPU memory (especially with frameworks like TensorFlow) to avoid runtime memory fragmentation, which speeds up training. Shell-launched Python scripts might be using "memory growth" mode (allocating only what’s needed at runtime), which can be slower even if it uses less memory. Also, Jupyter might have slightly higher process priority in some environments, but the main factor is memory allocation strategy.

1. Boost GPU Memory Usage in Your Python Script

Depending on the framework you’re using (TensorFlow/PyTorch), adjust these settings to match Jupyter’s behavior:

For TensorFlow

Force full GPU memory pre-allocation:
Add this at the start of your script to disable memory growth and pre-allocate all GPU memory (just like Jupyter likely does):

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    # Disable memory growth to pre-allocate full GPU memory
    tf.config.experimental.set_memory_growth(gpus[0], False)
    # Alternatively, set a fixed memory limit (Tesla K80 has ~11GB per GPU)
    # tf.config.set_logical_device_configuration(
    #     gpus[0],
    #     [tf.config.LogicalDeviceConfiguration(memory_limit=11441)]
    # )

Verify with nvidia-smi: Run nvidia-smi in the shell while your script is running to confirm memory usage matches Jupyter’s.

For PyTorch

Allow full GPU memory usage:
PyTorch defaults to on-demand allocation, but you can force it to use the full GPU memory pool and enable benchmarking for faster training:

import torch

# Allow the process to use 100% of the GPU memory
torch.cuda.set_per_process_memory_fraction(1.0, device=0)
# Enable cuDNN benchmarking for faster training with fixed input sizes
torch.backends.cudnn.benchmark = True

Clear unused memory: Add torch.cuda.empty_cache() at appropriate points (like after validation loops) to free up unused GPU memory that might be holding you back.

General Tips

Ensure GPU exclusivity: On GCP, make sure your VM is configured for GPU exclusive access (when creating the VM, set "GPU sharing" to "None"). This prevents other processes from siphoning GPU memory.
Raise process priority: Launch your script with higher CPU priority to ensure it gets enough resources to feed the GPU:
```
nice -n -20 python your_training_script.py
```

2. Unattended Training Alternatives to Jupyter

Since Jupyter’s WebSocket timeout is a pain for long runs, use these methods to keep your training running even when you’re disconnected:

Option 1: `nohup` (Simple & Quick)

Run your script in the background with nohup, which detaches it from your SSH session and saves output to a log file:

nohup python your_training_script.py > training_logs.txt 2>&1 &

Check progress later with: tail -f training_logs.txt
Find the process ID (if you need to stop it) with: ps aux | grep your_training_script.py

Option 2: `tmux` or `screen` (Persistent Sessions)

Create a persistent terminal session that survives SSH disconnections:

Install tmux (if not already installed): sudo apt install tmux
Create a new session: tmux new -s training_session
Run your training script inside the session
Detach from the session with Ctrl+B followed by D
Reconnect later with: tmux attach -t training_session

Option 3: GCP AI Platform Jobs (Managed Cloud Training)

Submit your training as a managed job on GCP AI Platform—this lets GCP handle the infrastructure, and you don’t have to worry about keeping an SSH connection alive:

Package your script and dependencies into a Docker container or use GCP’s pre-built ML images

Submit the job via gcloud CLI:

gcloud ai jobs submit training JOB_NAME \
    --region=us-central1 \
    --master-image-uri=gcr.io/cloud-ml-train/tf-gpu.2-6 \
    --scale-tier=BASIC_GPU \
    --python-module=trainer.task \
    --package-path=./trainer \
    --job-dir=gs://your-bucket/job-dir

Monitor progress via the GCP Console or gcloud ai jobs describe JOB_NAME

Option 4: `systemd` Service (Long-Running, Auto-Restart)

For stable, long-running training scripts, create a systemd service to manage the process (it will auto-restart if the script crashes):

Create a service file at /etc/systemd/system/training.service:

[Unit]
Description=ML Training Script
After=network.target

[Service]
User=your-gcp-username
WorkingDirectory=/path/to/your/script/folder
ExecStart=/usr/bin/python your_training_script.py
Restart=always
StandardOutput=append:/var/log/training.log
StandardError=append:/var/log/training_errors.log

[Install]
WantedBy=multi-user.target

Reload systemd and start the service:

sudo systemctl daemon-reload
sudo systemctl start training.service

Check status with: sudo systemctl status training.service

Final Notes

Start by adjusting your script’s GPU memory allocation settings to match Jupyter’s behavior—this should get you the same speed boost in a shell script. Then pick an unattended training method that fits your workflow (nohup/tmux for quick runs, AI Platform for managed cloud training).

内容的提问来源于stack exchange，提问作者Vibhor Kalra