DDPM model on the VITON-HD dataset
Install joliGEN
cd
git clone https://github.com/jolibrain/joliGEN.git
cd joliGEN
pip install -r requirements.txt --upgrade
More details: Install and Setup
Prepare the Dataset
Preprocess the dataset provided by VITON-HD (4.4 Gb) to a joliGEN compatible format:
cd
wget --continue https://www.dropbox.com/s/10bfat0kg4si1bu/zalando-hd-resized.zip
python3 ~/joliGEN/scripts/preprocess_viton.py --zip-file zalando-hd-resized.zip --target-dir ~/datasets/VITON-HD/ --dilate 5
This will produce two folders trainA
and testA
under the
~/VITON-HD/datasets
folder.
Each of these folder contains:
imgs
: the original imagesmask
: the masks for the top clothes area (orange part of the VITON-HD segmentation)paths.txt
: the pairs image/mask used for training/testing
Train your Diffusion Model
cd ~/joliGEN
python3 train.py \
--dataroot ~/datasets/VITON-HD/ \
--checkpoints_dir ~/checkpoints/ \
--name VITON-HD \
--gpu_ids 0 \
--model_type palette \
--train_batch_size 8 \
--data_num_threads 16 \
--train_iter_size 1 \
--model_input_nc 3 \
--model_output_nc 3 \
--data_relative_paths \
--train_G_ema \
--train_optim adamw \
--data_dataset_mode self_supervised_labeled_mask \
--data_load_size 256 \
--data_crop_size 256 \
--G_netG unet_mha \
--data_online_creation_rand_mask_A \
--train_G_lr 0.0001 \
--train_n_epochs 100 \
--dataaug_no_rotate \
--dataaug_no_flip \
--output_display_freq 20000 \
--output_print_freq 500 \
--output_display_visdom_autostart
If you have multiple GPUs, you can use them by adjusting the
--gpu_ids
option.If you run out of memory, you should lower the
--train_batch_size
option.
Training Visualization
Open http://localhost:8097/env/VITON-HD to monitor your training.

The columns contain in this order:
original image
conditioning image (unused in this experience)
initial noise
mask
generated image
More details: Quickstart DDPM: Train a model that adds glasses to a face
Inference
mkdir -p ~/inferences
cd ~/joliGEN/scripts
python3 gen_single_image_diffusion.py \
--model_in_file ~/checkpoints/VITON-HD/latest_net_G_A.pth \
--img_in ~/datasets/VITON-HD/testA/imgs/00006_00.jpg \
--mask_in ~/datasets/VITON-HD/testA/mask/00006_00.png \
--dir_out ~/inferences \
--nb_samples 4 \
--img_width 256 \
--img_height 256
This will produce 4 samples in the ~/inferences
folder.
In the example below, original image and mask followed by 4 generated images:

More details: JoliGEN Inference