JoliGEN Options

Here are all the available options to call with train.py

Parameter	Type	Default	Description
--checkpoints_dir	string	./checkpoints	models are saved here
--dataroot	string	None	path to images (should have subfolders trainA, trainB, valA, valB, etc)
--ddp_port	string	12355
--gpu_ids	string	0	gpu ids: e.g. 0 0,1,2, 0,2. use -1 for CPU
--model_type	string	cut	chooses which model to use. Values: cut, cycle_gan, palette, cm, cm_gan
--name	string	experiment_name	name of the experiment. It decides where to store samples and models
--phase	string	train	train, val, test, etc
--suffix	string		customized suffix: opt.name = opt.name + suffix: e.g., {model}_{netG}_size{load_size}
--test_batch_size	int	1	input batch size
--warning_mode	flag		whether to display warning
--with_amp	flag		whether to activate torch amp on forward passes
--with_tf32	flag		whether to activate tf32 for faster computations (Ampere GPU and beyond only)
--with_torch_compile	flag		whether to activate torch.compile for some forward and backward functions (experimental)

Discriminator

Parameter	Type	Default	Description
--D_dropout	flag		whether to use dropout in the discriminator
--D_n_layers	int	3	only used if netD==n_layers
--D_ndf	int	64	# of discrim filters in the first conv layer
--D_netDs	array	[‘projected_d’, ‘basic’]	specify discriminator architecture, another option, --D_n_layers allows you to specify the layers in the n_layers discriminator. NB: duplicated arguments are ignored. Values: basic, n_layers, pixel, projected_d, temporal, vision_aided, depth, mask, sam
--D_no_antialias	flag		if specified, use stride=2 convs instead of antialiased-downsampling (sad)
--D_no_antialias_up	flag		if specified, use [upconv(learned filter)] instead of [upconv(hard-coded [1,3,3,1] filter), conv]
--D_norm	string	instance	instance normalization or batch normalization for D Values: instance, batch, none
--D_proj_config_segformer	string	models/configs/segformer/segformer_config_b0.json	path to segformer configuration file
--D_proj_interp	int	-1	whether to force projected discriminator interpolation to a value > 224, -1 means no interpolation
--D_proj_network_type	string	efficientnet	projected discriminator architecture Values: efficientnet, segformer, vitbase, vitsmall, vitsmall2, vitclip16, vitclip14, depth, dinov2_vits14, dinov2_vitb14, dinov2_vitl14, dinov2_vitg14, dinov2_vits14_reg, dinov2_vitb14_reg, dinov2_vitl14_reg, dinov2_vitg14_reg, siglip_vitb16, siglip_vitl16, siglip_vit_so400m
--D_proj_weight_segformer	string	models/configs/segformer/pretrain/segformer_mit-b0.pth	path to segformer weight
--D_spectral	flag		whether to use spectral norm in the discriminator
--D_temporal_every	int	4	apply temporal discriminator every x steps
--D_vision_aided_backbones	string	clip+dino+swin	specify vision aided discriminators architectures, they are frozen then output are combined and fitted with a linear network on top, choose from dino, clip, swin, det_coco, seg_ade and combine them with +
--D_weight_sam	string		path to sam weight for D, e.g. models/configs/sam/pretrain/sam_vit_b_01ec64.pth, or models/configs/sam/pretrain/mobile_sam.pt for MobileSAM

Generator

Parameter	Type	Default	Description
--G_attn_nb_mask_attn	int	10	number of attention masks in _attn model architectures
--G_attn_nb_mask_input	int	1	number of mask dedicated to input in _attn model architectures
--G_backward_compatibility_twice_resnet_blocks	flag		if true, feats will go througt resnet blocks two times for resnet_attn generators. This option will be deleted, it’s for backward compatibility (old models were trained that way).
--G_config_segformer	string	models/configs/segformer/segformer_config_b0.json	path to segformer configuration file for G
--G_diff_n_timestep_test	int	1000	Number of timesteps used for UNET mha inference (test time).
--G_diff_n_timestep_train	int	2000	Number of timesteps used for UNET mha training.
--G_dropout	flag		dropout for the generator
--G_hdit_depths	array	[2, 2, 4]	distribution of depth blocks across the HDiT stages, should have same size as --G_hdit_widths
--G_hdit_patch_size	int	4	Patch size for HDIT, e.g. 4 for 4x4 patches
--G_hdit_widths	array	[192, 384, 768]	width multiplier for each level of the HDiT
--G_lora_unet	int	8	lora unet rank for G
--G_lora_vae	int	8	lora vae rank for G
--G_nblocks	int	9	# of layer blocks in G, applicable to resnets
--G_netE	string	resnet_256	specify multimodal latent vector encoder Values: resnet_128, resnet_256, resnet_512, conv_128, conv_256, conv_512
--G_netG	string	mobile_resnet_attn	specify generator architecture Values: resnet, resnet_attn, mobile_resnet, mobile_resnet_attn, unet_256, unet_128, segformer_attn_conv, segformer_conv, ittr, unet_mha, uvit, unet_mha_ref_attn, dit, hdit, img2img_turbo, unet_vid
--G_ngf	int	64	# of gen filters in the last conv layer
--G_norm	string	instance	instance normalization or batch normalization for G Values: instance, batch, none
--G_padding_type	string	reflect	whether to use padding in the generator Values: reflect, replicate, zeros
--G_spectral	flag		whether to use spectral norm in the generator
--G_unet_mha_attn_res	array	[16]	downrate samples at which attention takes place
--G_unet_mha_channel_mults	array	[1, 2, 4, 8]	channel multiplier for each level of the UNET mha
--G_unet_mha_group_norm_size	int	32
--G_unet_mha_norm_layer	string	groupnorm	Values: groupnorm, batchnorm, layernorm, instancenorm, switchablenorm
--G_unet_mha_num_head_channels	int	32	number of channels in each head of the mha architecture
--G_unet_mha_num_heads	int	1	number of heads in the mha architecture
--G_unet_mha_res_blocks	array	[2, 2, 2, 2]	distribution of resnet blocks across the UNet stages, should have same size as --G_unet_mha_channel_mults
--G_unet_mha_vit_efficient	flag		if true, use efficient attention in UNet and UViT
--G_unet_vid_max_frame	int	24	max frame number for unet_vid in the PositionalEncoding
--G_uvit_num_transformer_blocks	int	6	Number of transformer blocks in UViT

Algorithm-specific

Parameter	Type	Default	Description
--alg_cm_dists_mean	array	[0.485, 0.456, 0.406]	mean for DISTS perceptual loss
--alg_cm_dists_std	array	[0.229, 0.224, 0.225]	std for DISTS perceptual loss
--alg_cm_lambda_perceptual	float	1.0	weight for LPIPS and DISTS perceptual losses
--alg_cm_num_steps	int	1000000	number of steps before reaching the fully discretized consistency model sampling schedule
--alg_cm_perceptual_loss	array	[’’]	optional supervised perceptual loss Values: , LPIPS, DISTS
--alg_diffusion_cond_computed_sketch_list	array	[‘canny’, ‘hed’]	what primitives to use for random sketch
--alg_diffusion_cond_embed	string		whether to use conditioning embeddings to the generator layers, and what type Values: , mask, class, mask_and_class, ref
--alg_diffusion_cond_embed_dim	int	32	nb of examples processed for inference
--alg_diffusion_cond_image_creation	string	y_t	how image conditioning is created: either from y_t (no conditioning), previous frame, from computed sketch (e.g. canny), from low res image or from reference image (i.e. image that is not aligned with the ground truth) Values: y_t, previous_frame, computed_sketch, low_res, ref
--alg_diffusion_cond_prob_use_previous_frame	float	0.5	prob to use previous frame as y cond
--alg_diffusion_cond_sam_crop_delta	flag		extend crop’s width and height by 2*crop_delta before computing masks
--alg_diffusion_cond_sam_final_canny	flag		whether to perform a Canny edge detection on sam sketch to soften the edges
--alg_diffusion_cond_sam_max_mask_area	float	0.99	maximum area in proportion of image size for a mask to be kept
--alg_diffusion_cond_sam_min_mask_area	float	0.001	minimum area in proportion of image size for a mask to be kept
--alg_diffusion_cond_sam_no_output_binary_sam	flag		whether to not output binary sketch before Canny
--alg_diffusion_cond_sam_no_sample_points_in_ellipse	flag		whether to not sample the points inside an ellipse to avoid the corners of the image
--alg_diffusion_cond_sam_no_sobel_filter	flag		whether to not use a Sobel filter on each SAM masks
--alg_diffusion_cond_sam_points_per_side	int	16	number of points per side of image to prompt SAM with (# of prompted points will be points_per_side**2)
--alg_diffusion_cond_sam_redundancy_threshold	float	0.62	redundancy threshold above which redundant masks are not kept
--alg_diffusion_cond_sam_sobel_threshold	float	0.7	sobel threshold in %% of gradient magnitude
--alg_diffusion_cond_sam_use_gaussian_filter	flag		whether to apply a Gaussian blur to each SAM masks
--alg_diffusion_cond_sketch_canny_range	array	[0, 765]	range of randomized canny sketch thresholds
--alg_diffusion_dropout_prob	float	0.0	dropout probability for classifier-free guidance
--alg_diffusion_generate_per_class	flag		whether to generate samples of each images
--alg_diffusion_lambda_G	float	1.0	weight for supervised loss
--alg_diffusion_ref_embed_net	string	clip	embedding network to use for ref conditioning Values: clip, imagebind
--alg_diffusion_super_resolution_scale	float	2.0	scale for super resolution
--alg_diffusion_task	string	inpainting	Whether to perform inpainting, super resolution or pix2pix Values: inpainting, super_resolution, pix2pix
--alg_diffusion_vid_canny_dropout	array	[[]]	the range of probabilities for dropping the canny for each frame

GAN model

Parameter	Type	Default	Description
--alg_gan_lambda	float	1.0	weight for GAN loss：GAN(G(X))

CUT model

Parameter	Type	Default	Description
--alg_cut_HDCE_gamma	float	1.0
--alg_cut_HDCE_gamma_min	float	1.0
--alg_cut_MSE_idt	flag		use MSENCE loss for identity mapping: MSE(G(Y), Y))
--alg_cut_dists_mean	array	[0.485, 0.456, 0.406]	mean for DISTS perceptual loss
--alg_cut_dists_std	array	[0.229, 0.224, 0.225]	std for DISTS perceptual loss
--alg_cut_flip_equivariance	flag		Enforce flip-equivariance as additional regularization. It’s used by FastCUT, but not CUT
--alg_cut_lambda_MSE_idt	float	1.0	weight for MSE identity loss: MSE(G(X), X)
--alg_cut_lambda_NCE	float	1.0	weight for NCE loss: NCE(G(X), X)
--alg_cut_lambda_SRC	float	0.0	weight for SRC (semantic relation consistency) loss: NCE(G(X), X)
--alg_cut_lambda_perceptual	float	1.0	weight for LPIPS and DISTS perceptual losses
--alg_cut_lambda_supervised	float	1.0	weight for supervised loss
--alg_cut_nce_T	float	0.07	temperature for NCE loss
--alg_cut_nce_idt	flag		use NCE loss for identity mapping: NCE(G(Y), Y))
--alg_cut_nce_includes_all_negatives_from_minibatch	flag		(used for single image translation) If True, include the negatives from the other samples of the minibatch when computing the contrastive loss. Please see models/patchnce.py for more details.
--alg_cut_nce_layers	string	0,4,8,12,16	compute NCE loss on which layers
--alg_cut_nce_loss	string	monce	CUT contrastice loss Values: patchnce, monce, SRC_hDCE
--alg_cut_netF	string	mlp_sample	how to downsample the feature map Values: sample, mlp_sample, sample_qsattn, mlp_sample_qsattn
--alg_cut_netF_dropout	flag		whether to use dropout with F
--alg_cut_netF_nc	int	256
--alg_cut_netF_norm	string	instance	instance normalization or batch normalization for F Values: instance, batch, none
--alg_cut_num_patches	int	256	number of patches per layer
--alg_cut_supervised_loss	array	[’’]	supervised loss with aligned data Values: , MSE, L1, LPIPS, DISTS

CycleGAN model

Parameter	Type	Default	Description
--alg_cyclegan_lambda_A	float	10.0	weight for cycle loss (A -> B -> A)
--alg_cyclegan_lambda_B	float	10.0	weight for cycle loss (B -> A -> B)
--alg_cyclegan_lambda_identity	float	0.5	use identity mapping. Setting lambda_identity other than 0 has an effect of scaling the weight of the identity mapping loss. For example, if the weight of the identity loss should be 10 times smaller than the weight of the reconstruction loss, please set lambda_identity = 0.1
--alg_cyclegan_rec_noise	float	0.0	whether to add noise to reconstruction

ReCUT / ReCycleGAN

Parameter	Type	Default	Description
--alg_re_P_lr	float	0.0002	initial learning rate for P networks
--alg_re_adversarial_loss_p	flag		if True, also train the prediction model with an adversarial loss
--alg_re_netP	string	unet_128	specify P architecture Values: resnet_9blocks, resnet_6blocks, resnet_attn, unet_256, unet_128
--alg_re_no_train_P_fake_images	flag		if True, P wont be trained over fake images projections
--alg_re_nuplet_size	int	3	Number of frames loaded
--alg_re_projection_threshold	float	1.0	threshold of the real images projection loss below with fake projection and fake reconstruction losses are applied

Diffusion model

Parameter	Type	Default	Description
--alg_palette_ddim_eta	float	0.5	eta for ddim sampling variance
--alg_palette_ddim_num_steps	int	10	number of steps for ddim sampling
--alg_palette_loss	string	MSE	loss type of the denoising model Values: L1, MSE, multiscale_L1, multiscale_MSE
--alg_palette_minsnr	flag		use min-SNR weighting
--alg_palette_sampling_method	string	ddpm	choose the sampling method between ddpm and ddim Values: ddpm, ddim

Datasets

Parameter	Type	Default	Description
--data_crop_size	int	256	then crop to this size
--data_dataset_mode	string	unaligned	chooses how datasets are loaded. Values: unaligned, unaligned_labeled_cls, unaligned_labeled_mask, self_supervised_labeled_mask, unaligned_labeled_mask_cls, self_supervised_labeled_mask_cls, unaligned_labeled_mask_online, self_supervised_labeled_mask_online, unaligned_labeled_mask_cls_online, self_supervised_labeled_mask_cls_online, aligned, nuplet_unaligned_labeled_mask, temporal_labeled_mask_online, self_supervised_temporal_labeled_mask_online, self_supervised_temporal, single, unaligned_labeled_mask_ref, self_supervised_labeled_mask_ref, unaligned_labeled_mask_online_ref, unaligned_labeled_mask_online_prompt, self_supervised_labeled_mask_online_ref
--data_direction	string	AtoB	AtoB or BtoA Values: AtoB, BtoA
--data_image_bits	int	8	number of bits of the image (e.g. 8, 12 or 16)
--data_inverted_mask	flag		whether to invert the mask, i.e. around the bbox
--data_load_size	int	286	scale images to this size
--data_max_dataset_size	int	1000000000	Maximum number of samples allowed per dataset. If the dataset directory contains more than max_dataset_size, only a subset is loaded.
--data_num_threads	int	4	# threads for loading data
--data_online_context_pixels	int	0	context pixel band around the crop, unused for generation, only for disc
--data_online_fixed_mask_size	int	-1	if >0, it will be used as fixed bbox size (warning: in dataset resolution ie before resizing)
--data_online_random_bbox	flag		whether to randomly sample a bbox per online crop
--data_online_select_category	int	-1	category to select for bounding boxes, -1 means all boxes selected
--data_online_single_bbox	flag		whether to only allow a single bbox per online crop
--data_preprocess	string	resize_and_crop	scaling and cropping of images at load time Values: resize_and_crop, crop, scale_width, scale_width_and_crop, none
--data_refined_mask	flag		whether to use refined mask with sam
--data_relative_paths	flag		whether paths to images are relative to dataroot
--data_sanitize_paths	flag		if true, wrong images or labels paths will be removed before training
--data_serial_batches	flag		if true, takes images in order to make batches, otherwise takes them randomly
--data_temporal_frame_step	int	30	how many frames between successive frames selected
--data_temporal_num_common_char	int	-1	how many characters (the first ones) are used to identify a video; if =-1 natural sorting is used
--data_temporal_number_frames	int	5	how many successive frames use for temporal loader

Online created datasets

Parameter	Type	Default	Description
--data_online_creation_color_mask_A	flag		Perform task of replacing color-filled masks by objects
--data_online_creation_crop_delta_A	int	50	size of crops are random, values allowed are online_creation_crop_size more or less online_creation_crop_delta for domain A
--data_online_creation_crop_delta_B	int	50	size of crops are random, values allowed are online_creation_crop_size more or less online_creation_crop_delta for domain B
--data_online_creation_crop_size_A	int	512	crop to this size during online creation, it needs to be greater than bbox size for domain A
--data_online_creation_crop_size_B	int	512	crop to this size during online creation, it needs to be greater than bbox size for domain B
--data_online_creation_load_size_A	array	[]	load to this size during online creation, format : width height or only one size if square
--data_online_creation_load_size_B	array	[]	load to this size during online creation, format : width height or only one size if square
--data_online_creation_mask_delta_A	array	[[]]	mask offset (in pixels) to allow generation of a bigger object in domain B (for semantic loss) for domain A, format : ‘width (x),height (y)’ for each class or only one size if square, e.g. ‘125, 55 100, 100’ for 2 classes
--data_online_creation_mask_delta_A_ratio	array	[[]]	ratio mask offset to allow generation of a bigger object in domain B (for semantic loss) for domain A, format : width (x),height (y) for each class or only one size if square
--data_online_creation_mask_delta_B	array	[[]]	mask offset (in pixels) to allow generation of a bigger object in domain A (for semantic loss) for domain B, format : ‘width (x),height (y)’ for each class or only one size if square, e.g. ‘125, 55 100, 100’ for 2 classes
--data_online_creation_mask_delta_B_ratio	array	[[]]	ratio mask offset to allow generation of a bigger object in domain A (for semantic loss) for domain B, format : ‘width (x),height (y)’ for each class or only one size if square
--data_online_creation_mask_random_offset_A	array	[0.0]	ratio mask size randomization (only to make bigger one) to robustify the image generation in domain A, format : width (x) height (y) or only one size if square
--data_online_creation_mask_random_offset_B	array	[0.0]	mask size randomization (only to make bigger one) to robustify the image generation in domain B, format : width (y) height (x) or only one size if square
--data_online_creation_mask_square_A	flag		whether masks should be squared for domain A
--data_online_creation_mask_square_B	flag		whether masks should be squared for domain B
--data_online_creation_rand_mask_A	flag		Perform task of replacing noised masks by objects

Semantic segmentation network

Parameter	Type	Default	Description
--f_s_all_classes_as_one	flag		if true, all classes will be considered as the same one (ie foreground vs background)
--f_s_class_weights	array	[]	class weights for imbalanced semantic classes
--f_s_config_segformer	string	models/configs/segformer/segformer_config_b0.json	path to segformer configuration file for f_s
--f_s_dropout	flag		dropout for the semantic network
--f_s_net	string	vgg	specify f_s network [vgg,unet,segformer,sam] Values: vgg, unet, segformer, sam
--f_s_nf	int	64	# of filters in the first conv layer of classifier
--f_s_semantic_nclasses	int	2	number of classes of the semantic loss classifier
--f_s_semantic_threshold	float	1.0	threshold of the semantic classifier loss below with semantic loss is applied
--f_s_weight_sam	string		path to sam weight for f_s, e.g. models/configs/sam/pretrain/sam_vit_b_01ec64.pth, or models/configs/sam/pretrain/mobile_sam.pt for MobileSAM
--f_s_weight_segformer	string		path to segformer weight for f_s, e.g. models/configs/segformer/pretrain/segformer_mit-b0.pth

Semantic classification network

Parameter	Type	Default	Description
--cls_all_classes_as_one	flag		if true, all classes will be considered as the same one (ie foreground vs background)
--cls_class_weights	array	[]	class weights for imbalanced semantic classes
--cls_config_segformer	string	models/configs/segformer/segformer_config_b0.json	path to segformer configuration file for cls
--cls_dropout	flag		dropout for the semantic network
--cls_net	string	vgg	specify cls network [vgg,unet,segformer] Values: vgg, unet, segformer
--cls_nf	int	64	# of filters in the first conv layer of classifier
--cls_semantic_nclasses	int	2	number of classes of the semantic loss classifier
--cls_semantic_threshold	float	1.0	threshold of the semantic classifier loss below with semantic loss is applied
--cls_weight_segformer	string		path to segformer weight for cls, e.g. models/configs/segformer/pretrain/segformer_mit-b0.pth

Output

Parameter	Type	Default	Description
--output_no_html	flag		do not save intermediate training results to [opt.checkpoints_dir]/[opt.name]/web/
--output_num_images	int	20	number of visualized images results from the train/test set
--output_print_freq	int	100	frequency of showing training results on console
--output_update_html_freq	int	1000	frequency of saving training results to html
--output_verbose	flag		if specified, print more debugging information

Visdom display

Parameter	Type	Default	Description
--output_display_G_attention_masks	flag
--output_display_aim_port	int	53800	aim port of the web display
--output_display_aim_server	string	http://localhost	aim server of the web display
--output_display_diff_fake_real	flag		if True x - G(x) is displayed
--output_display_env	string		visdom display environment name (default is “main”)
--output_display_freq	int	400	frequency of showing training results on screen
--output_display_id	int	1	window id of the web display
--output_display_ncols	int	0	if positive, display all images in a single visdom web panel with certain number of images per row.(if == 0 ncols will be computed automatically)
--output_display_networks	flag		Set True if you want to display networks on port 8000
--output_display_type	array	[‘visdom’]	output display, either visdom, aim or no output Values: visdom, aim, none
--output_display_visdom_autostart	flag		whether to start a visdom server automatically
--output_display_visdom_port	int	8097	visdom port of the web display
--output_display_visdom_server	string	http://localhost	visdom server of the web display
--output_display_winsize	int	256	display window size for both visdom and HTML

Model

Parameter	Type	Default	Description
--model_depth_network	string	DPT_Large	specify depth prediction network architecture Values: DPT_Large, DPT_Hybrid, MiDaS_small, DPT_BEiT_L_512, DPT_BEiT_L_384, DPT_BEiT_B_384, DPT_SwinV2_L_384, DPT_SwinV2_B_384, DPT_SwinV2_T_256, DPT_Swin_L_384, DPT_Next_ViT_L_384, DPT_LeViT_224
--model_init_gain	float	0.02	scaling factor for normal, xavier and orthogonal.
--model_init_type	string	normal	network initialization Values: normal, xavier, kaiming, orthogonal
--model_input_nc	int	3	# of input image channels: 3 for RGB and 1 for grayscale, more supported
--model_multimodal	flag		multimodal model with random latent input vector
--model_output_nc	int	3	# of output image channels: 3 for RGB and 1 for grayscale
--model_prior_321_backwardcompatibility	flag		whether to load models from previous version of JG.
--model_type_sam	string	mobile_sam	which model to use for segment-anything mask generation Values: sam, mobile_sam

Training

Parameter	Type	Default	Description
--train_D_accuracy_every	int	1000	compute D accuracy every N iterations
--train_D_lr	float	0.0001	discriminator separate learning rate
--train_G_ema	flag		whether to build G via exponential moving average
--train_G_ema_beta	float	0.999	exponential decay for ema
--train_G_lr	float	0.0002	initial learning rate for generator
--train_batch_size	int	1	input batch size
--train_beta1	float	0.9	momentum term of adam
--train_beta2	float	0.999	momentum term of adam
--train_cls_l1_regression	flag		if true l1 loss will be used to compute regressor loss
--train_cls_regression	flag		if true cls will be a regressor and not a classifier
--train_compute_D_accuracy	flag		whether to compute D accuracy explicitely
--train_compute_metrics_test	flag		whether to compute test metrics, e.g. FID, …
--train_continue	flag		continue training: load the latest model
--train_epoch	string	latest	which epoch to load? set to latest to use latest cached model
--train_epoch_count	int	1	the starting epoch count, we save the model by <epoch_count>, <epoch_count>+<save_latest_freq>, …
--train_export_jit	flag		whether to export model in jit format
--train_feat_wavelet	flag		if true, train in wavelet features space (Note: this may not include all discriminators, when training GANs)
--train_gan_mode	string	lsgan	the type of GAN objective. vanilla GAN loss is the cross-entropy objective used in the original GAN paper. Values: vanilla, lsgan, wgangp, projected
--train_iter_size	int	1	backward will be apllied each iter_size iterations, it simulate a greater batch size : its value is batch_size*iter_size
--train_load_iter	int	0	which iteration to load? if load_iter > 0, the code will load models by iter_[load_iter]; otherwise, the code will load models by [epoch]
--train_lr_decay_iters	int	50	multiply by a gamma every lr_decay_iters iterations
--train_lr_policy	string	linear	learning rate policy. Values: linear, step, multistep, plateau, cosine
--train_lr_steps	array	[]	number of epochs between reductions of the learning rate by gamma=0.1
--train_metrics_every	int	1000	compute metrics every N iterations
--train_metrics_list	array	[‘FID’]	metrics on results quality to compute Values: FID, KID, MSID, PSNR, LPIPS, SSIM
--train_metrics_save_images	flag		whether to save images that result form metrics computation
--train_mm_lambda_z	float	0.5	weight for random z loss
--train_mm_nz	int	8	number of latent vectors
--train_n_epochs	int	100	number of epochs with the initial learning rate
--train_n_epochs_decay	int	0	number of epochs to linearly decay learning rate to zero
--train_nb_img_max_fid	int	1000000000	Maximum number of samples allowed per dataset to compute fid. If the dataset directory contains more than nb_img_max_fid, only a subset is used.
--train_optim	string	adam	optimizer (adam, radam, adamw, …) Values: adam, radam, adamw, lion, adam8bit
--train_optim_eps	float	1e-08	epsilon for optimizer
--train_optim_weight_decay	float	0.0	weight decay for optimizer
--train_pool_size	int	50	the size of image buffer that stores previously generated images
--train_save_by_iter	flag		whether saves model by iteration
--train_save_epoch_freq	int	1	frequency of saving checkpoints at the end of epochs
--train_save_latest_freq	int	5000	frequency of saving the latest results
--train_semantic_cls	flag		if true semantic class losses will be used
--train_semantic_mask	flag		if true semantic mask losses will be used
--train_temporal_criterion	flag		if true, MSE loss will be computed between successive frames
--train_temporal_criterion_lambda	float	1.0	lambda for MSE loss that will be computed between successive frames
--train_use_contrastive_loss_D	flag

Semantic training

Parameter	Type	Default	Description
--train_sem_cls_B	flag		if true cls will be trained not only on domain A but also on domain B
--train_sem_cls_lambda	float	1.0	weight for semantic class loss
--train_sem_cls_pretrained	flag		whether to use a pretrained model, available for non “basic” model only
--train_sem_cls_template	string	basic	classifier/regressor model type, from torchvision (resnet18, …), default is custom simple model
--train_sem_idt	flag		if true apply semantic loss on identity
--train_sem_lr_cls	float	0.0002	cls learning rate
--train_sem_lr_f_s	float	0.0002	f_s learning rate
--train_sem_mask_lambda	float	1.0	weight for semantic mask loss
--train_sem_net_output	flag		if true apply generator semantic loss on network output for real image rather than on label.
--train_sem_use_label_B	flag		if true domain B has labels too

Semantic training with masks

Parameter	Type	Default	Description
--train_mask_charbonnier_eps	float	1e-06	Charbonnier loss epsilon value
--train_mask_compute_miou	flag		whether to compute mIoU on semantic masks prediction
--train_mask_disjoint_f_s	flag		whether to use a disjoint f_s with the same exact structure
--train_mask_f_s_B	flag		if true f_s will be trained not only on domain A but also on domain B
--train_mask_for_removal	flag		if true, object removal mode, domain B images with label 0, cut models only
--train_mask_lambda_out_mask	float	10.0	weight for loss out mask
--train_mask_loss_out_mask	string	L1	loss for out mask content (which should not change). Values: L1, MSE, Charbonnier
--train_mask_miou_every	int	1000	compute mIoU every n iterations
--train_mask_no_train_f_s_A	flag		if true f_s wont be trained on domain A
--train_mask_out_mask	flag		use loss out mask

Data augmentation

Parameter	Type	Default	Description
--dataaug_APA	flag		if true, G will be used as augmentation during D training adaptively to D overfitting between real and fake images
--dataaug_APA_every	int	4	How often to perform APA adjustment?
--dataaug_APA_nimg	int	50	APA adjustment speed, measured in how many images it takes for p to increase/decrease by one unit.
--dataaug_APA_p	int	0	initial value of probability APA
--dataaug_APA_target	float	0.6
--dataaug_D_diffusion	flag		whether to apply diffusion noise augmentation to discriminator inputs, projected discriminator only
--dataaug_D_diffusion_every	int	4	How often to perform diffusion augmentation adjustment
--dataaug_D_label_smooth	flag		whether to use one-sided label smoothing with discriminator
--dataaug_D_noise	float	0.0	whether to add instance noise to discriminator inputs
--dataaug_affine	float	0.0	if specified, apply random affine transforms to the images for data augmentation
--dataaug_affine_scale_max	float	1.2	if random affine specified, max scale range value
--dataaug_affine_scale_min	float	0.8	if random affine specified, min scale range value
--dataaug_affine_shear	int	45	if random affine specified, shear range (0,value)
--dataaug_affine_translate	float	0.2	if random affine specified, translation range (-valueimg_size,+valueimg_size) value
--dataaug_diff_aug_policy	string		choose the augmentation policy : color randaffine randperspective. If you want more than one, please write them separated by a comma with no space (e.g. color,randaffine)
--dataaug_diff_aug_proba	float	0.5	proba of using each transformation
--dataaug_flip	string	horizontal	if specified, flip the images for data augmentation, possible values: none, horizontal, vertical, both Values: none, horizontal, vertical, both
--dataaug_imgaug	flag		whether to apply random image augmentation
--dataaug_no_rotate	flag		if specified, do not rotate the images for data augmentation