Dekodēšanas difūzijas modeļi: pamatkoncepcijas un PyTorch kods

Šajā rakstā es plānoju mēģināt destilēt difūzijas modeļu būtību, lai dotu jums pamata, pamata intuīciju aiz tiem, ar kodu, lai apmācītu pamata difūzijas modeli, kas tiek īstenots PyTorch beigās.

Definition:

Definīcija :

Diffusion modelir veida ģeneratīvais modelis mašīntulkošanā, ko izmanto, lai ģenerētu augstas kvalitātes datus [piemēram, attēlus], sākot ar tīru troksni.Dati ir trokšņi, izmantojot difūzijas soļus pēc Markov ķēdes [jo tas ir stohastisku notikumu secība, kur katrs posms ir atkarīgs no iepriekšējā laika posma] un pēc tam tiek rekonstruēti, apgūstot pretējo procesu.

Atgriezīsimies nedaudz atpakaļ, lai saprastu pamatideju, kas slēpjas aiz difūzijas modeļiem.“Dziļa, neuzraudzīta mācīšanās, izmantojot nelīdzsvarotības termodinamiku”[1]Autori to apraksta kā:

Dziļa, neuzraudzīta mācīšanās, izmantojot nelīdzsvarotības termodinamiku

Galvenā ideja, kas iedvesmota no nelīdzsvarotības statistikas fizikas, ir sistemātiski un lēni iznīcināt struktūru datu sadalījumā, izmantojot iteratīvu virziena difūzijas procesu.

The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.

Difūzijas process būtībā ir sadalīts uz priekšu un atpakaļ fāzē. Ņemsim piemēru, kā ģenerēt reālus augstas kvalitātes attēlus, izmantojot difūzijas modeļus.

Forward Diffusion Phase: We start with a real, high-quality image and add noise to it in steps to arrive at pure noise. Basically, we want to destroy the structure in the non-random data distribution that exists at the start.

Here, q is our forward process, x_t the output of the forward process at time step t, x_(t-1) is an input at time step t. N is a normal distribution with sqrt(1 - β_t) x_{t-1} mean and β_tI variance.

β_t [also called the schedule] here controls the amount of noise added at time step = t whose value ranges from 0→1. Depending on the type of schedule you use, you arrive at what is close to pure noise sooner or later. i.e. β_1,…,β_T is a variance schedule (that is either learned or fixed) which, if well-behaved, ensures that x_T is almost an isotropic Gaussian at sufficiently large T.
Reverse Diffusion Phase: This is where the actual machine learning takes place. As the name suggests, we try to transform the noise back into a sample from the target distribution in this phase. i.e. the model is learning to denoise pure Gaussian noise into a clean image. Once the neural network has been trained, this ability can be used to generate new images out of Gaussian noise through step-by-step reverse diffusion.

Since one cannot readily estimate q(x_(t-1)|x_t), we need to learn a model p_theta to approximate the conditional probabilities for the reverse diffusion process.
We want to model the probability density of an earlier time step given the current. If we apply this reverse formula for all time steps T→0, we can trace our steps back to the original data distribution. The time step information is provided usually as positional embeddings to the model. It is worth mentioning here that the diffusion model predicts the entire noise to be removed at a given timestep to make it equivalent to the image at the start, and not just the delta between the current and previous time step. However, we only subtract part of it and move to the next step. That is how the diffusion process works.

Vispārīgi runājot, izplatīšanas modelisdestroys the structure in training dataar Gausija trokšņa pievienošanu, un pēc tamlearns to recoverPēc apmācības var izmantot difūzijas modeli, lai ģenerētu datus, vienkāršipassing randomly sampled noise through the “learned” denoising processLai iegūtu detalizētu matemātisko paskaidrojumu, skatiet šo blogu [4].

Implementation:

Īstenošana :

Mēs izmantosimOxford Flowers102 datu kopums, kas satur attēlus no ziediem 102 kategorijās, un izveidot ļoti vienkāršu modeli šī raksta mērķiem, lai saprastu pamatideju un īstenošanu izplatīšanas modeļus.

Forward phase:Tā kā gaussiešu summa ir arī gaussiešu summa, lai gan trokšņa pievienošana ir secīga, var iepriekš aprēķināt trokšņainu ievades attēla versiju konkrētam laika posmam [2].

def linear_beta_schedule(timesteps, start=1e-4, end=2e-2):
    """Creates a linearly increasing noise schedule."""
    return torch.linspace(start, end, timesteps)

def get_idx_from_list(vals, t, x_shape):
    """ Returns a specific index t of a passed list of values vals. """
    batch_size = t.shape[0]
    out = vals.gather(-1, t.cpu())
    return out.reshape(batch_size, *((1,) * (len(x_shape) - 1))).to(t.device)

def forward_diffusion_sample(x_0, t, device="cpu"):
    """ Takes an image and a timestep as input and returns the noisy version of it."""
    noise = torch.randn_like(x_0)
    sqrt_alphas_cumprod_t = get_index_from_list(sqrt_alphas_cumprod, t, x_0.shape)
    sqrt_one_minus_alphas_cumprod_t = get_idx_from_list(sqrt_one_minus_alphas_cumprod, t, x_0.shape)
    return sqrt_alphas_cumprod_t.to(device) * x_0.to(device) + sqrt_one_minus_alphas_cumprod_t.to(device) * noise.to(device), noise.to(device)


T = 300  # Total number of timesteps
betas = linear_beta_schedule(T)
# Precompute values for efficiency
alphas = 1. - betas
alphas_cumprod = torch.cumprod(alphas, dim=0)
alphas_cumprod_prev = F.pad(alphas_cumprod[:-1], (1, 0), value=1.0)

sqrt_recip_alphas = torch.sqrt(1. / alphas)
sqrt_alphas_cumprod = torch.sqrt(alphas_cumprod)
sqrt_one_minus_alphas_cumprod = torch.sqrt(1. - alphas_cumprod)
posterior_variance = betas * (1. - alphas_cumprod_prev) / (1. - alphas_cumprod)

Reverse Diffusion Phase:Mēs izmantojam vienkāršu U-Net neironu tīklu, kas ņem trokšņainu attēlu un laika posmu [paredzēts kā pozīcijas iestrādāšana] un prognozē troksni.ConvBlockZemāk esošais slānis izmanto sinusoidālo laika posma iebūvi, uztverot laika kontekstu, lai nostiprinātu konvolucionālo izeju.

class SinusoidalPositionEmbeddings(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim

    def forward(self, t):
        half_dim = self.dim // 2
        scale = math.log(10000) / (half_dim - 1)
        freqs = torch.exp(torch.arange(half_dim, device=t.device) * -scale)
        angles = t[:, None] * freqs[None, :]
        return torch.cat([angles.sin(), angles.cos()], dim=-1)

class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, time_emb_dim, upsample=False):
        super().__init__()
        self.time_mlp = nn.Linear(time_emb_dim, out_channels)
        self.upsample = upsample

        self.conv1 = nn.Conv2d(in_channels * 2 if upsample else in_channels, out_channels, kernel_size=3, padding=1)
        self.transform = (
            nn.ConvTranspose2d(out_channels, out_channels, kernel_size=4, stride=2, padding=1)
            if upsample else
            nn.Conv2d(out_channels, out_channels, kernel_size=4, stride=2, padding=1)
        )
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU()

    def forward(self, x, t):
        h = self.bn1(self.relu(self.conv1(x)))
        time_emb = self.relu(self.time_mlp(t))[(..., ) + (None,) * 2]
        h = h + time_emb
        h = self.bn2(self.relu(self.conv2(h)))
        return self.transform(h)

class SimpleUNet(nn.Module):
    """Simplified U-Net for denoising diffusion models."""

    def __init__(self):
        super().__init__()
        image_channels = 3
        down_channels = (64, 128, 256, 512, 1024)
        up_channels = (1024, 512, 256, 128, 64)
        output_channels = 3
        time_emb_dim = 32

        self.time_mlp = nn.Sequential(
            SinusoidalPositionEmbeddings(time_emb_dim),
            nn.Linear(time_emb_dim, time_emb_dim),
            nn.ReLU()
        )
        self.init_conv = nn.Conv2d(image_channels, down_channels[0], kernel_size=3, padding=1)

        self.down_blocks = nn.ModuleList([
            ConvBlock(down_channels[i], down_channels[i+1], time_emb_dim)
            for i in range(len(down_channels) - 1)
        ])

        self.up_blocks = nn.ModuleList([
            ConvBlock(up_channels[i], up_channels[i+1], time_emb_dim, upsample=True)
            for i in range(len(up_channels) - 1)
        ])

        self.final_conv = nn.Conv2d(up_channels[-1], output_channels, kernel_size=1)

    def forward(self, x, t):
        t_emb = self.time_mlp(t)
        x = self.init_conv(x)
        skip_connections = []

        for block in self.down_blocks:
            x = block(x, t_emb)
            skip_connections.append(x)

        for block in self.up_blocks:
            skip_x = skip_connections.pop()
            x = torch.cat([x, skip_x], dim=1)
            x = block(x, t_emb)
        return self.final_conv(x)

model = SimpleUnet()

Apmācības mērķis ir vienkāršs MSE zaudējums, aprēķinot atšķirību starp faktisko troksni un modeļa prognozi par šo troksni.

def get_loss(model, x_0, t, device):
    x_noisy, noise = forward_diffusion_sample(x_0, t, device)
    noise_pred = model(x_noisy, t)
    return F.mse_loss(noise, noise_pred)

Visbeidzot, pēc tam, kad esam apmācījuši modeli 300 epokām, mēs varam sākt ģenerēt ~ reālistiski izskatu attēlus no ziediem, ņemot paraugus no tīra Gaussian trokšņa un barojot to, izmantojot iemācīto apgriezto difūzijas procesu.

References:

Dziļa neuzraudzīta mācīšanās, izmantojot Nonequilibrium Thermodynamics Sohl-Dickstein, J. et al.
Izplatīšanās iespējamības modeļu denoizēšana Ho et al. [2020]
Izplatīšanas modeļi pārspēj GAN par attēlu sintēzi Dhariwal un Nichol [2021]
Šis pārsteidzošais blogs, lai dziļāk iegremdētos matemātikā aiz difūzijas modeļiem.
Šis repozitorijs piekļūst resursu un dokumentu kolekcijai par izplatīšanas modeļiem.

Šis pārsteidzošais blogs Šis repozitorijs

Dekodēšanas difūzijas modeļi: pamatkoncepcijas un PyTorch kods

Pārāk ilgi; Lasīt

Definition:

Implementation:

References:

About Author

PAKARINĀT TAGUS

ŠIS RAKSTS TIKS PĀRSTRĀDĀTS...

Categories

Trending Topics

Dekodēšanas difūzijas modeļi: pamatkoncepcijas un PyTorch kods

Pārāk ilgi; Lasīt

Definition:

Implementation:

References:

About Author

PAKARINĀT TAGUS

ŠIS RAKSTS TIKS PĀRSTRĀDĀTS...

SAISTĪTI STĀSTI

Categories

Trending Topics