With the prompt redescription mechanism, our model obtains the firmly aligned , combinations. We apply CLIP to compute the domain gap between the source domain and target domain for image editing. Specifically, the CLIP is used to extract the high-level features of source domain sentences and target domain sentences, respectively. And the mean difference, which is computed along those features, is represented as the domain gap . Then, we apply the target text embedding for zero-shot image translation with diffusion inversion process. With -time inversion, MirrorDiffusion can obtain the corresponding latent code , which corresponds to with .