With the prompt redescription mechanism, our model obtains the firmly aligned $[z_0, z'_0]$ , $[I_{source}, I_{reco}]$ combinations. We apply CLIP to compute the domain gap $\Delta c$ between the source domain and target domain for image editing. Specifically, the CLIP is used to extract the high-level features of source domain sentences and target domain sentences, respectively. And the mean difference, which is computed along those features, is represented as the domain gap $\Delta c$ . Then, we apply the target text embedding $c_{rewrite} + \Delta c$ for zero-shot image translation with diffusion inversion process. With $T$ -time inversion, MirrorDiffusion can obtain the corresponding latent code $z'_0$ , which corresponds to $I_{trans}$ with $Dec(\cdot)$ .