Overall structure of our CookAnything model, illustrated with a 3-step vegetable pancake recipe.
Figure: Overview of the AutoRecipe Framework.
Overall structure of our CookAnything model, illustrated with a 3-step vegetable pancake recipe.
Figure: Overview of the AutoRecipe Framework.
Figure: Qualitative comparisons. SKD refers to StackedDiffusion, and SD3.5 refers to Stable Diffusion 3.5.
Comparison with Other Models in RecipeGen.
| Category | Method | Step Flexibility | Joint Generation | Goal Faithfulness (↑) |
Cross-Step Consistency (↓) |
SF (↑) (C) |
SF (↑) (G) |
IA (↑) (G) |
IA (↑) (H) |
UB (↑) (G) |
UB (↑) (H) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| UNet - based | SD1.5 [32] | ✔ | ✖ | 26.84 | 5.42 | 28.40 | 5.30 | 6.14 | 3.41 | - | - |
| SD2.1 [34] | ✔ | ✖ | 26.88 | 7.54 | 28.51 | 6.06 | 6.81 | 3.45 | - | - | |
| SDXL [26] | ✔ | ✖ | 27.46 | 2.98 | 29.37 | 6.79 | 7.51 | 3.71 | - | - | |
| SKD [18] | ✖ | ✔ | 26.62 | 0.7 | 28.53 | 4.57 | 6.67 | 2.59 | 6.43 | 3.59 | |
| DiT - based | SD3.5 [35] | ✔ | ✖ | 27.42 | 2.97 | 28.77 | 6.73 | 7.58 | 3.97 | - | - |
| Flux.1 - dev [11] | ✔ | ✖ | 26.47 | 3.47 | 28.31 | 5.31 | 5.93 | 5.71 | - | - | |
| IC - LoRA [9] | ✔ | ✔ | 26.07 | 9.03 | 26.58 | 4.03 | 5.50 | 3.91 | 5.34 | 4.45 | |
| RPF [4] | ✔ | ✔ | 27.19 | 8.73 | 25.99 | 4.45 | 7.05 | 3.02 | 3.89 | 4.24 | |
| Layout - aware | GLIGEN | ✔ | ✖ | 26.99 | 2.17 | 26.72 | 5.17 | 6.16 | 5.28 | - | - |
| A + R [25] | ✔ | ✖ | 26.31 | 2.46 | 27.63 | 4.58 | 5.46 | 5.29 | - | - | |
| DiT - based | Ours (TF) | ✔ | ✔ | 30.12 | 0.17 | 29.80 | 8.52 | 9.12 | 6.92 | 9.89 | 7.66 |
| Ours (TB) | ✔ | ✔ | 30.59 | 0.19 | 30.45 | 8.69 | 9.27 | 7.15 | 9.70 | 8.48 |