From Elements to Design

Abstract

In this work, we investigate automatic design composition from multimodal graphic elements. Although recent studies have developed various generative models for graphic design, they usually face the following limitations: they only focus on certain subtasks and are far from achieving the design composition task; they do not consider the hierarchical information of graphic designs during the generation process. To tackle these issues, we introduce the layered design principle into Large Multimodal Models (LMMs) and propose a novel approach, called LaDeCo, to accomplish this challenging task. Specifically, LaDeCo first performs layer planning for a given element set, dividing the input elements into different semantic layers according to their contents. Based on the planning results, it subsequently predicts element attributes that control the design composition in a layer-wise manner, and includes the rendered image of previously generated layers into the context. With this insightful design, LaDeCo decomposes the difficult task into smaller manageable steps, making the generation process smoother and clearer. The experimental results demonstrate the effectiveness of LaDeCo in design composition. Furthermore, we show that LaDeCo enables some interesting applications in graphic design, such as resolution adjustment, element filling, design variation, etc. In addition, it even outperforms the specialized models in some design subtasks without any task-specific training.

Go to Top

More

Gallery

Go to Top

How does it work?

Design Principle

Since a holistic design can be divided into different layers according to element semantics, we achieve the design composition task in a layer-by-layer manner.

Layer Planning

we formulate layer planning as an element content understanding problem and leverage pre-trained LMMs to resolve it.

Layerwise Prediction

We finetune Large Multimodal Models (LMMs) to predict element attributes in a layer-wise manner. After generating each layer, the intermediate designs will be rendered as images and fed back into LMMs to guide subsequent layer generation.

Go to Top

Comparisons to Prior Work

FlexDM

GPT-4o

Ours

GT

Go to Top

Generation Process

Background

+ Underlay

+ Logo/Image

+ Text

+ Embellishment

Our method enables some interesting applications in graphic design. Specifically, given the same input elements, it can compose them to create diverse designs, which provides multiple choices to users (called design variations). It can also achieve design composition on the condition of different canvas sizes (called resolution adjustment). The predicted attributes will be adjusted to suit the canvas size, making the final designs appealing in various canvas sizes. Besides, it can add new element on a existing design to make it more pleasing (called element filling).

Go to Top

Design Variations

Go to Top

Resolution Adjustment

Our approach composes the same input elements to designs with different canvas sizes. The long side is 1920px and the short side is 1080px. The generated designs are visually appealing in various canvas sizes.

Go to Top