GenWardrobe is end-to-end system that understands ‘human–complex context constraint–fashion knowledge’ to generate travel fashion wardrobes. Check out the demo track of this paper on our YouTube channel: https://youtu.be/Y7po8Cq-R4o.

Figure 1. Illustration of the overall system design.
- 👤 User Query Analysis
- 🔍 Fashion Knowledge Retrieval via RAG
- 🖼️ Wardrobe Image Generation
- We leverage an RAG framework to obtain a set of fashion knowledge from a well-curated large-scale fashion knowledge base. The extraction principles of fashion knowledge are illustrated in Figure 2

Figure 2. Fashion knowledge extraction from the image.
- Add your API_KEY in config.py.
- Start the backend with Flask to receive POST requests for generation.
- Front-end and back-end interaction with user photo and scene description upload on the Front-end.
- Run app.py to execute the entire pipeline.
Preparing, coming soon...
Preparing, coming soon...
- To showcase the end-to-end workflow of the system, we built an interactive demonstration, where the raw user input consists of a full-body photo and a travel plan text. The demonstration outputs both a pure fashion wardrobe and visualized fashion wardrobe.

Figure 3. Illustration of the interface of the demo system.
- After the user uploads the raw input, the system first calls the Gemini 2.0 Flash API to extract relevant information, which takes approximately 1 second.
- Once the user information is obtained, the system utilizes it to retrieve the top-k most relevant JSON files from the database using the Gemini 2.0 Flash API, a process that takes around 15 seconds.
- With the top-k JSON files, the system then generates both the pure fashion wardrobe and the visualized fashion wardrobe using the Stable Diffusion 3.5 Large, Gemini 2.0 Flash, and GPT-Image-1 APIs. Each JSON file takes approximately 20 seconds to process.
- To validate the professionalism and practicality of the generated recommendations, we conducted a double-blind expert evaluation experiment. Six experts in the field of fashion design were invited to anonymously evaluate two sets of outfit recommendation images: those generated by the proposed system and those produced without using it. The evaluation was conducted using a combination of pairwise comparison and anchored rating scales.
- To verify the effectiveness of the system across different task scenarios, we tested a variety of input cases, as listed below:
- General Travel Scenario
- Example: I plan to travel to Phuket in early May. Please recommend suitable outfits, preferably in a casual style.
- Formal Event Scenario
- Example: I have a business meeting in Singapore in June. Please recommend appropriate attire for a formal indoor setting.
- Leisure and Outdoor Activity Scenario
- Example: I plan to visit a park to enjoy the flowers this weekend. Please recommend some suitable outfits.
- Daily/Commuting Scenario
- Example: Please recommend an outfit suitable for daily commuting in summer in Shanghai.
- Multi-task Scenario
- Example: I will travel to Bali in July for about three days, and then attend a wedding in London. Please recommend outfits for both occasions.
- To ensure the accuracy of the evaluation, we invited experts in the fashion field to develop corresponding evaluation scales and methods, as shown in the figure below.

Figure 4. Evaluation_metrics.
- The average score across all comparisons was +1.146 (indicating a preference between “better” and “much better”), and statistical analysis confirmed a significant advantage in both aesthetic quality and contextual relevance.
Figure 5. Evaluation_result.
For questions or collaborations, please contact: [Peng Jin] [[email protected]]