https://github.com/kijai/ComfyUI-KwaiKolorsWrapper/blob/main/examples/kolors_example.json
| :--- | | fp16 | ~13 GB | | quant8 | ~8 GB | | quant4 | ~4 GB |
5. Generate
Model is downloaded at first generation.
Since it eats quite time only for first time, let’s wait while eating meal.
Generation Result
Generation Result 1
Try text generation
Prompt
A lively and cheerful anime-style character holding a piece of paper with the word ‘love’ written on it. The character is smiling brightly and is surrounded by vibrant, colorful backgrounds. The atmosphere is joyful and inviting, designed to attract and captivate viewers.
Regarding text generation, it is luck game same as SD3.

Generation Result 2
Since it is high performance Text Encoder, I input “long” prompt.
A vibrant outdoor scene in a lively city park during a sunny day. The park is bustling with activity and full of cheerful people of all ages. In the foreground, a group of friends are having a picnic on a colorful blanket, laughing and enjoying various snacks and drinks. Nearby, children are playing games like tag and flying kites, their faces beaming with joy.
In the background, a beautiful fountain is surrounded by blooming flowers, adding to the lively ambiance. Couples are strolling hand-in-hand along the pathways, while others are riding bicycles and skateboards. Some people are sitting on benches, reading books or chatting animatedly with friends.
The trees are lush and green, casting dappled shadows on the ground, and birds are flying above or perched on branches. The sky is a bright blue with a few fluffy white clouds, and the sun is shining brightly, casting a warm and inviting glow over the entire scene.
Street performers are entertaining the crowd with music and dance, and a small ice cream cart is attracting a line of excited children. Signs with short words like ‘Fun’, ‘Play’, and ‘Smile’ are placed around the park. The ground has colorful chalk drawings with words like ‘Joy’ and ‘Happy’ written by children.
The overall atmosphere is one of happiness, energy, and community, making everyone feel welcome and engaged.
Japanese Translation
(Skipped as it is translation of prompt above)
If stuffing too much, it seems not all are described indeed.

Generation Result 3
Feed first 2, 3 paragraphs of “I Am a Cat” as is (Translated by ChatGPT).
I am a cat. I have no name yet. I have no idea where I was born. All I remember is crying “meow meow” in a damp and gloomy place. It was there that I first saw a human. Later, I learned that this human was the most vicious kind, known as a “student.” It is said that students sometimes catch and boil us to eat. However, at that time, I had no particular thoughts and didn’t find it frightening. When I was lifted up in his palm, I only felt a fluffy sensation. The first human face I ever saw was when I sat calmly in the palm and looked up. Even now, I remember thinking it was a strange thing. First, a face should be decorated with fur, but his was smooth like a kettle. Since then, I’ve met many cats, but I’ve never encountered such an odd one. Moreover, his face protruded too much in the middle, and from that hole, he occasionally blew smoke, which was quite choking. I later learned that this was something called a “tobacco” that humans smoke.
Description of emotion etc. seems impossible indeed, and only gloomy atmosphere, kitten, and element of tobacco (smoke) remained.

Conclusion
It is difficult point that required VRAM becomes high due to large scale language model, but performance of image generation seems perfect. If you want to emphasize sentence understanding ability, I think it is acceptable as choice.
Since it can be tried for free, those interested please try.








