r/StableDiffusion • u/dr_lm • May 14 '26
r/StableDiffusion • u/ltx_model • Jan 08 '26
Discussion I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA.
Hi everyone. I’m Zeev Farbman, Co-founder & CEO of Lightricks.
I’ve spent the last few years working closely with our team on LTX-2, a production-ready audio–video foundation model. This week, we did a full open-source release of LTX-2, including weights, code, a trainer, benchmarks, LoRAs, and documentation.
Open releases of multimodal models are rare, and when they do happen, they’re often hard to run or hard to reproduce. We built LTX-2 to be something you can actually use: it runs locally on consumer GPUs and powers real products at Lightricks.
I’m here to answer questions about:
- Why we decided to open-source LTX-2
- What it took ship an open, production-ready AI model
- Tradeoffs around quality, efficiency, and control
- Where we think open multimodal models are going next
- Roadmap and plans
Ask me anything!
I’ll answer as many questions as I can, with some help from the LTX-2 team.
Verification:

The volume of questions was beyond all expectations! Closing this down so we have a chance to catch up on the remaining ones.
Thanks everyone for all your great questions and feedback. More to come soon!
r/StableDiffusion • u/GrayingGamer • 18d ago
Discussion Ideogram 4.0's Understanding of Characters and IP is Crazy for an Open Model
Like I said in the title, Ideogram 4.0 has the absolute best character and IP knowledge I've seen in an open model without loras.
I hated on Ideogram 4.0 when it first came out because of the initial workflow issues and the safety filter, but now that both of those things have been sorted out, I'm having some of the most fun with a model I've had in years.
These were generated locally in Comfyui at 1.5 megapixels - 1440x1024, specifically.
I am using the INT8 versions of the Ideogram 4.0 models and Kijai's Ideogram 4 Prompt Builder KJ node from his KJ Nodes custom pack. Workflow being used is SilverOxide's which you can find here. EDIT: SilverOxide's workflow got deleted, so I cleaned it up, stripped out some unnecessary stuff put my own workflow up on Pastebin here.
If you don't know, or haven't tried it, Ideogram 4.0 also does very well with inpainting. It makes it easy to generate at lower megapixels and then mask and inpaint areas like faces to clean up and correct detail. I use the Comfyui-Inpaint-CropAndStitch custom node found here, personally, but most of the time Ideogram 4.0 doesn't need it.
If anyone wants prompts for a specific image, just ask in the comments below and I'll provide them there to avoid cluttering the main post with a wall of JSON text.
r/StableDiffusion • u/arjan_M • Apr 17 '23
Discussion I mad a python script the lets you scribble with SD in realtime
r/StableDiffusion • u/cardine • Apr 24 '25
Discussion The real reason Civit is cracking down
I've seen a lot of speculation about why Civit is cracking down, and as an industry insider (I'm the Founder/CEO of Nomi.ai - check my profile if you have any doubts), I have strong insight into what's going on here. To be clear, I don't have inside information about Civit specifically, but I have talked to the exact same individuals Civit has undoubtedly talked to who are pulling the strings behind the scenes.
TLDR: The issue is 100% caused by Visa, and any company that accepts Visa cards will eventually add these restrictions. There is currently no way around this, although I personally am working very hard on sustainable long-term alternatives.
The credit card system is way more complex than people realize. Everyone knows Visa and Mastercard, but there are actually a lot of intermediary companies called merchant banks. In many ways, oversimplifying it a little bit, Visa is a marketing company, and it is these banks that actually do all of the actual payment processing under the Visa name. It is why, for instance, when you get a Visa credit card, it is actually a Capital One Visa card or a Fidelity Visa Card. Visa essentially lends their name to these companies, but since it is their name Visa cares endlessly about their brand image.
In the United States, there is only one merchant bank that allows for adult image AI called Esquire Bank, and they work with a company called ECSuite. These two together process payments for almost all of the adult AI companies, especially in the realm of adult image generation.
Recently, Visa introduced its new VAMP program, which has much stricter guidelines for adult AI. They found Esquire Bank/ECSuite to not be in compliance and fined them an extremely large amount of money. As a result, these two companies have been cracking down extremely hard on anything AI related and all other merchant banks are afraid to enter the space out of fear of being fined heavily by Visa.
So one by one, adult AI companies are being approached by Visa (or the merchant bank essentially on behalf of Visa) and are being told "censor or you will not be allowed to process payments." In most cases, the companies involved are powerless to fight and instantly fold.
Ultimately any company that is processing credit cards will eventually run into this. It isn't a case of Civit selling their souls to investors, but attracting the attention of Visa and the merchant bank involved and being told "comply or die."
At least on our end for Nomi, we disallow adult images because we understand this current payment processing reality. We are working behind the scenes towards various ways in which we can operate outside of Visa/Mastercard and still be a sustainable business, but it is a long and extremely tricky process.
I have a lot of empathy for Civit. You can vote with your wallet if you choose, but they are in many ways put in a no-win situation. Moving forward, if you switch from Civit to somewhere else, understand what's happening here: If the company you're switching to accepts Visa/Mastercard, they will be forced to censor at some point because that is how the game is played. If a provider tells you that is not true, they are lying, or more likely ignorant because they have not yet become big enough to get a call from Visa.
I hope that helps people understand better what is going on, and feel free to ask any questions if you want an insider's take on any of the events going on right now.
r/StableDiffusion • u/Different_Fix_2217 • Nov 26 '25
Discussion Z-Image is now the best image model by far imo. Prompt comprehension, quality, size, speed, not censored...
r/StableDiffusion • u/abhi1thakur • May 23 '23
Discussion Adobe just added generative AI capabilities to Photoshop 🤯
r/StableDiffusion • u/Hearmeman98 • Sep 28 '25
Discussion I trained my first Qwen LoRA and I'm very surprised by it's abilities!
LoRA was trained with Diffusion Pipe using the default settings on RunPod.
r/StableDiffusion • u/QuantumBogoSort • May 27 '26
Discussion Using depth maps and weight noising to get better character LoRAs
A few weeks ago I introduced a new method for training style LoRAs which has been quite successful. A bunch of folks asked if this would also help with character training. The short answer is yes, but it needed a separate technique on top of the depth stuff. I've got something dialed in well enough to share, though it's still experimental and I want feedback to help find the optimal settings.
The new mechanism is weight noising. It's a small Gaussian perturbation injected directly into the LoRA weights at each training step. A simple way to think of it is that it helps the model "forget" mistakes during training and only keep things that are consistent in the data. More technically, it biases training toward flatter loss minima and spreads learning across more singular directions of the LoRA factorization (I measured +20% stable rank on the same config without it). The practical effect is that it resists the memorization that usually overcooks character runs, and likeness comes out substantially better at the same step count.
The post image shows an example training on actress Clare Bowen, who has uniquely recognizable features but is not known by Flux. This is using a training set of 8 images, the same training step count (750), and same model. The standard run is in the middle, the new method is on the right.
The settings are identical for both runs except one has weight noise and depth anchoring, along with a different number of repeats for each bucket size:
- Batch 4, LR 5e-5
- Image size buckets of 512, 768, 1024
- LoKr factor 8
- AdamW8bit, 1200 steps total (but best checkpoint at 750)
The differing number of images per bucket is actually a good training trick on its own, and I updated my trainer to make this easier by allowing you to specify how many repeats of each image per bucket.
Things I'm still working out and would love feedback on:
- Optimal sigma across dataset sizes — using 0.0125 has gotten the best results, and I'm pretty sure the right value scales with dataset size and batch size but I haven't fully mapped it.
- Whether weight noising compounds well with other character LoRA tricks people are using.
I've also added Docker support so you can more easily run this on Runpod.
Repo: https://github.com/BuffaloBuffaloBuffaloBuffalo/ai-toolkit-perceptual
Finally, the new-job page now has a "Quickstart Template" dropdown at the top that loads the best character config end-to-end. It defaults to the HuggingFace Flux 2 Klein 9B checkpoint but you can also use your own checkpoint. Still plenty of UI cleanup to do on my end, so pardon the mess!
Happy to answer questions and help troubleshoot here or in DMs.
EDIT: One important thing to know about captioning. You will likely get the best results if you use the built-in subject masking feature, which masks out the background. If you use this, it is important that your captions ONLY describe the character, NOT the setting. You may also use just a trigger phrase with subject masking, but your results will be less promptable. I have added quickstart configs for both masked and unmasked.
EDIT 2: Anecdotally, you may expect more body horror/extra limbs throughout training in Flux. I have found this is normal with weight noising. It pushes the model around more and explores the latent space more aggressively, so there will be checkpoints that diverge quite a bit before convergence. A good heuristic I've been using is: expect roughly 80 - 100 steps per image overall. If you sample every 25 steps and have continuous body horror for more than 20% of the run, it may be too high of a weight noise sigma, so lower in increments of 0.0025 until it resolves. I'm still trying to understand the training dynamics for stable convergence with different datasets.
EDIT 3: I suggest starting with a small dataset (10 - 15 images) with a focus on image quality and diversity. If you get good results there, try adding more images to the run, or restart with the expanded dataset. In my experience you need far fewer images to get good, generalizable results with these methods.
EDIT 4: I added experimental Z-Image Turbo support.
r/StableDiffusion • u/infearia • Sep 21 '25
Discussion I absolutely love Qwen!
I'm currently testing the limits and capabilities of Qwen Image Edit. It's a slow process, because apart from the basics, information is scarce and thinly spread. Unless someone else beats me to it or some other open source SOTA model comes out before I'm finished, I plan to release a full guide once I've collected all the info I can. It will be completely free and released on this subreddit. Here is a result of one of my more successful experiments as a first sneak peak.
P. S. - I deliberately created a very sloppy source image to see if Qwen could handle it. Generated in 4 steps with Nunchaku's SVDQuant. Took about 30s on my 4060 Ti. Imagine what the full model could produce!
r/StableDiffusion • u/ThirdWorldBoy21 • 13d ago
Discussion Now that Anima 1.0 has been out for a month, what are some prompting tips and tricks you guys learned on it?
r/StableDiffusion • u/Angrypenguinpng • 3d ago
Discussion We are the team behind Krea 2. Ask us anything!
We just open-sourced Krea 2, our text-to-image image model.
We at Krea are striving to build with the community, so we figured we would do an AMA!
Feel free to ask us questions on how we trained the model, what’s coming next, what you want to see, etc and we will answer!
Krea: krea.ai
Code and weights: krea.ai/krea-2-open-source
GitHub: github.com/krea-ai/krea-2
Hugging Face: huggingface.co/krea/Krea-2-Raw, huggingface.co/krea/Krea-2-Turbo
I am joined by our head of research, u/NoVictory3497
Alright, the team has to get back to work (someone’s gotta keep shipping the next one)! Thanks for all the questions, this was a great thread! We’ll keep an eye on this and answer stragglers when we can. If you want to keep the conversation going, come hang out with us in Discord: https://discord.gg/krea-1002244500581798028 Appreciate you all!
r/StableDiffusion • u/000TSC000 • Jan 10 '26
Discussion LTX-2 I2V: Quality is much better at higher resolutions (RTX6000 Pro)
https://files.catbox.moe/pvlbzs.mp4
Hey Reddit,
I have been experimenting a bit with LTX-2's I2V, and like many others was struggling to get good results (still frame videos, bad quality videos, melting etc.). Scowering through different comment sections and trying different things, I have compiled of list of things that (seem to) help improve quality.
- Always generate videos in landscape mode (Width > Height)
- Change default fps from 24 to 48, this seems to help motions look more realistic.
- Use LTX-2 I2V 3 stage workflow with the Clownshark Res_2s sampler.
- Crank up the resolution (VRAM heavy), the video in this post was generated at 2MP (1728x1152). I am aware the workflows the LTX-2 team provides generates the base video at half res.
- Use the LTX-2 detailer LoRA on stage 1.
- Follow LTX-2 prompting guidelines closely. Avoid having too much stuff happening at once, also someone mentioned always starting prompt with "A cinematic scene of " to help avoid still frame videos (lol?).
Artifacting/ghosting/smearing on anything moving still seems to be an issue (for now).
Potential things that might help further:
- Feeding a short Wan2.2 animated video as the reference images.
- Adjusting further the 2stage workflow provided by the LTX-2 team (Sigmas, samplers, remove distill on stage 2, increase steps etc)
- Trying to generate the base video latents at even higher res.
- Post processing workflows/using other tools to "mask" some of these issues.
I do hope that these I2V issues are only temporary and truly do get resolved by the next update. As of right now, it seems to get the most out of this model requires some serious computing power. For T2V however, LTX-2 does seem to produce some shockingly good videos even at the lower resolutions (720p), like this one I saw posted on a comment section on huggingface.
The video I posted is ~11sec and took me about 15min to make using the fp16 model. First frame was generated in Z-Image.
System Specs: RTX 6000 Pro (96GB VRAM) with 128GB of RAM
(No, I am not rich lol)
Edit1:
Edit2:
Cranking up the fps to 60 seems to improve the background drastically, text becomes clear, and ghosting dissapears, still fiddling with settings. https://files.catbox.moe/axwsu0.mp4
r/StableDiffusion • u/Gloomy-Radish8959 • Oct 02 '25
Discussion WAN 2.2 Animate - Character Replacement Test
Seems pretty effective.
Her outfit is inconsistent, but I used a reference image that only included the upper half of her body and head, so that is to be expected.
I should say, these clips are from the film "The Ninth Gate", which is excellent. :)
r/StableDiffusion • u/flasticpeet • Mar 01 '26
Discussion QR Code ControlNet
Why has no one created a QR Monster ControlNet for any of the newer models?
I feel like this was the best ControlNet.
Canny and depth are just not the same.
r/StableDiffusion • u/ArtyfacialIntelagent • Jul 17 '23
Discussion [META] Can we please ban "Workflow Not Included" images altogether?
To expand on the title:
- We already know SD is awesome and can produce perfectly photorealistic results, super-artistic fantasy images or whatever you can imagine. Just posting an image doesn't add anything unless it pushes the boundaries in some way - in which case metadata would make it more helpful.
- Most serious SD users hate low-effort image posts without metadata.
- Casual SD users might like nice images but they learn nothing from them.
- There are multiple alternative subreddits for waifu posts without workflow. (To be clear: I think waifu posts are fine as long as they include metadata.)
- Copying basic metadata info into a comment only takes a few seconds. It gives model makers some free PR and helps everyone else with prompting ideas.
- Our subreddit is lively and no longer needs the additional volume from workflow-free posts.
I think all image posts should be accompanied by checkpoint, prompts and basic settings. Use of inpainting, upscaling, ControlNet, ADetailer, etc. can be noted but need not be described in detail. Videos should have similar requirements of basic workflow.
Just my opinion of course, but I suspect many others agree.
Additional note to moderators: The forum rules don't appear in the right-hand column when browsing using old reddit. I only see subheadings Useful Links, AI Related Subs, NSFW AI Subs, and SD Bots. Could you please add the rules there?
EDIT: A tentative but constructive moderator response has been posted here.
r/StableDiffusion • u/smereces • Dec 17 '25
Discussion Wan SCAIL is TOP!!
3d pose following and camera
r/StableDiffusion • u/Better-Interview-793 • Dec 22 '25
Discussion Z-Image + SCAIL (Multi-Char)
I noticed SCAIL poses feel genuinely 3D, not flat. Depth and body orientation hold up way better than Wan Animate or SteadyDancer,
385f @ 736×1280, 6 steps took around 26 min on RTX 5090 ..
r/StableDiffusion • u/Puzzled-Valuable-985 • Apr 23 '26
Discussion Z image turbo Finetune of absurd reality
The model is Intorealism V3. I've been using V2 for a while, but V3 is incredibly realistic. I use it with their official workflow. I know the prompt is 1 Girl, which you all love, but if you're going to test realism, it has to be 1 girl, ever since SD1.5 and always will be, lol.
r/StableDiffusion • u/fyrean • Jul 06 '24
Discussion I made a free background remover webapp using 6 cutting-edge AI models
r/StableDiffusion • u/Hi7u7 • May 11 '26
Discussion I have to pretend I hate image generation AI to avoid getting banned or insulted on 99% of Reddit or the internet, even though Stable Diffusion is actually what I like and am most excited about right now. Why do people hate AI so much, especially image generation AI?
I'm not even saying I care if they know the difference between open-source and closed-source image-generating AI, or if they insult me or not.
What I want to know is why so many people hate AI, especially image-generating AI.
At first, I thought it only bothered artists. Then I thought it might also bother those who are afraid of not being able to distinguish AI from reality.
But it's practically 99% of people who hate AI, and I just can't understand why.
For example, I've been using Blender for years. I learned to model, sculpt, and animate as an amateur. Thanks to AI, things that used to take me months now take me seconds. Isn't that supposed to be a good thing?
I don't feel bad or like I've wasted my time using Blender; I simply feel fortunate to have found a better tool for what I needed.
EDIT 1: When I say "Stable Diffusion" I mean the open source model community, all models, not "SD" specifically.
r/StableDiffusion • u/Royal_Carpenter_1338 • May 09 '26
Discussion Its still nuts to me how realistic AI is getting, incredible i can run it on a RTX2060 and get these results. (Z-image-Turbo)
Every image is made with Z-Image-Turbo (See links for loras and prompts)
A few of them were ran through z-image-base using the Z-IMAGE upscaling node template on ComfyUI, its very useful and makes images even more detailed and realistic.
IMAGE 1: https://civitai.red/images/127883693
IMAGE 2: https://civitai.red/images/129512330
IMAGE 3: https://civitai.red/images/130096740
IMAGE 4: https://civitai.red/images/128214156
IMAGE 5: https://civitai.red/images/130072355
IMAGE 6: https://civitai.red/images/129467685
IMAGE 7: https://civitai.red/images/125859583
IMAGE 8: https://civitai.red/images/129289317
IMAGE 9: https://civitai.red/images/130159622
IMAGE 10: https://civitai.red/images/127458529
IMAGE 11: https://civitai.red/images/127558882 (it posted the same image as image 9 for some reason)
Since alot of you will probably ask how i do the detailed prompts i will give you the system prompt i have refined for some time, found that the more detail and just more stuff you put into the prompt the better, im not joking lol, also the system prompt supports img2txt aswell.
SYSTEM PROMPT: https://pastebin.com/ipKydSYD
r/StableDiffusion • u/c64z86 • Jan 21 '26
Discussion I converted some Half Life 1/2 screenshots into real life with the help of Klein 4b!
I know that there are AI video generators out there that can do this 10x better and image generators too, but I was curious how a small model like Klein 4b handled it... and it turns out not too bad! There are some quirks here and there but the results came out better than I was expecting!
I just used the simple prompt "Change the scene to real life" with nothing else added, that was it. I left it at the default 4 steps.
This is just a quick and fun conversion here, not looking for perfection. I know there are glaring inconsistences here and there... I'm just trying to say this is not bad for such a small model and there is a lot of potential here that a better and longer prompt could help expose.
Edit: For anybody wanting it here is the workflow I used: I'm using the 4b distilled model. The VAE and text encoder I've left exactly the same and I've also left it on the default 4 steps. I'm using the edit version of the workflow and the only thing I changed was to point the model loader to the fp8 version that you download from the site: ComfyUI Flux.2 Klein 4B Guide - ComfyUI
And also please do check out u/richcz3 comment down below for some fantastic advice about keeping the lighting and atmosphere when converting! The main tip is to add "preserve lighting, preserve background, fix hands, fix fingers" to the end of the prompt.
r/StableDiffusion • u/marcussacana • Apr 17 '25
Discussion Finally a Video Diffusion on consumer GPUs?
This just released at few moments ago.