What can YOLO-World&EfficientSAM&Stable Diffusion do?

Detect specific objects in video in real time, then segment objects, and use natural language to replace, modify, style, etc. specific objects!
Are you familiar? The scenes in science fiction movies are realized!
This means that you can replace and modify the content in any image and video in real time, or even replace a certain character in the video.

Author@skalskip92

Online experience:http://huggingface.co/spaces/SkalskiP/YOLO-World

Use the YOLO-World + EfficientSAM combination to zero-sample segmentation of the source clip.
prompt: “woman walking in red dress”
Tip: “Woman walking in a red dress”
Real-time detection of women walking in red dresses

Can be more refined: only detect red skirts on women

ComfyUI implementation of YOLO-World + EfficientSAM. Interested parties can try

GitHub:https://github.com/ZHO-ZHO-ZHO/ComfyUI-YoloWorld-EfficientSAM

Video:

Scroll to Top