Detect specific objects in video in real time, then segment objects, and use natural language to replace, modify, style, etc. specific objects!
Are you familiar? The scenes in science fiction movies are realized!
This means that you can replace and modify the content in any image and video in real time, or even replace a certain character in the video.
Author@skalskip92
Online experience:http://huggingface.co/spaces/SkalskiP/YOLO-World
Use the YOLO-World + EfficientSAM combination to zero-sample segmentation of the source clip.
prompt: “woman walking in red dress”
Tip: “Woman walking in a red dress”
Real-time detection of women walking in red dresses
Can be more refined: only detect red skirts on women
ComfyUI implementation of YOLO-World + EfficientSAM. Interested parties can try
GitHub:https://github.com/ZHO-ZHO-ZHO/ComfyUI-YoloWorld-EfficientSAM
Video: