Show Lab

All

68 repositories

Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
6•147•0•0•Updated Oct 23, 2024Oct 23, 2024
computer_use_ootb
Public
An out-of-the-box (OOTB) version of Anthropic Claude Computer Use
0•0•0•0•Updated Oct 23, 2024Oct 23, 2024
Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion
199•3.3k•0•0•Updated Oct 23, 2024Oct 23, 2024
Show-o
Public
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•42•946•25•0•Updated Oct 22, 2024Oct 22, 2024
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
4•185•0•0•Updated Oct 22, 2024Oct 22, 2024
videogui
Public
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•0•20•0•0•Updated Oct 22, 2024Oct 22, 2024
LOVA3
Public
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark visual-question-answering multimodal-deep-learning visual-question-generation multimodal-large-language-models data-asse
Python
•1•63•0•0•Updated Oct 21, 2024Oct 21, 2024
EvolveDirector
Public
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Python
•0•34•0•0•Updated Oct 14, 2024Oct 14, 2024
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
12•422•1•0•Updated Oct 10, 2024Oct 10, 2024
VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
0•22•2•0•Updated Oct 3, 2024Oct 3, 2024
MovieSeq
Public
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•27•0•0•Updated Oct 1, 2024Oct 1, 2024
GUI-Narrator
Public
Repository of GUI Action Narrator
JavaScript
•0•3•0•0•Updated Sep 22, 2024Sep 22, 2024
RingID
Public
Python
•0•13•1•0•Updated Aug 30, 2024Aug 30, 2024
MotionDirector
Public
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
video-generation diffusion-models text-to-video text-to-motion text-to-video-generation motion-customization
Python
•
Apache License 2.0
•49•832•20•0•Updated Aug 21, 2024Aug 21, 2024
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•27•210•15•0•Updated Aug 15, 2024Aug 15, 2024
X-Adapter
Public
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Python
•
Apache License 2.0
•43•736•17•4•Updated Aug 14, 2024Aug 14, 2024
afformer
Public
Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)
deep-learning pytorch
Python
•2•38•6•0•Updated Jul 26, 2024Jul 26, 2024
BoxDiff
Public
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
text-to-image-synthesis diffusion-models
Python
•14•245•6•0•Updated Jul 21, 2024Jul 21, 2024
cvpr2024-tutorial-video-diffusion-models
Public
HTML
•
MIT License
•0•1•0•0•Updated Jul 16, 2024Jul 16, 2024
DragAnything
Public
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Python
•13•412•20•0•Updated Jul 2, 2024Jul 2, 2024
AssistGaze
Public
Python
•0•1•0•0•Updated Jun 25, 2024Jun 25, 2024
VisInContext
Public
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Python
•1•11•1•0•Updated Jun 6, 2024Jun 6, 2024
cosmo
Public
Python
•4•70•2•2•Updated May 10, 2024May 10, 2024
EgoVLP
Public
[NeurIPS2022] Egocentric Video-Language Pretraining
pretraining video-language egocentric-vision pytorch
Python
•20•224•5•0•Updated May 9, 2024May 9, 2024
UniVTG
Public
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
video-summarization video-grounding pretraining moment-retrieval highlight-detection video-language
Python
•
MIT License
•28•317•19•0•Updated May 8, 2024May 8, 2024
VisorGPT
Public
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
image-generation gpt diffusion-models controlnet
Python
•
MIT License
•2•131•4•0•Updated May 4, 2024May 4, 2024
Long-form-Video-Prior
Public
Python
•0•23•0•0•Updated May 3, 2024May 3, 2024
assistgui
Public
JavaScript
•1•23•1•0•Updated Apr 16, 2024Apr 16, 2024
T2VScore
Public
T2VScore: Towards A Better Metric for Text-to-Video Generation
1•77•3•0•Updated Apr 10, 2024Apr 10, 2024
sparseformer
Public
(ICLR 2024, CVPR 2024) SparseFormer
computer-vision transformer efficient-neural-networks vision-transformer sparseformer
Python
•
MIT License
•1•62•1•0•Updated Mar 30, 2024Mar 30, 2024