FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing
Publication List
Research
A complete list of publications and preprints. Use the filters below to narrow by year, type, keyword, or highlighted works.
Published Papers
2026
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
Omni-Attack: Adversarial Attacks on Open-Ended VQA in Black-Box Multimodal LLMs
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Swiftvideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment
2025
Towards Reliable and Holistic Visual In-Context Learning Prompt Selection
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation
StrandDesigner: Towards Practical Strand Generation with Sketch Guidance
AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection
SVFR: A Unified Framework for Generalized Video Face Restoration
2024-2019
Towards Global Optimal Visual In-Context Learning Prompt Selection
Faster OreFSDet: A Lightweight and Effective Few-Shot Object Detector for Ore Images
PatchMix Augmentation to Identify Causal Features in Few-shot Learning
Exploring Efficient Few-shot Adaptation for Vision Transformers
Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning
The Image Local Autoregressive Transformer
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
Learning Dynamic Alignment via Meta-filter for Few-shot Learning
Learning a Few-shot Embedding Model by Contrastive Learning
Pose-Guided Person Image Synthesis in the Non-Iconic Views
An Embarrassingly Simple Baseline to One-Shot Learning
Instance Credibility Inference for Few-Shot Learning
Learning to Score Figure Skating Sport Videos
Preprint Papers
DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation
On the Theory of Cross-Modality Distillation with Contrastive Learning
StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation