A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Fixed the issue with the remove module. Added clip embedding capabilities, initially adapting to multi-modal scenarios.