Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.
Continue VNNI acceleration support, we add optimization for more CNN models including object detection models, enhance model scales generation support for VNNI.
Add attention based model support, we add Transformer implementation for both lanuage model and translation model.
RNN optimization, We support LSTM integration with MKL-DNN which acheives ~3x performance speedup.
mapPartition
Release Notes