Utilities to use the Hugging Face Hub API
The main features for this release are:
request
and streamingRequest
HfInference
class: great for tree-shakingfeatureExtraction
(the existing featureExtraction
task was renamed to sentenceSimilarity
, oops!), credits @radamesThe other changes for recent versions are detailed at the end (including textGenerationStream
for streaming text generation, ...)
Inference endpoints are the next step for using Inference API for a specific model in production.
The different tiers for inference are:
Here's how you can call an inference endpoint:
const inference = new HfInference("hf_...");
const gpt2 = inference.endpoint('https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2');
const { generated_text } = await gpt2.textGeneration({inputs: 'The answer to the universe is'});
You can even use the free inference API backend with this syntax:
const endpoint = inference.endpoint("https://api-inference.huggingface.co/models/google/flan-t5-xxl");
const { generated_text } = await endpoint.textGeneration({
inputs: "one plus two equals",
});
It's easy to switch between Inference API & Inference Endpoints. So easy, that you can even do this:
await inference.textGeneration({
model: 'https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2',
inputs: 'The answer to the universe is'
});
@huggingface/inference
supports tasks from https://huggingface.co/tasks, and is typed accordingly. But what if your model has additional inputs, or even custom inputs or outputs?
You can now use .request
and .streamingRequest
!
const output = await inference.request({
inputs: "blablabla",
parameters: {
custom_parameter_1: ...,
...
}
});
For streaming responses, use streamingRequest
rather than request
.
All existing tasks can use request
or streamingRequest
instead :exploding_head:
const {generated_text} = await inference.textGeneration({model: "gpt2", inputs: "The answer to the universe is"});
// small output change for .textGeneration to .request: the raw response is actually an array
const [{generated_text}] = await inference.request({model: "gpt2", inputs: "The answer to the universe is"});
for await (const output of inference.textGenerationStream({
model: "google/flan-t5-xxl",
inputs: "Repeat 'one two three four'"
})) {}
// is equivalent to
for await (const output of inference.streamingRequest({
model: "google/flan-t5-xxl",
inputs: "Repeat 'one two three four'"
})) {}
Of course, request
and streamingRequest
can also be used with Inference Endpoints! Actually, if you make your own custom models and inputs / outputs for your business use case, it'll probably be what you use.
You don't like the current API, you don't like classes, and want the strict minimum in your bundle? No need to say more, I know which frontend framework (or should I say library ;)) you use.
Don't worry, you can import individual functions - this release of @hugginface/inference
is all about choice and flexibility:
import { textGeneration } from "@huggingface/inference";
await textGeneration({
accessToken: "hf_...", // new param
model: "gpt2", // or your own inference endpoint
inputs: "The best, most efficient and purest frontend framework is: "
});
questionAnswer
and tableQuestionAnswer
have been renamed to questionAnswering
and tableQuestionAnswering
featureExtraction
has been renamed to sentenceSimilarity
and a new featureExtraction
was created :bow:textGenerationStream
to generate streaming content by returning an AsyncIterable
. Yay for for await
! Credits to @vvmnnnkv. Demo
imageToText
to caption images among other things. Credits to @vvmnnnkv. Demo
request
or streamingRequest
to skip this validation. Credits to @mishig25