Oembed Parser Save

Extract oEmbed data from given webpage

Project README

oembed-extractor

Extract oEmbed content from given URL.

NPM CodeQL CI test Coverage Status

Demo

Install & Usage

Node.js

npm i @extractus/oembed-extractor

# pnpm
pnpm i @extractus/oembed-extractor

# yarn
yarn add @extractus/oembed-extractor
// es6 module
import { extract } from '@extractus/oembed-extractor'

const result = await extract('https://www.youtube.com/watch?v=x2bqscVkGxk')
console.log(result)

Deno

// deno < 1.28
import { extract } from 'https://esm.sh/@extractus/oembed-extractor'

// deno > 1.28
import { extract } from 'npm:@extractus/oembed-extractor'

Browser

import { extract } from "https://esm.sh/@extractus/oembed-extractor@latest"

Please check the examples for reference.

APIs

.extract()

Load and extract oembed data.

Syntax

extract(String url)
extract(String url, Object params)
extract(String url, Object params, Object fetchOptions)

Parameters

url required

URL of a valid oEmbed resource, e.g. https://www.youtube.com/watch?v=x2bqscVkGxk

params optional

Optional argument params can be useful when you want to specify some additional customizations.

Here are several popular params:

  • maxwidth: max width of embed size
  • maxheight: max height of embed size
  • theme: e.g, dark or light
  • lang: e.g, 'en', 'fr', 'cn', 'vi', etc

Note that some params are supported by these providers but not by the others. Please see the provider's oEmbed API docs carefully for exact information.

fetchOptions optional

fetchOptions is an object that can have the following properties:

  • headers: to set request headers
  • proxy: another endpoint to forward the request to
  • agent: a HTTP proxy agent
  • signal: AbortController signal or AbortSignal timeout to terminate the request

You can use this param to set request headers to fetch.

For example:

import { extract } from '@extractus/oembed-extractor'

const url = 'https://codepen.io/ndaidong/pen/LYmLKBw'
extract(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  }
})

You can also specify a proxy endpoint to load remote content, instead of fetching directly.

For example:

import { extract } from '@extractus/oembed-extractor'

const url = 'https://codepen.io/ndaidong/pen/LYmLKBw'
extract(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  },
  proxy: {
    target: 'https://your-secret-proxy.io/loadJson?url=',
    headers: {
      'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'
    }
  }
})

With the above setting, request will be forwarded to https://your-secret-proxy.io/loadJson?url={OEMBED_ENDPOINT}.

Another way to work with proxy is use agent option instead of proxy as below:

import { extract } from '@extractus/oembed-extractor'

import { HttpsProxyAgent } from 'https-proxy-agent'

const proxy = 'http://abc:[email protected]:31113'

const url = 'https://codepen.io/ndaidong/pen/LYmLKBw'

const oembed = await extract(url, null, {
  agent: new HttpsProxyAgent(proxy),
})
console.log('Run oembed-extractor with proxy:', proxy)
console.log(oembed)

For more info about https-proxy-agent, check its repo.

By default, there is no request timeout. You can use the option signal to cancel request at the right time.

The common way is to use AbortControler:

const controller = new AbortController()

// stop after 5 seconds
setTimeout(() => {
  controller.abort()
}, 5000)

const oembed = await extract(url, null, {
  signal: controller.signal,
})

A newer solution is AbortSignal's timeout() static method:

// stop after 5 seconds
const oembed = await extract(url, null, {
  signal: AbortSignal.timeout(5000),
})

For more info:

.setProviderList()

Apply a list of providers to use, overriding the default.

Syntax

setProviderList(Array providers)

Parameters

providers required

List of providers to apply.

For example:

import { setProviderList } from '@extractus/oembed-extractor'

const providers = [
  {
    provider_name: 'Alpha',
    provider_url: 'https://alpha.com',
    endpoints: [
      // endpoint definition here
    ]
  },
  {
    provider_name: 'Beta',
    provider_url: 'https://beta.com',
    endpoints: [
      // endpoint definition here
    ]
  }
]

setProviderList(providers)

Default list of resource providers is synchronized from oembed.com.

If you want to modify providers list, please make pull request on iamcal/oembed then create issue/pr here to ask for sync.

Facebook and Instagram

In order to work with the links from Facebook and Instagram, you need a reviewed Facebook's app with oEmbed Read permission.

When seeing a link from Facebook or Instagram, oembed-parser will look for environment variables FACEBOOK_APP_ID and FACEBOOK_CLIENT_TOKEN to retrieve oembed data using your app credentials.

For example:

export FACEBOOK_APP_ID=your_app_id
export FACEBOOK_CLIENT_TOKEN=your_client_token

npm run eval https://www.instagram.com/tv/CVlR5GFqF68/

Test

git clone https://github.com/extractus/oembed-extractor.git
cd oembed-extractor
npm i
npm test

oembed-extractor unit test

Quick evaluation

git clone https://github.com/extractus/oembed-extractor.git
cd oembed-extractor
npm i
npm run eval {URL_TO_PARSE_OEMBED}

License

The MIT License (MIT)

Support the project

If you find value from this open source project, you can support in the following ways:

Thank you.


Open Source Agenda is not affiliated with "Oembed Parser" Project. README Source: extractus/oembed-extractor

Open Source Agenda Badge

Open Source Agenda Rating