Pyalex Save

A Python library for OpenAlex (openalex.org)

Project README

PyAlex - a Python wrapper for OpenAlex

PyAlex

PyPI

PyAlex is a Python library for OpenAlex. OpenAlex is an index of hundreds of millions of interconnected scholarly papers, authors, institutions, and more. OpenAlex offers a robust, open, and free REST API to extract, aggregate, or search scholarly data. PyAlex is a lightweight and thin Python interface to this API. PyAlex tries to stay as close as possible to the design of the original service.

The following features of OpenAlex are currently supported by PyAlex:

Get single entities
Filter entities
Search entities
Group entities
Search filters
Select fields
Sample
Pagination
Autocomplete endpoint
N-grams
Authentication

We aim to cover the entire API, and we are looking for help. We are welcoming Pull Requests.

Key features

Pipe operations - PyAlex can handle multiple operations in a sequence. This allows the developer to write understandable queries. For examples, see code snippets.
Plaintext abstracts - OpenAlex doesn't include plaintext abstracts due to legal constraints. PyAlex can convert the inverted abstracts into plaintext abstracts on the fly.
Permissive license - OpenAlex data is CC0 licensed :raised_hands:. PyAlex is published under the MIT license.

Installation

PyAlex requires Python 3.8 or later.

pip install pyalex

Getting started

PyAlex offers support for all Entity Objects: Works, Authors, Sources, Institutions, Topics, Publishers, and Funders.

from pyalex import Works, Authors, Sources, Institutions, Topics, Publishers, Funders

The polite pool

The polite pool has much faster and more consistent response times. To get into the polite pool, you set your email:

import pyalex

pyalex.config.email = "[email protected]"

Max retries

By default, PyAlex will raise an error at the first failure when querying the OpenAlex API. You can set max_retries to a number higher than 0 to allow PyAlex to retry when an error occurs. retry_backoff_factor is related to the delay between two retry, and retry_http_codes are the HTTP error codes that should trigger a retry.

from pyalex import config

config.max_retries = 0
config.retry_backoff_factor = 0.1
config.retry_http_codes = [429, 500, 503]

Get single entity

Get a single Work, Author, Source, Institution, Concept, Topic, Publisher or Funder from OpenAlex by the OpenAlex ID, or by DOI or ROR.

Works()["W2741809807"]

# same as
Works()["https://doi.org/10.7717/peerj.4375"]

The result is a Work object, which is very similar to a dictionary. Find the available fields with .keys().

For example, get the open access status:

Works()["W2741809807"]["open_access"]

{'is_oa': True, 'oa_status': 'gold', 'oa_url': 'https://doi.org/10.7717/peerj.4375'}

The previous works also for Authors, Venues, Institutions, Concepts and Topics

Authors()["A2887243803"]
Authors()["https://orcid.org/0000-0002-4297-0502"]  # same

Get random

Get a random Work, Author, Source, Institution, Concept, Topic, Publisher or Funder.

Works().random()
Authors().random()
Sources().random()
Institutions().random()
Concepts().random()
Topics().random()
Publishers().random()
Funders().random()

Get abstract

Only for Works. Request a work from the OpenAlex database:

w = Works()["W3128349626"]

All attributes are available like documented under Works, as well as abstract (only if abstract_inverted_index is not None). This abstract made human readable is create on the fly.

w["abstract"]

'Abstract To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.'

Please respect the legal constraints when using this feature.

Get lists of entities

results = Works().get()

For lists of entities, you can also count the number of records found instead of returning the results. This also works for search queries and filters.

Works().count()
# 10338153

For lists of entities, you can return the result as well as the metadata. By default, only the results are returned.

results, meta = Topics().get(return_meta=True)

print(meta)
{'count': 65073, 'db_response_time_ms': 16, 'page': 1, 'per_page': 25}

Filter records

Works().filter(publication_year=2020, is_oa=True).get()

which is identical to:

Works().filter(publication_year=2020).filter(is_oa=True).get()

Some attribute filters are nested and separated with dots by OpenAlex. For example, filter on authorships.institutions.ror.

In case of nested attribute filters, use a dict to build the query.

Works()
  .filter(authorships={"institutions": {"ror": "04pp8hn57"}})
  .get()

Search entities

OpenAlex reference: The search parameter

Works().search("fierce creatures").get()

OpenAlex reference: The search filter

Authors().search_filter(display_name="einstein").get()

Works().search_filter(title="cubist").get()

Funders().search_filter(display_name="health").get()

Sort entity lists

OpenAlex reference: Sort entity lists.

Works().sort(cited_by_count="desc").get()

Select

OpenAlex reference: Select fields.

Works().filter(publication_year=2020, is_oa=True).select(["id", "doi"]).get()

Sample

OpenAlex reference: Sample entity lists.

Works().sample(100, seed=535).get()

Logical expressions

OpenAlex reference: Logical expressions

Inequality:

Sources().filter(works_count=">1000").get()

Negation (NOT):

Institutions().filter(country_code="!us").get()

Intersection (AND):

Works().filter(institutions={"country_code": ["fr", "gb"]}).get()

# same
Works().filter(institutions={"country_code": "fr"}).filter(institutions={"country_code": "gb"}).get()

Addition (OR):

Works().filter(institutions={"country_code": "fr|gb"}).get()

Paging

OpenAlex offers two methods for paging: basic (offset) paging and cursor paging. Both methods are supported by PyAlex.

Cursor paging (default)

Use the method paginate() to paginate results. Each returned page is a list of records, with a maximum of per_page (default 25). By default, paginates argument n_max is set to 10000. Use None to retrieve all results.

from pyalex import Authors

pager = Authors().search_filter(display_name="einstein").paginate(per_page=200)

for page in pager:
    print(len(page))

Looking for an easy method to iterate the records of a pager?

from itertools import chain
from pyalex import Authors

query = Authors().search_filter(display_name="einstein")

for record in chain(*query.paginate(per_page=200)):
    print(record["id"])

Basic paging

See limitations of basic paging in the OpenAlex documentation.

from pyalex import Authors

pager = Authors().search_filter(display_name="einstein").paginate(method="page", per_page=200)

for page in pager:
    print(len(page))

Autocomplete

OpenAlex reference: Autocomplete entities.

Autocomplete a string:

from pyalex import autocomplete

autocomplete("stockholm resilience centre")

Autocomplete a string to get a specific type of entities:

from pyalex import Institutions

Institutions().autocomplete("stockholm resilience centre")

You can also use the filters to autocomplete:

from pyalex import Works

r = Works().filter(publication_year=2023).autocomplete("planetary boundaries")

Get N-grams

OpenAlex reference: Get N-grams.

Works()["W2023271753"].ngrams()

Code snippets

A list of awesome use cases of the OpenAlex dataset.

Cited publications (referenced works)

from pyalex import Works

# the work to extract the referenced works of
w = Works()["W2741809807"]

Works()[w["referenced_works"]]

Get works of a single author

from pyalex import Works

Works().filter(author={"id": "A2887243803"}).get()

Dataset publications in the global south

from pyalex import Works

# the work to extract the referenced works of
w = Works() \
  .filter(institutions={"is_global_south":True}) \
  .filter(type="dataset") \
  .group_by("institutions.country_code") \
  .get()

Most cited publications in your organisation

from pyalex import Works

Works() \
  .filter(authorships={"institutions": {"ror": "04pp8hn57"}}) \
  .sort(cited_by_count="desc") \
  .get()

Experimental

Authentication

OpenAlex experiments with authenticated requests at the moment. Authenticate your requests with

import pyalex

pyalex.config.api_key = "<MY_KEY>"

Alternatives

R users can use the excellent OpenAlexR library.

License

MIT

Contact

This library is a community contribution. The authors of this Python library aren't affiliated with OpenAlex.

Feel free to reach out with questions, remarks, and suggestions. The issue tracker is a good starting point. You can also email me at [email protected].

Open Source Agenda is not affiliated with "Pyalex" Project. README Source: J535D165/pyalex

Stars

Open Issues

Last Commit

2 weeks ago

Repository

J535D165/pyalex

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/pyalex"><img src="https://www.opensourceagenda.com/projects/pyalex/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022

Pyalex Save

PyAlex

Key features

Installation

Getting started

The polite pool

Max retries

Get single entity

Get random

Get abstract

Get lists of entities

Filter records

Nested attribute filters

Search entities

Search filter

Sort entity lists

Select

Sample

Logical expressions

Paging

Cursor paging (default)

Basic paging

Autocomplete

Get N-grams

Code snippets

Cited publications (referenced works)

Get works of a single author

Dataset publications in the global south

Most cited publications in your organisation

Experimental

Authentication

Alternatives

License

Contact

Open Source Agenda Badge

From the blog

How to Choose Which Programming Language to Learn First?

From the blog

How to Choose Which Programming Language to Learn First?