rcsb-api - Access RCSB PDB APIs and data from Python

The rcsb-api package provides a Python interface to RCSB PDB API services. Use it to search and fetch macromolecular structure data from RCSB PDB at RCSB.org.

Availability

Install it from PyPI via pip or uv:

pip install rcsb-api

# or, if using uv:
uv pip install rcsb-api

Or, download from GitHub

Overview

The rcsb-api package provides a simple Pythonic interface to the suite of RCSB PDB APIs for querying and fetching data in the PDB. Specifically, each API service is provided as a separate “module” (or sub-package) within the Python client, and offers the following set of key functionalities:

  • Search API module (rcsbapi.search):

    • Perform all search types available through the RCSB.org Advanced Search builder (e.g., full-text, attribute-based, sequence and structure similarity, sequence and structure motif)

    • Use simple Boolean logic to intuitively construct complex or nested queries

    • Upload custom structure files for structure similarity searches

    • Include computed structure models (CSMs) in search results

  • Data API module (rcsbapi.data):

    • Retrieve any subset of metadata, features, and/or annotations for a given list of PDB IDs (e.g., experimental method details, structural annotations, binding sites, etc.)

    • Easily fetch data for all structures across the archive

    • Simplified GraphQL query construction using a Python syntax

  • Sequence Coordinate API module (rcsbapi.sequence):

    • Query alignments between structural and sequence databases as well as protein positional annotations/features integrated from multiple resources

    • Alignment data is available for NCBI RefSeq (including protein and genomic sequences), UniProt and PDB sequences

    • Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse

  • Model API module (rcsbapi.model):

    • Provides access to molecular structure data (e.g., atomic coordinates) and related information (in mmCIF or BCIF formats)

    • Query for various structural data types, such as full structure, ligands, atoms, residue interactions, and more

    • Valuable for extracting out specific slices of a structure data file (not for bulk downloads, in which case see our download services)

Training Materials

  • Example usage for each module is available throughout these documentation pages (see sidebar menu navigation)

  • The project source code is available on GitHub, which also contains several example Jupyter notebooks for all supported API services.

  • Watch our webinar, Streamlining Access to RCSB PDB APIs with Python, which provides an introduction to our Search and Data APIs along with hands-on tutorials.

Contents

License

Code is licensed under the MIT license. See the LICENSE for details.

Citing

Please cite the rcsb-api package with the following reference:

Dennis W. Piehl, Brinda Vallat, Ivana Truong, Habiba Morsy, Rusham Bhatt, Santiago Blaumann, Pratyoy Biswas, Yana Rose, Sebastian Bittrich, Jose M. Duarte, Joan Segura, Chunxiao Bi, Douglas Myers-Turnbull, Brian P. Hudson, Christine Zardecki, Stephen K. Burley. rcsb-api: Python Toolkit for Streamlining Access to RCSB Protein Data Bank APIs, Journal of Molecular Biology, 2025. DOI: https://doi.org/10.1016/j.jmb.2025.168970

You should also cite the RCSB.org API services this package utilizes:

Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley, John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive, Journal of Molecular Biology, 2020. DOI: https://doi.org/10.1016/j.jmb.2020.11.003

Support

If you experience any issues installing or using the package, please submit an issue on GitHub and we will try to respond in a timely manner.