Quickstart

Installation

Get it from PyPI:

pip install rcsb-api

Or, download from GitHub

Import

To import this package, use:

from rcsbapi.data import DataSchema, DataQuery

Getting Started

The RCSB PDB Data API supports requests using GraphQL, a language for API queries. This package simplifies generating queries in GraphQL syntax.

To generate a query in this package, you would create a DataQuery object. The query must be executed using the .exec() method, which will return the JSON response as well as store the response as an attribute of the DataQuery object. From the object, you can access the Data API response, get an interactive editor link, or access the arguments used to create the query.

The package is able to automatically build queries based on the “input_type” and path segment passed into “return_data_list”. If using this package in code intended for long-term use, it’s recommended to use the fully qualified path (a complete path from input type to the final data field). When autocompletion is being used, a WARNING message will be printed out as a reminder.

To suppress the warning, either use the fully qualified path (as in the below example) or set the suppress_autocomplete_warning argument to True. To suppress the warning for all queries, set the SUPPRESS_AUTOCOMPLETE_WARNING flag to True (see custom configuration).

from rcsbapi.data import DataQuery as Query

query = Query(
    input_type="entries",
    input_ids=["4HHB"],
    return_data_list=["exptl.method"]
)

result_dict = query.exec()
print(result_dict)
# print(query.get_response()) would be equivalent

Data is returned in JSON format

{
  "data": {
    "entries": [
      {
        "rcsb_id": "4HHB",
        "exptl": [
          {
            "method": "X-RAY DIFFRACTION"
          }
        ]
      }
    ]
  }
}

GraphQL

This is the equivalent query in GraphQL syntax.

{
  entries(entry_ids: ["4HHB"]) {  # returns type "CoreEntry"
    exptl {  # returns type "Exptl"
      method  # returns a scalar (string)
    }
  }
}

GraphQL is built on “types” and their associated “fields”. All types and their fields are defined in a “schema”. An example of a type in our schema is “CoreEntry” and a field under CoreEntry is “exptl” (experimental). Upon initialization, the Data API package fetches the schema from the RCSB PDB website (See Implementation Details for more).

In GraphQL, you must begin your query at specific fields. These are fields like entries, polymer_entities, and polymer_entity_instances (see full list here). Each field can return a scalar (e.g. string, integer) or a type. Every query must ultimately request scalar value(s), which can be seen in the example query below. As shown in the example, fields are explicitly included in queries while types are implicit. Types are named in CamelCase (CoreEntry) while fields are in snake case (exptl or audit_author).

Autocompletion of Queries

One way this package simplifies making requests is by adding fields that return scalars into the generated query if you request a field that returns a type.

from rcsbapi.data import DataQuery as Query

query = Query(
    input_type="entries",
    input_ids=["4HHB"],
    # Requesting "exptl" will return a query requesting exptl.method, exptl.details, etc
    return_data_list=["exptl"]
)
result_dict = query.exec()
print(result_dict)

This creates a valid query even though “exptl” doesn’t return a scalar. However, the resulting query will be more verbose, requesting all scalar fields under “exptl” (see return_data_list).

Jupyter Notebooks

Important changes to Jupyter behavior in Python 3.14+

Warning: Usage of the Data API module in Jupyter environments has changed starting in Python 3.14+.

When working in Jupyter with Python 3.14+, calls to .exec() must be explicitly awaited, for example:

from rcsbapi.data import DataQuery

query = DataQuery(input_type="entries", input_ids=[...], return_data_list=[...])

results = await query.exec()

This change does not impact code run in standard Python scripts (of any Python version); it only affects code run in Jupyter that uses Python 3.14 or greater.

A notebook briefly summarizing the readthedocs is available in notebooks/data_quickstart.ipynb or online through Google Colab Open In Colab

Another notebook using both Search and Data API packages for a COVID-19 related example is available in notebooks/search_data_workflow.ipynb or online through Google Colab Open In Colab.