Quickstart

This is a quickstart guide to interacting with the RCSB PDB Sequence Coordinates API using the rcsb-api Python package.

Installation

Get it from PyPI:

pip install rcsb-api

Or, download from GitHub

Import

To import this module, use:

import rcsbapi.sequence

(Note: in the examples below we’ll import individual classes from the sequence module. Whether to import the whole module or individual classes is a matter of preference.)

Getting Started

The RCSB PDB Sequence Coordinates API allows querying for alignments between structural and sequence databases as well as protein positional annotations/features integrated from multiple resources. Alignment data is available for NCBI RefSeq (including protein and genomic sequences), UniProt and PDB sequences. Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse.

Alignments and positional features provided by this API include Experimental Structures from the PDB and select Computed Structure Models (CSMs). Alignments and positional features for CSMs can be requested using the same parameters as Experimental Structures providing CSM Ids.

The API supports requests using GraphQL, a language for API queries. This package simplifies generating queries in GraphQL syntax.

There are two main types of queries: Alignments and Annotations.

Alignments

Alignments queries request data about alignments between an object in a supported database to all objects of another supported database.

from rcsbapi.sequence import Alignments

# Fetch alignments between a UniProt Accession and PDB Entities
query = Alignments(
    db_from="UNIPROT",
    db_to="PDB_ENTITY",
    query_id="P01112",
    return_data_list=["query_sequence", "target_alignments", "alignment_length"]
)
result_dict = query.exec()
print(result_dict)

Argument

Description

db_from

From which structure/sequence database (see SequenceReference table below for possible values)

db_to

To which structure/sequence database (see SequenceReference table below for possible values)

query_id

Sequence identifier for database specified in db_from (see SequenceReference table below for examples)

range

Optional list of two integers that can be used to filter the alignment to a particular region (e.g., [1, 100])

return_data_list

Data to fetch (e.g., ["query_sequence", "target_alignments", "alignment_length"])

suppress_autocomplete_warning

Suppress warning message about field path autocompletion. Defaults to False.

SequenceReference and Corresponding Database Identifiers

The table below describes the type of database identifiers used for each SequenceReference value.

SequenceReference

Database Identifier Description

Example

NCBI_GENOME

NCBI RefSeq Chromosome Accession

NC_000001

NCBI_PROTEIN

NCBI RefSeq Protein Accession

NP_789765

UNIPROT

UniProt Accession

P01112

PDB_ENTITY

RCSB PDB Entity Id / CSM Entity Id

2UZI_3 / AF_AFP68871F1_1

PDB_INSTANCE

RCSB PDB Instance Id / CSM Instance Id

2UZI.C / AF_AFP68871F1.A

Annotations

Annotations queries request annotation data about a sequence (e.g., residue-level annotations/features). Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse.

from rcsbapi.sequence import Annotations

# Fetch all positional features for a particular PDB Instance
query = Annotations(  # type: ignore
    reference="PDB_INSTANCE",
    query_id="2UZI.C",
    sources=["UNIPROT"],
    return_data_list=["target_id", "features"]
)
result_dict = query.exec()
print(result_dict)

Argument

Description

reference

Structure/sequence database to request (see SequenceReference table above for possible values)

query_id

Sequence identifier for database specified in reference (see SequenceReference table above for examples)

sources

Enumerated list defining the annotation collections to be requested (possible values: "UNIPROT", "PDB_ENTITY", "PDB_INSTANCE", "PDB_INTERFACE")

return_data_list

Data to fetch (e.g., ["target_id", "features"])

filters

Optional list of AnnotationFilterInput that can be used to select what annotations will be retrieved. See Additional Examples.

suppress_autocomplete_warning

Suppress warning message about field path autocompletion. Defaults to False.

Additional Usage and Examples

For examples using other query types like GroupAlignments, GroupAnnotations, and GroupAnnotationsSummary or for examples using filters, check Additional Examples.

Jupyter Notebooks

A runnable jupyter notebook is available in notebooks/sequence_coord_quickstart.ipynb, or can be run online using Google Colab: Open In Colab