Quickstart¶
This is a quickstart guide to interacting with the RCSB PDB Sequence Coordinates API using the rcsb-api Python package.
Import¶
To import this module, use:
import rcsbapi.sequence
(Note: in the examples below we’ll import individual classes from the sequence module. Whether to import the whole module or individual classes is a matter of preference.)
Getting Started¶
The RCSB PDB Sequence Coordinates API allows querying for alignments between structural and sequence databases as well as protein positional annotations/features integrated from multiple resources. Alignment data is available for NCBI RefSeq (including protein and genomic sequences), UniProt and PDB sequences. Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse.
Alignments and positional features provided by this API include Experimental Structures from the PDB and select Computed Structure Models (CSMs). Alignments and positional features for CSMs can be requested using the same parameters as Experimental Structures providing CSM Ids.
The API supports requests using GraphQL, a language for API queries. This package simplifies generating queries in GraphQL syntax.
There are two main types of queries: Alignments and Annotations.
Alignments¶
Alignments queries request data about alignments between an object in a supported database to all objects of another supported database.
from rcsbapi.sequence import Alignments
# Fetch alignments between a UniProt Accession and PDB Entities
query = Alignments(
db_from="UNIPROT",
db_to="PDB_ENTITY",
query_id="P01112",
return_data_list=["query_sequence", "target_alignments", "alignment_length"]
)
result_dict = query.exec()
print(result_dict)
Argument |
Description |
|---|---|
|
From which structure/sequence database (see |
|
To which structure/sequence database (see |
|
Sequence identifier for database specified in |
|
Optional list of two integers that can be used to filter the alignment to a particular region (e.g., |
|
Data to fetch (e.g., |
|
Suppress warning message about field path autocompletion. Defaults to False. |
SequenceReference and Corresponding Database Identifiers¶
The table below describes the type of database identifiers used for each SequenceReference value.
|
Database Identifier Description |
Example |
|---|---|---|
|
NCBI RefSeq Chromosome Accession |
|
|
NCBI RefSeq Protein Accession |
|
|
UniProt Accession |
|
|
RCSB PDB Entity Id / CSM Entity Id |
|
|
RCSB PDB Instance Id / CSM Instance Id |
|
Annotations¶
Annotations queries request annotation data about a sequence (e.g., residue-level annotations/features). Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse.
from rcsbapi.sequence import Annotations
# Fetch all positional features for a particular PDB Instance
query = Annotations( # type: ignore
reference="PDB_INSTANCE",
query_id="2UZI.C",
sources=["UNIPROT"],
return_data_list=["target_id", "features"]
)
result_dict = query.exec()
print(result_dict)
Argument |
Description |
|---|---|
|
Structure/sequence database to request (see |
|
Sequence identifier for database specified in |
|
Enumerated list defining the annotation collections to be requested (possible values: |
|
Data to fetch (e.g., |
|
Optional list of |
|
Suppress warning message about field path autocompletion. Defaults to False. |
Additional Usage and Examples¶
For examples using other query types like GroupAlignments, GroupAnnotations, and GroupAnnotationsSummary or for examples using filters, check Additional Examples.
Jupyter Notebooks¶
A runnable jupyter notebook is available in notebooks/sequence_coord_quickstart.ipynb, or can be run online using Google Colab: