# Quickstart This is a quickstart guide to interacting with the RCSB PDB [Sequence Coordinates API](https://sequence-coordinates.rcsb.org/#sequence-coordinates-api) using the *rcsb-api* Python package. ## Installation Get it from PyPI: pip install rcsb-api Or, download from [GitHub](https://github.com/rcsb/py-rcsb-api) ## Import To import this module, use: ```python import rcsbapi.sequence ``` (***Note:*** in the examples below we'll import individual classes from the `sequence` module. Whether to import the whole module or individual classes is a matter of preference.) ## Getting Started The [RCSB PDB Sequence Coordinates API](https://sequence-coordinates.rcsb.org/#sequence-coordinates-api) allows querying for alignments between structural and sequence databases as well as protein positional annotations/features integrated from multiple resources. Alignment data is available for NCBI [RefSeq](https://www.ncbi.nlm.nih.gov/refseq/) (including protein and genomic sequences), UniProt and PDB sequences. Protein positional features are integrated from [UniProt](https://www.uniprot.org/), [CATH](https://www.cathdb.info/), [SCOPe](https://scop.berkeley.edu/) and [RCSB PDB](https://www.rcsb.org/) and collected from the [RCSB PDB Data Warehouse](https://data.rcsb.org/#data-api). Alignments and positional features provided by this API include Experimental Structures from the [PDB](https://www.rcsb.org/) and [select Computed Structure Models (CSMs)](https://www.rcsb.org/docs/general-help/computed-structure-models-and-rcsborg#what-csms-are-available). Alignments and positional features for CSMs can be requested using the same parameters as Experimental Structures providing CSM Ids. The API supports requests using [GraphQL](https://graphql.org/), a language for API queries. This package simplifies generating queries in GraphQL syntax. There are two main types of queries: `Alignments` and `Annotations`. ## Alignments `Alignments` queries request data about alignments between an object in a supported database to all objects of another supported database. ```python from rcsbapi.sequence import Alignments # Fetch alignments between a UniProt Accession and PDB Entities query = Alignments( db_from="UNIPROT", db_to="PDB_ENTITY", query_id="P01112", return_data_list=["query_sequence", "target_alignments", "alignment_length"] ) result_dict = query.exec() print(result_dict) ``` | Argument | Description| | ----------|------------| |`db_from` |From which structure/sequence database (see [`SequenceReference` table below](#sequencereference-and-corresponding-database-identifiers) for possible values)| |`db_to` |To which structure/sequence database (see [`SequenceReference` table below](#sequencereference-and-corresponding-database-identifiers) for possible values)| |`query_id` |Sequence identifier for database specified in `db_from` (see [`SequenceReference` table below](#sequencereference-and-corresponding-database-identifiers) for examples)| |`range` |Optional list of two integers that can be used to filter the alignment to a particular region (e.g., `[1, 100]`)| |`return_data_list`|Data to fetch (e.g., `["query_sequence", "target_alignments", "alignment_length"]`)| |`suppress_autocomplete_warning`|Suppress warning message about field path autocompletion. Defaults to False.| ### SequenceReference and Corresponding Database Identifiers The table below describes the type of database identifiers used for each `SequenceReference` value. | `SequenceReference` | Database Identifier Description | Example | |---------------------|-----------------------------------------------|--------------------------------| | `NCBI_GENOME` | NCBI RefSeq Chromosome Accession | `NC_000001` | | `NCBI_PROTEIN` | NCBI RefSeq Protein Accession | `NP_789765` | | `UNIPROT` | UniProt Accession | `P01112` | | `PDB_ENTITY` | RCSB PDB Entity Id / CSM Entity Id | `2UZI_3` / `AF_AFP68871F1_1` | | `PDB_INSTANCE` | RCSB PDB Instance Id / CSM Instance Id | `2UZI.C` / `AF_AFP68871F1.A` | ## Annotations `Annotations` queries request annotation data about a sequence (e.g., residue-level annotations/features). Protein positional features are integrated from [UniProt](https://www.uniprot.org/), [CATH](https://www.cathdb.info/), [SCOPe](https://scop.berkeley.edu/) and [RCSB PDB](https://www.rcsb.org/) and collected from the [RCSB PDB Data Warehouse](https://data.rcsb.org/#data-api). ```python from rcsbapi.sequence import Annotations # Fetch all positional features for a particular PDB Instance query = Annotations( # type: ignore reference="PDB_INSTANCE", query_id="2UZI.C", sources=["UNIPROT"], return_data_list=["target_id", "features"] ) result_dict = query.exec() print(result_dict) ``` | Argument | Description| | ----------|------------| |`reference`|Structure/sequence database to request (see [`SequenceReference` table above](#sequencereference-and-corresponding-database-identifiers) for possible values)| |`query_id` |Sequence identifier for database specified in `reference` (see [`SequenceReference` table above](#sequencereference-and-corresponding-database-identifiers) for examples)| |`sources` |Enumerated list defining the annotation collections to be requested (possible values: `"UNIPROT"`, `"PDB_ENTITY"`, `"PDB_INSTANCE"`, `"PDB_INTERFACE"`)| |`return_data_list`|Data to fetch (e.g., `["target_id", "features"]`)| |`filters`|Optional list of `AnnotationFilterInput` that can be used to select what annotations will be retrieved. See [Additional Examples](additional_examples.md).| |`suppress_autocomplete_warning`|Suppress warning message about field path autocompletion. Defaults to False.| ## Additional Usage and Examples For examples using other query types like `GroupAlignments`, `GroupAnnotations`, and `GroupAnnotationsSummary` or for examples using filters, check [Additional Examples](additional_examples.md). ## Jupyter Notebooks A runnable jupyter notebook is available in [notebooks/sequence_coord_quickstart.ipynb](https://github.com/rcsb/py-rcsb-api/blob/master/notebooks/sequence_coord_quickstart.ipynb), or can be run online using Google Colab: Open In Colab