Quickstart¶
This is a quickstart guide to interacting with the RCSB PDB ModelServer API using the rcsb-api Python package.
Getting Started¶
The RCSB ModelServer API provides access to molecular structure data (e.g., atomic coordinates) and related information from PDB structures. The Model Server API allows you to extract out specific structural components of a given structure, such as the full structure coordinates or the coordinates of particular chains, ligands, or surrounding/interacting residues or ligands, and more.
The API supports queries for Experimental Structures. (Support for Computed Structure Models (CSMs) is not yet available.)
Model API Query Construction¶
Query Methods¶
The ModelQuery object supports the following types of queries/methods:
Method |
Description |
|---|---|
|
Retrieve ligand coordinates from a given structure |
|
Fetch the coordinates of any part of a given structure (e.g., a particular entity, chain, or ligand) |
|
Retrieve the coordinates of all residues and ligands within a specified distance from a given residue or ligand (takes crystal symmetry into account) |
|
Retrieve the coordinates of all residues and ligands within a specified distance from a given residue or ligand (ignores crystal symmetry) |
|
Retrieve the coordinates of all ligands within a specified distance from a given residue or ligand (taking crystal symmetry into account) |
|
Extract the coordinates of a structural assembly (selected group of instances or “chains”) from an entry |
|
Fetch the full structure coordinates for a given entry |
|
Compute crystal symmetry mates for a given structure |
|
Fetch data for multiple structures |
Query Arguments¶
The specific set of arguments available depend on the particular ModelQuery method being used above. However, in general, most methods will accept the following common arguments:
Argument |
Description |
Default |
|---|---|---|
|
Structure identifier (e.g., |
— |
|
Response encoding format. Supported values: |
|
|
Whether to download the response to a file |
|
|
Output filename for downloaded data |
|
|
Directory where downloaded files will be saved |
|
|
Whether to gzip-compress the downloaded file |
|
|
Whether to include all metadata categories from the source entry file |
|
Depending on the particular method being used and the level of granularity of the structure you want to fetch, one or more of the following can be specified:
Argument |
Description |
Default |
Example |
|---|---|---|---|
|
PDB-assigned entity ID for the structural component |
|
|
|
PDB-assigned asymmetric (chain) ID |
|
|
|
Author-assigned asymmetric (chain) ID |
|
|
|
PDB-assigned chemical component or ligand ID |
|
|
|
Author-assigned chemical component or ligand ID |
|
|
|
PDB-assigned sequence number of the residue of interest |
|
|
|
Author-assigned sequence number of the residue of interest |
|
|
Ligand Data¶
Use the .get_ligand() method to fetch ligand-related data (metadata and coordinates) within a given structure.
Note that by default this returns only the first instance of the specified ligand (e.g., if there are 10 HEM ligands in the structure, only the first one is returned). If you want a specific instance of the ligand, you can specify label_asym_id and/or label_entity_id. If you want all occurrences of a specific ligand, you should use the .get_atoms() method below.
from rcsbapi.model import ModelQuery
# Fetch the first occurrence of the `HEM` ligand for entry "4HHB"
query = ModelQuery()
result = query.get_ligand(entry_id="4HHB", label_comp_id="HEM", download=True, filename="4HHB_HEM_ligand.cif", file_directory="model-output")
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The entity label for the ligand |
|
The asymmetric ID for the ligand |
|
The author asymmetric ID |
|
The label for the component |
|
The author component ID |
|
The label sequence ID |
|
The author sequence ID |
|
The insertion code (optional) |
|
The label for the atom |
|
The author atom ID |
|
The chemical type symbol for the ligand (optional) |
|
The model numbers to fetch (optional) |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: False) |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: False) |
Atoms Data¶
Use the .get_atoms() method to fetch atom-level data (coordinates and metadata) from a given structure. This can be used to fetch all atom_site data for a particular component using label_comp_id (e.g., all HEM ligands, all water molecules HOH, or all CYS residues), a given entity, a specific residue in the sequence, and/or a combination of these criteria.
from rcsbapi.model import ModelQuery
# Fetch the metadata and `atom_site` coordinates of ALL occurrence of `HEM` in entry "4HHB"
query = ModelQuery()
result = query.get_atoms(entry_id="4HHB", label_comp_id="HEM", download=True, filename="4HHB_HEM_atoms.cif", file_directory="model-output")
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The entity label for the atom |
|
The asymmetric ID for the atom |
|
The author asymmetric ID |
|
The label for the component |
|
The author component ID |
|
The label sequence ID |
|
The author sequence ID |
|
The insertion code (optional) |
|
The label for the atom |
|
The author atom ID |
|
The chemical type symbol for the atom |
|
The model numbers to fetch (optional) |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: False) |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: False) |
Residue Interaction¶
Use the .get_residue_interaction() method to fetch data (metadata and coordinates) on the surrounding residues and ligands of a given ligand or residue. If you only provide the label_comp_id, the server will return the interaction data for all occurrences of the component. This method takes crystal symmetry into account (returned data includes _molstar_atom_site_operator_mapping).
from rcsbapi.model import ModelQuery
# Fetch surrounding residues for ALL `HEM` ligands in entry "4HHB"
query = ModelQuery()
result = query.get_residue_interaction(entry_id="4HHB", label_comp_id="HEM", radius=5.0, download=True, file_directory="model-output")
print(result)
# Fetch surrounding residues for `HEM` chain `E` in entry "4HHB"
query = ModelQuery()
result = query.get_residue_interaction(
entry_id="4HHB",
label_comp_id="HEM",
label_asym_id="E",
radius=5.0,
download=True,
filename="4HHB_HEM_E_residue_interaction.cif",
file_directory="model-output"
)
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The entity label for the residue interaction |
|
The asymmetric ID for the residue interaction |
|
The author asymmetric ID |
|
The label for the component |
|
The author component ID |
|
The label sequence ID |
|
The author sequence ID |
|
The insertion code (optional) |
|
The label for the atom |
|
The author atom ID |
|
The chemical type symbol for the residue |
|
The interaction radius for residue interaction (default: 5.0) |
|
The assembly name (optional) |
|
The model numbers to fetch (optional) |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: False) |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: False) |
Residue Surroundings¶
Use the .get_residue_surroundings() method to fetch data (metadata and coordinates) on the surrounding residues and ligands of a given ligand or residue. If you only provide the label_comp_id, the server will return the interaction data for all occurrences of the component. Similar to Residue Interaction, but doesn’t take crystal symmetry into account (returned data does not include _molstar_atom_site_operator_mapping).
from rcsbapi.model import ModelQuery
# Fetch surrounding residues for `HEM` chain `E` in entry "4HHB"
query = ModelQuery()
result = query.get_residue_surroundings(
entry_id="4HHB",
label_comp_id="HEM",
label_asym_id="E",
radius=5.0,
download=True,
filename="4HHB_HEM_E_residue_surroundings.cif",
file_directory="model-output"
)
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The entity label for the residue |
|
The asymmetric ID for the residue |
|
The author asymmetric ID |
|
The label for the component |
|
The author component ID |
|
The label sequence ID |
|
The author sequence ID |
|
The insertion code (optional) |
|
The label for the atom |
|
The author atom ID |
|
The chemical type symbol for the residue |
|
The interaction radius for residue surroundings (default: 5.0) |
|
The assembly name (optional) |
|
The model numbers to fetch (optional) |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: False) |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: False) |
Surrounding Ligands¶
Use the .get_surrounding_ligands() method to fetch data on ligands that are within a certain proximity of a residue in a structure. This method takes crystal symmetry into account (returned data includes _molstar_atom_site_operator_mapping).
from rcsbapi.model import ModelQuery
# Fetch surrounding ligands for `ALA 284` in entry "1TQN"
query = ModelQuery()
result = query.get_surrounding_ligands(
entry_id="1TQN",
label_comp_id="ALA",
label_seq_id=284,
radius=5.0,
download=True,
file_directory="model-output"
)
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The entity label for the ligand |
|
The asymmetric ID for the ligand |
|
The author asymmetric ID |
|
The label for the ligand component |
|
The author component ID |
|
The label sequence ID for the ligand |
|
The author sequence ID for the ligand |
|
The insertion code (optional) |
|
The label for the ligand atom |
|
The author atom ID for the ligand |
|
The chemical type symbol for the ligand |
|
Whether to exclude water molecules from the surrounding ligands (default: |
|
The interaction radius for surrounding ligands (default: 5.0) |
|
The assembly name (optional) |
|
The model numbers to fetch (optional) |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: |
Assembly Data¶
Use the .get_assembly() method to extract a structural assembly (select group of instances or “chains”) from an entry.
from rcsbapi.model import ModelQuery
# Fetch assembly "3" for the entry "13PK"
query = ModelQuery()
result = query.get_assembly(entry_id="13PK", name="3", download=True, file_directory="model-output")
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The assembly id (default: “1”) |
|
The model numbers to fetch (optional) |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: False) |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: False) |
Full Structure¶
Use the .get_full_structure() method to fetch complete structural data for a given entry.
from rcsbapi.model import ModelQuery
# Fetch the full structure for the entry "2HHB" and store content in `result` variable
query = ModelQuery()
result = query.get_full_structure(entry_id="2HHB")
print(result[:500])
# Or, download the structure:
result = query.get_full_structure(
entry_id="2HHB",
encoding="cif",
download=True,
file_directory="model-output"
)
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The model numbers to fetch (optional). If set, only include atoms with the corresponding |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: False) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: False) |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
Symmetry Mates¶
Use the .get_symmetry_mates() method to compute crystal symmetry mates within a specified radius.
from rcsbapi.model import ModelQuery
# Generate the symmetry mates (unit cell replications) for the entry "1TQN"
query = ModelQuery()
result = query.get_symmetry_mates(entry_id="1TQN", download=True, file_directory="model-output")
print(result)
Argument |
Description |
|---|---|
|
The ID of the structure (e.g., “2HHB”) |
|
The interaction radius for symmetry mates (default: 5.0) |
|
The model numbers to fetch (optional) |
|
The encoding format for the response ( |
|
Whether to copy all categories (default: False) |
|
Allows to control how the provided data source ID maps to input file (as specified by the server instance config) (default: None) |
|
Apply any transformations (optional) |
|
Whether to download the file (True/False) |
|
The name of the file to save |
|
Directory to save the file |
|
Whether to compress the file (default: False) |
Working with Multiple Structures¶
Let’s say you want to download or fetch data for several structures at once. You can do so by providing a list to the .get_multiple_structures() method:
# List of structure IDs to query
entry_ids = ["1CBS", "4HHB"]
# Fetch multiple structures (e.g., "full" type) and save the result
results = query.get_multiple_structures(
entry_ids,
query_type="full",
encoding="cif",
download=True,
compress_gzip=True,
file_directory="model-output"
)
print(results)
The .get_multiple_structures() method is ammenable to any of the available types of queries via the query_type argument:
Query type method |
Corresponding |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ModelQuery Defaults¶
The ModelQuery class supports defining default values for common parameters at instantiation. These include:
encodingfile_directorydownloadcompress_gzip
These default values will be automatically applied to all subsequent method calls (e.g., .get_full_structure(), .get_ligand(), etc.) unless you explicitly override them.
from rcsbapi.model import ModelQuery
# Set defaults during instantiation
query = ModelQuery(
encoding="cif",
file_directory="model-output",
download=True,
compress_gzip=False
)
# Now run a query without repeating those arguments
result = query.get_full_structure(
entry_id="2HHB",
filename="2HHB_full_structure.cif"
)
print(result)