Schemas
There are four schemas in STRING that describes different aspects of the content.
schema |
description |
evidence |
contains info of the underlying evidence for interactions. |
homology |
blast hits used to propagate evidence to other species by means of homology. |
items |
info about entries (protein names, species, orthgroups, etc.). |
network |
the interactions and their scores. |
Table: evidence.abstracts
Abstracts are used for showing the text bodies in the text-mining view.
field |
description |
abstract_id |
abstract identifier (e.g. "PMID019234442", "OMIM000100070", etc.). |
publication_date |
date of publication (e.g. "2009"). |
publication_source |
source. the name of journal(e.g. "Nature"). |
linkout_url |
link to publication. |
title |
title of publication. |
body |
the abstract of the publication. |
Table: evidence.collections
The different sources of data from where STRING imports data (for the channels 'experiments' and 'databases').
field |
description |
collection_id |
the name of the data source (e.g. "dip"). |
pubmed_id |
the pubmed identification number (e.g. "14681454"). |
comment |
short description of the data and the date it was imported. |
Table: evidence.evidence_transfers
Evidence of interaction propagated across species using homology.
field |
description |
target_protein_id_a |
interactor A that have received evidence from source. |
target_protein_id_b |
interactor B that have received evidence from source. |
transfer_score_c1 |
fraction of transfer score for interactor A. depends on how ambiguous the homology is. range from 0 to 1, higher score is better. |
transfer_score_c2 |
fraction of transfer score for interactor A. |
source_ascore |
coexpression score of the source interaction. |
source_escore |
experimental score of the source interaction. |
source_dscore |
database score of the source interaction. |
source_tscore |
textmining score of the source interaction. |
Table: evidence.fusion_evidence
Homology propagated evidence for fusion evidence, which originates from only one source protein.
field |
description |
target_protein_id_a |
interactor A that have received evidence from source. |
target_protein_id_b |
interactor A that have received evidence from source. |
source_protein |
the source from which evidence for the interaction have have been transferred. |
source_species |
species_id for the species of the source of the evidence. |
transfer_score_c1 |
homology fraction of score. |
transfer_score_c2 |
homology fraction of score. |
fusion_score |
fusion score of the source interaction. |
Table: evidence.items_abstracts
The abstracts in which a protein is mentioned.
field |
description |
protein_id |
internal protein identifier. |
abstract_id |
abstract identifier. |
name |
name of gene in abstract. |
abstract_length |
length of abstract in number of characters. |
mesh_id |
identifier for MeSH (i.e. the controlled vocabulary of NCBI for different names that refer to the same concept). |
Table: evidence.orthgroups_abstracts
The abstracts that mention any member of a orthologous group (redundant table w.r.t. evidence.items_abstracts)
field |
description |
orthgroup_id |
internal orthologs group identifier. |
abstract_id |
abstract identifier. |
Table: evidence.orthgroups_sets
The sets that support evidence for interaction between orthologs groups.
field |
description |
orthgroup_id |
internal orthologs group identifier. |
set_id |
the identifier to external repository from where evidence was gathered |
is_database_set |
this flag is set to 'true' when the set concerns the 'database' channel in STRING (as opposed to the 'experiments' channel) |
Table: evidence.sets
The sets of evidence that support interactions.
field |
description |
set_id |
the identifier to external repository from where evidence was gathered (e.g. "BCRT1314"). |
collection_id |
the external repository (e.g. "biocarta"). |
title |
type of evidence (e.g. "curated pathway"). |
comment |
auxiliary information of the set. |
url |
link to original data. |
Table: evidence.sets_items
The members in the evidence sets. An interaction exists if two lines have the same set_id.
set_id | identifier to a single of proteins in an external repository (protein complex, pathway or binary pair).
field |
description |
protein_id |
internal protein identifier. |
species_id |
taxonomy identifier. |
is_database_set |
this flag is set to 'true' when the set concerns the 'database' channel in STRING (as opposed to the 'experiments' channel). |
Table: evidence.sets_pubmedrefs
Supporting papers from external repository.
field |
description |
set_id |
identifier to external repository. |
pubmed_id |
pubmed identifier. |
Table: homology.best_hit_per_species
Derived homology information that is used for transfer of evidence.
field |
description |
protein_id |
internal protein identifier. |
species_id |
taxonomy identifier of the species to which the alignment is found. |
nr_high_scoring_hits |
number of proteins in the species that have blast bit scores higher than 60. |
best_hit_protein_id |
the id of the protein with the highest scoring alignment. |
best_hit_identifier |
the string preferred name of the highest scoring alignment. |
best_hit_bitscore |
the bitscore of the highest scoring alignment. |
best_hit_normscore |
score normalized by the self-hit of the longer protein. |
best_hit_alignment_length |
the length of the alignment. |
Table: homology.blast_data
Raw data of all-against-all BLAST alignments.
field |
description |
protein_id_a |
internal identifier of protein A. |
protein_id_b |
internal identifier of protein B. |
bitscore |
bitscore of alignment (higher is better). |
start_a |
amino acid where alignment start. |
end_a |
amino acid where alignment stop (c.f. length_of_alignment_a = end_a |
start_b |
amino acid where alignment start. |
end_b |
amino acid where alignment stop. |
size_b |
length of protein B. |
Table: items.funccats
The functional categories defined by the COG database.
field |
description |
funccat_id |
one-letter identifier of functional category. |
funccat_description |
description of function (e.g. "Transcription"). |
Table: items.genes
Information of the genes that are used for neighborhood evidence.
field |
description |
gene_id |
internal gene identifier. |
gene_external_id |
external gene identifier (e.g.: "257311.BPP1623.NC_002928.1731802") |
start_position_on_contig |
the nucleotide of the start of the ORF. |
end_position_on_contig |
the nucleotide where the ORF ends. |
protein_size |
size of the protein (usually the longest splice variant). |
Table: items.genes_proteins
Mapping between the internal identifier of genes and proteins.
field |
description |
protein_id |
internal protein identifier. |
gene_id |
internal gene identifier. |
Table: items.meshterms
Mesh (Medical Subject Headings) describe a controlled vocabulary when names and categories can not be distinguished.
field |
description |
mesh_id |
MeSH identifier (e.g. 2826) |
description |
Description of MeSH term (e.g."Chorismate Mutase") |
Table: items.orthgroups
Information of orthologous groups.
field |
description |
orthgroup_id |
internal orthologs groups identifier. |
orthgroup_external_id |
name of orthologs group (e.g. "COG0133"). |
description |
general description of biological functionality. |
protein_count |
number of members in orthologs group. |
species_count |
number of distinct species in group. |
Table: items.orthgroups_funccats
The functional category of a orthologs group.
field |
description |
orthgroup_id |
internal orthologs groups identifier. |
funccat_id |
one-letter identifier of functional category. |
Table: items.orthgroups_species
This describes how many genes in a given organism encode a gene from a given orthologous group.
field |
description |
orthgroup_id |
internal orthologs groups identifier. |
species_id |
taxonomy identifier. |
count |
number of genes. |
Table: items.protein_image_match
Information about protein structure images used for the nodes in the network view.
field |
description |
protein_id |
internal protein identifier. |
image_id |
internal identifier of an protein structure image. |
identity |
the percentage identity to the most similar protein. |
source |
the origin of the protein structure. |
start_position_on_protein |
from which position of the protein a structure is mapped. |
end_position_on_protein |
to which position of the protein a stucture is mapped. |
annotation |
the name of the structure. |
Table: items.proteins
Information about the proteins in STRING.
field |
description |
protein_id |
internal protein identifier. |
protein_external_id |
taxonomy identifier and name of protein concatenated. |
species_id |
taxonomy identifier. |
protein_checksum |
checksum of the protein sequence. |
protein_size |
length of the protein (in amino acids). |
annotation |
description of the functionality of protein. |
preferred_name |
the preferred name of STRING (e.g. "amiF") |
annotation_word_vectors |
internal use only: enables full-text searching. |
Table: items.proteins_meshterms
Mapping between MeSH and STRING.
field |
description |
mesh_id |
MeSH id. |
protein_id |
internal protein identifier. |
Table: items.proteins_names
Mapping of various names to string entries
field |
description |
protein_name |
a name of the protein (e.g. "amiF", "spr1703", "AE008535", etc.) |
protein_id |
internal protein identifier. |
species_id |
taxonomy identifier. |
source |
the origin of the name (e.g. "Ensembl") |
is_preferred_name |
"true" if the name is the preferred string name. |
Table: items.proteins_orthgroups
Description of the members in the orthologs groups.
field |
description |
protein_id |
internal protein identifier. |
orthgroup_id |
to which orthgroup the protein belongs (internal orthgroup id). |
species_id |
taxonomy identifier of the protein. |
start_position |
residue within the protein where the orthologous group mapping starts. |
end_position |
residue within the protein where the orthologous group mapping ends. |
preferred_name |
preferred name of protein (redundant w.r.t items.proteins_names). |
protein_annotation |
annotated function of protein (redundant w.r.t items.proteins_names). |
Table: items.proteins_sequences
Describes the sequence of the protein.
field |
description |
protein_id |
internal protein identifier. |
sequence |
protein sequence. |
Table: items.proteins_smartlinkouts
Links to the SMART database describing the domain structures of a protein.
field |
description |
protein_id |
internal protein identifier. |
protein_size |
length of protein in amino acids. |
smart_url |
link to SMART database entry. |
Table: items.runs"
Neighborhood evidence: this describes an un-interrupted group of neighboring genes a ('run').
field |
description |
run_id |
internal id. |
species_id |
taxonomy identifier. |
contig_id |
from genome assembly information: which chromosome or otherwise identified contig the run is on. |
Table: items.runs_genes_proteins
Mapping of between runs, genes and proteins.
field |
description |
run_id |
internal id. |
gene_id |
internal gene identifier. |
protein_id |
internal protein identifier. |
start_position_on_contig |
the nucleotide of the start of the ORF. |
end_position_on_contig |
the nucleotide where the ORF ends. |
preferred_name |
the preferred name of STRING. |
annotation |
functional annotation of protein. |
Table: items.runs_orthgroups
Describes which orthologous groups map to an un-interrupted group of genes on the chromosome.
field |
description |
run_id |
internal id. |
orthgroup_id |
internal orthologs groups identifier. |
Table: items.species
Information on the organisms in STRING.
field |
description |
species_id |
taxonomy identifier (e.g "9606" for human). |
official_name |
scientific name of organism. |
compact_name |
other name, shortened version of the scientific name. |
kingdom |
to which of the 3 different highest grouping in the taxonomy the organism belong. |
type |
If the organism is a core species or periphery species. Core species are BLAST aligned all-against-all, periphery only against the core. |
Table: items.species_names
NCBI taxonomy used for organism selection on input page.
field |
description |
species_id |
taxonomy identifier . |
species_name |
species synonym. |
official_name |
scientific name of organism. |
Table: items.species_nodes
Auxiliary table to NCBI organism selection (c.f. items.species_names).
field |
description |
species_id |
taxonomy identifier. |
species_name |
name of query. |
position |
position of fist clade member in a STRING clade. |
size |
number of string species in the NCBI clade. |
Table: network.actions
The type of an interaction
field |
description |
item_id_a |
internal protein identifier. |
item_id_b |
internal protein identifier. |
mode |
type of interaction ("reaction", "expression", "activation", "ptmod"(post-translational modifications), "binding", "catalysis") |
action |
the effect of the action ("inhibition", "activation") |
a_is_acting |
the directionality of the action if applicable (1 gives that item_id_a is acting upon item_id_b) |
score |
the best combined score of all interactions in string. |
Table: network.best_combined_scores_orthgroups
Derived table of best combined score between two orthologs groups.
field |
description |
orthgroup_id |
internal orthologs group identifier. |
best_score |
the highest score of any members between two orthologs group. |
Table: network.best_combined_scores_proteins
The highest interaction scores of a protein.
field |
description |
protein_id |
internal protein identifier. |
best_score |
the best combined score of all interactions in string. |
Table: network.node_node_links
The interactions and their scores between proteins in a species (and orthologs groups)
field |
description |
node_id_a |
internal identifier (equivalent to protein_id). |
node_id_b |
internal identifier (equivalent to protein_id). |
node_id_b |
taxonomy identifier (equivalent to species_id). |
combined_score |
the combined score of all the evidence scores (including transferred scores). |
evidence_score |
the scores of the individual channels represented as a list of score types and their score. For example, {{4,626}} means that coocurrance score (4) is 0.626. The types of score can be found in table network.score_types. |
The combined_score is multiplied by 1000 to represent a score that range from 0 to 1 (as an integer from 0 to 1000).
Table: network.score_types
field |
description |
score_id |
internal identifier |
score_type |
the type of the score, see below |
score_type |
name |
description |
1 |
equiv_nscore |
neighborhood score, (computed from the inter-gene nucleotide count). |
2 |
equiv_nscore_transferred |
neighborhood score from other species (via homology). |
3 |
equiv_fscore |
fusion score (derived from fused proteins in other species). |
4 |
equiv_pscore |
cooccurence score of the phyletic profile (derived from similar absence/presence patterns of genes). |
5 |
equiv_hscore |
homology score, the degree of homology of the interactors (trivial and normally not reported in STRING). |
6 |
array_score |
coexpression score (derived from similar patter of mRNA expression measured by DNA arrays and similar technologies). |
7 |
array_score_transferred |
coexpression score transferred by homology from other species. |
8 |
experimental_score |
experimental score (derived from experimental data, such as, affinity chromatography). |
9 |
experimental_score_transferred |
experimental score transferred by homology from other species. |
10 |
database_score |
database score (derived from curated data of various databases). |
11 |
database_score_transferred |
database score transferred by homology from other species. |
12 |
textmining_score |
textmining score (derived from co-occurring mentioning of gene/protein names in abstracts). |
13 |
textmining_score_transferred |
textmining score transferred by homology from other species. |
14 |
neighborhood_score |
raw neighborhood counts for COG mode (deprecated). |
15 |
fusion_score |
raw fusion score for COG mode (deprecated). |
16 |
cooccurence_score |
raw cooccurence score for COG mode (deprecated). |