Schemas
There are four schemas in STRING that describes different aspects of the content.
| schema | description |
|---|---|
| evidence | contains info of the underlying evidence for interactions. |
| homology | blast hits used to propagate evidence to other species by means of homology. |
| items | info about entries (protein names, species, orthgroups, etc.). |
| network | the interactions and their scores. |
Table: evidence.abstracts
Abstracts are used for showing the text bodies in the text-mining view.
| field | description |
|---|---|
| abstract_id | abstract identifier (e.g. "PMID019234442", "OMIM000100070", etc.). |
| publication_date | date of publication (e.g. "2009"). |
| publication_source | source. the name of journal(e.g. "Nature"). |
| linkout_url | link to publication. |
| title | title of publication. |
| body | the abstract of the publication. |
Table: evidence.collections
The different sources of data from where STRING imports data (for the channels 'experiments' and 'databases').
| field | description |
|---|---|
| collection_id | the name of the data source (e.g. "dip"). |
| pubmed_id | the pubmed identification number (e.g. "14681454"). |
| comment | short description of the data and the date it was imported. |
Table: evidence.evidence_transfers
Evidence of interaction propagated across species using homology.
| field | description |
|---|---|
| target_protein_id_a | interactor A that have received evidence from source. |
| target_protein_id_b | interactor B that have received evidence from source. |
| transfer_score_c1 | fraction of transfer score for interactor A. depends on how ambiguous the homology is. range from 0 to 1, higher score is better. |
| transfer_score_c2 | fraction of transfer score for interactor A. |
| source_ascore | coexpression score of the source interaction. |
| source_escore | experimental score of the source interaction. |
| source_dscore | database score of the source interaction. |
| source_tscore | textmining score of the source interaction. |
Table: evidence.fusion_evidence
Homology propagated evidence for fusion evidence, which originates from only one source protein.
| field | description |
|---|---|
| target_protein_id_a | interactor A that have received evidence from source. |
| target_protein_id_b | interactor A that have received evidence from source. |
| source_protein | the source from which evidence for the interaction have have been transferred. |
| source_species | species_id for the species of the source of the evidence. |
| transfer_score_c1 | homology fraction of score. |
| transfer_score_c2 | homology fraction of score. |
| fusion_score | fusion score of the source interaction. |
Table: evidence.items_abstracts
The abstracts in which a protein is mentioned.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| abstract_id | abstract identifier. |
| name | name of gene in abstract. |
| abstract_length | length of abstract in number of characters. |
| mesh_id | identifier for MeSH (i.e. the controlled vocabulary of NCBI for different names that refer to the same concept). |
Table: evidence.orthgroups_abstracts
The abstracts that mention any member of a orthologous group (redundant table w.r.t. evidence.items_abstracts)
| field | description |
|---|---|
| orthgroup_id | internal orthologs group identifier. |
| abstract_id | abstract identifier. |
Table: evidence.orthgroups_sets
The sets that support evidence for interaction between orthologs groups.
| field | description |
|---|---|
| orthgroup_id | internal orthologs group identifier. |
| set_id | the identifier to external repository from where evidence was gathered |
| is_database_set | this flag is set to 'true' when the set concerns the 'database' channel in STRING (as opposed to the 'experiments' channel) |
Table: evidence.sets
The sets of evidence that support interactions.
| field | description |
|---|---|
| set_id | the identifier to external repository from where evidence was gathered (e.g. "BCRT1314"). |
| collection_id | the external repository (e.g. "biocarta"). |
| title | type of evidence (e.g. "curated pathway"). |
| comment | auxiliary information of the set. |
| url | link to original data. |
Table: evidence.sets_items
The members in the evidence sets. An interaction exists if two lines have the same set_id. set_id | identifier to a single of proteins in an external repository (protein complex, pathway or binary pair).
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| species_id | taxonomy identifier. |
| is_database_set | this flag is set to 'true' when the set concerns the 'database' channel in STRING (as opposed to the 'experiments' channel). |
Table: evidence.sets_pubmedrefs
Supporting papers from external repository.
| field | description |
|---|---|
| set_id | identifier to external repository. |
| pubmed_id | pubmed identifier. |
Table: homology.best_hit_per_species
Derived homology information that is used for transfer of evidence.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| species_id | taxonomy identifier of the species to which the alignment is found. |
| nr_high_scoring_hits | number of proteins in the species that have blast bit scores higher than 60. |
| best_hit_protein_id | the id of the protein with the highest scoring alignment. |
| best_hit_identifier | the string preferred name of the highest scoring alignment. |
| best_hit_bitscore | the bitscore of the highest scoring alignment. |
| best_hit_normscore | score normalized by the self-hit of the longer protein. |
| best_hit_alignment_length | the length of the alignment. |
Table: homology.blast_data
Raw data of all-against-all BLAST alignments.
| field | description |
|---|---|
| protein_id_a | internal identifier of protein A. |
| protein_id_b | internal identifier of protein B. |
| bitscore | bitscore of alignment (higher is better). |
| start_a | amino acid where alignment start. |
| end_a | amino acid where alignment stop (c.f. length_of_alignment_a = end_a |
| start_b | amino acid where alignment start. |
| end_b | amino acid where alignment stop. |
| size_b | length of protein B. |
Table: items.funccats
The functional categories defined by the COG database.
| field | description |
|---|---|
| funccat_id | one-letter identifier of functional category. |
| funccat_description | description of function (e.g. "Transcription"). |
Table: items.genes
Information of the genes that are used for neighborhood evidence.
| field | description |
|---|---|
| gene_id | internal gene identifier. |
| gene_external_id | external gene identifier (e.g.: "257311.BPP1623.NC_002928.1731802") |
| start_position_on_contig | the nucleotide of the start of the ORF. |
| end_position_on_contig | the nucleotide where the ORF ends. |
| protein_size | size of the protein (usually the longest splice variant). |
Table: items.genes_proteins
Mapping between the internal identifier of genes and proteins.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| gene_id | internal gene identifier. |
Table: items.meshterms
Mesh (Medical Subject Headings) describe a controlled vocabulary when names and categories can not be distinguished.
| field | description |
|---|---|
| mesh_id | MeSH identifier (e.g. 2826) |
| description | Description of MeSH term (e.g."Chorismate Mutase") |
Table: items.orthgroups
Information of orthologous groups.
| field | description |
|---|---|
| orthgroup_id | internal orthologs groups identifier. |
| orthgroup_external_id | name of orthologs group (e.g. "COG0133"). |
| description | general description of biological functionality. |
| protein_count | number of members in orthologs group. |
| species_count | number of distinct species in group. |
Table: items.orthgroups_funccats
The functional category of a orthologs group.
| field | description |
|---|---|
| orthgroup_id | internal orthologs groups identifier. |
| funccat_id | one-letter identifier of functional category. |
Table: items.orthgroups_species
This describes how many genes in a given organism encode a gene from a given orthologous group.
| field | description |
|---|---|
| orthgroup_id | internal orthologs groups identifier. |
| species_id | taxonomy identifier. |
| count | number of genes. |
Table: items.protein_image_match
Information about protein structure images used for the nodes in the network view.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| image_id | internal identifier of an protein structure image. |
| identity | the percentage identity to the most similar protein. |
| source | the origin of the protein structure. |
| start_position_on_protein | from which position of the protein a structure is mapped. |
| end_position_on_protein | to which position of the protein a stucture is mapped. |
| annotation | the name of the structure. |
Table: items.proteins
Information about the proteins in STRING.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| protein_external_id | taxonomy identifier and name of protein concatenated. |
| species_id | taxonomy identifier. |
| protein_checksum | checksum of the protein sequence. |
| protein_size | length of the protein (in amino acids). |
| annotation | description of the functionality of protein. |
| preferred_name | the preferred name of STRING (e.g. "amiF") |
| annotation_word_vectors | internal use only: enables full-text searching. |
Table: items.proteins_meshterms
Mapping between MeSH and STRING.
| field | description |
|---|---|
| mesh_id | MeSH id. |
| protein_id | internal protein identifier. |
Table: items.proteins_names
Mapping of various names to string entries
| field | description |
|---|---|
| protein_name | a name of the protein (e.g. "amiF", "spr1703", "AE008535", etc.) |
| protein_id | internal protein identifier. |
| species_id | taxonomy identifier. |
| source | the origin of the name (e.g. "Ensembl") |
| is_preferred_name | "true" if the name is the preferred string name. |
Table: items.proteins_orthgroups
Description of the members in the orthologs groups.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| orthgroup_id | to which orthgroup the protein belongs (internal orthgroup id). |
| species_id | taxonomy identifier of the protein. |
| start_position | residue within the protein where the orthologous group mapping starts. |
| end_position | residue within the protein where the orthologous group mapping ends. |
| preferred_name | preferred name of protein (redundant w.r.t items.proteins_names). |
| protein_annotation | annotated function of protein (redundant w.r.t items.proteins_names). |
Table: items.proteins_sequences
Describes the sequence of the protein.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| sequence | protein sequence. |
Table: items.proteins_smartlinkouts
Links to the SMART database describing the domain structures of a protein.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| protein_size | length of protein in amino acids. |
| smart_url | link to SMART database entry. |
Table: items.runs"
Neighborhood evidence: this describes an un-interrupted group of neighboring genes a ('run').
| field | description |
|---|---|
| run_id | internal id. |
| species_id | taxonomy identifier. |
| contig_id | from genome assembly information: which chromosome or otherwise identified contig the run is on. |
Table: items.runs_genes_proteins
Mapping of between runs, genes and proteins.
| field | description |
|---|---|
| run_id | internal id. |
| gene_id | internal gene identifier. |
| protein_id | internal protein identifier. |
| start_position_on_contig | the nucleotide of the start of the ORF. |
| end_position_on_contig | the nucleotide where the ORF ends. |
| preferred_name | the preferred name of STRING. |
| annotation | functional annotation of protein. |
Table: items.runs_orthgroups
Describes which orthologous groups map to an un-interrupted group of genes on the chromosome.
| field | description |
|---|---|
| run_id | internal id. |
| orthgroup_id | internal orthologs groups identifier. |
Table: items.species
Information on the organisms in STRING.
| field | description |
|---|---|
| species_id | taxonomy identifier (e.g "9606" for human). |
| official_name | scientific name of organism. |
| compact_name | other name, shortened version of the scientific name. |
| kingdom | to which of the 3 different highest grouping in the taxonomy the organism belong. |
| type | If the organism is a core species or periphery species. Core species are BLAST aligned all-against-all, periphery only against the core. |
Table: items.species_names
NCBI taxonomy used for organism selection on input page.
| field | description |
|---|---|
| species_id | taxonomy identifier . |
| species_name | species synonym. |
| official_name | scientific name of organism. |
Table: items.species_nodes
Auxiliary table to NCBI organism selection (c.f. items.species_names).
| field | description |
|---|---|
| species_id | taxonomy identifier. |
| species_name | name of query. |
| position | position of fist clade member in a STRING clade. |
| size | number of string species in the NCBI clade. |
Table: network.actions
The type of an interaction
| field | description |
|---|---|
| item_id_a | internal protein identifier. |
| item_id_b | internal protein identifier. |
| mode | type of interaction ("reaction", "expression", "activation", "ptmod"(post-translational modifications), "binding", "catalysis") |
| action | the effect of the action ("inhibition", "activation") |
| a_is_acting | the directionality of the action if applicable (1 gives that item_id_a is acting upon item_id_b) |
| score | the best combined score of all interactions in string. |
Table: network.best_combined_scores_orthgroups
Derived table of best combined score between two orthologs groups.
| field | description |
|---|---|
| orthgroup_id | internal orthologs group identifier. |
| best_score | the highest score of any members between two orthologs group. |
Table: network.best_combined_scores_proteins
The highest interaction scores of a protein.
| field | description |
|---|---|
| protein_id | internal protein identifier. |
| best_score | the best combined score of all interactions in string. |
Table: network.node_node_links
The interactions and their scores between proteins in a species (and orthologs groups)
| field | description |
|---|---|
| node_id_a | internal identifier (equivalent to protein_id). |
| node_id_b | internal identifier (equivalent to protein_id). |
| node_id_b | taxonomy identifier (equivalent to species_id). |
| combined_score | the combined score of all the evidence scores (including transferred scores). |
| evidence_score | the scores of the individual channels represented as a list of score types and their score. For example, {{4,626}} means that coocurrance score (4) is 0.626. The types of score can be found in table network.score_types. |
The combined_score is multiplied by 1000 to represent a score that range from 0 to 1 (as an integer from 0 to 1000).
Table: network.score_types
| field | description |
|---|---|
| score_id | internal identifier |
| score_type | the type of the score, see below |
| score_type | name | description |
|---|---|---|
| 1 | equiv_nscore | neighborhood score, (computed from the inter-gene nucleotide count). |
| 2 | equiv_nscore_transferred | neighborhood score from other species (via homology). |
| 3 | equiv_fscore | fusion score (derived from fused proteins in other species). |
| 4 | equiv_pscore | cooccurence score of the phyletic profile (derived from similar absence/presence patterns of genes). |
| 5 | equiv_hscore | homology score, the degree of homology of the interactors (trivial and normally not reported in STRING). |
| 6 | array_score | coexpression score (derived from similar patter of mRNA expression measured by DNA arrays and similar technologies). |
| 7 | array_score_transferred | coexpression score transferred by homology from other species. |
| 8 | experimental_score | experimental score (derived from experimental data, such as, affinity chromatography). |
| 9 | experimental_score_transferred | experimental score transferred by homology from other species. |
| 10 | database_score | database score (derived from curated data of various databases). |
| 11 | database_score_transferred | database score transferred by homology from other species. |
| 12 | textmining_score | textmining score (derived from co-occurring mentioning of gene/protein names in abstracts). |
| 13 | textmining_score_transferred | textmining score transferred by homology from other species. |
| 14 | neighborhood_score | raw neighborhood counts for COG mode (deprecated). |
| 15 | fusion_score | raw fusion score for COG mode (deprecated). |
| 16 | cooccurence_score | raw cooccurence score for COG mode (deprecated). |