|Home||About||TOC||User Guide||Data Model||Schema||Extended Search||PathFinder||SkyPainter||Download||Linking||Citing||Editorial Calendar||Mart||Help|
The Arabidopsis Reactome Data Model
Life on the cellular level is a network of molecular interactions. Molecules are synthesized and degraded, undergo a bewildering array of temporary and permanent modifications, are transported from one location to another, and form complexes with other molecules. Arabidopsis Reactome represents all of this complexity as reactions in which input physical entities are converted to output entities. These reactions can occur spontaneously or be facilitated by physical entities acting as catalysts, and their progress can be modulated by regulatory effects of other physical entities. Reactions are linked together by shared physical entities: a product from one reaction may be a substrate in another reaction and may catalyze yet a third. It is often convenient, if sometimes arbitrary, to group such sets of interlinked reactions into pathways.
The functions of macromolecular entities such as proteins are often determined not only by their primary sequences, but by chemical modifications they have undergone. In Arabidopsis Reactome, unmodified and modified forms of a protein are distinct physical entities and the modification process is treated as an explicit reaction. A macromolecule's function may depend on whether the molecule is free or complexed with specific other molecules. Arabidopsis Reactome treats complexes as physical entities distinct from their components, and the multimerization events that build up complexes are modeled explicitly as reactions.
Cellular compartments play a key role in biological processes. The segregation of molecules into different compartments often regulates the reactions in which those entities can participate, or can be responsible for driving a reaction forward. In Arabidopsis Reactome, a molecule in one compartment is distinct from that molecule in another compartment. Thus, extracellular and cytosolic glucose are different Arabidopsis Reactome entities and, e.g., the movement of glucose across the plasma membrane is a reaction that converts the extracellular glucose entity into the cytosolic one.
Many biochemical entities and processes appear redundant: there are two or more chemically distinct entities that can act more or less interchangeably. It is often useful to treat functionally equivalent protein isoforms, splice variants, and paralogues as a single entity, implying that any individual entity from the given set could fulfill the same role in a given situation. The Arabidopsis Reactome data model allows this type of generalization, but does so explicitly in a way that allows us to trace specific functions back to the individual molecules covered by the generalization.
The goal of the Arabidopsis Reactome knowledgebase is to represent Arabidopsis biological processes, but many of these processes have not been directly studied in Arabidopsis. Rather, an Arabidopsis event has been inferred from experiments on material from a model organism. In such cases, the model organism reaction is annotated in Arabidopsis Reactome, the inferred Arabidopsis reaction is annotated as a separate event, and the inferential link between the two reactions is explicitly noted.
Arabidopsis Reactome uses a frame-based knowledge representation. The data model consists of classes (frames) that describe the different concepts (e.g., reaction, simple entity). Knowledge is captured as instances of these classes (e.g., "glucose transport across the plasma membrane", "cytosolic ATP"). Classes have attributes (slots) which hold properties of the instances (e.g., the identities of the molecules that participate as inputs and outputs in a reaction).
Key data classes
PhysicalEntities include individual molecules, multi-molecular complexes, and sets of molecules or complexes grouped together on the basis of shared characteristics. Molecules are further classified as genome encoded (DNA, RNA, and proteins) or not (all others). Attributes of a PhysicalEntity instance capture the chemical structure of an entity, including any covalent modifications in the case of a macromolecule, and its subcellular localization.
PhysicalEntity instances that represent, e.g., the same chemical in different compartments, or different post-translationally modified forms of a single protein, share numerous invariant features such as names, molecular structure and links to external databases like UniProt or ChEBI. To enable storage of this shared information in a single place, and to create an explicit link among all the variant forms of what can also be seen as a single chemical entity, Arabidopsis Reactome creates instances of the separate ReferenceEntity class. A ReferenceEntity instance captures the invariant features of a molecule. A PhysicalEntity instance is then the combination of a ReferenceEntity attribute (e.g., Glycogen phosphorylase UniProt:P06737) and attributes giving specific conditional information (e.g., localization to the cytosol and phosphorylation on serine residue 14).
The PhysicalEntity class has subclasses to distinguish between different kinds of entities and to ensure data integrity while enabling different handling rules for different categories:
GenomeEncodedEntity - a species-specific protein or nucleic acid whose sequence is unknown, such as an enzyme that has been characterized functionally but not yet purified and sequenced, e.g., cytosolic triokinase
SimpleEntity - other fully characterized molecules, e.g., nucleoplasmic ATP or cytosolic glutathione
Complex- a complex of two or more PhysicalEntities, e.g., FASL:FAS Receptor Trimer:FADD complex associated with the plasma membrane
EntitySet - a set of PhysicalEntities (molecules or complexes) which function interchangeably in a given situation, e.g., Notch ligand associated with the plasma membrane. This notation allows collective properties of multiple individual entities to be described explicitly.
PhysicalEntities are paired with molecular functions taken from the Gene Ontology molecular function controlled vocabulary to describe instances of biological catalysis. An optional ActiveUnit attribute indicates the specific domain of a protein or subunit of a complex that mediates the catalysis. If a PhysicalEntity has multiple catalytic activities, a separate CatalystActivity is created for each. This strategy allows the association of specific activities with specific variant forms of a protein or complex, and also enables easy retrieval of all activities of a protein, or all proteins capable of mediating a specific molecular function.
Events - the conversion of input PhysicalEntities to output PhysicalEntities - are the building blocks used in Arabidopsis Reactome to represent all biological processes. At present, only two subclasses of Event are recognized, Reaction and Pathway. A Reaction is an event that converts inputs to outputs in a single step. A pathway is any grouping of related events. An event may be a member of more than one pathway.
The Pathway class is thus remarkably heterogeneous at present. Work is underway to extend this aspect of the Arabidopsis Reactome data model to support distinctions between conventional pathways, e.g., "fatty acyl CoA biosynthesis", and other useful groupings of events, e.g., "carbohydrate metabolism", "hydroxylation of xenobiotics", or "Cell cycle progression".
Full specification of the Arabidopsis Reactome data model
A full specification of all Arabidopsis Reactome classes, slots and a listing of all instances of each class is accessible from the Schema page on the top menu bar.