Synonym Database Schema
Background
We call gene databases in PathVisio? PGDB's, which stands for PathVisio? Gene Database (although they are not restricted to gene information). They are similar to GenMAPP's gene databases. Both are distributed as standalone files, PGDB's get a .pgdb extension.
Java Code
To see how this is used in practice, take a look at the Java code at our subversion repository ( http://svn.bigcat.unimaas.nl/pathvisio). In particular, take a look at these classes
- org.pathvisio.data.Gdb -> interface for the rest of the application for contacting gene databases. Theoretically, this could be implemented in something other than a relational database, e.g. via a webservice.
- org.pathvisio.data.SimpleGdb? -> simple jdbc-based implementation of the Gdb interface
- org.pathvisio.data.DBConnector -> interface for a glue layer to work with different database implementations (e.g. MySQL, server, etc)
- org.pathvisio.data.DataDerby? -> Derby-based implementation of DBConnector, can handle jar'ed single-file .pgdb files
Release process
Since pgdb's are intended to be released as stand-alone files, they store a schema version number so that we can write code to provide backwards compatibility. Especially gene databases lose their value with age (old annotation etc.) so we do expect users to upgrade reasonably quickly.
Proposed changes for Version 3.0
- Suggestion by John Dionisio: 64-bit integers as primary (local) keys -> faster and more efficient
Schema Version 3.0
NOTE: the databases are now .bridge files instead of .pgdb files.
Datanode
Table to store datanode identifier and database code. For a list of database codes, see http://conklinwolf.ucsf.edu/genmappwiki/Species_and_System_Codes
Table structure
| column | type | key | description |
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
NOTE: backpageText was removed in schema version 3. The attribute table has taken over this function.
Indexes
| name | column(s) | generated |
| id, code | automatic | |
| I_DNCODE | code | manual |
| I_DNID | id | manual |
Link
Table to link datanode identifiers from different databases with each other.
Table structure
| column | type | key | description |
| idLeft | varchar(50) | PRI | left datanode identifier |
| codeLeft | varchar(50) | PRI | left database code identifier |
| idRight | varchar(50) | PRI | right datanode identifier |
| codeRight | varchar(50) | PRI | right database code identifier |
Note: Currently, the 'Left' columns are only used to store Ensembl identifiers, because that is the source of all current link information.
Indexes
| name | column(s) | generated |
| idLeft, codeLeft, idRight, codeRight | automatic | |
| I_CODELEFT | codeLeft | manual |
| I_RIGHT | idRight, codeRight | manual |
Attribute
Table used to store attributes of a datanode, including: symbol, synonyms, description, chromosome and type. As of version 3, the attributes are used to dynamically generate backpages.
Table structure
| column | type | key | description |
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
| attrName | varchar(50) | PRI | attribute name (key) |
| attrValue | varchar(255) | attribute value |
Indexes
| name | column(s) | generated |
| id, code, attrName, attrValue | automatic | |
| I_ATTRID | id | manual |
| I_ATTRCODE | code | manual |
| I_ATTRNAME | attrName | manual |
| I_ATTRVALUE | attrValue | manual |
Info
Table used to store all meta data pertaining to the database, including the version of the database schema which is used for compatibility checks in PathVisio?. This table should have only one row.
Table structure
| column | type | key | description |
| schemaversion | integer | database schema version, e.g., '3' | |
| builddate | integer | date of database generation, e.g., '20090720' | |
| dataSourceName | VARCHAR(31) | name of data source, e.g., 'Ensembl' | |
| dataSourceVersion | VARCHAR(31) | version of data from data source, e.g., 'core_homo_sapiens_55_23c' | |
| species | VARCHAR(31) | genus species, e.g., 'homo sapiens' | |
| dataType | VARCHAR(31) | type of data, e.g., 'GeneProduct?' |
Schema Version 2.0
Datanode
Table to store datanode information such as identifier, database code and description For a list of database codes, see http://conklinwolf.ucsf.edu/genmappwiki/Species_and_System_Codes
Table structure
| column | type | key | description |
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
| backpageText | varchar(800) | backpage text (html formatted) |
NOTE: This table was renamed from "Gene" for schema version 2
NOTE: backpageText is deprecated and will probably disappear in the future. The attribute table can take over this function.
Indexes
| name | column(s) | generated |
| id, code | automatic | |
| I_DNCODE | code | manual |
| I_DNID | id | manual |
Link
Table to link datanode identifiers from different databases with each other.
Table structure
| column | type | key | description |
| idLeft | varchar(50) | PRI | left datanode identifier |
| codeLeft | varchar(50) | PRI | left database code identifier |
| idRight | varchar(50) | PRI | right datanode identifier |
| codeRight | varchar(50) | PRI | right database code identifier |
thomaskelder 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers<BR>
Indexes
| name | column(s) | generated |
| idLeft, codeLeft, idRight, codeRight | automatic | |
| I_CODELEFT | codeLeft | manual |
| I_RIGHT | idRight, codeRight | manual |
Attribute
Table used to store attributes of a datanode
Table structure
| column | type | key | description |
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
| attrName | varchar(50) | PRI | attribute name (key) |
| attrValue | varchar(255) | attribute value |
We can add or remove any type of attribute to this table without breaking anything. Note: this table is new in Schema v2.
Indexes
| name | column(s) | generated |
| id, code, attrName | automatic |
Info
Table used to store the version of the database schema, used for compatibility checks in PathVisio? This table should have only one row.
Table structure
| column | type | key | description |
| schemaversion | integer | database schema version, in this case 2 | |
| builddate | integer | date of database generation, format: yyyymmdd |
Schema Version 1.0
Gene
Table to store datanode information such as identifier, database code and description
| column | type | key | description |
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
| bpText | varchar | backpage text (html formatted) |
Link
Table to link datanode identifiers from different databases with each other.
| column | type | key | description |
| idLeft | varchar(50) | PRI | left datanode identifier |
| codeLeft | varchar(50) | PRI | left database code identifier |
| idRight | varchar(50) | PRI | right datanode identifier |
| codeRight | varchar(50) | PRI | right database code identifier |
| bridge | varchar(50) | GenMAPP legacy, unused by PathVisio? |
thomaskelder 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers<BR>
Info
Table used to store the version of the database schema, used for compatibility checks in PathVisio?
| column | type | key | description |
| schemaversion | integer | PRI | database schema version, in this case 1 |
