Synonym Database Schema

Background

We call gene databases in PathVisio? PGDB's, which stands for PathVisio? Gene Database (although they are not restricted to gene information). They are similar to GenMAPP's gene databases. Both are distributed as standalone files, PGDB's get a .pgdb extension.

Java Code

To see how this is used in practice, take a look at the Java code at our subversion repository ( http://svn.bigcat.unimaas.nl/pathvisio). In particular, take a look at these classes

  • org.pathvisio.data.Gdb -> interface for the rest of the application for contacting gene databases. Theoretically, this could be implemented in something other than a relational database, e.g. via a webservice.
  • org.pathvisio.data.SimpleGdb? -> simple jdbc-based implementation of the Gdb interface
  • org.pathvisio.data.DBConnector -> interface for a glue layer to work with different database implementations (e.g. MySQL, server, etc)
  • org.pathvisio.data.DataDerby? -> Derby-based implementation of DBConnector, can handle jar'ed single-file .pgdb files

Release process

Since pgdb's are intended to be released as stand-alone files, they store a schema version number so that we can write code to provide backwards compatibility. Especially gene databases lose their value with age (old annotation etc.) so we do expect users to upgrade reasonably quickly.

Proposed changes for Version 3.0

  • Suggestion by John Dionisio: 64-bit integers as primary (local) keys -> faster and more efficient

Schema Version 3.0

NOTE: the databases are now .bridge files instead of .pgdb files.

Datanode

Table to store datanode identifier and database code. For a list of database codes, see  http://conklinwolf.ucsf.edu/genmappwiki/Species_and_System_Codes

Table structure

columntypekeydescription
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code

NOTE: backpageText was removed in schema version 3. The attribute table has taken over this function.

Indexes

namecolumn(s)generated
id, code automatic
I_DNCODE code manual
I_DNID id manual

Table to link datanode identifiers from different databases with each other.

Table structure

columntypekeydescription
idLeft varchar(50) PRI left datanode identifier
codeLeft varchar(50) PRI left database code identifier
idRight varchar(50) PRI right datanode identifier
codeRight varchar(50) PRI right database code identifier

Note: Currently, the 'Left' columns are only used to store Ensembl identifiers, because that is the source of all current link information.

Indexes

namecolumn(s)generated
idLeft, codeLeft, idRight, codeRight automatic
I_CODELEFT codeLeft manual
I_RIGHT idRight, codeRight manual

Attribute

Table used to store attributes of a datanode, including: symbol, synonyms, description, chromosome and type. As of version 3, the attributes are used to dynamically generate backpages.

Table structure

columntypekeydescription
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code
attrName varchar(50) PRI attribute name (key)
attrValue varchar(255) attribute value

Indexes

namecolumn(s)generated
id, code, attrName, attrValue automatic
I_ATTRID id manual
I_ATTRCODE code manual
I_ATTRNAME attrName manual
I_ATTRVALUE attrValue manual

Info

Table used to store all meta data pertaining to the database, including the version of the database schema which is used for compatibility checks in PathVisio?. This table should have only one row.

Table structure

columntypekeydescription
schemaversion integer database schema version, e.g., '3'
builddate integer date of database generation, e.g., '20090720'
dataSourceName VARCHAR(31) name of data source, e.g., 'Ensembl'
dataSourceVersion VARCHAR(31) version of data from data source, e.g., 'core_homo_sapiens_55_23c'
species VARCHAR(31) genus species, e.g., 'homo sapiens'
dataType VARCHAR(31) type of data, e.g., 'GeneProduct?'

Schema Version 2.0

Datanode

Table to store datanode information such as identifier, database code and description For a list of database codes, see  http://conklinwolf.ucsf.edu/genmappwiki/Species_and_System_Codes

Table structure

columntypekeydescription
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code
backpageText varchar(800) backpage text (html formatted)

NOTE: This table was renamed from "Gene" for schema version 2

NOTE: backpageText is deprecated and will probably disappear in the future. The attribute table can take over this function.

Indexes

namecolumn(s)generated
id, code automatic
I_DNCODE code manual
I_DNID id manual

Link

Table to link datanode identifiers from different databases with each other.

Table structure

columntypekeydescription
idLeft varchar(50) PRI left datanode identifier
codeLeft varchar(50) PRI left database code identifier
idRight varchar(50) PRI right datanode identifier
codeRight varchar(50) PRI right database code identifier

thomaskelder 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers<BR>

Indexes

namecolumn(s)generated
idLeft, codeLeft, idRight, codeRight automatic
I_CODELEFT codeLeft manual
I_RIGHT idRight, codeRight manual

Attribute

Table used to store attributes of a datanode

Table structure

columntypekeydescription
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code
attrName varchar(50) PRI attribute name (key)
attrValue varchar(255) attribute value

We can add or remove any type of attribute to this table without breaking anything. Note: this table is new in Schema v2.

Indexes

namecolumn(s)generated
id, code, attrName automatic

Info

Table used to store the version of the database schema, used for compatibility checks in PathVisio? This table should have only one row.

Table structure

columntypekeydescription
schemaversion integer database schema version, in this case 2
builddate integer date of database generation, format: yyyymmdd

Schema Version 1.0

Gene

Table to store datanode information such as identifier, database code and description

columntypekeydescription
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code
bpText varchar backpage text (html formatted)

Link

Table to link datanode identifiers from different databases with each other.

columntypekeydescription
idLeft varchar(50) PRI left datanode identifier
codeLeft varchar(50) PRI left database code identifier
idRight varchar(50) PRI right datanode identifier
codeRight varchar(50) PRI right database code identifier
bridge varchar(50) GenMAPP legacy, unused by PathVisio?

thomaskelder 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers<BR>

Info

Table used to store the version of the database schema, used for compatibility checks in PathVisio?

columntypekeydescription
schemaversion integer PRI database schema version, in this case 1