Ticket #64 (new)

Opened 21 months ago

Last modified 21 months ago

compare size and performance benefits of using integers in place of ensembl ids

Reported by: AlexanderPico Owned by: AlexanderPico
Milestone: Component: Derby ensembl
Version: Severity: feature
Keywords: Cc:

Description

Will replace ensembl ids in Link table with integers to see if there are significant performance and size benefits. This is an early test of an internal global identifier for bridgedb databases, which will allow for easier deviation from current ensembl dependence and better handling of indirect mappings.

Change History

Changed 21 months ago by AlexanderPico

3 versions of the human derby database were made:

Hs_Derby_20090929.bridge = using Ensembl IDs encoded as VARCHAR(50) = 507MB Hs_Derby_20090930.bridge = using integers encoded as SMALLINT = 442MB (13% smaller) Hs_Derby_20090931.bridge = using integers encoded as CHAR(5) = 491MB (3% smaller)

These are all available in the database dir of the wikipathways account on chihuahua.ucsf.edu. If the SMALLINT version performs as good or better, then that's the way to go.

Note: See TracTickets for help on using tickets.