Specification for Id Mapping / bridgedb plugin (result from a brainstorm with Alex, Isaac and Martijn)
The plugin consists of 6 functions that can be implemented more or less independently.
I) Annotation. Function to look up cross-references and / or symbols from the currently selected database / webservice and copy this information into (a) new attribute(s).
Suppose that a user starts with a network where each node is annotated with an Entrez Gene id. Suppose that the user wishes to create new attributes containing 1. GO annotation and 2. primary symbols for each node.
This could take form of a wizard like this:
- On the first page of the Wizard the user can review the current settings: are the primary attribute settings correct (see VI), is the right database / webservice selected (see V). There are buttons to jump to the respective dialogs to adjust these settings.
- On the second page of the Wizard the user can select the annotations that they want to add. In our example the user selects GO and Primary Symbol from a list. Default attribute names are generated (e.g. annotation.GO and annotation.PrimarySymbol if these aren't taken), but the user can override these if desired
- In the final page the actual translation is performed, this is a potentially time consuming operation so a progress bar is needed. After the process is complete, a summary is given, including for how many nodes the operation failed / no information was available.
Or, there could be a default function to annotate with Ensembl IDs, using the current primary key as input:
Suppose a user loads a network with Entrez Gene IDs and then a dataset with Affy IDs. By listening for "bridgedb_mappEnsembl" events (triggered by the GenMAPPImport Plugin), the bridgedb plugin could collect the given IDs, query a synonym database, return Ensembl IDs (by default) and add them as a new attribute. This would require minimal user interaction (e.g., only when the ID system can not be guessed) and appear to the user like a seamless, automatic mapping from Affy to Entrez, without any change to their network, node labels or data.
II) Provide link-outs to all cross-referenced DataSources in the right-click menu.
BridgeDb can generate linkout URL's for all cross-references for a given identifier. This feature would place these linkout URLS in the right-click menu.
Note: Cytoscape already has an existing linkout feature but that does not include
cross-references, it is also non-specific (it generates all kinds of URLs with the identifier as basis, without looking what the identifier actually is) See also http://www.cytoscape.org/cgi-bin/moin.cgi/Generic_Linkout
III) Provide backpage information for a given node, just like GenMAPP / PathVisio
In GenMAPP / PathVisio, the backpage provides a summary of information about a gene. Usually this includes the gene symbol and a short description.
The information is rendered as a html fragment that is displayed in Cytopanel 3 (appears on the right)
The information in this panel should be updated to the current node selection. If multiple nodes are selected, no information can be displayed.
IV) Searching for a symbol or identifier.
Type a part of a identifier or symbol in the search text box. Click on the search button.
Usually there are many results from several datasources. The program shows a table with all matches,
in the columns id + datasource. The user can pick a single result from this table.
The selected identifier is set as a node attribute for the currently selected node. Optional feature: it might be good to add filtering for a preferred datasource (for example, show only
results from Entrez Gene)
Another optional feature: It would be cool to have the search text box do autocompletion.
Note Alex: The very first Cytoscape plugin we wrote was a GeneFinder tool that searched a derby database. This code is probably crap and outdated (working on Cytoscape 2.1 or 2.2 only), but I'll provide a link to it in case any of it is salvageable.
PathVisio has some functionality in this area that we can build on. Or, if this
is written in a generic way some classes can be placed in bridgedb and shared by multiple projects.
V) Dialog to select a source of ID mapping information. Source can be a webservice or a derby database.
After selecting this menu item, a dialog is shown with a list of available sources. This list should include all locally available derby databases, plus available webservices.
The user can select one or more sources from the list.
Multiple sources can be stacked, but it's not recommended to stack too many as this makes things slow. Perhaps we should give the user a warning when they select more than three resources.
Optional feature: show some resource metadata, like a description, which datasources it does / does not
cover and how many identifiers it contains.
VI) Define how network / edge attributes relate to identifier information. This means selecting which attribute contains this information, and also which DataSource? this identifier corresponds to (e.g. Ensembl or Entrez). This can be guessed in part, but there should also be a dialog to override the guesses.
The following settings need to be taken into account:
Which attribute contains the identifier
Is the datasource uniform or mixed In the case of mixed, which attribute contains the datasource In the case of uniform, what is the datasource
Upon opening of a new network, these settings can be guessed. There is a dialog to override these guesses The settings are saved as network attributes, so they are available
again the next time the network is opened.
