Getting Started
The four most important classes of BridgeDb are:
Suppose that you have an identifier in Ensembl, for example ENSG00000171105
The BioDataSource class enumerates a set of constant data sources. For Ensembl Human the constant value is BioDataSource.ENSEMBL_HUMAN The Xref is the combination of BioDataSource.ENSEMBL_HUMAN plus the identifier itself as a String. Gdb is an interface for
In java code:
Xref ref = new Xref("ENSG00000171105", BioDataSource.ENSEMBL_HUMAN); System.out.println ("<a href=\"" + ref.getUrl() + "\">clicky</a>");
Note: you can also run the BridgeDB service on your local computer, see here for instructions: http://bridgedb.org/wiki/LocalService
Mapping / Translating
If you want to translate one id to another system, you have to use an IDMapper. An IDMapper is a connection to a database or webservice that knows how to translate identifiers. You can use one of the webservices (Ensembl BioMart, CRONOS, PICR, Synergizer) as well as local text files and local Derby databases for increased efficiency and control. See also the guide to choosing a mapping service
In java code:
// first we have to load the driver // and initialize information about DataSources Class.forName("org.bridgedb.webservice.bridgerest.BridgeRest"); BioDataSource.init(); // now we connect to the driver and create a IDMapper instance. IDMapper mapper = BridgeDb.connect ("idmapper-bridgerest:http://webservice.bridgedb.org/Human"); // We create an Xref instance for the identifier that we want to look up. // In this case we want to look up Entrez gene 3643. Xref src = new Xref ("3643", BioDataSource.ENTREZ_GENE); // let's see if there are cross-references to Ensembl Human Set<Xref> dests = mapper.mapID(src, DataSource.getBySystemCode("EnHs")); // and print the results. // with getURN we obtain valid MIRIAM urn's if possible. System.out.println (src.getURN() + " maps to:"); for (Xref dest : dests) System.out.println(" " + dest.getURN());
This produces the following output:
urn:miriam:entrez.gene:3643 maps to: urn:bridgedb:ensembl.human:ENSG00000171105
If you're not particular about type of identifier you get back, you can simply leave off the second argument of mapper.mapID:
Set<Xref> dests = mapper.mapID(src);
If you use the above line in place of the original, you get dozens of different identifiers as a result.
Mapping is not restricted to genes and proteins, you can do the same thing with metabolites such as Methionine in this example:
// We'll use the BridgeRest webservice in this case, as it does compound mapping fairly well. // We'll use the human database, but it doesn't really matter which species we pick. Class.forName ("org.bridgedb.webservice.bridgerest.BridgeRest"); IDMapper mapper = BridgeDb.connect("idmapper-bridgerest:http://webservice.bridgedb.org/Human"); // Start with defining the Chebi identifier for // Methionine, id 16811 Xref src = new Xref("16811", BioDataSource.CHEBI); // the method returns a set, but in actual fact there is only one result for (Xref dest : mapper.mapID(src, BioDataSource.PUBCHEM)) { // this should print 6137, the pubchem identifier for Methionine. System.out.println ("" + dest.getId()); }
Searching
In the example above, you had to specify the DataSource of the input to mapper.mapID. This way there is no ambiguity about the type of identifier that you want to map.
What if you're given an identifier as a string and you don't know the input type? In that case you can use free search. (Note that not all mappers support free search, but idmapper-bridgerest does).
After the same setup as the previous example, we can use the freeSearch method to do a query for an identifier string without specifying the id type.
String query = "3643"; // let's do a free search without specifying the input type: Set<Xref> hits = mapper.freeSearch(query, 100); // Now print the results. // with getURN we obtain valid MIRIAM urn's if possible. System.out.println (query + " search results:"); for (Xref hit : hits) System.out.println(" " + hit.getURN());
Here is a sample of the results. If you want to filter further down you have to add your own code for that.
3643 search results: urn:bridgedb:affymetrix:3643427 urn:bridgedb:affymetrix:3463643 urn:miriam:entrez.gene:283643 urn:bridgedb:affymetrix:3643367 urn:bridgedb:illumina:GI_21536438-A urn:bridgedb:affymetrix:3364396 urn:miriam:refseq:NP_036430 ... more results omitted ...
Guessing
It's possible to guess identifier type based on a predefined set of regular expression patterns (These patterns are amongst others based on the MIRIAM registry). Of course this is not a fool-proof way to determine the type of identifier, but it can be helpful nontheless. Identifiers like "ENSG00000171105" can be recognized without problem as coming from Ensembl. An identifier that is just an integer like Entrez Gene ID's are more ambiguous. In the example below, a numeric identifier will give a list of possible results.
In practice, if you need to know the type of identifier we recommend that you let the end-user select from a combo-box, but use the guessing mechanism to set the default value of the combo box. In that way the user has full control, but when the guessing mechanism gets it right you'll have prevented the need to scroll through a long list of possibilities, which makes the application more user-friendly.
// We have to initialize DataSource information, // but we don't need a driver BioDataSource.init(); String query = "NP_036430"; System.out.println ("Which patterns match " + query + "?"); // DataSourcePatterns holds a registry of patterns Map<DataSource, Pattern> patterns = DataSourcePatterns.getPatterns(); // loop over all patterns for (DataSource key : patterns.keySet()) { // create a matcher for this pattern Matcher matcher = patterns.get(key).matcher(query); // see if the input matches, and print a message if (matcher.matches()) { System.out.println (key.getFullName() + " matches!"); } }
This produces the following output:
Which patterns match NP_036430? RefSeq matches!
More
Free search sample
This blog post contains an advanced example in Groovy that uses free attribute search to match gene names with identifiers.
