GeneTerm Linker is a new algorithm for functional annotation of a list of genes that provides a set of functional metagroups in a single output. It includes a concurrent enrichment analysis (GeneCodis) followed by a non-redundant reciprocal linkage of genes and biological terms (GeneTerm Linker).


Functional analysis of large sets of genes and proteins is becoming more and more necessary with the increase of experimental biomolecular data at omic scale. Enrichment analysis is by far the most popular available methodology to derive functional implications of sets of cooperating genes. The problem with these techniques relies in the redundancy of resulting information, that in most of the cases generate lots of trivial results with high risk to mask the reality of key biological events. We present and describe a computational methodology that filters and links enriched output data identifying sets of genes and terms to produce metagroups of coherent biological significance. The method is called GeneTerm Linker and it uses fuzzy reciprocal linkage between genes and terms to unravel their functional convergence and associations.


GeneTerm Linker can be used through this web site and we include below a simple HELP as support for the users. The algorithm is designed to take the results of any enrichment analysis which provide a collection of GeneTerm-sets (defined as terms/genes/p-value itemsets derived from a functional annotation procedure), and produces a simple result including genes and terms (i.e. co-annotations) associated in metagroups with consistent biological significance. These metagroups are evaluated by parameters that measure their significance and coherence, in order to find out the most relevant functions present in a given list of genes.


The complete description of the algorithm with all its steps and mathematical procedures is included in a manuscript that will be attached to this web site. The method can be applied to the output of well known enrichment analysis tools (DAVID, GSEA, FatiGO, GeneCodis, etc). In the current version on the web, the users can provide just a list of genes and the preliminar enrichment analysis is provided using GeneCodis tool (Nogales-Cadenas et. al. Nucleic Acids Res 37: W317-322, 2009).



Figure. Scheme that illustrates the rational followed by the GeneTerm Linker method. The method provides a single result combining all annotation spaces where a gene list has been interrogated. The method uses filters for generic and redundant terms/annotations.

Submitting analysis

1. Paste a list of genes (query list)

Include the list of genes to study. Place one gene per line and use appropriate IDs (see allowed IDs for GeneCodis tool here ). If the list contains replicated entries they will be removed and only one of them will be used in the analysis.

2. Paste a list of reference genes (optional)

By default, the method uses as reference list all genes that are annotated with terms for the selected annotation space/s and organism. Optionally, users can introduce their own reference list (e.g. genes in a given microarray) by simple pasting all genes in the same format commented in the above option. Note: The input list should be a subset of the reference list.

3. Select the organism

Select the organism under study. GeneTerm Linker currently supports annotations for H. sapiens and S. cerevisiae.

4. Select the sources of biological annotations (i.e. annotation spaces)

Select one or more annotation space that you want include in the analysis. GeneTerm Linker currently supports:
  • GO, Gene Ontology categories from http://www.geneontology.com
  • KEGG pathways from http://www.genome.jp/kegg/
  • InterPro domains keywords from http://www.ebi.ac.uk/interpro

5. Select the minimum number of genes (i.e. minimum support)

Annotations or combinations of annotations that do not appear in at least the minimum number of genes selected (minimum support) will not be part of the enrichment analysis results. Recommended: use minimum support of 4 (set up by default) or 3, since using less than 3 can bring too many GeneTerm-sets and it can jam the server.

6. Email (optional)

If you provide an e-mail address a notification will be sent when the analysis is completed with the link to the results

Interpreting results

GeneTerm Linker provides the final results as tabulated text format files and html tables with the metagroups of coherent biological significance. The web also provides a text format file with the initial enrichment analysis from the enrichment tool used.

Description of columns in the html table:

  • Genes: Genes included in each metagroup
  • #list: Number of annotated genes in the input list (Total number of genes in the input list)
  • #ref_list: Number of annotated genes in the reference list (Total number of genes in the reference list)
  • adjusted pValue: p-value calculated using the Hypergeometric distribution and corrected for multiple testing with FDR method.
  • Silhouette Width: Coefficient that measure how appropriately genes have been clustered in the metagroup. It takes into account the intra-groups compactness and the inter-groups proximity and ranges from -1 to 1
  • Terms: Annotation terms obtained from the different biological annotation resources selected (i.e. annotation spaces).


The information about each metagroup is available by clicking in the “Show details” link of each of them.

  • Size: Number of GeneTerm-sets from the enrichment results that are included in the metagroup.
  • Diameter: Maximum Cosine Distance within the GeneTerm-sets of the metagroup. It ranges from 0 to 1
  • Similarity: Cosine Similarity Coefficient within the GeneTerm-sets of the metagroup. It ranges from 0 to 1
  • Silhouette Width: As indicated above, it is the Coefficient that measures how appropriately genes have been clustered in each metagroup, taking into account intra-group compactness and inter-groups proximity.
  • adjusted pValue: p-value calculated using the Hypergeometric distribution and corrected for multiple testing with FDR method.
  • Genes: Genes included in the metagroup
  • Non-generic Terms: Biological terms considered as non-gereric by GeneTerm Linker method. GeneTerm Linker identifies and filters biological terms that are "generic" and "promiscuous", that –on their own– they can be considered not very informative.


The GeneTerm-sets that support the metagroup are also available in a text format file and an html table.

Description of columns in the html table:

  • Genes: Genes annotated with the given term or set of concurrent terms
  • #list: Number of annotated genes in the input list (Total number of genes in the input list)
  • #ref_list: Number of annotated genes in the reference list (Total number of genes in the reference list)
  • adjusted pValue: p-value calculated using the Hypergeometric distribution and corrected for multiple testing with FDR method.
  • Terms: Annotation terms obtained from the different biological annotation resources selected (i.e. annotation spaces).