Version 17 (modified by chun, 14 years ago)


Satellite meeting for Literature and Text mining related issues


Different web tools allow researchers to search literature databases and integrate semantic information extracted from text with external databases and ontologies. Some of them even provide accessible APIs so users can build their own text mining pipelines.

Enhanced literature search and retrieval engines

 EBI text mining tools

Text mining web services

 BioAID workflows

 iHOP webservices


TM related web services provided by  DBCLS

Examples of Text mining approach

  • Related PNE Japanese articles.

PubMed? provides "Related Articles" for each MedLine? abstract. DBCLS is developing a tool to provide "Related Protein, Nucleic acid and Enzyme (PNE) Japanese Articles" for all MedLine? abstracts. In this approach, we have applied dictionaries to recognize biomedical terms and translate them to Japanese and have used GALAXY to show the results.

  • Prediction of Protein Sub-cellular Localization.

 GENIA corpus is a collection of biomedical literature. It has been compiled and annotated within the scope of the  GENIA project. The goal of the project is to develop text mining (TM) systems for the domain of molecular biology. The GENIA corpus has been developed to provide a reference material for the development of bio-TM systems. The corpus currently contains 1,999 Medline abstracts which were collected using the three MeSH terms, "human", "blood cells", and "transcription factors". The corpus has been annotated with various levels of linguistic and semantic information. As for the cellular components in GENIA ontology, Japan Biological Information Research Center(JBIRC) has constructed a new corpus that annotates protein subcellular locations and developed a Machine Learning-based prediction tool that can recognize subcellular locations for proteins.

The questions that this satellite meeting could try to answer are:

For developers:

  • What are the missing elements for better integration of the different services?
  • Which are the best formats for information extraction data exchange (ieXML, RDF)?
  • Which components of the text minig architecture can be moved from to the server to the client (AJAX, browser plugins)?
  • Is it possible to build portable text mining systems? Can systems be adapted to specific domains and specific tasks without the assistance of an experienced language processing specialist?
  • How can we cope with full text data (format / copyright)?

For users:

  • What is the actual utility of text mining in the workflows of the various communities of potential users—model organism database curators, bedside clinicians, biologists utilizing high-throughput experimental assays, hospital billing departments?
  • How usable are biomedical text mining applications? How does the application fit into the workflow of a complex bioinformatics pipeline? What kind of training does a bioscientist require to be able to use an application?
  • How robust and reliable are biomedical text mining applications?



  • Developers of literature and text mining related software and services
  • End-users of these services


  • Yulia Kovarskaya
  • Hong-Woo Chun
  • Yasunori Yamamoto
  • Tobias Gattermayer


  • 2009/3/17 13:00-17:00


  • Seminar Room (1F)


  • Alberto Labarga
    • Review of text mining tools and web services
    • Attachments (coming soon)