of 14

A Semantic Framework for Web service Annotation, Matching and Classification in Bioinformatics

4 views
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
A Semantic Framework for Web service Annotation, Matching and Classification in Bioinformatics
Transcript
  A Framework for Resource Annotation andClassification in Bioinformatics Nadia Yacoubi Ayadi † , Malika Charrad † , Soumaya Amdouni ‡ and MohamedBen ahmed † † National School of Computer Science,University of Manouba, 2010 Tunisianadia.yacoubi@asu.edu, malika.charrad@riadi.rnu.tn,mohamed.benahmed@riadi.rnu.tn ‡ High Institute of Management, Bardo City, Tunisia Abstract.  Semantic annotation is commonly recognized as one of thecornerstones of the semantic Web. In the context of Web services, seman-tic annotations can support effective and efficient discovery of services,and guide their composition into workflows. Because semantic annotationis a time consuming and expensive task, (semi-)automatic approaches forsemantic annotation extraction are required. In this paper, we propose asemi-automatic extraction approach of lightweight semantic annotationsfrom textual description of Web services. In contrast with most of theexisting semi-automatic approaches for semantic annotations of Web ser-vices which rely on a predefined domain ontology, we investigate the useof NLP techniques to derive service properties given a corpus of textualdescription of bioinformatics services. We evaluate the performance of theannotation extraction method and the importance of lightweight anno-tations to classify bioinformatics Web services in order to bootstrap theservice discovery process. Our framework relies an unsupervised cluster-ing approach based on a simultaneous clustering algorithm that enablesto determine biclusters of Web services and semantic annotations highlycorrelated. Keywords:  Semantic Annotation, Semantic Web Service, Block Clus-tering, Bioinformatics 1 Introduction During the last decade, semantic Web services (SWS) [20] technology have beenproposed and investigated to support effective and efficient service discovery,composition and invocation by machines. Despite the appealing characteristicsof semantic Web services principles, their uptake on a Web-scale has been signifi-cantly less prominent than initially anticipated [21]. In fact, research on semanticWeb services has mostly focused on devising domain-independent Web servicedescription ontologies such as OWL-S [19] and WSMO [22]. Semantic Annota-tions for WSDL (SAWSDL) [15] adopts a bottom-up approach by adding seman-tics to existing Web service standards through mapping syntactic definitions to  2 N. Yacoubi Ayadi et al. a set of ontological concepts. All of these approaches rely on a pre-determineddomain ontology to explicit service semantics. Reasoning tasks performed withsemantic Web service descriptions is mainly conditioned by the quality of thisdomain ontology [4]. The existence of a domain ontology to capture domainknowledge in an explicit and formal way is crucial. In several fields, many domainontologies have been developed for several purposes. The complexity of reason-ing tasks increases when semantic service descriptions are generated by meansof several domain ontologies. In the bioinformatics field, the OBO foundary 1 lists around 60 ontologies for life sciences including molecular biology, anatomy,biochemistry, environment, neuroscience, etc. (for a survey, see [24]). None of these ontologies is suitable to annotate bioinformatics Web services; although,they are  rich in semantics   but not  enough generic   to capture high-level conceptsand their semantic relationships.In this paper, we propose a  bottom-up  approach to extract domain-dependantlightweight semantic annotation from textual description of Web services. Suchannotations of Web services aims to capture static (i.e., domain concepts) andprocedural knowledge (i.e., tasks) of a domain. Despite their importance, few do-main ontologies exist for the purpose of Web services annotation, and thus, build-ing such ontologies is a challenging task. Natural language documentations of Web services are short textual descriptions intended to close the ” semantic gap ”between low-level technical features of Web services (e.g., data types, port types,or data formats) and the high-level, meaning-bearing features a user is interestedin and refers to when discovering a Web service. Hence, our semi-automatic ap-proach combines different extraction patterns to generate lighweight annotationsdescribing service properties such as inputs, outputs, or functionnalities. We no-tice that our extraction method provides a good starting point for ontologybuilding.Therefore, we rely on a simultaneous clustering algorithm, namely CROKI2[13], to identify clusters (groups) of services that are described by a specificsubset of highly correlated annotations. Simultaneous clustering step has twobenefits. Firsly, clustering Web services based on semantic annotations wouldgreatly boost the ability of Web services search engines to select suitable servicesgiven a discovery query. Secondly, it enables to detect implicit associations (rela-tionships) between highly correlated annotations which is crucial in an ontologybuilding process. In fact, the co-occurrence of a subset of annotations within asubset of Web services reflects implicit relationships that could be taxonomicor non taxonomic between these annotations. To the best of our knowledge, noapproach was developed using block-clustering, however, most of the approachesenables either annotations clustering [16,1] or services clustering [17,12].The paper is organized as follows. The section 2 reviews related work con-ducted in the fields of automatic annotation of Web services and block clustering.Section 3 presents our framework for semantic annotation and clustering of Webservices. In the section 4, we present and discuss the results of our experimen-tations. Section 5 concludes the paper and outlines our future work. 1 http://www.obofoundry.org/  A Framework for Resource Annotation and Classification in Bioinformatics 3 2 Related Work 2.1 Semantic annotation learning for Semantic Web services Converting an existing Web service into a semantic Web service requires signifi-cant effort and must be repeated for each new Web service. We review in this sec-tion research work that focus on learning semantic annotations by exploiting tex-tual descriptions, WSDL files or even Web forms. Hess and al. proposes ASSAM(Automated Semantic Annotation with Machine Learning), a semi-automaticWSDL annotator application. ASSAM [14] relies on a pre-determined domainontology and uses a machine learning algorithm to provide users with sugges-tions on how to describe the elements in the WSDL file. However, because of theintensive expert user intervention, applicability of such solution for large-scaleannotation of web services could be impractical despite of the fact that thesesolutions tend to provide high-quality annotations. Sabou et al. [23] proposesan automatic extraction method based on Natural Language Processing (NLP).Experimentations was conducted in the bioinformatics field by learning an on-tology from the documentation of Web services in the context of the  my Gridproject. The evaluation of the extracted ontology shows that the approach is ahelpful tool to support process of building domain ontologies for Web services.Our approach relies on [23]’s approach by using also NLP processing techniquesto generate semantic annotations of Web services.Also, within the bioinformatics space, Afzal et al. [2] developed a text miningapproach based on literature to learn  semantic profile   of bioinformatics resources.The approach identifies a set of semantic classes of descriptors that could beattached to a bioinformatics resource:  data  ,  data resource  ,  task  , and  algorithm  .The instances of these classes were collected by harvesting a corpus of scientificpapers along with related sentences containing the resource name. However, thecase study conducted in [2] shows that the coverage broad of the  my Grid ontologyused as annotation support is partially limited especially to capture functionalservice descriptions. The quality of extracted descriptors was only measured fromthe curator’s perspective view which is not accurate in the semantic Web contextwhere Web services are supposed to be discovered and composed by agents.Ambite and al. [3] present an approach to automatically discover and cre-ate semantic Web services. The idea behind their approach is to start with aset of known sources and the corresponding semantic descriptions and then dis-cover similar sources, extract the source data, build semantic descriptions of thesources, and then turn them into semantic Web services. Authors implementedthe  Deimos  system and evaluated it across five domains. In contrast to ourwork, the goal of   Deimos  is to build a semantic description that is sufficientlydetailed to support automatic retrieval and composition. Our work aims to gen-erate lightweight annotations useful to classify Web services and bootstrap theservice discovery process in the bioinformatics field.  4 N. Yacoubi Ayadi et al. 2.2 Web service Clustering With the expectable growth of the number of available Web services and servicerepositories, the need for mechanisms that enable the automatic organizationand discovery of services becomes increasingly important. In this context, mostof the existing research rely on a one-way clustering, either annotations clustering[16,1] or services clustering [12,17]. When clustering algorithms are used, eachservice in a given services cluster is described using all annotations. Similarly,each annotation in an annotation cluster characterizes all services. For instance,Based on their approach presented in [2], Afzal and al. propose in [1] to uselexical kernel metrics to identify semantically related networks of resources bycomputing similarity between annotations. However, the goal of our work is toidentify groups of services that are more described by a specific subset of annota-tions which refers to find biclusters of services and annotations highly correlatedin order to bootstrap the service discovery process. We rely on simultaneousclustering which is an approach enabling to find local pattern where a subset of subjects might be similar to each other based on only a subset of attributes. Si-multaneous clustering, usually designated by biclustering, co-clustering or blockclustering aims to find sub-matrices, which are subgroups of rows and subgroupsof columns that exhibit a high correlation. A number of algorithms that performsimultaneous clustering on rows and columns of a matrix have been proposed todate. This type of algorithms has been proposed and used in many fields, suchas bioinfomatics [18], Web mining [8] and text mining [6]. Table 1 outlines acomparison between one-way clustering and simultaneous clustering. Table 1.  Comparison between Clustering and Simultaneous clustering Clustering Simultaneous Clustering - applied to either the rows or the - performs clustering in the twocolumns of the data matrix  separately  dimensions  simultaneously ⇒  global model .  ⇒  local model .- produce  clusters  of rows  or  seeks  blocks  of rows  and clusters of columns. columns that are interrelated.- Each subject in a given subject - Each subject in a bicluster is selectedcluster is defined using  all  the using  only  a subset of the variablesvariables. Each variable in a variable and each variable in a bicluster is selectedcluster characterizes  all  subjects. using  only  a subset of the subjects.- Clusters are  exhaustive  - The clusters on rows and columns  shouldnot be exclusive  and/or  exhaustive 3 General Framework The proposed framework is comprised of two main steps. The first one aims toperform a semi-automatic semantic annotation extraction from Web services tex-tual documentations. Semantic annotations enables to describe service properties  A Framework for Resource Annotation and Classification in Bioinformatics 5 such as functionalities, inputs, outputs, and other domain-dependant features.One particluarity of textual Web service description is that they employ naturallanguage in a specific way. In fact, such texts belong to what was defined as sub-languages [23]. A sublanguage is a specialized form of natural language whichis used within a particular domain and characterized by a specialized vocab-ulary, semantic relations, and syntax (e.g., medical test report). The semanticannotation extraction step exploits the linguistic regularities of a sublanguageto identify semantic service properties. The second step of our approach consistson Web service clustering in terms of semantic annotations. This step allowsto discover subgroups (biclusters) of Web services and subgroups of semanticannotations that exhibit a high correlation by applying the CROKI2 algorithm[13]. In following, we present in further details the two steps. 3.1 Semantic Annotation Extraction of Web services The semantic annotation extraction phase allows to identify two types of knowl-edge: domain concepts and procedural knowledge describing services tasks. First,a morphosyntactic analysis of textual description of Web services is performed.In this step, a sentence splitter and a tokeniser components are used to extractsentences and basic linguistic entities. Then, a POS (Part-Of-Speech) Tagger isperformed to associate to each word (token) a grammatical category and thusdistinguish the morphology of various entities. For example, the sentence be-low, the tagger identify a verb (i.e.,  compute  ), three nouns (i.e.,  structure  ,  RNA , sequence  ), an adjective (i.e.,  secondary  ), and a preposition (i.e.,  for  ). compute (VB) Secondary (JJ) Structure (NN) for (Prep) RNA (NN) sequence (NN). We distinguish different types of syntactic patterns depending on the se-mantic annotation type. Syntactic patterns describe selectional constraints thatexploit sublanguages particularities. We distinguish syntactic patterns that allowto extract inputs and outputs of services, services tasks, and domain-dependantfeatures which are strongly related to the bioinformatics domain:1.  Identifying service tasks is crucial for the service discovery andcomposition issue.  We observed that, in majority of textual descriptionsof Web services, verbs identify the functionnality performed by a Web service.In our work, we consider different classes of verbs which inform on the servicetask. For example,  VBRetrieval   is the class of verbs that indicates a retrievalprocess (e.g.,  get, retrieve, fetch, search, find, return, query  ). A frequentlyoccuring pattern which involves this verbs class and the preposition  from  can be used to easily determine the output and the retrieved resource asdescribed by the following selectional pattern: VBRetrieval   < Output  >  from   < Source  > .
Related Search
Advertisements
Related Docs
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks