About the robustness of Bilbo

In this post, we will first try to see the impact of training set’s nature on the performance of automatic annotation. Secondly, we will try to see how well our system handles multilingual documents.

All experiments are based on 10 fold cross-validation, we used the set of feature in this previous post.

To conduct these experiments we used five corpus containing annotated bibliography :

Corpus Nature Language Quantity Labels
Corpus 1 scientific literature , humanities and social sciences field Multilingual 715 ref 18
Cora scientific literature , computer sciences field English 496 ref 14
Umich scientific literature, several other fields English 80 ref 7
Corpus 11 scientific literature , humanities and social sciences field French 412 ref 18
PubMed scientific literature , biomedical sciences field Multilingual 566 ref 16

Evaluation on the impact of training set’s nature on the performance

The following are several experiments based on different kinds of corpora, although the corpora are all extracted from scientific literature. They can present a wide variety of structures both in function of the domain publication and of document’s type (articles, journals, etc.)

tableau_corpus

This table allows us to observe the behaviour of our system on different type of corpora. At first we see a very stable2 behaviour during tests on the Corpus 1 and on the Corpus 1 only french, we can also observe very good result on Cora corpus. Secondly, we can see that the corpus presented as Umich has a really unstable behaviour. For PubMed we can notice results overall pretty weak with some instability too. In the case of Umich, this phenomenon can be explained by too little data and in the case of PubMed we are dealing with a varied bibliography in which might be cited audio and visual media, material on CD-ROM, DVD or Disk. If we look at this table a little more in detail we can look for variants of the Corpus 1 (monolingual and multilingual ) linearity with some better results when adding training data. We can also note that this increasing linearity is observed in the Cora corpus whose average F-measure reaches 94.24 % with use of simple part of speech. These behavioural differences between the variants of Corpus 1 and Cora corpus may be explained by the much more heterogeneous areas journals in the variants of Corpus 1 and much more complex structuring (presence of nested references) than Cora corpus. For the Umich corpus we note unstable performance affected both by the amount of training data and the various combinations of feature. However, it is interesting to note that despite its small size we are able to achieve an average F-measure of about 80%. Regarding the PubMed corpus, we have already noted the particular type of bibliography which composes it, it is also interesting to observe that in its case, it is the splits composed of 50% of training data who obtained similar results than those composed of 90 %.

Evaluation of bibliographical reference’s language on performance

In this section, we present an evaluation based on monolingual and multilingual corpora to observe if the system pays attention to the language. To conduct this experiment we used Corpus 1 and Corpus 1 with only french bibliographical references. We choose these corpura due to their similar nature.

language

These diagrams allow us to observe different behaviours between the two variations of the Corpus 1 despite similar performance on split of 90%. We can see much more stable behaviour on the multilingual corpus while the monolingual corpus has much more unstable behaviour. We may also find that certain combinations of features are better suited to the management of multilingualism as we watch the use of detailed part of speech. This feature has lower results on the monolingual corpus. It is interesting to note that for this corpus different variations on the split of 90% present similar results ( about 87% of F-measure ) except for the variation using simple part of speech. This experiments allows us to observe that the presence of multilingualism within the bibliography does not cause loss of particular performance compared to our previous experiments. However we can see that the monolingual corpus makes our system more unstable : this phenomenon can be explained by a slightly smaller amount of data or a less good representation of different structuring. It is also possible that these combinations of features don’t fit as well on a monolingual corpus.

In conclusion, we found that the features were , in most cases, not dependent on the language of the body , while the features according to the nature of the body are more sensitive .’

Notes

(1) It’s the same corpus that Corpus 1 with only french bibliographical references. These references have been found mainly in French journals

(2) By stability we mean : a small deviation of the performance of a division of the corpus to another


Vous devriez également aimer ...

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *