About the robustness of Bilbo

par Anaïs Ollagnier · 03/11/2014

In this post, we will first try to see the impact of training set’s nature on the performance of automatic annotation. Secondly, we will try to see how well our system handles multilingual documents.

All experiments are based on 10 fold cross-validation, we used the set of feature in this previous post.

To conduct these experiments we used five corpus containing annotated bibliography :

Corpus	Nature	Language	Quantity	Labels
Corpus 1	scientific literature , humanities and social sciences field	Multilingual	715 ref	18
Cora	scientific literature , computer sciences field	English	496 ref	14
Umich	scientific literature, several other fields	English	80 ref	7
Corpus 1¹	scientific literature , humanities and social sciences field	French	412 ref	18
PubMed	scientific literature , biomedical sciences field	Multilingual	566 ref	16

Evaluation on the impact of training set’s nature on the performance

The following are several experiments based on different kinds of corpora, although the corpora are all extracted from scientific literature. They can present a wide variety of structures both in function of the domain publication and of document’s type (articles, journals, etc.)

This table allows us to observe the behaviour of our system on different type of corpora. At first we see a very stable² behaviour during tests on the Corpus 1 and on the Corpus 1 only french, we can also observe very good result on Cora corpus. Secondly, we can see that the corpus presented as Umich has a really unstable behaviour. For PubMed we can notice results overall pretty weak with some instability too. In the case of Umich, this phenomenon can be explained by too little data and in the case of PubMed we are dealing with a varied bibliography in which might be cited audio and visual media, material on CD-ROM, DVD or Disk. If we look at this table a little more in detail we can look for variants of the Corpus 1 (monolingual and multilingual ) linearity with some better results when adding training data. We can also note that this increasing linearity is observed in the Cora corpus whose average F-measure reaches 94.24 % with use of simple part of speech. These behavioural differences between the variants of Corpus 1 and Cora corpus may be explained by the much more heterogeneous areas journals in the variants of Corpus 1 and much more complex structuring (presence of nested references) than Cora corpus. For the Umich corpus we note unstable performance affected both by the amount of training data and the various combinations of feature. However, it is interesting to note that despite its small size we are able to achieve an average F-measure of about 80%. Regarding the PubMed corpus, we have already noted the particular type of bibliography which composes it, it is also interesting to observe that in its case, it is the splits composed of 50% of training data who obtained similar results than those composed of 90 %.

Evaluation of bibliographical reference’s language on performance

In this section, we present an evaluation based on monolingual and multilingual corpora to observe if the system pays attention to the language. To conduct this experiment we used Corpus 1 and Corpus 1 with only french bibliographical references. We choose these corpura due to their similar nature.

These diagrams allow us to observe different behaviours between the two variations of the Corpus 1 despite similar performance on split of 90%. We can see much more stable behaviour on the multilingual corpus while the monolingual corpus has much more unstable behaviour. We may also find that certain combinations of features are better suited to the management of multilingualism as we watch the use of detailed part of speech. This feature has lower results on the monolingual corpus. It is interesting to note that for this corpus different variations on the split of 90% present similar results ( about 87% of F-measure ) except for the variation using simple part of speech. This experiments allows us to observe that the presence of multilingualism within the bibliography does not cause loss of particular performance compared to our previous experiments. However we can see that the monolingual corpus makes our system more unstable : this phenomenon can be explained by a slightly smaller amount of data or a less good representation of different structuring. It is also possible that these combinations of features don’t fit as well on a monolingual corpus.

In conclusion, we found that the features were , in most cases, not dependent on the language of the body , while the features according to the nature of the body are more sensitive .’

Notes

(1) It’s the same corpus that Corpus 1 with only french bibliographical references. These references have been found mainly in French journals

(2) By stability we mean : a small deviation of the performance of a division of the corpus to another

About the robustness of Bilbo

Evaluation on the impact of training set’s nature on the performance

Evaluation of bibliographical reference’s language on performance

Notes

Vous devriez également aimer ...

Laisser un commentaire Annuler la réponse.

Liens

Liens Externes

S’abonner

Présentation

Annonces récentes

Articles récents

Catégories

Archives

About the robustness of Bilbo

Evaluation on the impact of training set’s nature on the performance

Evaluation of bibliographical reference’s language on performance

Notes

Vous devriez également aimer ...

Proper noun features II (corpus 1)

Note Classification in Corpus level 2

Finding DOI through CrossRef

Laisser un commentaire Annuler la réponse.

Liens

Liens Externes

S’abonner

Présentation

Annonces récentes

Articles récents

Catégories

Archives