* The system we reported on in our paper used the TensorFlow sequence-to-sequence code used in Britz et al. (2017). Later, an open source version of the Google Translation model (GNMT) was published as a tutorial. The ActiveQA version released today is based on this more recent, and actively developed implementation. For this reason the released system varies slightly from the paper’s. Nevertheless, the performance and behavior are qualitatively and quantitatively comparable.↩
1 In our paper, we posit that if automatic determination of names, locations, brands, etc. from the image is needed, it should be done as a separate task that may leverage image meta-information (e.g. GPS info), or complementary techniques such as OCR.↩