Archive for April, 2013

Answers, results of polls, and a brief description of the program

April 14, 2013

I have now closed the polls in the second mathematical writing experiment. Here are the results. I have also published the comments on the first and second experiments, which shed further light on the results and on (some) people’s reasons for voting the way they did.

The results were complicated slightly by the fact that after a couple of days the post was linked to from the front page of Hacker News, and suddenly the number of people who voted more than tripled in a few hours. Furthermore, there was a bit of discussion about the polls, so it is not clear that the votes were completely independent. Also, the profile of an average Hacker News reader is probably somewhat different from the profile of an average reader of this blog.

Fortunately, I had recorded the votes shortly before this happened, so below we (“we” means Mohan Ganesalingam and I) present both sets of results. As it happens, the proportions didn’t change too much. We begin with some bar charts. U stands for “undergraduate”, G for “graduate”, C for “computer” and D for “don’t know”. The portion coloured in blue represents people who claimed to be sure that they were correct, and the portion coloured in red represents those who were unsure.

At the end of the post we give the exact numbers.

A brief remark before we present the results is that none of the three “contestants” was explicitly trying to write proofs that would appear as human as possible. The two humans were asked to provide proofs and not told why. And the program was designed to produce passable write-ups, but not to fool people into thinking that those write-ups were written by a human. (There are some easy improvements that could have been made if we had intended to do that, but we did not originally envisage carrying out this second experiment.)

The general picture can be summarized as follows.

1. The computer was typically identified by around 50% of all those who voted.

2. Typically around half of those were confident that they were correct, and half not so confident.

3. A non-negligible percentage of respondents claimed to be sure that a write-up that was not by the computer was by the computer.

The results in bar-chart form

whichisthecomputer

Mohan Ganesalingam

There is quite a lot to say about these results, and also about the program. Most of the rest of this post is written jointly with my collaborator Mohan Ganesalingam (as indeed were the posts with the two experiments — we chose the wording very carefully), but first let me say a little about who Mohan is.
(more…)

A second experiment concerning mathematical writing.

April 2, 2013

The time has come to reveal what the experiment in the previous post was about. As with many experiments (the most famous probably being that of Stanley Milgram about obedience to authority), its real purpose was not its ostensive purpose.

Over the last three years, I have been collaborating with Mohan Ganesalingam, a computer scientist, linguist and mathematician (and amazingly good at all three) on a project to write programs that can solve mathematical problems. We have recently produced our first program. It is rather rudimentary: the only problems it can solve are ones that mathematicians would describe as very routine and not requiring any ideas, and even within that class of problems there are considerable restrictions on what it can do; we plan to write more sophisticated programs in the future. However, each of the problems in the previous post belongs to the class of problems that it can solve, and for each problem one write-up was by an undergraduate mathematician, one by a mathematics PhD student and one by our program. (To be clear, the program was given the problems and produced the proofs and the write-ups with no further help. I will have more to say about how it works in future posts.) We wanted to see whether anybody would suspect that not all the write-ups were human-generated. Nobody gave the slightest hint that they did.

Of course, there is a world of difference between not noticing a difference that you have not been told to look out for, and being unable to detect that difference at all. Our aim was merely to be able to back up a claim that our program produces passable human-style output, so we did not want to subject that output to full Turing-test-style scrutiny, but you may, if you were kind enough to participate in the experiment, feel slightly cheated. Indeed, in a certain sense you were cheated — that was the whole point. It seems only fair to give you the chance to judge the write-ups again now that you know how they were produced. For each problem I have created a poll, and each poll has seven possible answers. These are:

The computer-generated output is definitely (a).
I think the computer-generated output is (a) but am not certain.
The computer-generated output is definitely (b).
I think the computer-generated output is (b) but am not certain.
The computer-generated output is definitely (c).
I think the computer-generated output is (c) but am not certain.
I have no idea which write-up was computer generated.

I would also be interested in comments about how you came to your judgments. All comments on both experiments and all votes in the polls will be kept private until I decide that it is time to finish the second experiment. A small remark is that I transcribed by hand all the write-ups into a form suitable for WordPress, so the existence of a typo in a write-up is not a trivial proof that it was by a human.

If you did not participate in the first experiment but nevertheless want to try this one, that’s fine. [Update: I have now closed the polls. Very soon Mohan and I will post the results and a discussion of them. Further update: The results now appear below. They appear displayed in possibly a more convenient way in this post, which also contains a discussion of how the program works.]
(more…)

Another test

April 2, 2013

This time I want to test whether I can have polls where the results are not visible until the poll closes. So if you have a few seconds to vote, that would be very helpful. If the facility works, then my next post will include some secret ballots.