Last weekend has been pretty interesting down here in Sheffy! Me and my brother participated in the local hackaton: HackSheffield (fb event).
Not needing to explain what a hackaton is (previous link to wikipedia opens in a new tab), I’ll describe my point of view. The committee for HackSheffield has been truly amazing. The sponsors, Major League Hacking and The University Of Sheffield very supportive! The organisation was spot on, everything was scheduled with great attention. But let’s proceed with order.
The event was a 24+ gathering of enthusiast developers and curious non-techs. In our case I was the dev and my brother joined me as an interested non-techie.
The sponsors gave brief presentations to open the dances. A team from SkyBet was there, as well as lead-management company DataBowl. Chersoft, Ossila, Wandisco and Huel completed the sponsors lineup.
The hack started at 12 midday on Saturday. Most of the teams were formed already but we found Chris Ingram. Our team focused on one of the datasets provided by the organisation and specifically by Sheffield Uni. It consisted in 64k entries representing human behaviour recorded by sensors. Many of these entries were totally wrong, some were recorded as “unknown” which adds little to no information to the data. Our task was to come up with a solution to reduce the noise in the dataset and to visualise the data. As my brother does not have much experience in code development me and Chris took care of that. Chris developed the visualisation side using node.js, react and gulp. I used python, with numpy and matplotlib to operate on the data.
We set up a Github repo (sadly the data is covered by non-disclosure agreements and could not be shared) and a Devpost project.
The Pasta brothers did some research and came up with a combination of rule-based and classification techniques to reduce the noisiness. We used some simple rules, e.g. collapse an unknown entry if it is between coherent entries. After a couple of hours researching and writing test code for several unsupervised learning techniques (including, k-nearest neighbors and Principal Component Analysis) we decided to use a neural network. We used the features of the dataset as inputs and normalised them (more work needed there) into a numerically stable range. There are a few classes we are trying to capture (e.g. WALKING, IN_VEHICLE, etc.) so we decided to use 6 output neurons (more testing needed here). By splitting the dataset in thirds we could use the first 2/3 for the training phase and the remaining third to test the quality of the classification.
We did not have enough time to carry out an exhaustive testing, as this would have taken too long to mark the correctness of the predictions. We could see from the graphs produced that there was a certain degree of learning but many instances were mis-classified. In my opinion, this is due to the large skew in the class distribution of the data. One class was clearly overrepresented (WALKING) with more than double the instances of the second most common class.
This technical approach won us one of the sponsors prizes. We all got a DBPOWER Hawkeye III Drone. I cannot say how chuffed we were when we received the award and the prize! Really happy times 🙂
The other hacks were truly amazing! The team who won all the rest of the awards developed a visualisation tool to show on a map the events near the user location, pulling that directly from Facebook! Another team used governmental data (huge files describing London 1 square meter at a time) to plot a map of potential solar panel areas. Some people hacked combining Oculus Rift and Leap Motion. Some really cool stuff on controllers using Myo (an armband which reads the electrical activity of the forearm muscles) to control a remote robot. The list of hacks goes on, incredible effort by all the teams and hackers. Some of which did not go to bed the night before and were still on the stage presenting their work.
Apart from the prize winning or the hack itself, what this experience really gifted me with was the inclusivity, the feeling of being part of a wider community of people. The participants had different background and knowledge areas but were striving to create and to do this in a stimulating and creative environment.
After witnessing the spreading like wild fire of the news about a piece of software, an artificial intelligence that supposedly passed Turing Test I have to admit I was a bit excited. After IBM Watson who knows what else AI researchers would have brought us.
I have to be honest on two main things: the first one is that I am clearly biased, I am working in a Machine Learning Lab, my career as a student was focused on these tasks and as a passionate technology enthusiast I could not be more happy if something like this happened for real. The second thing is that I am also very skeptical, I am a researcher and I cannot trust claims until I have the proofs. I still thank my dad for giving me this mindset. When you are a kid at school everyone calls you stubborn, meticulous, even polemical. But when I grew up and I had to confront myself with the massive amount of information we deal with everyday, including the generic news… what was considered to be a negative trait of my character becomes the extra gear!
Now after reading the first articles I have to admit I was a bit let down. I mean, we all had a go with Cleverbot and that thing worked quite well if you wanted to creep out some naive friend.
This “Eugene” does nothing more than that. How come, all of a sudden, people realise AI can trick a human interlocutor into thinking he is actually chatting with another person and not a sophisticated mechanical string composer?
I stumbled upon a nice blog post by Paolo Attivissimo.
I asked for his permission to translate it and he gently granted it to me (I am sorry for potential mistakes but I had to go through this in a bit of a rush, at 2AM)
What you are going to read from now on is his work. Enjoy.
(Here is the original version)
No, no “supercomputer” passed Turing test
The news is all over the place. A computer, or as “Il Sole 24 Ore” titled,
a “supercomputer” has supposedly passed the legendary Turing Test, proving it is intelligent.
It is a hoax passed off by a researcher, Kevin Warwick, already known for his pompous statements, totally lacking in scientific basis.
The test has not been passed at all, notwithstanding Warwick altered the rules in his favor.
First of all, a bit of revision on what is the Turing Test.
There is no unique definition, but the mathematician Alan Turing in 1950 wrote a famous article, “Computing Machinery and Intelligence“.
He proposed “The Imitation Game”: an examiner talks freely using a chat (at the time it would have been a teleprinter) with a computer and with a human being.
If he could not distinguish which one of the two is the computer and which the human, then it can be concluded that the computer “thinks”, or at least that it is capable of simulating perfectly human thoughts and therefore it is intelligent as much as a human being.
This tests features many limits, also age-related ones: at the time artificial intelligence was an unexplored field.
What happened instead at the London Royal Society, according to the press release from University of Reading, is that the computer managed to convince only 33% of the examiners, of being a real person.
One out of three. Quote: “Eugene managed to convince 33% of the human judges that it was human.” The remaining 67% did not get fooled. This, where I come from, cannot be called “to pass” a test.
Moreover, according to the press release, Turing Test provides that “if a computer is mistaken for a human more than 30% of the time during a series of five minute keyboard conversations it passes the test”. Fake: Turing never wrote such a percentage as pass criteria.
He wrote, instead, that the test is passed if the examiner misjudges, with the same frequency, both when he has to distinguish between man and woman and when he has to discern between human and computer (“We now ask the question, ‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?'”).
The only close enough claim in the press release is a prediction made by Turing, again on “Computing Machinery and Intelligence”. That by the year 2000 it would have been possible to program a computer, so that an average examiner would not have more than 70% chance of identifying correctly after 5 minutes of questioning (“I believe that in about fifty years’ time it will be possible, to programme computers […] to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning”).
But it is not a description of the criteria to pass the test: it is simply a prediction that in 50 years computer science would have evolved to this point. That’s all.
In other words, the test that has been announced no the newspaper does not correspond at all with the original criteria stated by Turing, that did not put limits for time, topic or examiner competence. Instead, in this event:
– the interrogator has been limited to sessions of 5 minutes (maybe repeated);
– not all the judges (that were probably thirty or five all in all; there is an astonishing confusion about even the simplest thing like this one) were expert in evaluating AI software: the names that have been disclosed until now are Robert Llewellyn (actor, known as the robot Kryten in the sitcom “Red Dwarf”), Aaron Sloman (Computer Science professor at the University of Birmingham) and Lord Sharkey (that does not seem to have any AI qualification, even if BBC defines him as an “expert” of this field);
– the judges were told they had to decide whether they were communicating with a computer simulating a thirteen years old kid or with a real thirteen y.o.; but the choice of a teen is equivalent to cheating, because it prevents the judges from asking complex cultural or ethical questions and therefore simplifies a lot the task of imitating a human being;
– the judges were also told that “Eugene Goostman” does not speak english very well because he is from Ukraine: a trick that allowed an believable explanation for the bad grammar of the software, unable to formulate idiomatic expressions.
As a matter of fact, Time published a transcription of a dialogue with the software that is claimed to have “passed” Turing Test: it suffices reading it to notice that the program does nothing more than creating prepacked sentences, diverting from the interviewer’s questions. When the software could not find anything useful in the questions, it answers “I prefer not to talk about this”. Everyone can play like this: this is not intelligence, it’s a pure mechanical elaboration. Basically “Eugene Goostman” is a chatbot and nothing more. You can find a version here (if the website is not overloaded).
The Guardian published a picture of an excerpt of conversation with “Eugene”: decide for yourself about the credibility. Note that twice the software does merely take the interlocutor’s sentence and re-uses it in the form of a question and twice answers changing topic.
EUGENE: Hello, how are you?
Exam: I am fine, you?
EUG: I am pleased you are fine.
Exam: and you?
EUG: I do not want to answer that right now.
Exam: Why not?
EUG: Lets talk about something else.
Exam: Ok what?
EUG: What is on your mind?
Exam: I am wondering if you are a robot.
EUG: Would if be a problem if I was a robot?
Exam: Not really.
The contradiction between the press release statements and the reality of facts is particularly clear in this BBC article, where the organisers underline how “crucially, the conversations were unrestricted”, and immediately after they state that simulating a Ukrainian thirteen years old kid, astutely limited the conversation (“It was very clever ruse to pretend to be a 13-year-old Ukranian boy, which would constrain the conversation”). Make up your mind.
It is not over: even the statement “No computer has ever achieved this, until now” is false. Already three years ago Cleverbot chatbot convinced 59% of the examiners that it was a human. Much more than “Eugene’s” 33%.
Professor Warwick is by himself an assurance of hoax. Years ago he announced he was the first cyborg because he had implanted himself a chip in his arm (if so, all dogs and cats with subcutaneous microchip would be cyborgs).
He also claimed sensationally to announce the first human being infected by a computer virus: really he just implanted a chip containing a virus into a colleague’s arm. He has said so many things that The Register has a compilation of the sensationalist stupidities Warwick announced.
A full-scale massive hoax, shameful for Royal Society. Surfing the wave of the sixtieth anniversary of Turing death, causing only confusion in public opinion. There is no AI to come: we will continue to be surrounded by natural stupidity and naivety of gullible journalists that write about things they don’t know and publish everything without verifying it. This “test” shows, if anything, that if it takes so little to imitate a thirteen year old, then thirteen year old kids are not thinking beings. I have a few doubt about it also on many journalists.
Translation finished. Hope you enjoyed it.
I have two final things I want to say.
First I want to thank Paolo for his invaluable job. We live in the time of disinformation and your effort makes a difference!
Second, an appeal to everybody out there. Please do not trust everything you read. Especially when it comes to technical stuff. Ignorance IS NOT bliss and this kind of news has only 2 consequences: sell copies and keep people ignorant.
Someone once said “Be hungry, be foolish”, now let’s not forget about the first of the two: be hungry for knowledge!