Spinvox secures £15m more, but the demo didn't really answer the big questions | techcrunch

feature-image

Play all audios:

Loading...

When I walked in to SpinVox‘s plush Buckinghamshire offices this morning, flanked by the _Register_‘s Andrew Orlowski and Ben Smith and Dan Lane from The Really Mobile Project, the tension


in the building was obvious. There were nervously exchanged glances and bad jokes from senior staff. A smartly-dressed James Whatley eyed me reproachfully. But the guys managed to hold it


together for long enough to usher us in to a conference room and ply us with pastries. We were not asked to sign an NDA, but we were asked not to record anything that happened in the room.


Ironic, really – and the reason that Ewan MacLeod from _Mobile Industry Review_ declined the invitation. CIO Rob Wheatley took us through a technical explanation that, while honest about the


existence of human agents in the process, didn’t give away as many secrets as he made out (between the four of us, there wasn’t much we didn’t already know), before leaping to what we all


came for: the demo. The big technical question surrounding SpinVox – the one they refuse to answer (as they did again today) – is what proportion of the messages they process are seen by a


human being. It’s the one sticking point that has fascinated journalists and customers alike. But SpinVox are staying quiet: all they’ll say is the proportion varies from country to country


and from carrier to carrier. So what happened in the demo, and what can we infer from it about those proportions? The demo was performed in a standalone test environment, which had only four


processing cores – as opposed to the main system’s 800 or so – and was not connected to the wider network. I saw no evidence that what we saw was “a set-up” or “prioritised” demo and I have


no reason to think it was (you’ll see why in a minute). We began with a short, simple message, read by Rob Wheatley himself and called in from his own BlackBerry. The system spat out a


perfect text version in a few seconds. Next Wheatley left something a little more complex. A few sentences this time. Again, a perfect and speedy result. But then, both messages were


straightforward and they were left in a loud, clear voice at a leisurely pace in a quiet room. You’d have been worried if the system _hadn’t_ got them right. It was then my turn to try. I


left a message, at a brisk speed, that included my full name, the word “TechCrunch” and an invitation for the “recipient” to call me back. I believe that the message was a reasonable and


realistic approximation of a real-world message, albeit with a few strange words in it. The SpinVox system failed to convert the whole message – ok, so most humans can’t spell Yiannopoulos –


and passed it to a human “agent” (who was sitting in the room with us). Here’s where it got ugly. From observing the “tenzing” process in action, it was clear to us that the system had


failed to pick up a single word in the message correctly. The agent in the room had to listen to and _manually type the entire message_, from beginning to end. SpinVox has previously claimed


that agents do not get to hear entire voicemail messages; only enough to give context and enable transcription. That’s not what I saw this morning. Spinvox’s people were quick to point out


that British English is actually SpinVox’s worst performing language. According to them the system is much better at US English and Spanish. But if all we have to go on is today’s demo


(given SpinVox’s refusal to give any indication of how many transcriptions involve human agents), then it’s hard to escape this implication: that the vast majority of messages left in


real-world conditions (like beside roads and in cafes) and containing more than “Hi Jim, can you call me back? Cheers, Bye” are processed to some extent by a human being. The aim of the day


had been to show us how the technology works. First of all: it didn’t, beyond transcribing a simple message in a quiet room. But secondly, and more importantly, that’s not actually what


people want to know about any more: since SpinVox refuses to go on the record about the level of human involvement, the media will be left having to continue to speculate about that number,


and no doubt investigating it as well. The sorts of question we _did_ want to ask included: * “Do you have a Chief Financial Officer?” They don’t, and – astonishingly – haven’t had one for


eight months. This is a company with hundreds of millions in venture backing. * “Are your investors comfortable with your CEO being paid over half a million pounds a year when the company


doesn’t even break even yet?” * “Does the fact thatYOU JUST ACCEPTED ANOTHER CASH INJECTION OF £15MILLION mean – given the date of your previous round of investment – you’re burning through


£3million a month?” In fact, that last question _was_ asked by Orlowski during a brief but fiery cameo appearance by CEO Christina Domecq, during which she revealed the new cash injection


(again, I wish we’d not been prevented from recording video). Domecq obviously didn’t appreciate the question, and retorted angrily that SpinVox is spending “much, much less” than that per


month. She added that the company expects to break even this month and become cash-flow positive very shortly after that. What else did we learn today? For one, that SpinVox is taking


security very seriously these days. “As we’ve matured as a business,” said CIO Rob Wheatley, “Our relationships have matured. Our QC houses [the third-party processing centres contracted to


process SpinVox’s messages] are very professional environments in areas where it’s seen as a very good job to have. Our agents are very proud of what they do.” Proud, perhaps, but apparently


not proud enough to be trusted with the Web: we were told that agents’ computers have no web browser (in fact they have no software installed besides the tenzing application and an


anti-virus package) or USB ports; agents cannot take cameras or phones into the office; they have to wear ID and uniform at all times and there are background checks into all recruits.  Ok,


that’s reassuring. After all, they’re listening to our voicemails all the time so you’d hope they couldn’t just email their friends the contents or post them online. But it’s clear: Although


the Spinvox denies it’s in trouble and says it is poised to break even, it’s still burning masses of cash, hence the latest injection. And it must surely be clear now that vast majority of


messages are seen by human eyes. So what does that suggest? It suggests that after five years of operation, after processing 130 million voicemails, Spinvox can only handle relatively simple


messages spoken in quiet rooms. And they have not reduced their call centre operation in the last 5 years as the system got “smarter”. If anything they’ve scaled up their call centre


operation to deal with the contracts they’ve signed with carriers. If they were a normal call centre business, a cash business, then that would be fine. But this is a company that


effectively claims that at some point, as their voice recognition gets better, the human element will be substantially reduced and the VCs will be rewarded with a business which scales


massively. That was not what was suggested by today’s demonstration, which ultimately calls into question the entire Spinvox model.