I don't feel great about gibberlink. LLMs have got AIs to interact like humans do. Similarly for the multimodal models. gibberlink could evolve into a highly efficient machine communication which leaves humans out of the loop for better/worse. We/it could make it more efficient by applying AI.
If you concern is humans being able to have some oversight/insight, then text is useful but in many cases not sufficient. The models are quite hard to understand and chaotic - what is to us trivial or no difference can change outcomes completely. This is well demonstrated via adverserial attacks etc. There is a lot of potential for stenography also with text, where messages can be hidden in plain sight.
It's probably not slower than words, the rate for English pronunciation is something like 150-200 words per minute only.
That said, the "gibberlink" demo is definitely much slower than even a 28.8k modem (that's kilobit). It sounds cool because we can't understand it and it seems kinda fast, but this is a terribly inefficient way for machines to communicate. It's hard to say how fast they're exchanging data from just listening, but it can't be much more than ~100 bits/sec if I had to guess.
Even in the audible range you could absolutely go hundreds of times faster, but it's much easier to train an LLM that has some audio input capabilities if you keep this low rate and likely very distinct symbols, rather than implementing a proper modem.
But why even have to use a modem though? Limiting communication to audio-only is a severe restriction. When AIs are going to "call" other AIs, they will use APIs… not ancient phone lines.
I assume the long-winded "shall we switch" dialog was more for effect in the demo, but there's no reason why it couldn't hear "I'm an AI" and just send a quick enquiry data burst without having to continue the conversation in English.
The original plan was to develop essential "audio QR codes" that would allow short codes to be transmitted that could be parsed by certain apps and used to drive different interactions.
The core idea was to work with a commercial TV broadcaster in to embed the codes in certain ads, or have it as part of a TV show - so the "listener" would need an active app to handle it.
If it had all gone off well, the eventual plan was to have it be used on a live show where users could also interact. We had some prototypes ready with a native app - but then the Brexit referendum happened - and our company had a couple of clients pull out of upcoming projects - and the company got shuttered.
Turning data into audio is a big thing nowadays with amateur radio.
Ironic that the author overlaps so much with that field, without noticing that they chose the same name as probably the most used amateur radio programmer in the world.
If you're interested, the state of the art is VARA. It's closed source though, so NinoTNC may be a more interesting choice.
I'm struggling to find the protocol for VARA, although maybe my Google abilities are just failing me.l The protocol at least should be openly available according to the FCC
There's an Amazon-backed close-to-$100M funded established company in India called ToneTag which has its use case for sound data transfer for retail payments/etc. I still don't understand how they work from a consumer-use standpoint, but I find it fascinating.
const CHARACTER_DURATION = 0.07; // seconds - balanced for accuracy while still fast (up from 0.055s)
const CHARACTER_GAP = 0.03; // seconds - balanced for accuracy while still fast (up from 0.025s)
https://github.com/PennyroyalTea/gibberlink