Ultrasonic communications for the MEGAphone: Testing speaker and microphone performance

This might sound like a rather strange thing to be working on, but there is a reason. The NLnet Foundation has taken an interest in the MEGAphone as a secure and "sovereign" communications device, that can play a role in civil society. This largely aligns with what I have previously said about the need for such self-sovereign communications systems in the face of the coming digital winter.


In short, NLnet have kindly agreed to fund a body of work on advancing the MEGAphone, and by implication, advancing the MEGA65 project. This means that I will be spending a considerable amount of time in several pieces between now and early next year working on the MEGAphone. This means I am going to be spending more time working on the MEGA65 as a whole than I otherwise would have, so it's a positive all round. But I understand that some folks are only interested in the MEGA65 as a retro-computing platform. In which case, feel free to pay less attention to these posts. That said, they will still be very much focused on fun retro development, and solving very much real world problems with such a system. After all, its not every day that someone makes an 8-bit smart-phone that can communicate via ultra-sound!


If borders open, and international travel becomes feasible again, this will be done in Darmstadt with the rest of the MEGA65 team, but until then, I'll continue to work from here, since there isn't really any alternative.


Speaking of the effects of COVID19, this has also slightly tweaked the work plan, as NLnet are keen for us to look at how the idea of a sovereign and fully open phone can be used to produce a more privacy-protecting and less error-prone form of contact tracing. In particular, they are interested to know how practical implementing near ultra-sound communications would be, to see if it makes sense as an alternative to bluetooth-based proximity detection. Ultrasound has some nice theoretical advantages, like not working through walls or other barriers that are likely to also be effective barriers against virus transmission. Thus it has the potential to reduce the false-positive rate.


Exploring ultra-sonic communications is something that I have been wanting to do for a while, and was already on the plans fora the MEGAphone as a means of resilient communications. For those who remember when I first talked about the MEGAphone, they might recall that it already has a bunch of weird communications modes available, including an IR LED that can probably turn TVs off from 200m away. It also has not just one microphone, but an array of 4.


The microphone array was intended for making it easier to to cancel background noise, and detect the direction a sound is coming from, so that speaker-phone mode could do various party tricks with it. However, having read about ultrasonic communications around that time, I chose microphones that are sensitive well into the ultrasound range. So, its quite possible that we may be able to achieve ultrasonic communications with the MEGAphone. And that is the first milestone in the NLnet project:


1. Assess near ultrasound capability of existing MEGAphone prototypes.

There already exists several revision 1 hardware prototypes of the MEGAphone, that form the basis for forward activity on the project. These include MEMS microphones and amplified speaker output functionalities that have the potential for near ultrasound communications. The purpose of this sub-task is to assess the feasibility of this, and gain an understanding of what may be possible. It is entirely possible that this will prove infeasible with the current hardware, in which case the reasons and potential remedies for this will be considered for inclusion in a future revision of the MEGAphone hardware.


Milestone(s)

Examine the existing components of the MEGAphone r1 and r1b prototype hardware, in particular the MEMS microphones and speakers, to determine their theoretical suitability for near ultra-sound communications.

Determine the ultrasonic frequencies and bandwidth that are likely to be possible to use, and consider the constraints that this is likely to place on any protocol designed to use this facility.

So my first goal here is to look at the speakers and microphones in the MEGA65, and, and see what their theoretical properties are, and whether they have the potential to be used for ultrasonic communications, and if so, what frequency bands we expect to be able to use. Of course, we could also consider dedicated ultra-sound communications, but the goal here is to use existing speakers and microphones, rather than increasing the bill of materials of a phone. So let's get started:


Microphones

The MEGAphone R1 prototypes have four MEMS microphones. As physically very small structures, they have a natural resonance that is well into the ultrasonic frequencies. The ones that we are using are the SPW0690LM4H. According to Table 2 of that datasheet, they have a Resonant Frequency Peak at 26KHz. Also, Table 2 tells us that at 15KHz they are 3dB more sensitive than at 1KHz. Thus we can expect that they will likely be competent up to probably around 40KHz or so. Figure 9 provides some more information, showing the frequency response:



Here we see a peak at 26kHz as promised. We also see that sensitivity is quite reasonable all the way to the 80kHz limit they show, and in fact from ~72kHz the sensitive is above the sensitivity of the acoustic range. So while somewhere around 26kHz would be ideal, with a benefit of close to +20dB versus the acoustic band, any ultrasonic frequency up to at least 80kHz should be usable. So the receive side is not likely to be a problem.


Speaker

The speaker in the MEGAphone is a CMS-40504N-L152. As this is not on the PCB, we can in principle easily swap it out for another. However, hopefully we won't need to do that, because these speakers are simply fantastic for the MEGAphone. They are super-loud and take up to 2W for loud ringing and playing games, and have good frequency response thanks to their relatively large 40mm diameter. And for all that, they are only ~5mm thick.


Here we have a less positive prospect: It claims a maximum frequency of 7kHz. This will presumably be the maximum frequency with reasonably flat response, above which it waill presumably roll off. Digging through the datasheet we find:



Well, this is much better than feared. Yes, above 7kHz, it is 10dB below the lower frequencies. But it isn't a flat roll-off. Rather there are a couple of interesting peaks, the first at ~9kHz, and then a second at what looks to be ~18kHz, after which there is a similar big drop-off. Unfortunately, they don't show any higher frequencies on the table. This is a bit unfortunately, as 18kHz is still audable for some younger people (I can still just hear 17.4kHz, and maybe a bit higher, and I am in my 40s).


But I am just about willing to bet that there is another peak at ~3x9kHz = ~27kHz, which would nicely coincide with our microphone's peak sensitivity. If possible, that would be a particularly effective combination. The question is whether the speaker is still loud enough, and whether we would start to get audible distortion. This will require some experimentation.


But to summarise: 18kHz should be fine, and if we can confirm it, ~27kHz has potential. 27kHz also has the advantage that it is safer at high volume levels than frequencies below 20kHz. There is still the residual problem that some animals are quite sensitive even around 30kHz, cats in particular (although it looks like tuna fish won't be bothered in the least).


Either way, the frequency response around these frequencies is a bit sharp, so the bandwidth available is likely to be quite narrow -- perhaps less than 1kHz. This will affect what wave-forms we might try to use for a communications protocol. For example, frequency shift keying might be problematic, because allowing for variation of peak frequency resonance among the speakers and a bit of Doppler shift etc. Similarly chirped spread spectrum would be problematic, due to the very narrow frequency band available for the chirps.


Although for contact tracing, we can probably assume that if you are moving past someone fast enough to cause a large Doppler shift, you are probably unlikely to catch anything from them -- unless perhaps you are on a fast merry-go-round near a stationary person for a long time. Thus we can probably ignore the problem of large Doppler shifts due to high velocities.


Would other speakers be any better?

There are certainly other speakers that have higher rated frequencies, and thus might be capable of higher frequencies. We'll examine that a bit later, if it turns out to be necessary. This is because we need only a few metres range, and if we can achieve this with the existing speaker that is optimised for our other needs, then there is no point changing it. To determine that, we need to consider out total link budget, similar to if we were using radio.


Link-Budget Estimate

Lets now think about what the link budget at 18kHz and 27kHz would be. We will start by examining the microphone sensitivity and maximum speaker volume, and then subtract for the free-space path lost as the ultrasound disperses through the air.


According to the speaker data-sheet, we should be able to produce a signal of ~94dB SPL (sound pressure level).


The free-space path loss through air is relatively minor for the frequencies and distances that we are concerned about. According to this calculator, the path loss will be <1dB per metre. However, the total path lost due to the inverse square law etc, is more like 20dB over 1m, ~34dB over 5m and ~41dB at 10m


Finally, we have the sensitivity of the MEMS microphone to consider. Here I am having to stretch my understanding of the data-sheet -- so if anyone reading this spots any errors in my reasoning, please let me know, so that I can fix it.


The MEMS microphones indicate a signal-to-noise ratio of 82dB for near-ultrasound frequencies (around 20kHz) with a 94dB SPL (sound pressure level) sound source. We know that this should improve as we approach the resonance at 26kHz, so we will assume that we can detect down to 94 - 82 = 12dB.


Pulling this together, at a range of 10 metres, we should have a link budget of 94dB SPL - 41dB - 12dB = 41dB. If we allow 20 or 30 dB for multi-path interference and all the usual horrors of propagation, we should have a signal with an amplitude of 10 dB to 20 dB, i.e, 10x to 100x the background noise level. In reality, this might be somewhat worse due to interference from ambient noise sources, and muffling effects, such as having the device in your pocket or bag. That said, if we are aiming for 1.5m instead of 10m, there is better than 10dB margin to be gained. Also, all of this assumes an omni-directional sound source, which will not be true -- but it is a starting point.


However, overall, it seems like we should have a positive link budget at the end, with an SNR of somewhere around 10dB to 20dB. So for now, we don't need to immediately try a different speaker.


Constraints on Protocol Design

So it sounds like we should have a signal of a few hundred Hz bandwidth, and with a link budget of 10 -- 20 dB. Within this constraint, there is still considerable freedom for selecting a solution. Optimising that would require considerable effort and expertise beyond what we possess, or what is in this case actually required: This is because contract tracing requires only very modest data transfer rates. Each beacon can potentially be only a dozen or so bytes long, and need only be sent every minute or so. Given most guidelines are for 15 minutes of close proximity, this would allow significant redundancy in the beaconning, to help reduce the probability of false negatives.


If we allow for 15% channel efficiency for a simple ALOHA approach, and up to 100 people within range of each other at a time, this means that each beacon must consume less than 0.15% of each time step. For a 1 minute beaconing interval, this corresponds to an air-time of 90 milliseconds. Assuming we need a 32 baud synchronisatiaon preamble and a 64-bit random token, this would require a data rate of ~96 bits / 90 millisecond = ~1kbit / second. This seems totally achievable within the expected channel characteristics.


If the token consisted of a 48-bit unique token and 16-bit CRC, this would provide for robustness in the protocol, while still maintaining a very low false-positive rate. We can use the Birthday Paradox to compute the beacon collision rate with 2^48 tokens and not more than 2^32 simultaneous users of the system: The probability of any two users using the same beacon at the same time would be ~1/(2^(16/2)) = ~1/256. Thus we would expect of the order of 10^1 colliding beacons per day, globally. Assuming that less than 1% of users of the system test positive for the virus on any given day, this would result in ~10^(1-2) = ~1/10 false positive situations per day, i.e., of the order of one person per week being erroneously told that they had been in contact with someone with the virus. This is almost certainly below the noise floor of the viral testing procedures and various other factors. Thus we accept this false positive rate.


As can be read below, a channel bandwidth of around 1KHz seems quite possible -- provided that the other technical barriers can be resolved. Thus while careful protocol design and implementation would be necessary, the channel bandwidth would not impose any particularly troublesome constraints on the protocol design.


In fact, the very short range of communications that would be likely realised would result in rather more relaxed constraints than described above. This could allow for a reduction in data rate and/or longer packets containing more information, which could be used to effectively eliminate the remaining false-positive rate.


Conclusion and Recommendation

For the full detail on how the following conclusions were reached, read the "Appendix: Experimental Verification" section below, but the TL;DR version:


1. It is possible to perform bi-direction near-ultrasonic communications using the existing hardware of the MEGAphone prototypes.
2. There is likely sufficient bandwidth to implement a realistic contact tracing facility.
HOWEVER
3. My suspicions about the real-world practicality of this led me to go beyond the scope of the milestone and actually test the components, which identified several important factors, including:

4. The audible artefacts, limited communications range, and by implication, the power budget, of near-ultrasonic communications in this context is rather problematic.

5. It would seem that for all of its problems, Bluetooth is a better solution, due to its vastly superior performance in terms of range, lack of audible artefacts and existing wide-spread deployment.
THEREFORE
6. I do not recommend further pursuit of creating a near-ultrasonic contact tracing system using mobile phone hardware at this point in time. This conclusion does not impact on the creation of bespoke devices that avoid these problems through careful design.
IN ADDITION
7. While not well suited to a contact tracing use-case, we have established near ultra-sound as a quite feasible means of digital communications using off-the-shelf mobile phone parts. This may be of assistance to people operating in areas where electromagnetic communications are denied or heavily surveiled.


Appendix: Experimental Verification

It's all well and good to say that the above should work. But as we know, saying should in computer science usually means "won't despite all expectation". Or as my German friends like to remind me "eigentlich ist die stärkste Verneinung", i.e, "Actually" or "In Reality" is the strongest contradiction. Thus we want to make sure that we are not barking up the wrong tree, and want to at least observe and characterise the performance of the MEGAphone's actual near ultra-sound performance.


My approach here is to produce an ultra-sound tone using the speaker, and then pick that tone up using the microphone array. This will done on the MEGAphone using some software I have written for this purpose.


The first step is to make sure that the MEGAphone microphones really are sensitive to ultra-sound. For this, I used a tone generator app on my boring Android phone to make sure that I can pick it up. The MEGAphone R1 prototype has one dead MEMS microphone, so I had to fiddle to pick one that was working, and that was conveniently located where I could get to it. This let me test that I could receive a 20kHz tone. The signal to noise ratio (SNR) I didn't measure, because I don't know the actual SPL loudness of the phone's speaker that frequency, or what funny filtering the Android phone has. I did discover that my daughter can hear upto 19kHz quite fine, when she started to complain while I was testing ;)


So now the next step is to modify the MEGA65 test programme I wrote so that it can also produce the tone. This will be fairly easy, as the MEGA65's audio output is continuously integrated using a pulse density encoder (PDR), so sample rates up in the ultrasound range shouldn't be a problem.


Also, running the CPU at 40MHz gives us plenty of horse power to do this -- or at least I hope so. I will run a non-maskable interrupt at some multiple of the target frequency, driving a small sine curve table of values at maximum volume.


The interrupt loop will need to be quite tight, but it should be okay. Something like:


sampleplayernmi:

   PHA
   PHX
   LDX $FD
   LDA ($E0),X
   INXaa
   TXA
   AND #$0F
   STA $FD
   PLX
   PLA
   RTI

That will take something like ~35 cycles, including the interrupt entry and exit, plus a few cycles jitter while the CPU finishes whatever instruction that the CPU was executing when the interrupt is triggered. So close to 40 cycles = 1uSec. Thus our over-sampled frequency can be up to 1MHz. If we have 16 entries in our sine table, then this allows up to 1MHz / 16 = ~64kHz as our maximum effective frequency. That should be sufficient, since we only need 1/2 that. Thus about 1/2 the CPU time will be spent on the interrupt, leaving the other half to read the microphone samples and visualise them.


Ideally we would synchronise the sample reading with the writing, as this would give us a display that is always synchronised with the tone we are transmitting. So I might rework the above to read and stash the microphone samples, as well. That will eat more CPU time, though, so I'll probably have to optimise it carefully. It might be, that instead of using an NMI to drive this, that I just have a tight loop that does both, and interleaves this with doing the visualisation. The goal here is not something that is perfect, but rather, something that demonstrates that we can produce and receive a signal.


As I was thinking about it overnight, it occurred to me that I might just be able to use the SID chip implementations in the MEGA65 to produce a tone at the target frequency. This would avoid all the CPU timing complications. The catch is that the SID doesn't do pure sine waves, but only triangle, saw-tooth and square wave waveforms.


The problem here is that this means that it will introduce harmonics. I don't know if the harmonics will only be at higher frequencies -- and thus not audible -- or whether there will also be audible harmonics on the lower end. I'm not a signal processing expert, but my intuition and little bit of radio experience tells me that we can expect at least hetrodyning from reflections. That is, the signal will mix with its reflections, and this will produce the sum and difference of each frequency present in the signal. If any of those harmonics are loud enough, it could produce an audible artefact. The good news is that this is, by definition, testable. So I can live with that.


So, coming back to producing tones with the SID chip, I need to find out what the maximum frequency that the SID can produce is. Fortunately I have done a bit of mucking about inside the VHDL SID implementation we are using, so I know, for example, that it internally uses a clock in the 10s of MHz to generate everything, so tens of KHz shouldn't be a problem. Also, the SID chip's frequency generation formulae are well known, so we should just be able to work out the correct register settings, and again, just try to produce something.


Let's start with working out the register settings... and here we hit a snag: Although the internals of the SID chip are capable of much higher frequencies, the registers for the frequency generators can't go above about 4KHz, because the 16-bit frequency values don't offer enough dynamic range.


Back to the drawing board then. The 16-bit digital sample registers on the MEGA65 will be much more effect, and allow us to generate close to a sine-wave, as previously discussed. The only problem is feeding them fast enough. It might be possible to make a really tight CPU loop to do this, but it will be a bit of a pain for playing the tone while also processing the incoming signal.


What I am thinking for a solution to this is to add Amiga-style intelligent audio DMA logic, so that I can point the machine to a sample table, and have it play the sine curve over and over at arbitrary sample frequencies. It will also be handy for the MEGA65 retro-computer in any case. For this, we will need the following information for each channel:


1. Base address of the sample data (28-bit address)

2. Length of the sample (16-bit number of samples)

3. Sample frequency

4. Volume level (maybe)

5. Flag to select 4, 8 or 16-bit samples (maybe)


Those last two are maybes, because they aren't strictly required, but will give more flexible output and make more effective use of memory, respectively.


The next question is where to locate them in the system. Unlike the Amiga, the MEGA65 has the CPU always being the bus-master. This means that it will end up in the CPU one way or another. There is a DMA controller in the CPU already, and that could be hacked to provide the means of setting things up.


It also actually reminds me of another big problem for the MEGA65 at least: DMA jobs on the MEGA65 are not interruptable. This means that if I implement this sample playback method, the sound will pause whenever a DMA job is running. With a bit of clever pre-fetch and buffering, I can probably hide this problem for all but the longest running of DMA jobs. Given a CPU frequency of ~40MHz, and the maximum 64KB DMA copy requiring ~130Kcycles (or ~260Kcycles for swaps, when they are implemented), that corresponds to about 7 milliseconds. If the programmer were disciplined, and broke the DMA jobs down into 1KB pieces, then we can get that figure down by close to two orders of magnitude, to ~0.1 milliseconds, which corresponds to a sample frequency >8KHz. That sounds like a reasonable situation for now, and I can always make the DMA jobs interruptable by the audio data fetch sometime down the track.


Anyway, all of the above means that a buffer of even just one sample per channel should allow decent audio, even with DMAs running. If I make the buffer even just 4 or 8 samples deep, then much higher frequency audio should be possible -- allowing us to produce ultra-sound while running any little DMA jobs that we might need in the test programmes I will need to write.


Since I've just talked myself into using the DMA controller, I might just make the control for this via unused extended DMA job options, rather than having more memory mapped registers. It also means that setting up a particular sample will be easier in practice, because you will just be able to trigger the relevant audio DMA job, rather than having to stuff a pile of registers. I'm liking this plan even more... except it will be a pain for freezing. So I'll have to find some spare memory-mapped register space, after all. Oh well, it sounded like a great idea.


Time to get cracking: First I need to setup the data structures, and create the memory mapped registers to access them. Those are $D720-$D75F, with $10 registers per channel. To that I have added all the behind-the-scenes plumbing that does the sample fetching, volume control and mixing. The whole setup is not over complicated, but simulation is still the best way to ensure correct behaviour. And that's where the problem has arisen...


When I simulate it, GHDL is not incrementing various counters in this new stuff correctly. This seems to be because it thinks that lines are being cross-driven. The trouble is, all the inputs to the calculations don't seem to have any undefined or other invalid values. This led to a whole rabbit-hole of investigation that took several full days to explore, trying to get backtraces working in GHDL, so that I can see where the meta-values are being produced, as well as dealing with GHDL crashing during compiling the code for synthesis, among others.


After various adventures, I got GHDL built using the GCC back-end with at least rudimentary back-trace support. So now I am working through the process of finding and eliminating the meta-value problems, so that I can hopefully get to the one that is causing my simulation problems. Simultaneously, I have been making small incremental changes to the VHDL and synthesising, to progressively inch towards making the audio DMA stuff actually work. The whole idea of simulation was to avoid this slow process, but here we are with both both running neck and neck, as to which will yield results first.


One thing I realised I needed to add, is a mechanism to prevent the audio DMA from hogging the bus. To do this, I have implemented a hold-off timer, that prevents any two audio DMA cycles occurring with less than 8 cycles between them. Combined with the fact that the DMA cycles cannot interrupt a CPU instruction, this should allow for reasonable processor performance, even when the DMA rate is set reasonably high. Also, it means that we should be able to get around 2 million audio DMA operations per second. Since we need only one audio DMA channel for the ultrasound, this should be plenty.


Well, that all took longer than intended, and I'm not yet sure that it is all 100% correct. There are some niggling CPU timing problems, that mean that the audio DMA only works reliably when the CPU is set to 40MHz, and not in the Hypervisor. More precisely, it probably will get upset if the CPU is running in anything that is not the main memory. Anyway, for now, those are reasonable limitations that I can work within.


The audio DMA system works by adding a 24-bit fixed-point fractional increment to the sample counter. When it reaches 1, then its time for a new sample. This means that the sample rate will be CPU CLOCK SPEED * FRACTION. If for simplicity, we call the fraction a simple 24 bit integer, and substituting the CPU speed of 40.5MHz in, we get:


SAMPLE RATE = 40,500,000 * (SPEED / 2^24)


And thus by rearranging, we can find the SPEED value required for any particular sample rate:


SPEED / (2^24) = SAMPLE RATE / 40,500,000


SPEED = SAMPLE RATE * (2^24) / 40,500,000


SPEED = SAMPLE RATE * 0.414252


This means that we can achieve sample rates all the way up to the CPU clock speed, and down to about 3Hz. In practice, it is limited to around 1 - 2MHz due to the bus saturation limit, and also because the audio cross-bar mixer effectively places an upper-limit on the sample rate that will come out the speaker. If the audio cross-bar is limiting the frequency too much, we can make a bypass for that, so that we can increase our upper frequency limit.


For a little test, I have made a 16-entry 8-bit Sine table, so that I can produce a pure tone for calibrating frequency and testing the volume at varying ultrasonic frequencies. It sounds generally ok, but even I can hear that it isn't a pure tone. So I might increase the sample count and go to 16-bit samples, so that it sounds better.


Well, that didn't work. I even tried it on the MEGAphone prototype, in case it was the audio output circuitry. However, fiddling around, I did discover something important: The distortion changes based on the code the CPU is running. I even made a little loop that confirmed that the opcode of an instruction is ending up in the audio stream, by changing the bytes I am playing in the sample loop, and finding the silent point occurred when the bytes all matched the opcode of the loop. More investigation reveals that it isn't just opcode bytes that can show up, but seemingly any CPU memory access. Also, I was seeing that just occassionally the CPU would pick up a byte from the audio DMA data, and jump off into lala land as a result.


This is rather annoying and a bit worrying, as it means that there is a bigger problem with bus timing than I had expected. I already knew that I had to be careful with not allowing the audio DMAs in hypervisor mode, and only at 40MHz because of funny business. And it now seems that this problems are much more significant than I had hoped. But because they don't show up in simulation, they are a bit of a pain to track down.


What I might do here is a bit of a pragmatic solution, and make the bus wait an extra cycle when starting an audio DMA so that the value we want REALLY shows up, and then allow the bus to settle for a cycle back on whatever the CPU was asking for before the DMA. Another slightly more elegant solution would be to use the dead read wait states in the CPU. But that is rather more complex to implement.


Taking a look at the synthesis logs, it looks like the audio DMA stuff has pushed the tolerance for timing closure on the memory controller in the CPU out the window -- which would explain just about everything. The trick is how to simplify things back down, so that the logic becomes shallow enough again to get closure.


It looks like it might be easier to use the read wait-state cycles after all. Those were all reading from address $000002, so they were easy to find. I've now gone through and refactored all the audio DMA scheduling into a more generic "background DMA" framework, where the sample LSB and MSB reads for the four channels are considered 8 separate DMA targets. It also means we can add other interesting background DMA actions in the future.


For not the first time in this adventure, I have had something that ALMOST works, but not quite. Sometimes it would simulate, but not synthesise due to multiple drivers, or there would be funny corner-cases where it wouldn't correctly realise when the shadow RAM was reading the correct location for the background DMA activitity. The whole interplay for reading the shadow RAM with effectively zero waitstates is just a pain, but we have to work with it for now.


I think I now have it so that whenever the shadow RAM is not being used, that the CPU does a background DMA read, and correctly latches the data into the audio registers. While I was doing all that, I also overhauled the audio mixer to use signed samples the whole way through, so that mixing the audio can be done more easily, and without introducing DC biases like unsigned samples do.


It now runs under simulation, with the background DMA reads happening when the CPU reads from IO, or has a wait-state: Basically it makes the background DMA activity the default, and it is only if a different address is presented to the shadow RAM bus that it does something else. So the moment of proof comes now, while I wait for it to synthesise again...


And again. And a few times after that. But I now finally have a nicely working DMA audio engine. In fact, its now good enough that it can play Amiga MOD files with a crude little tracker that I wrote to test it. Which actually works quite nicely. It can already play a reasonable variety of MOD files, but doesn't yet support most of the MOD effects -- only tempo and instrument volume.


But it does already support repeating samples, too, since I need that for the ultrasound testing, because it would be a pain to have to keep feeding sample data in. Instead, I can just play a 32-byte sine wave loop indefinitely. The result is this:


From there, I have been working on a little test programme that just plays a continuous sine wave tone, with variable frequency. The Audio DMA can, in theory, play a sample every tick on the 40.5MHz CPU clock. However, in practice it is limited to about 1/16th of that, i.e., about 2.53MHz. Still not bad. Our sine wave sample is 32 bytes long, so that means we have a maximum of around 80KHz. This is comfortably well above our target range of 20 - 30KHz -- and remember that this is not the frequency at which it can play a horrible square wave, it is the frequency at which it can play a pretty nice sine wave -- and all this without the CPU needing to do anything once we set it going. So I'm pretty happy that we have the audio sub-system in place that will let us produce ultrasonic frequencies.


Now, back to that test programme: It plays the 32-byte sine wave in an infinite loop, and lets us vary the frequency and volume. It also shows a nice little oscilloscope display of what it thinks it is playing:



Here we can see it set to ~40KHz, above what we need, and the 40MHz 8-bit CPU is doing a fine job of displaying this in real-time. It is currently fixed with respect to the sweep time, with the 256 pixel wide display corresponding to about 130 microseconds as a natural consequence of the tight sample reading loop. It took a little bit of fiddling to get the programme right, though:


First, I had to synchronise the sweep to always start at the beginning of a loop of the sample. This helps to hold the display with a fairly steady phase.


Second, I was using DMA to update the display with a double-buffered arrangement I had made for another programme. However, that can't be used here, because foreground DMAs, such as the double-buffer copying, cause the audio DMA in the background to pause. This was causing quite horrible audio artefacts as the tone would be interrupted tens of times per second for a number of milli-seconds. Eliminating the double-buffer and all other foreground DMAs fixed that problem. This is one of the things about this audio sub-system that leave it clearly in the 8-bit home computer class, and not in the multimedia PC class where the Amiga lives: There is no prioritised media DMA slots to ensure stable audio at all times. Demo writers thus get to have fun working out how they can do parts with cool DMA effects *and* still have nice digital audio playing in the background.


Third, even when I fixed those problems, there is still a funny artefact where there are spikes in the audio playback. There are some hints that this might be a glitch when the next sample begins to play. However the audio hardware is so fast -- driving the PDM at 40.5MHz, resulting in ~25ns intervals, makes it hard to capture this reliably on my oscilloscope. This causes wave-forms that look like this:



This problem is manageable, however, as the sine wave is still otherwise intact, and produces an acoustically decent tone. The glitches do produce some audible artefacts when driving ultrasonic frequencies, but this is all at tolerable levels for the test. So I'll worry about fixing that later, unless it does turn out to be problematic.


The next step is to add the reading of the microphone data. Hopefully this will go smoothly, as I have already established that the microphone is sensitive well past 20KHz, and according to the data-sheet, all the way up to the maximum 79KHz that we would realistically generate.


Getting the code running on the MEGAphone prototype was easy, as it is fully compatible with the desktop MEGA65 that I have been developing it on. The only trick was I had to remember how to control the amplifier on the MEGAphone for the speakers so that I can get the volume loud enough for testing. This is controlled by $FFD7035 and $FFD7036, where $00 = +24dB, $40 = 0dB and $60 = -24dB and $FF = mute. It was set to $60 by default and was way too quiet.


I did some initial testing with $00 (+24dB) and it seems to work, although I think it might send the amplifier into over-current shutdown after a while. That would be ok for this application, as we only need a very low duty-cycle. What is more of a problem is that at +24dB there are a lot of audible artefacts which I will need to investigate. Once the kids have gone to sleep I'll start experimenting again to see if +0dB is loud enough to be detected at a decent range.


So, in terms of initial tests, I'm initially testing only at very close range, and having to deal with some distortion caused by a bug in the audio DMA subsystem that I have yet to figure out: Basically if the CPU is running, then the waveform is distorted. I'll likely deal with that by implementing a special "play sine wave" mode for the audio subsystem, so that we can side-step the whole DMA thing altogether. But its still good enough that we can work out where the resonances are that are good enough to receive anything. So:


There is a peak between about 20.0KHz and 20.2KHz, that ramps up more slowly on the low side, and falls off quite quickly on the high-side. Then there are also a few peaks in the 25KHz to 30KHz range, but they are lower than the 20.1KHz centred peak. Above 30KHz I haven't seen any signs of life as yet. But this DMA distortion problem is getting worse, the higher the frequency, so I think I need to implement the "play pure sine wave" function, and then test it again. But I must say: I'm pleasantly surprised that the speak is in fact producing any energy at all above 20KHz. Whether it has enough range, and whether it can do so with quiet enough audible artefacts remains an open question, though.


I implemented the sine table ROM to avoid the distortion. There are still audible artefacts, presumably because the MEGA65's audio output is PDM, i.e., lots of 1s and 0s rather than a true sine wave. As side-effect of fixing this is that the peak response shifted down from ~20.1KHz to around 18KHz. There is some response up around 26KHz-30KHz, but it is quite weak.

It looks like the next step required will be somehow filter the signal, so that we get closer to a true sine curve. The data-sheet for the amplifier suggests using ferite beads on both leads to the speaker, and a 470uF capacitor to GND on the high-side. At that point, though, we cease to be working with the existing MEGA65 hardware without modification, and we are already well past the scope of this work unit.


What I will still do, is attempt to more adequately quantify the range of ultrasonic frequencies that respond, and what the fall-off looks like, so that we can identify the likely usable frequency range for communications. Minor speed-bump, though, in the form of the speaker lead breaking:



Fixed that. Next step was to improve the ultrasound test progamme to show some kind of frequency domain break-down. An FFT would be idea, but a fair bit of work to produce, when I just want to be able to see if there is noticeable energy at a frequency. So I made a simple programme that superimposes the sample train over itself at all possible time deltas, and then measures the energy of the result. If the delay corresponds to a periodicity of the signal, then it will result in a higher amplitude signal. I added a bit of filtering to it to deal with harmonics etc, and give or take a few wrinkles, it produces fairly clean peaks for the frequencies at which energy is present:



The read peaks correspond to the periods at which energy is present, i.e., further left is higher frequency, and further right is lower frequency. The very tall line-like peak on the left edge is an artefact of measuring energy by superimposing waves and looking fo reinforcement, as it will basically pick up when adjacent samples don't zero-cross. Thus it should be ignored. But we can see the next peak corresponds to 1 cycle of the ~76KHz resonance of the microphone. The other peaks come and go a fair bit, and are less robust, and may correspond to some weaker lower frequency enveloping of the 76KHz resonance.


So I think we are finally set to try various frequencies and measure at which frequencies we reliably see energy, and to delineate that set of frequencies as those which are potentially usable for ultrasound communications on the existing MEGAphone hardware, and thus to give us sufficient characterisation of the available channel bandwidth.


I'll work with the amplifier set to $40, which is some way off the loudest setting, and the speaker leaning against the MEGAphone prototype about 1cm away from the phone. If we can't pick up a signal under those conditions, then we'll assume it won't be usable. But if we can, then we can repeat the test with some increased distance and power, and see what kind of range is possible. To support this, I have improved my test programmes to work out the period of the sample frequency requested, and to calculate a running average of the estimated power at that frequency. In this way, I have an objective and quantitative means of comparing the power at different frequencies, even if the units are undefined because of the estimation process. The display with this looks like this:



The light blue vertical line is the period we are looking for. This corresponds to exactly one full wave of the tone being played (visualised by the white points). The received return via the microphone is the yellow points, and the red line is the power at each period.


So we can see visually that there is some power at the target frequency here visually, as there is an observable periodicity in the yellow waveform, which is then reflected by the red peak at that point. The running average is then calculated and displayed in the lighter red coloured text. For each sampling, I will collect 63 samples. I'll also do a quick variance check by doing three successive collections with a single setting, so that we can get an estimate of how reliable the readings are. It was quite pleasing to be able to implement all of this directly on the MEGAphone's 40MHz 8-bit CPU.


Okay, so having batched a bunch of runs at different frequencies, we find the following:


1. Below ~19.4KHz the waveform is very loud.
2. Between ~19.4KHz and 19.7KHz it's not quite as loud, but still very loud, perhaps 1/2 as loud as just below 19.4KHz.

3. From 19.8KHz to 20.4KHz it's about 1/2 as loud again, with relatively flat response.
4. From 20.4KHz to about 21.4KHz it is about as loud as between 19.4KHz and 19.7KHz again.
5. From 21.4KHz it drops off very sharply, with no discernible signal beyond 21.7KHz.


So assuming we want to keep high enough frequency to not annoy people, there is probably about 1.4KHz centred on 20.7KHz that would be usable. As tempting as it would be from a link-margin perspective, dropping down to below 19.4KHz is probably not feasible.


Increasing the amplifier level from 0dB to +12dB yielded a discernible waveform at distances of around 30 - 40 cm at 19.4KHz -- i.e., where the performance of the system is very good. The amplifier can go to +24dB, which would thus be expected to perhaps deliver the ~1.5m range, but at a power consumption during transmission of greater than 2W. It seems rather unlikely that 1.5m range would be obtainable in practice over the whole band, although somewhat shorter range may well be possible, or it may be possible to increase the amplifier power further.


But whatever the frequency and amplifier arrangement, some kind of filtering is going to be required, to prevent the audible artefacts that are present across the whole frequency range tested.


That said, we have clearly proven that the microphones we are using are sensitive to ultrasound. Thus it could be possible to use a smaller speaker with improved ultrasonic properties to further boost performance.


But as I reflect on all this, it seems to me that there are a bunch of problems, that together make me a little sceptical about the utility of such a system in practice:


1. Transmit power of 2W or more is likely required to produce a useful range over a wide-enough ultrasonic band. Even at a low duty-cycle, this will create a significant power consumption.
2. The speakers we use, and most ultrasonic speakers, are rather directional, making detection of proximity rather unreliable. It was quite fiddly to get the results described above, demonstrating that this is not just a theoretical problem.

3. Point (2) is made worse if you have the device in a bag or pocket.
4. The audible artefacts are REALLY annoying.
5. Some people can likely hear up around 20KHz, and even the occasional short chirp of a packet is going to be very annoying to listen to.
6. Bluetooth is already widely available, requires no new hardware, and for all its problems, has much better performance than I have been able to observe here. In particular, the lack of risk of audible artefacts seems compelling in the circumstances.

7. The need for contact tracing apps seems to have somewhat dissipated, at least for the time being.


However, as a further communications channel between devices when faced with a hostile RF environment, it would seem to have potential. In that case, it would likely be possible to mask the audible artefacts by, for example, playing music or other sound while the ultrasonic transfer occurs. In this context the difficulty of creating an ultrasonic signal that can travel great distances becomes a strength, because it means that an effective jamming effort would require considerable proximity. In contrast, 2.4GHz Bluetooth or Wifi are rather easy to jam from a distance.


Thus, while this wasn't the primary objective of this investigation, it has revealed that such near-ultrasonic communications is quite possible using rather conventional components found in smart-phones and similar devices. It also means that from a privacy perspective we must take care, as it is similarly possible for devices to communicate via ultrasound without a user's knowledge, e.g., to exfiltrate data across air-gaps.