ESP32-A1S Guitar Multi-Effect Engine with Microphone and MIDI I/O
Since the release of ESP32 by Espressif, many application-specific modules based on this chip are produced. For example, Ai-Thinker produce image processing module with artificial intelligence support for image recognition. In the audio processing area, one interesting product by Ai-Thinker is the ESP32-A1S module. Because of the integrated AC101 codec inside, this module enable very compact and low cost solution for audio DSP. The most interesting features of ESP32 is its dual core 32-bit CPU with Wifi and Bluetooth connectivity. Since the Wifi connectivity enable OTA (over-the-air) firmware upgrade, releasing any products will be faster because any bug can be fixed later without any product-recall. Now take a look at our tentative circuit’s schematic design shown in the Figure 1.
Hardware Features of The Multi-Effect Engine
The schematic diagram is not perfect yet, as you can see there’s no reset circuitry for the EN pin, and some resistors doesn’t show any values. Before going into the detail explanations, let’s see the features of this multi-effect engine design first:
- Guitar or line/input, support for single-ended and balanced modes.
- Microphone input, support for single-ended and balanced modes, and with configurable bias current.
- Stereo outputs, support for single-ended and balanced modes.
- OLED display module support.
- Up-to 6 control knobs or mode selectors
- Analog expression pedal support (alternative function with one of the control knobs)
- 2 indicator LEDs.
- Rotary encoder with push button.
- MIDI input and output.
- WiFi (for OTA update)
- Bluetooth (for pedal extension mobile app)
ESP32 CPU Power and Its Comparison with STM32F103 in Deepstomp Multi-Effect Platform
Before exploring on what kind of multi-effect pedals we can implement with this design concept, first we will talk about on the processing power of the CPU. ESP32 has dual core Xtensa 32-bit LX6 RISC microprocessor as the CPU. Furthermore, the processor is capable of running up to 240 MHz (up-to 600 MIPS) [see reference 1]. To grab an idea on how fast it can be, let’s compare it with STM32F103C8 microcontroller used in our developed multi-effect platform: Deepstomp. Looking at the CPU clock, ESP32 runs up to 240 MHz and while STM32F103C8 runs up to 72 MHz [see reference 2], that means one ESP32 core is more than 3 faster. By looking at the number of core, then we can assume that ESP32 is more that 6 time faster. While it’s not a really accurate comparison since both CPU are RISC (reduced instruction set computer) and they has different design, but we can still use it as coarse approximation.
To have more insight of what can be done with such processing speed, now let’s see what can we do in Deepstomp platform for the comparison. In the first case, Deepstomp can run the audio processing for noise gate, compressor, distortion, and delay at the same time. In the second case, it can also run the combination of noise gate, compressor, and spring reverb (based on 5 multi-tap delay model). Those cases are some multi-effect combination that works within its processing power budget. From the first example cases, it would get a CPU overload condition when we change the delay into the spring reverb. In the same way, it would produce CPU overload as well when we add a distortion in the second case combination.
After looking at STM32F103C8 capabilities in handling the effect processing, now we can see that one core of the ESP32 is likely capable of handling almost any combination of standard multi-effect processing. Just keep in mind that one core of the ESP32 is about 3 time faster than the STM32F103C8. By dedicating one core for the audio processing, consequently we can easily manage the second core of the ESP32 for handling the OLED display, Bluetooth, and the MIDI input/output.
Possible Effect Types Implementation
With the hardware features and the processing power of the CPU as described before, we can expect many types of guitar effect types can be implemented using this multi-effect engine.
- Standard Multi Effect. Noise gate, compressor, distortion, tremolo, vibrato, chorus, phaser, flanger, wah, auto-wah, delay, and reverb. With its processing power, we believe that almost all common combination of those effect types can be run on ESP32. Please note that some effect types combination would practically never be applied, such as phaser and flanger, or some other combination whichis ineffective or contra-productive.
- Voice Effects. With microphone input, it is possible to implement many type of vocal effect such as band-limited filters, vocal-tuned reverb, auto-tune, anti-feedback, distortion, etc. With combination of distortion and band-limited filter, we can implement many type of device simulator such as telephone line, megaphone, or intercom for example.
- Vocoder. With both guitar and microphone input, we can simulate an analog channel vocoder or even digital phase vocoder which uses the power of FFT (Fast Fourier Transform) computation.
- Pitch Shifter and Vocal-Guitar Harmonizer. The possibility of polyphonic pitch shifter implementation would be very interesting since we can implement some cool effect like whammy pedal, or even a vocal-guitar harmonizer. A vocal-guitar harmonizer would need a chord analysis processing, and the FFT performance benchmark of ESP32 [see reference 3] shows such application is possible to explore.
- Guitar Synthesizer. We are pretty sure that monophonic synthesizer can be easily implemented using this engine, but the possibility of polyphonic synth is also very challenging to explore. To produce polyphonic synthesizer, a polyphonic pitch detection analysis should be done by the pedal. If we can implement this analysis, then the synthesis process can be proceed. If the polyphonic pitch detection can be done on the engine but the synthesis overloads the CPU, then we can possibly just convert the analysis result into the MIDI commands to generate the sound by external synthesizer.
WiFi, Bluetooth, and MIDI for Maximum Design Flexibility
Next, let’s make some list on how the features of WiFi, Bluetooth, and MIDI enable the engine to provide many flexibility for many things.
- WiFi connectivity enable the engine to provide over-the-air (OTA) firmware upgrade. As a result, we can expect many new pedals based on this design would quickly released. Just remember that the biggest obstacle of releasing new product is the possible software-bug, and any product recall would sound ridiculous on pedal product. With OTA upgrade, we need not to worry about such disaster.
- Bluetooth connectivity makes it possible to provide a cheap pedal or quick button extension on mobile application, so the pedal itself can be implemented as small stompbox for the most convenient mobility. Of course we can still provide foot switch and expression pedal extension through MIDI input while providing cheaper mobile app extension alternative via Bluetooth.
- MIDI Input can be used to provide interface to external controller for quick preset button or MIDI expression pedal through MIDI input.
- Other possible MIDI I/O usage, is for implementing many knobs, indicator LEDs, buttons, foot switches, and expression pedal as an integrated standard multi-effect pedal. In order to do this, the MIDI I/O can be extended to provide all the required interface using the system-exclusive message of the MIDI protocol. It can be bridged and buffered internally using a dedicated micro controller, then pass the non-system-exclusive MIDI messages to external MIDI I/O ports.
- The MIDI output can also be used to send MIDI commands if the engine is used to implement a guitar-to-midi converter. If a polyphonic pitch detection can be implemented by the engine, then sending a polyphonic MIDI command should be easily implemented. Otherwise, monophonic pitch analysis should be easily implemented to produce a monophonic MIDI command on the MIDI output.
That’s all what we have in mind fro the moment, any discussion about this multi-effect engine design is welcome. Please share your thought in the comment section below.
New Year Update 2021: New Schematic and Codes!
Based on this circuit design, now a rapid development platfom of digital audio effect processor has been released with the name: Blackstomp. See the hardware and software development details here: https://www.deeptronic.com/blackstomp/
Hi , any progress on this project ? I can work together , it is a very nice idea.
Best Regards
Koray
We are working on the encrypted over the air update and the security things. I might release some basic codes as open source since the hardware is basically open source by publishing here, but not for the full engine codes. The next is exploring some low level setup for the built-in codec chip. There are some codes out there that integrate the codec with ESP-ADF library, but I think we need more low level for this application, so I think we will develop from scratch under ESP-IDF library. What things do you think you can do for this project? Please let me know your experience and interest.
Looking forward to updates!
Now the final schematic, the source code, and the programming manual has been published here: https://www.deeptronic.com/blackstomp/
Humaro, Thank you for your hard work. I may or may not use or modify the PCB but i was looking for a simple framework like this for vocal and audio fx for a musical instrument. I have been looking at the Faust libraries. Seems like they would plug right in to this framework. Have you tried them. I just flashed the code to my ESP32 A1S Kit board, and had a little issue. I set both Right and Left inputs to use Mic (LeftMic1(true) and RightMic1(true), but on the headphone output I only get 1 mic (think it is Lft) and it comes out of both left and right channels, nothing from the other onboard mic. I haven’t dug into the code yet, but, is that due to some settings in the ac101 driver? OR could I have a bad board? Or is the A1S kit wired differently then you PCB? I will dig in myself, but, just wondered if you know.
I downloaded the Blackstomp Arduino GIThub software framework. Also, I notice you are using 32 bitspersample. I might look to see if i can make it 24 bit to save a little ram, if needed. But, will probably process at at least 48k sample rate (96k might be overkill? and there goes the ram again). I want to add eq, reverbs, limiters or gates on mic, maybe even pitch shifting, and on instrument side that plus other effects. I am working on 2 projects, both musical instruments. Both will require 2 ESP32 A1S’s, Both will have 2 separate internal mics/pickups that need to be processed separately, plus the vocal mic, so min 3 channels. A 4th channel could be available for line in for use with wireless mic in. And, 1 of the projects is a very unique keyboard – with real strings – but the keys will also have IR detectors that will give variable level analog outputs that change level with amount the key is pressed, so I will translate those into midi, to make the keyboard dual purpose, with individual key aftertouch – pressure sensitive. So will also need synth processing capabilities (if that requires a 3rd ESP32, no prob since these are inexpensive enough).
Yes the audio kit has different wiring than blackstomp board. Blackstomp board has only has one mic, i.e. mic1. The two methods, LeftMic1 (true) and RightMic1(true) route the microphone (mic1) to the left and right channel. To enable the second mic of A1S audio kit then you have to write yourself two additional functions: LeftMic2(enable) and RightMic2(enable). The bit per sample setting on the ESP’s i2s interface should be set to 32 for the 24-bit codec setting for the fast reading using the standard c++ pointer of 32 bit data type. Moreover, the processing is done in 32-bit floating point anyway.
I’d also go with the FAUST ESP32 build target, given that there are already lots of existing high-quality effects and synth implementations in FAUST both inside and outside the standard library (e.g. many of the Guitarix effects are written in FAUST).
Still waiting for my kit tho.
You state:
ESP32 is likely capable of handling almost any combination of standard multi-effect processing.
Are the libraries for all these effects written, tested and available or only the compiled files?
Unfortunately not available as libraries, you can install the Blackstomp arduino library and see some tested examples there, but they are single effect apps (stereo chorus and tap tempo ping pong delay). Just keep in mind that you can write a multi-effect application inside a single effect application.