turn colors and pictures into sound by using a color scanner in a 3D VR environment

The Synesthesizer is inspired to chromesthesia, that particular kind of synaesthesia that links sounds and colors.

While chromesthesia usually produces color perception in response to sound stimulation, this synthesizer does the opposite: sound is generated according to color detection. More precisely, RGB values are detected (one pixel at a time) and used to determine the behaviour of five different physical models developed with Modalys.

The motivation for creating such a synthesizer arose from the will to generate a timbral continuum out of the color continuum, allowing to explore the relation between color spectrum and sound spectrum. Such solution would also contemplate two additional interpretations:

  • A picture can become a sort of score; graphic scores can find a new source of interpretation;

  • The Synesthesizer is an intuitive sound generation tool; a properly targeted implementation might allow even non-experts to explore the possibilities of a synthesizer.

Numerous color-based synthesizers have been implemented in the past twenty years, ranging from the gaming industry ( as for«Specdrum», 2018, available on Kickstarter) to more elaborated patents (as in P. T. McClard, «Music generating system and method utilizing control of music based upon displayed color». US Brevetto US5689078A, 30 06 1995). The tendency has been usually consisting in mapping different colors to pitches/chords. The Synaesthesizer has been conceived as a tool for exploring timbres.

The Synaesthesizer’s components are:

  • A software developed in Unity for color detection in VR; talking through OSC (Open Sound Control) to:

  • Wekinator, mapping the three incoming RGB values to 35 output values, sent through OSC to:

  • A Max/MSP patch including five physical models developed with Modalys, by using five different colors (one per physical model) as input.

A path is created from the first to the last model (according to incoming RGB data) and consecutive models are connected through cross synthesis. A Neural Network is trained in Wekinator by using five different colors (one per physical model) as input. The outputs control different properties of the physical models (e.g. vibrating modes, high-frequency loss etc.). For the remaining colors of the spectrum the corresponding values of the output are interpolated in Wekinator. The result is a continuously changing timbral landscape.

The current application has been developed in a VR environment. The main reason was to prevent sensor inaccuracy, avoiding the need of calibration in different environments with different lightning conditions. It also prevents from the need to print images to scan: in such case, the resulting colors might be affected by the ink used, as well as by the kind of paper used. Additionally, VR would make the simultaneous use of more than one picture very comfortable.


The Max/MSP interface. It is used only for setting up the values of the five main different colors; after the first setup and the machine learning training, Wekinator controls the patch in real-time.

The analysis of colors can be performed with a simple Raycaster (a line going from a remote towards infinity, detecting things when colliding) intercepting the pixels of a chosen image.