GRANULAR
SYNTHESIS
THE HISTORY OF ACOUSTICAL QUANTA
The notion that all matter is composed of individual elements goes as far as the 5th century BC. The Greek philosopher Leucippus and most famously Democritus were the first to conceive the idea of the atom [atomo - something that cannot be divided]. They suggested that everything in our world, both matter and energy, is constructed either by a combination of atoms or by empty space.
In 1925 the physicist Norbert Wiener made some important observations about the inner properties of sound. In a talk about quantum physics, he used sound as the base to explain his findings [11]. Wiener gave precise details about the relationship between time and frequency. He proposed that to get precise time information about a sound signal, it is only possible in the expense of frequency resolution and also the opposite. He justified his theory by saying that the pitch of a certain note played on a specific time frame, is the result of other micro-time relationships not easily perceivable in our normal time scale.
Although references to sound particles exist from the days of Democritus, not much could be scientifically proven without the technology to accomplish the necessary experiments. In the second half of the 20th century this barrier gradually begin to fade away and the era of granular synthesis of sound essentially started with the acoustic theory and experiments of Dennis Gabor.
[1] DENNIS GABOR AND THE GABORET
Dennis Gabor was a British physicist who also won the Nobel price for the invention of holograms. Besides that, he also suggested that any sound can be decomposed into a collection of function gathered by time and frequency shifts of a single Gaussian particle [1, p.57]. When these function recompiled together, they compose a larger audio event. Gabor enriched Helmholtz’s idea that any sound is represented by an infinite number of signals by giving equal weight in the time function. His theory about acoustical quanta involved a combination of a time-domain signal s(t), with frequency-domain signal s(f). The Gabor’s acoustical quanta characteristics extracted by the analysis of the s(t) signal over a short time window (?t) and the s(f) signal over a short time window (?f) [1, p.58]. From there, Gabor found that high frequency resolution required larger time windows than lower frequencies. Hence, he discovered that it is possible to identify all the specific frequencies of an analyzed sound, but in a short time window it is not possible to say when exactly they have occurred. On the contrary, it is possible to identify very accurately the temporal time structure of an audio event, but without a great amount of frequency precision. This time-frequency analysis procedure of a signal is widely known as the Gabor Transform (GT) and results to the Gabor’s Matrix – an unbreakable time and frequency representation of a signal (figure 1a). At the heart of the GT lays the Gaboret – a signal limited in duration by a Gaussian curve window [9].
To implement his ideas, Gabor made several sound granulation machines (figure 2a). These machines are known as electromechanical pitch-time changers, as they make possible to change the pitch of a sound without changing the duration and also the opposite. They consider being the first audio sampling devices, the development of which is the digital sampling technology as we know it today. Electromechanical pitch-time changers works in the following way: A sampling head spin across a tape or film of an audio recording, but only contact the tape for a very short period at regular time intervals. In Gabor’s machines, the sampled sounds were recompiled into a continuous stream on another recorder in order to playback. By changing the speed of the sampling head, it is possible to change the duration of the sampled signal. By slowing down the sampling head it is possible to reduce the duration of the original signal, while speeding up the sampling head made possible to expand the duration of the signal - in both cases retaining the base frequency of the original sound. To change the pitch of the original recording without change the duration, one would playback the original signal at a different speed and then use the previous time-granulation techniques to restore the duration of the original sound to its original length.

Figure 1a: The Gabor Matrix

Figure 2a: The base for an Electromechanical pitch-time changer.
[2] IANNIS XENAKIS
Gabor’s theory had a profound impact to many musicians and scientists. One of the first was the musician, architect and mathematician Iannis Xenakis. Xenakis applied Gabor’s discoveries in many of his musical works, most noticeable in Metastasis (1954), Concret PH (1958) and Analogigue B (1959) and he is probably consider to be the first musician to work extensively with ‘grains of sound’, a term actually invented by him.
Long before the arrival of digital technology, Xenakis in Analogigue B used analogue tape recorders to record the output of sine wave generators. He then cut these recordings into small fragments of sound and by reassembling these fragments he developed his granular textures. He also proposed a way to organize the grains of sound, based on the notion of screens. Screens represent grains of sound scattered probabilistically in a three-dimension grid of frequency, amplitude and time. By using simple Boolean operations, screens can intersect in order for new screens to be generated. He also defined a degree of order or disorder in a succession of screens – encapsulated in the idea of ataxy (Greek word for disorder). A maximum ataxy in a succession of screens constitutes big changes in frequency and amplitude of the grains, resulting in effects similar to white noise. On the contrary, a low degree of ataxy can represent simpler sound events such as sine waves [1.p66]. To control the degree of ataxy, Xenakis used the idea of Markov chains – a matrix of transition probabilities used to control the flow within different events.

Figure 3a: Score of Xenakis Metastasis
[3] CURTIS ROADS
The American composer Curtis Roads is responsible for many developments
around Granular synthesis and other sound particles synthesis techniques.
Through the years, he has used Granular synthesis in numerous of his composition
such as ncor, field, organic and half-life. He also has developed a method
of Granular synthesis based on sonic clouds (Asynchronous Granular streams)
and he also has developed various software to implements the Asynchronous
granular approach, such as the Cloud Generator. Roads, together with Alberto
de Campo have also developed together the PulserGenerator – a software
program based on Pulsar Synthesis – and the Creatovox a digital
synthesizer for real-time Granular composition. He is also responsible
for a large number of papers and books around Computer music in general,
most famously the Computer Music Tutorial and more recently Microsound
in which he explores the aesthetics and synthesis techniques around acoustic
particles. Curtis Roads currently teaches electronic music composition
at the University of California, Santa Barbara.
CHAPTER 2
GRANULAR SYNTHESIS METHODS
[1] A simple grain instrument.
GS is not driven by the complexity of the grain generator, rather than the complexity of the parameters involved to driven it. A simple digital grain generator could be constructed with a sine oscillator controlled by an amplitude envelope generator with a desirable envelope curve (most often a Gaussian curve). This instrument can be extended to include more waveform (Square, Saw, Sync (band-limited impulse)) or even sampled sounds. A simple digital granular instrument is shown in figure 4a.

Figure 4a: A simple grain instrument.
[2] Grain Duration
Granular synthesis develops very complex and dynamic sound events by combining a large amount of elementary sonic events, the grains. A grain of sound is a very brief acoustical event near or about the threshold of the human hearing. The duration of a grain has a significant value within the GS method. Generally, the grain duration falls into the range of 5 to 50 milliseconds. Gabor’s grains have been estimated at 10 ms. Green [3] has suggested that the human ear is possible to detect sound events as short as 1 or 2 ms, therefore grains with these durations are also possible however we perceive them as a ‘click’. It is possible to change the tone colour of the ‘click’ by changing the waveform within the grain and still make these very short grains useful. With longer grains the perception of pitch can be sustained, while very short grains make pitch recognition rather difficult. Roads [1. Dutch scientist p88] suggests that pitch recognition becomes clearer at about 25ms.
Undesirable effects, such as noise, are caused when the grain duration is lower than the period of the grain waveform. To completely represent one period of a given frequency, the grain duration must be at least equal to the frequency period. At a 20 Hz waveform (the theoretical lower limit of our human hearing range) a grain could be no less than 50ms (1/20Hz) in order for the frequency energy of the 20 Hz to be represented completely and without undesirable effects. Yet, grains with less than 50 ms duration are used and it is possible to eliminate any undesirable effect by using band-pass filters before the final output.
Depending on the GS method used, the grain duration can be either constant (all grains have the same duration), time-varying (duration is changing with time), random (duration is random between upper and lower boundaries) and parameter-depending (duration is set accordingly to the grain’s fundamental frequency period) [1.p101].
[3] Grain Envelope
Each grain has an amplitude envelope that produces a profound amplitude modulation (AM) effect. This modulation results in side-bands around the carrier frequency of the grain, at intervals of the envelope period (1.p101). If the grain’s duration is D then the Fc (centre frequency) of the AM would be 1/D.
Gabor’s original envelope is meant to be a bell-shaped Gaussian curve (figure 5a). He had also suggested line-segments envelopes to save memory space and computation time [4]. Keller and Rolfe [5] found by analyzing grains produced by line-segment envelopes that they have similar frequency response characteristics as with the Gaussian method, with the addition of comb-shaped spectral effects [1p88] [5].
Roads [6] have also used a quasi-Gaussian curve as an envelope for the grains which has a smooth transition and maximize the effective grain amplitude [1.p88].
Furthermore, a band-limited pulse
envelope can be used in the expense of introducing strong modulation effects
or an exponentially decay envelope that has proven effective when using
the process of convolution as a method to extract grains.

Figure 5a: The result of a Gaussian amplitude envelope.
CHAPTER 3 - GS METHODS
[1 ] Introduction.
Granular synthesis can be implemented by performing different sound synthesis approaches. We can divide these in two general groups. Methods based on previous sound analysis and use of a model to extract the desired Granular parameters and methods that do not rely on previous analyses. According to the first approach, a computer performs analyses/resynthesis to model and produce the final sound. In the Granular approach of analyses/resynthesis, the grains act as detectors of the characteristics of the sound in the analyses state, while in re-synthesis state they are used to generate various basic functions to re-synthesize the sound. This is the case of pitch-synchronous GS. In the second approach where an analysis is not required, grains are synthesized using various sound synthesis techniques [7]. As we see later, Asynchronous Granular falls in this category.
We can sub-divide the above approaches into the following GS techniques:
STFT / Wavelet Transforms
Pitch-Synchronous overlapping streams
Synchronous and quasi-synchronous streams
Granulated sampled sound streams
Asynchronous clouds
[ 2 ] STFT / Wavelet Transforms
Mathematical transformations in general, apply to a signal to obtain further information about the signal itself. In 19th century, the French mathematician J. Fourier showed that any periodic waveform can be expressed as an infinite sum of periodic complex exponential functions (Fourier analysis). In 1940s, the development of memory enable computers made possible a digital implementation of the Fourier Transform (FT) but was enormously slow in computation time [1.p244]. In 1960s, the development of a set of algorithms commonly known as Fast Fourier Transform (FFT) greatly reduced the computation time. However, for long duration signals the computation time was still significant large. The Short Time Fourier Transform (STFT) is a digital implementation of the Fourier Transform with a window ‘w’ function applied on it. The window in STFT is fixed and usually take values between 1 to 100 ms. The envelope of this window is a bell-shape curve [1.p246]. In STFT, the signal is divided into small enough segments through decomposition (Frequency fragmentation), where these segments of the signal can be assumed stationary. With STFT, a sound is analyzed in frequency and time domain in order for a two-dimensional matrix to be created. This matrix is used as a reference for the production of the grains.
Wavelet Transforms works in a similar way. However, the main difference is that the window been used (called here the analyzing wavelet) is not fixed and can vary according to the input frequency – this process is called dilation (1.p284). This provides a better overall resolution, as higher frequencies are better resolved in time and lower frequencies are better resolved in frequency. Wavelet analysis associates both time and frequency domains with a time domain signal, the wavelet. A wavelet can be considered as the impulse response of a bandpass filter [1,p282]. A new signal can be reconstructed by adding a correct amount of wavelets together.
[ 3 ] Pitch-Synchronous overlapping streams
Pitch-synchronous GS uses sound analyses and then resynthesis to result in a granular sound. As a general approach, synthesis by analysis methods works by the following way. In the analysis stage, an array of bandpass filters with a small bandwidth called analysis filters, divides the required sound into various channels. Because the bandwidth of the filters is small, it is possible to sample the input sound in a much lower sampling rate, therefore performing time decimation to the signal. Time decimation operates by taking a sample at each L samples (decimation factor) of the input signal. In synthesis stage, the signal for each channel that has been analyzed regain its original sampling rate by inserting L-1 zeros between each sample and fed to another bandpass filter, called synthesis filter, which interpolates the signal to produce the output [8.p189].
In the pitch-synchronous approach, the purpose of the analysis is to assign each grain to a time-frequency channel as identified in the analysis stage. For each channel, an algorithm delivers the coefficients for a filter which its impulse response corresponds to the frequency response of the analyzed channel. A pitch detection algorithm also identifies the fundamental frequency for each channel. In the resynthesis stage, a pulse-train set to the previously detected frequency is used to excite an array of parallel FIR filters. The final sonic result is the weighted sum of the impulse responses of these filters [1.p93 – 8.p193].
[ 4 ] Synchronous and quasi-synchronous streams
Synchronous GS involves the generation of many streams of grains simultaneously. As the name suggests, in the synchronous GS approach grains follow each other at a regular intervals with a user defined delay period between each grain output. Because the successive grains are equal, the output streams forms a periodic function. Therefore, SGS can be analyzed as a form of Amplitude Modulation. As we already know, AM occurs when the shape of one signal (known as the Modulator) determines the amplitude of another signal, known as the carrier. This process produces sidebands (new frequencies above and below the frequency of the carrier) as in the case of SGS. In SGS the shape of the envelope function determines the amplitude of the sidebands, while the inverse of the period of the envelope function (most often a Gaussian curve) determines the spacing between the carrier frequency and the sidebands [7.p152]. Consider the case in which we set the duration of the grain at 40 ms (corresponding to an envelope frequency of 40 Hz). The sidebands in the output stream are going to be spaced at 25 Hz intervals (1 / 40 Hz). The modulation process within SGS, results to the creation of formants (frequency peaks) around the frequency of the carrier and thus makes possible the synthesis of human voices.
The difference between SGS and quasi-synchronous GS (QSGS) is that the spacing intervals between the grains are unequal through the duration of a granular stream. A random deviation factor determines the irregularity of the intervals. In this respect, QSGS is similar to Asynchronous GS as we see later. It has to be mentioned that the modulation effects in QSGS are no longer predictable, as the envelope function is not periodic function anymore.
[ 5 ] Granulated sampled sound streams
In contrast with the previously discussed GS methods, granulation of a sampled or live input sound is purely a time-domain operation. The purpose of the granulation is to chop up a signal into grains and then recompile these grains in a new time order. Digital time granulation is similar in operation to Gabor’s electromechanical time/rate changing devices and therefore results in two widely used studio techniques, pitch-shifting and time-stretching. For time-stretching, and in order to halve the duration of the signal, every other grain is deleted. To double the duration of the signal, for each grain an additional grain is replicated. The frequency information for each grain remains integral in both case, while only the time representation is changing. For pitch-shifting, and in order to shift the pitch of the signal up an octave, the playback sampling rate is doubled and every grain is replicated to restore the duration of the signal. To shift the pitch down an octave, the playback sampling rate is halved, while every other grain is deleted to restore the duration of the signal [7.p177].
More possibilities are available in non-real time situations and when
the input signal is stored in a file. By keeping the signal in a computer
memory table, is possible to extract grain in any order – sequential,
reversed or random. Roads [1.p189] suggest that very interesting sounds
can be created when many different stored files are combined and used
as source for the grains. By controlling the distribution of grains in
these files is possible to create unique sound textures evolving from
one sound to another.
[ 6 ] Asynchronous Granular Synthesis
Curtis Roads developed the Asynchronous Granular Synthesis method in 1978, using the MUSIC 5 music programming language. AGS is implemented quite differently from the previously discussed methods. In AGS, grains are not produced in linear streams anymore, instead they produced in asynchronous (irregular) streams and then are scattered probabilistically or stochastically over specific regions, the clouds [7] (figure 6a). A cloud is the unit that the composer works and it can be specified by the following parameters:
Start-time and duration of the cloud.
Grain duration.
Grain waveform
Frequency band of the cloud.
Density of grains within the cloud (in seconds)
Spatial dispersion of the cloud.
Figure 6a: sonic clouds in Asynchronous Granular synthesis
(i)
AGS: Start-time and duration of the cloud
The start-time and grain duration is both used to specify the time boundaries of a cloud. When many clouds are presented, time boundaries are necessary to specify when a cloud is finished and when a next cloud will follow. Clouds can also move parallel in time therefore their time boundaries can be intersected. Furthermore, the duration of a cloud can vary according to probabilistic or stochastic methods.
(ii)
AGS: Grain Duration
The grain duration has a profound effect in AGS. Within a cloud, the grain duration can be either constant (all grains have the same duration), random (grain duration takes random values between two boundaries) or frequency-dependent (duration is set as a result of the fundamental frequency of the grain). Normally the grain duration would be between the regions of 10 ms to 60 ms. Grains with duration more than 100 ms are causing an obvious modulation effect in which the center frequency (Fc) of the AM is defined by 1/Dg (where Dg is the grain Duration).
On the other hand, a grain duration less than 50 ms can also produce modulation by-products. As you previously saw, a grain with 50 ms of duration corresponds to a waveform period of 20 Hz. With durations less than 50 ms it is also possible to capture low frequency signals but in expense of modulation noise (pops and clicks) below and above the Fc of the grain’s waveform [7. p158]. However, these modulation artifacts can be minimized by using a band-pass filter at the output, centered on the fundamental frequency period of the grain’s waveform.
The frequency bandwidth of the grain is also proportional to the grain duration. Basic acoustic principles tell us, that the shorter the duration of a pulse, the greater the bandwidth. The total width of the grain’s frequency is inversely proportional to the duration of the grain and it can be calculated as: B (bandwidth) = 1 / Dg
(iii)
Grain Waveform AGS
Grains could have different waveforms in AGS. The choice of the grain waveform has a profound impact at the output. Sine, Square, Saw and Sync (band-limited impulse) waveforms are normally used as a grain wavefrom. Moreover, sampled sound could also be used. In AGS method, grains could have one of the following conditions:
Monochrome:
In this condition, only one type of waveform is used for all the grains inside a cloud.
Polychrome:
Two or more waveforms are used for grains, which are randomly selected and evenly distributes inside a cloud.
Transchrome:
The grain waveform evolves from one waveform to another through out the duration of a cloud.
Finally, when the fundamental frequency of the grain is high and the choice of the grain waveform is complex, attention has to be made to avoid aliasing especially when performing GS in the standard 44.1Khz sampling rate. Therefore, it is advisable to either to shift the sampling rate at a higher rate (i.e. @ 96 KHz), or limit the choice of the grain’s waveform depending on the fundamental frequency of the grain in order to follow the Nyquist theorem. In any case, sine waves provide the safest way to apply GS without concerned greatly about possible aliasing effects.
(iv)
FREQUENCY BAND OF THE GRAIN AGS
The large amount of grains required to generate a larger audio event – a cloud - demands a method of organizing their frequency for evolving sound textures to be made. Therefore, we normally use an upper and a lower frequency band in which the grains will be scattered. It would be almost impossible otherwise to specify a certain frequency for each grain individually. These frequency bands could have one of the following statuses:
They can either be:
Cumulus in which grains are scattered homogeneously within the upper and the lower bands or they can be Stratus in which grains adapt to specific frequencies specified by the composer [1.p104].
(v)
GRAIN DENSITY AGS
The grain density parameter defines the amount of grains within a cloud and is otherwise known as the Fill Factor (FF). A large density of grains within a cloud produces very rich sound textures, while the opposite is useful for simpler sounds. The sonic result of a cloud is depending on the frequency bands used in conjunction with the grain density. By increasing the density within a cloud, it is possible to create an effect that depends on the specified frequency bandwidth [7.p169]. For example, if we want to create intensive, large clouds, we have to fill completely a granular cloud and adjust the frequency bandwidth to an octave or more.
However, because the grain duration can vary, simply by counting the number of grains per second within a cloud is not enough to measure the cloud’s density. Therefore, three levels of density have been defined describing the percentage of grains within the cloud. A cloud can be either:
Sparse:
More than 50% of the cloud is empty of sonic grains.
Filled:
A cloud if completely filled by sonic grains.
Dense:
A cloud if filled by a large amount of sonic grains, overlapping each other.
However, because by definition AGS produce grains at random times it is not possible to secure a cloud to be completely filled at a specific moment in time. Curtis Roads [7.p169] suggests that in order to create a filled cloud we have to set the Filled Factor of a cloud at 2/Dg (Dg: Grain Duration). For example, if we want to fill a one second cloud and the duration of the grains is 50 ms then we need 50 grains (2/0.05) to fill completely the cloud. Roads [7.p169] also suggests that in order make possible to hear each grain as an individual event the previous ration has to be set at : 0.5/Dg
(vi)
AGS – SPATIAL EFFECTS
A further AGS parameter relates to the aural spatial position of grains within a cloud. This consideration is not normally found in other synthesis methods and provides an extra creative tool for a composer with an interest to work in a multi-channel audio configuration. There are two ways of using AGS to spread the grains in a multi-channel environment. Grains can be scattered randomly in an n number of channels or an envelope can be used to scatter the grain between n channels.
Furthermore, the spatial position of the grains can specified as to react according to different frequencies within the grain or even in response of grain duration. It is possible to specify different frequency bands assigned to different channel outputs. For example, in a 5.1 surround system it is possible to send grains between 1000-1500 Hz to the Left-Rear speaker, and grains between 1500 – 2000 Hz to the Right-Rear speaker, leaving the rest of the grains in the front channel positions. It is possible then to apply an envelope to the frequency bands that driving each grain to a channel position, thus succeeding an even more evolving spatial output between the output channels.
--------------------------------------------------------------------------
REFERENCES:
[1] Microsound, Curtis Roads
[3] Green D. 1971 “Temporal auditory acuity.” Psychological
review 78(6):540-551
[4] Truax B. 1988 “Real-time granular synthesis with a DSP computer”
Computer Music Journal 12(2): 14-26
[5] Keller D / C.Rolfe 1998 “The corner effect” Proceedings
of the XII Colloquium on Musical Informatics / www.thirdmonk.com
[6] Roads C. 1978a “Automated Granular synthesis of sound”
Computer Music Journal 2(2): 61-62
[7] Roads Curtis, Representation of musical signals 1991
[8] Giovanni Di Polli, Representation of musical signals
[9] Arfib D and Delprat N. 1993 “Music Transformations through modification
of time-frequncy images” Computer Music Journal 17(2): 66-72
[10] granularsynthesis.com
[11] find reference for Wiener at the web
[30] http://sound.media.mit.edu/mpeg4/sa-tools.html#SAOL
[SAOL language]
Dimitris Barnias 2004
©sonicspace.org 2005