This site discusses, and has examples of, a kind of spectrogram video geared towards looking at and exploring the psychoacoustics of music.
The site was put together by Norm Spier.
For the purposes of looking at performed music, consider a spectral analysis (i.e., pitch breakdown) for a fixed time that looks like:
On this type of spectrogram, when you go around clockwise 1/12 of a revolution, you go up a half-step. The half steps (against a particular tuning, here the piano was tuned to A3=222hz) lie on the white rays emanating from the center. What we have in this type of spectrogram is that all of the occurrences of the same notes on higher octaves are on the same ray. Going out one level on the blue spiral curve brings you up an octave. The labels of note and octave (e.g. "E8") give the octave for the note shown on the ray on the outermost loop of the spiral curve. (E.g., the example shows of note E, some E3, E4, E5, and E6, with E5 the strongest.)
Now, if you do a spectrogram like this every tenth of a second or so, and look at it synchronized with the playing music, you get a dynamic picture of the music.
(People familiar with the science of hearing should note that what you are looking at in such a dynamic spectrogram is the raw data that the brain gets from the ear.)
Here
[.mov]<-(choose format)->[.avi]
is a 30 second a segment of recorded music (from Bach's Goldberg Variations) as such a dynamic spectrogram. (NOTE: ".mov" format gives a sharper image, but on Windows, you need Apple's Quicktime player. ".avi" should work on any Windows machine. On a Windows computer, if you don't already have it, you can get the Quicktime player needed for the sharper ".mov" for free by clicking here. These files are about 4Mb each, and take about 20 seconds to load on a high-speed connection, longer on a low-speed.)
Here
[.mov]<-(choose format)->[.avi]
is another segment of recorded music (about 35 seconds from the Brahms Requiem movement 2).
NOTE: You can pause these sample clips, go backwards, forwards, single frame, even in the I.E. browser view. To do so, use the controls on the bottom right for Apple Quicktime, and elsewhere for other players.
Note that what you are looking at in such a spectrogram is not just the notes in the score, but also the harmonics of those notes, as present in the recorded music. The amount of such harmonics will depend on the particular instruments and how they are played. (Where, exactly, will the harmonics be? --click here to see.) A detail here that winds up being important is that, each fundamental and each harmonic is, by definition, a sine-wave shape. This is important because the ear (via the basilar membrane) discriminates and separates out the separate sine waves in a sound, i.e. it separates out the tones and harmonics. The spectrograms as well separate out the sine waves, and display strength of each sine component.
The fact that the spectrograms contains both the notes and some harmonics can be a bit confusing at first. It helps to look at a single note, intervals, and chords, which I have spectralized and can be examined below (under MORE SAMPLE SPECTROGRAM CLIPS). One of the basic observations is that the patterns of a single note, or a consonant chord, have energy at a fundamental, 4 half-steps up, then 3 more up (counting angle only, not octave). However, when a note is played softly, some of the harmonics may not show up, and either the 4 or 3 more half-step-up positions may not show. (They are caught by my analytical method, actually, but are below the start of the scale that I have used on my spectrogram.)
Unification of Single Sounds: As in these spectrograms, the ear does output the information about all the different harmonics of each separate sound as separate signals, and, in its marvelousness, the brain puts it all together into the correct picture of actual distinct sounds. From the wonderful book edited by Perry Cook
(Amazon |
B & N )
, we learn this is done using the perceptual mechanism of "common outcome" -- that is, the harmonics of a given sound go through parallel changes in volume and frequency-shift (as well as having a prescribed harmonic frequency relationship) -- this allows the reconstruction. With a little practice, you can somewhat see the parallel changes and separate out different sounds in these spectrograms. (It is much harder to also incorporate the harmonic frequency relationship information -- everything is happening too fast.) Though the visual perceptive apparatus can't sort this out all that well (and further, my frame rate is a bit slow to give the visual perceptive apparatus its best shot) -- the aural perceptive apparatus apparently can.
NORM'S MUSIC VISUALIZER (Interactively play and display MIDI files with chord- and key- recognition tools, to aid comprehension of tonal aspects of the music, with optional overlaid pitch feedback for humming, singing along, and several other musical immersion features, as well. )
MORE SAMPLE SPECTROGRAM CLIPS--(EACH TAKES 20 SECS OR MORE TO DOWNLOAD AT HIGH SPEED!) (Each plays for about 30 seconds)
[.mov] --
[.avi]
MOVING SINE AND FIXED SINE. The moving sine starts more than an octave down from the fixed one, and goes to more than an octave above. It is not dissonant when anywhere within a few half-steps of an octave down, or anywhere within a few half-steps of an octave above. It is only dissonant within a few half-steps of the fixed note (but not when almost exactly at the fixed note).
[.mov] --
[.avi]
of a playing of intervals on the piano (middle C is always the lower note):
Then imperfect-consonant: major 3rd (4 h-s), minor 6th (8-hs), minor 3rd (3-hs), major 6th (9-hs)
Then dissonant: major 2nd (2 h-s), minor 7th (10-hs), minor 2nd (1-hs), major 7th (11-hs), augmented 4th (6-hs)
Note: The explanation in Cook's book of the presence of close tones seems to hold. However, if it also has to do a bit with the more dissonant patterns being simply unlike the patterns you get more used to from listening to harmonics in nature -- like the human voice (energy at the rays 4 and 7 half-steps up from the fundamental -- e.g. like the first sound--the unison), this wouldn't surprise me. (My moving sine example doesn't seem to support the latter, but I'd need to see more variations on that to be sure. And pass them through expert ears, not my own.)
[.mov] --
[.avi]
SOME TRIADS: a C, followed by a major triad, a minor triad, a diminished triad, and an augmented triad (all with C as root). Again, the explanation in Cook's book would do it. An again, I wonder if the deviation from the standard harmonic pattern adds some bite for the dissonant ones.
[.mov]--
[.avi]
APPLAUSE. The spectrum is diffuse (noiselike), running over about 3 octaves. (There is also at times some energy at a very low frequency. This is not the applause, but some recording equipment rumble.)
[.mov]--
[.avi]
SOME INDIAN MUSIC. There is a Sitar (Plucked fretted instrument playing the melody in an improvised fashion within the bounds of the "raga" or formula), Tamboura (Drone: Playing open-string always, tuned to the main tones of the raga), and a Tabla (tuned-pitch hand drums, tuned to some main tones of the raga). This is the Hindustani variant of Indian music, and is in the Dadra raga.
An Indian raga is roughly a formula combining particular notes and orders of playing of those notes. The notes are roughly (but not exactly) a subset of those spaced as within a tempered Western scale. Thus, after setting my software to show A at 227.5 hz (an unusual tuning for Western music), the fundamental tones of the music appear roughly where they would in Western Music -- that is, on my 12 outward rays.
[.mov] --
[.avi]
from the Beethoven 4th Piano Concerto.
[.mov] --
[.avi]
from the Beethoven 2nd String Quartet.
THANK YOU NAXOS: I am grateful to Naxos for making available a number of its high-quality professional recordings for this project. Here is the Naxos site.
TECHNICAL NOTES FOR PEOPLE WITH A MATH/ENGINEERING BACKGROUND
I have not used the somewhat standard Fourier techniques to do these spectrograms. The battery of tuned damped mass-springs seem closer in functioning to the basilar membrane than Fourier transforms. Further, the efficiency of the FFT does not come into play so much, since the evenly spaced musical intervals are not evenly spaced in frequency.
I have no knowledge of how the method I have used might compare with using windowed Fourier transforms. (My guess is that the results would be roughly similar. However, this comparison does not apply to DFT/FFT -- my technique is much sharper in tone distinctions.)
With Fourier transforms, there is a well-known tradeoff called the Uncertainty Principle (absolutely NOT related to Heisenberg's) where the shorter the sample in time, the less precise the image in frequency. This tradeoff is clearly visible when I look at the examples in my method as well. In my dynamic spectrograms, I choose parameters to place a cap on the rate of spring slow-down after the sound signal is removed (keeping lag or sluggishness of response under control). Doing so yields frequency images which are less and less sharp (even on synthetic pure sines) as we go down in frequency. (Some charts in Cook confirm that the same type of thing happens in the basilar membrane. Further, the wider critical bandwidth (in terms of half-steps) may be another manifestation of this.)
The precise modelling I use for each damped mass-spring is:
x" = -(k/m)x - (c/m)x' + s/m
here, x(t) is the one-dimensional position as a function of time, k is the spring constant, c is a damping constant, m is the mass, and s(t) is the one-dimensional force placed on the mass by the sound vibration in the air.
This mass-spring model and differential equation is covered in virtually all basic physics and differential equations texts, and the solution to the equation is given (with or without proof).
It is important to comment that, of course, it is not in the nature of biological things, like a basilar membrane, to be precisely engineered so that neurological detectors can be pre-wired to know that this position on the membrane resonates precisely 2 octaves above where this other point resonates. That correspondence would be learned, either from music, or from simply the experience of hearing sounds in nature. Thus, the spiral layout that I have used, with outward rays representing the same note in all octaves, presents the information not quite raw to the nervous system, but actually after a bit of neurological processing.
What Are the Frequencies of the Notes?: "A" right below middle C has a fundamental that is a sine of 220 cycles per second. Each time you go up a half-step, you multiply this by the 12th root of 2 (about 1.059463094359), down a half step, divide by the 12th root of 2. In each case, the harmonics are sines (in varying strengths, dying out as you go up) at 2 times the fundamental frequency, 3 times, 4 times, 5 times. Thus, the B right below middle C has a fundamental of 246.94 cycles per second, with harmonics 493.88, 740.82, 987.77, etc. DETAIL: This way of determining notes is the common standard way, called the tempered scale. Sometimes instead of the A below middle C being 220 cycles per second, it is a few cycles different. (My spectrogram software used to make the videos was adjustable, and sometimes I adjusted to the apparent tuning used by the ensemble in the recording -- e.g. 222hz in the first screenshot.)
A discerning observer's question: why sines?: The notion of frequency and harmonics implicitly defines that a vibration of a certain frequency is a sine function of that frequency. Why does everybody seem to choose this (the set of sine waves) as our "basis"? Most books just start out by looking at the sine as the fundamental wave, without saying why.
I am not sure all of the reasons, but the fundamental and best reason is that the sine is what the ear perceives. As above, the psychacoustics literature seems to bear this out. (However, the common choice of sine probably predates the psychoacoustics literature.)
A second reason is that the general model for most acoustical transformation, the linear-time-invariant system (supported by physics and usually reasonably accurate), preserves sines (just shifting and changing amplitude). Incidentally, using the knowledge that linear time invariant systems are those systems that are describable as the effect of an impulse response (i.e., essentially a large linear combination of shifted versions of the input wave), then the fact that linear time invariant systems preserves sine waves boils down to the well-known elementary trigonometric identity cos(x+y)=cos(x)cos(y)-sin(x)sin(y).
If anyone knows of any other reasons for using sines, please let me know.
SPIRAL REPRESENTATION: Of course it's not new. The spiral representation is pretty obvious, so one does not expect it is new. Indeed, I have bumped into a few people who have used it recently, and the book by Perry R. Cook indicates that the German physicist Moritz Drobisch proposed a helical representation (essentially the same thing--just pull up my flattened spring to form a stretched-out spring -- a helix). I expect the representation is even older than that, and I do know of one of my old Math professors who would be pretty surprised if Archimedes didn't think of this arrangement. (Oh, by the way: there is a terminology for the angular position around the spiral or helix (i.e. the note without reference to octave) -- it is called the chroma of the tone.)
My new Windows-based software allows you to look at your own live or recorded sounds in the same fashion as the videos on this page, in real time.
It supports either the spiral representation as in the dynamic spectrograms on this page, or a multi-line (one octave per line) display.
The new software has a mode for directly displaying the pitch when you have single sounds. It also can analyze separately the sound from two separate devices, so you can check your pitch as you sing or play along.
Click on the link in this box for information.
About me, Norm Spier:
I am a free-lance mathematical statistician and computer programmer, living near Binghamton, New York, U.S.
DONATIONS (Suggested: 10 dollars / Euros or less)
to support this site and other music-related software and projects are accepted and appreciated
Ear Training Software:
I have, and recommend, EarMaster ear training software. These links, through Amazon, seem to be for the same product that I have: EarMaster 5. The prices are different: one through Amazon direct, one through a sub-vendor.
Schoenberg is a classic, at some points articulate, at others unclear. I have place it here because it has considerable reference to overtones as explanations for the rules of harmony. However, some of these explanations may be speculative!