Explanation of Screenshot 1 of Spectratune



EXPLANATION OF SCREENSHOT 1 (Main Spectratune Panel):

Importantly, the way I have the program set for the picture, notes go up like reading: as you go right (within an octave range), and as you go down (each new row is the next octave). (You may think I have up and down inverted from what is natural, but somehow the analogy for Westerners of the way we read seemed most natural to me. Anyway, I actually have "up goes up" as an option, so you can run it that way if you like, and that's what I did for the videos on this site.)

The example above was taken with my electronic keyboard feeding my computer sound-card as device 1. I also happen to be listening to that keyboard with headphones, and humming along and matching pitch into my webcam microphone, which is set as device 2.

The keyboard is playing the E right above middle C. What shows in the spectral analysis for the keyboard, in blue, are the note (fundamental) and the first 12 overtones. (The electronic keyboard was set to a saxaphone -- a timbre rich in overtones. On say the piano setting of the keyboard, the higher overtones are less conspicuous. For the Sax from the keyboard that I did use, there are in fact some overtones beyond the first 12 visible in that last half-octave when I raise the "Plot Gain" a little beyond what I have it at for the screen shot.)

Because the keyboard generates the sax with a little vibrato, if you watch the fundamental and its overtones in motion, they actually move up and down in complete synchrony by about a 1/5 of a half-step. At the moment I snapped the screen shot, the vibrato was putting the true frequency just a tad above the note.

If you were using the blue spectral display above to check tuning or intonation, you would look for the lowest overtone to get the exact note, rather than an overtone.

When you have a single sound (with overtones as their usually are, of course), you can also use the single-pitch detection mode, which works like a normal chromatic tuner. (And by the same algorithm -- called autocorrelation based.) This is shown above in red for the keyboard. (Beware that the autocorrelation algorithm is actually imperfect, and in particular it occasionally picks out the right note in the wrong octave -- whether in this software product, or another chromatic tuner product.)

As I indicated, I was humming along trying to match the note in the webcam microphone when I took the shot. I am only running the single-pitch detection on the webcam microphone channel -- no spectrogram. The single pitch detection is showing as the yellow arrow. It is picking up only my humming and not the keyboard because I am listening to the keyboard through headphones.

You might wonder why I was humming along on the webcam microphone and not a microphone plugged into the soundcard. The reason is I wanted a separate analysis for the humming, and didn't want an analysis of the humming mixed in with the keyboard. With my soundcard, at least, there is no way to separate the sound-card microphone sound from the sound of anything else running through the sound card. (In my screen shot of the device panel below, this fact is manifested in that all sound-card signals for computer analysis come through the "Line In/Mic In" "Windows sound input device". Your sound card might be more flexible. Anyway, if you wanted an analysis from a separate mic of higher fidelity than a webcam mic, I believe you could put in a second sound-card. (You wouldn't need higher fidelity just to sing-along, but if you were looking, for whatever reason, at high overtones while singing along, then you would.)

The spectrum display (blue above) works no matter how many sounds are present. When things are more complicated (say a singer with an instrument, or Peter, Paul, and Mary with several instruments), it gets a little difficult to figure out what's what. One hint is: (a) that a single note always shows the straight down pattern of the fundamental, then the first overtone one octave below, and the 3rd overtone one octave below that, plus occasionally the 7th overtone and 15th overtone straight down below those. Another hint, when you see it changing in time, is (b) that overtones from the same sounds move together. You can often pick this up. Finally, (c) when you have more time to look at real detail, the precise pattern of overtones from a single sound that may exist must be as in the screen-shot (except with additional overtones after the 12th), and using this information, you can often deduce additional sounds. (Frequently, while looking at string quartet recordings or the audio from MIDI files, after going to the lowest fundamental, I can deduce two other fundamentals down within the first two octaves of that first fundamental.)

The ear actually works much like the spectrogram. There is a long curled thing in the cochlea of the ear called the basilar membrane that vibrates at different sections corresponding to the fundamental and all overtones present. Several thousand nerves transmit information to the brain about what sections are vibrating. The brain then puts together single sounds basically by method (b) above. With my software, you won't be able to do as good a job at putting things together as your brain does with the ear, partly because my software doesn't respond as quickly as the basilar membrane and auditory nerves. But you get some idea what it does by looking at the software.

(NOTE: I have a slight oversimplification in the last paragraph. At the lowest frequencies, there is evidence that the brain may actually use, additionally to the information about WHERE the basilar membrane is vibrating, or solely, an actual COUNT of the vibrations. The count would be transmitted via neural firing frequency.)

For deaf folks with auditory nerves still intact, a cochlear implant actually works by taking something like the signals from a scattering of points across the spectrogram, and sending them to the appropriate nerves along the basilar membrane.

Some may think the spectrograms (whether from my program, or another spectrogram program) are defective because the overtones don't show on the graph in one sharp point, but rather are a little spread out. But that actually parallels what's going on in the ear on the basilar membrane -- which is again the data that the brain gets. The basilar membrane won't vibrate in just one precise point (even if fed a perfect sine wave), but will vibrate in a small zone, with the amount of vibration peaking at one point. The ear doesn't work by detecting absolutely precise sine waves -- it works by picking up where the basilar membrane is vibrating and figures out the pattern.

As a technical note, spectrograms work by breaking signals down to sine waves. One might wonder what is the significance of the sine waves -- why sines, isn't the choice by all those engineers of that mathematical sine shape arbitrary? No, it's not arbitrary. It works out that that's what the ear, at the basilar membrane, picks up. It vibrates at sections acording to what sines are present.

Oh, I need to explain the first plot down from the top. That is the autocorrelation used in the "single-pitch detection" algorithm. (The plot is split into halves for each device. At the exact moment I snapped the shot, the autocorrelation for device 1 was not on screen.) The autocorrelation is not that interesting, but I put it there because it can help you figure out when the single-pitch detection will work well, and make adjustments, or move your mic, to make it work well. (It will work well when the plots hits very near the top line in clear places -- rather than being kind of ambiguous. In this case, for device 2 it is hitting in 7 clear places -- and the single-pitch-detection algorithm is working well on that device. (For device 1 there was a similar situation, and, thus, the red detected-pitch arrow is showing, and agreeing with the fundamental in the spectrogram.)

Also, I may need to explain about the amount of each frequency that the spectrogram shows. It is in decibels (dB), which is on a "logarithmic scale". Every time you go up 10 dB, you multiply the sound has 10 times as much power. (If you go up 20 dB, the sound has 100x as much power, up 30dB, 1000x as much power). Now, how many dB of power range (i.e. ratio) the full range of my plot shows depends on the dynamic-range setting you make. In the picture its 74dB, so the range in the screen shot above runs over a bit more that 10,000,000 to 1 power ratio. From "eyeballing" it, it looks like the first overtone is about 30 dB down from the fundamental, and so the first overtone has about 1 one thousandth as much power. Some of the higher overtones are stronger than this. (The pattern of overtone strengths you get depends on the instrument and how it is played. The way these vary is perceived as "timbre".)


By the way, if you've downloaded the program, to move any of those 5 "sliders" (of which dynamic range is one), you just left-click on where you want to be on the slider.