In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
On 05/06/2026 21:27, Don Y wrote:
In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
Back in those days - well around year 2000 at least - a rather
different model actually applied.˙ This was dc to 4kHz.
I tested phone calls between the UK and USA and Canada and
also across Europe and was usually able to pass dc across the
phone network.
This did require using ISDN equipment at each end.˙ Even that
far back most of the global phone network was digital and it
was only the codecs at each end that defined the frequency
response.˙ Many codecs had a high-pass corner at around
150Hz and an anti-aliasing filter at about 3.7kHz.
However, the GSM standard rate codec had a high-pass at
about 80Hz.˙ There was a problem because the GSM specification
for that codec had an error in the filter coefficients which
meant that it only slightly attenuated dc.˙ This was a problem
when echo cancelers treated dc just like any other frequency.
A major international project had severe difficulties because
the silence inserted between voice recognizer prompts was
accidentally coded in u-law when it should have been A-law.
The code mismatch resulted in a dc offset during the silences
which caused the silences to take priority when the GSM codec
happened to be working in half duplex mode causing the
user responses to be lost.
The bottom line is - if you really want to know the channel
response you will have to measure it yourself.
Telephone-company employees must sign documentation promising that they
will not enact unfaithful propagations of persons' voices, but
completely faithful telephones do not exist.
In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
On 05/06/2026 21:27, Don Y wrote:
In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
You may be able to infer a rough idea of the channel frequency response by taking an FFT of a chunk digitised at 44kHz. But I'd be more inclined to low pass filter everything above 4kHz and use 8b @ 8kHz.
It should be good enough for the intended purpose of speech at that.
On 05/06/2026 21:27, Don Y wrote:
In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
You may be able to infer a rough idea of the channel frequency response
by taking an FFT of a chunk digitised at 44kHz. But I'd be more inclined
to low pass filter everything above 4kHz and use 8b @ 8kHz.
It should be good enough for the intended purpose of speech at that.
Accuracy
Many file formats that compress audio discard some of the audio signal
information whilst doing so. Converting to such a format and then con?
verting back again will not produce an exact copy of the original au?
dio. This is the case for many formats used in telephony (e.g. A-law,
GSM) where low signal bandwidth is more important than high audio fi?
delity, and for many formats used in portable music players (e.g. MP3,
Vorbis) where adequate fidelity can be retained even with the large
compression ratios that are needed to make portable players
practical.
In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
Don Y <blockedofcourse@foo.invalid> wrote:
In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
The Telco minimum standard is 300 to 3400 hz (or 3200 for some
equipment manufacturers, depending on who you talk to). LPF
is required before digitizing to avoid aliasing.
In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.
Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.
On 6/6/2026 2:29 PM, Jeremiah Jones wrote:
Don Y <blockedofcourse@foo.invalid> wrote:
In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.
Digitizing to 8b at 8KHz and you were pretty much golden.
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
The Telco minimum standard is 300 to 3400 hz (or 3200 for some
equipment manufacturers, depending on who you talk to). LPF
is required before digitizing to avoid aliasing.
Yes, but that was when PSTN was *the* means of moving voice.
How do more modern network protocols address this? Do they
deliberately cripple themselves thinking there is a land-line
(over copper, SLIC06, etc.) on at least one end of the line
(so why go out of your way to provide MORE fidelity)?
In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.
Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.
That is exactly the issue.
If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the analyzer/models via channels of dubious characteristics?
POTS relied on the fact that humans can recognize even severely
distorted speech and can rely on knowledge of WHO they telephoned
(or who reassured themselves at the far end) to identify the
called parties.
Note how some vendors now rely on speech as a biometric
authenticator. How reliable could *that* be if I placed a
call to them from an old exchange through a dubious PBX, etc.?
With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?
Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?
The Telco minimum standard is 300 to 3400 hz (or 3200 for some
equipment manufacturers, depending on who you talk to). LPF
is required before digitizing to avoid aliasing.
Yes, but that was when PSTN was *the* means of moving voice.
How do more modern network protocols address this? Do they
deliberately cripple themselves thinking there is a land-line
(over copper, SLIC06, etc.) on at least one end of the line
(so why go out of your way to provide MORE fidelity)?
The PSTN is still the go-to carrier for voice calls. There
are VOIP and various ATM vehicles, which would likely use
telco routers and switches or telco TDM facilities... if you
need wider sound bandwidth. I'm not sure what more modern
protocols you mean. I'm not up to date on that.
3400 hz sampled at 8 khz gives you pretty decent voice
quality. Not very good for music, but understandable for
speech, and for people like me with poor high-frequency
hearing, extra audio bandwidth wouldn't help much. Telephone
mikes and sound elements are also cheap and crappy. If you
want fidelity i'd start with a good quality headset.
Most of the information in a human voice signal is well within
the sub-1khz range. More so for men than women.
24 telco voice channels fit nicely on a T1. Bandwidth was not
always cheap. Do you remember back when it was easy to run up
$40 on one phone call? It wasn't a matter of deliberate
crippling, it was about what's doable at a reasonable price
that fits what the customer needs.
In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.
Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.
That is exactly the issue.
If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the
analyzer/models via channels of dubious characteristics?
Not sure what you mean. The only part of the "speech model"
that matters is the spectral content. Signal/noise matters,
but that's a hardware issue.
POTS relied on the fact that humans can recognize even severely
distorted speech and can rely on knowledge of WHO they telephoned
(or who reassured themselves at the far end) to identify the
called parties.
Note how some vendors now rely on speech as a biometric
authenticator. How reliable could *that* be if I placed a
call to them from an old exchange through a dubious PBX, etc.?
I don't know, but probably just fine. One would think they
are built for telco channel transmission.
In most cases the PBX or other customer facilities does the
A/D and D/A conversion at the customer premise, so analog
signals traveling down long copper spans are not a source of
degradation.
What's the alternative... a call over Instagram? A
high-priced special circuit from your carrier?
If I phone home and simply state "Meet me out front",
my other half will do so, even if annoyed at the terse,
imperative nature of my comment.˙ Because she would
*recognize* me as the caller and would realize there
was a reason for my demand AND for its brevity as well as
the likelihood of my making such a demand.
I want a machine to be able to act with that same level
of reliability without having to resort to supplemental
interactions to bolster confidence in its authentication
of the caller.
So, if I am out of town, I can instruct the machine to
recognize a neighbor's "commands" (to perform a limited
set of actions, as my proxy) without having to quiz them
on other information (effectively, shared secrets -- even
if not truly "secret") to get that level of confidence
in their authentication:
˙ "What day did Don depart?"
˙ "When did he say he was going to return?"
˙ "Where was he going?"
˙ "What's your pet's name?"
I.e., a recording of said neighbor (or, an AI generated
simulant) could sound correct and yet not respond correctly.
You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.
Jeremiah Jones <jj@j.j> wrote:
[...]
You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.
Some mobile 'phones produce sounds that are barely recognisable as a
voice, let alone any particular voice. This is made even worse by
people who stand them on a table and switch them to 'hands-free' with a
loud television or radio in the background.
On 6/6/2026 11:25 PM, Jeremiah Jones wrote:
3400 hz sampled at 8 khz gives you pretty decent voice
quality. Not very good for music, but understandable for
speech, and for people like me with poor high-frequency
hearing, extra audio bandwidth wouldn't help much. Telephone
mikes and sound elements are also cheap and crappy. If you
want fidelity i'd start with a good quality headset.
What I want is repeatability and, ideally, with some form
of quantitative description that lets me relate THIS audio
stream to an audio stream that I would encounter when sharing
a room with the speaker.
Most of the information in a human voice signal is well within
the sub-1khz range. More so for men than women.
But, that's because the human listener is "accommodating".
One can distort their voice and still be understood by
someone intent on that understanding. Witness how you
can understand someone with a severe head cold, despite
their voice being quantitatively "different".
Or, someone with laryngitis.
A machine can similarly understand what they are *saying*.
But, its harder for a machine to identify and authenticate
that speaker. "What does Bob sound like when he's got
a head cold?"
Anything that the channel does to the signal falls into the
same problem domain.
E.g., I have a bridge that lets my *wired* home phones
use my cellular connection. But, the quality of the
connection is beneath the quality of the wired phones
on a wired landline *or* the call phone that is connected
to the bridge.
24 telco voice channels fit nicely on a T1. Bandwidth was not
always cheap. Do you remember back when it was easy to run up
$40 on one phone call? It wasn't a matter of deliberate
crippling, it was about what's doable at a reasonable price
that fits what the customer needs.
In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.
Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.
That is exactly the issue.
If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the
analyzer/models via channels of dubious characteristics?
Not sure what you mean. The only part of the "speech model"
that matters is the spectral content. Signal/noise matters,
but that's a hardware issue.
But "you" sound different when your speech travels to "me"
via different channels. I can understand what you are trying to
*convey* -- much like you could understand me in a helium
environment. But, you likely wouldn't recognize my "voice"
in that environment. You would have to rely on other cues.
In most cases the PBX or other customer facilities does the
A/D and D/A conversion at the customer premise, so analog
signals traveling down long copper spans are not a source of
degradation.
What's the alternative... a call over Instagram? A
high-priced special circuit from your carrier?
I'm not asking (or requiring) anything special. Rather,
I am trying to learn the characteristics of a channel that
has presented itself to me (my device) without any input
from me (or the other party!) in that selection. I.e.,
you can't tell your provider what sort of connection you want
for THIS call.
Don Y <blockedofcourse@foo.invalid> wrote:
On 6/6/2026 11:25 PM, Jeremiah Jones wrote:
(snip for focus)
3400 hz sampled at 8 khz gives you pretty decent voice
quality. Not very good for music, but understandable for
speech, and for people like me with poor high-frequency
hearing, extra audio bandwidth wouldn't help much. Telephone
mikes and sound elements are also cheap and crappy. If you
want fidelity i'd start with a good quality headset.
What I want is repeatability and, ideally, with some form
of quantitative description that lets me relate THIS audio
stream to an audio stream that I would encounter when sharing
a room with the speaker.
That would be signal-to-noise ratio (SNR).
SNR = St / (St - Sr)
if St is the transmitted signal, ie out of the speaker's
mouth, and Sr is the received signal, that which goes into
your ear.
Simple to say, accurate measurement could be pretty difficult.
You could send a known signal thru the network that would
quantify frequency and phase distortion and latency. I don't
know what off-the-shelf equipment would do that, but there
must be some.
Most of the information in a human voice signal is well within
the sub-1khz range. More so for men than women.
But, that's because the human listener is "accommodating".
No. It's because that's just the spectral content of a human
voice. That one can be objectively measured & quantified.
One can distort their voice and still be understood by
someone intent on that understanding. Witness how you
can understand someone with a severe head cold, despite
their voice being quantitatively "different".
Or, someone with laryngitis.
A machine can similarly understand what they are *saying*.
But, its harder for a machine to identify and authenticate
that speaker. "What does Bob sound like when he's got
a head cold?"
Anything that the channel does to the signal falls into the
same problem domain.
Not exactly. What the channel does to the signal is noise,
which falls under SNR. If Bob has a head cold, that's just a
difference in the original signal.
If you are trying to identify the caller via a "voice print",
it still adds to the confusion all the same. IIRC a voice
print can't be spoofed by another human voice, but i'm pretty
sure it can be spoofed by an AI synthesis of Bob's voice. I
speculate the voice print would not be greatly affected if Bob
has a head cold, since it sounds like a product of the
speaker's physiology, like larynx, tongue etc, at least
according to my very incomplete understanding.
E.g., I have a bridge that lets my *wired* home phones
use my cellular connection. But, the quality of the
connection is beneath the quality of the wired phones
on a wired landline *or* the call phone that is connected
to the bridge.
24 telco voice channels fit nicely on a T1. Bandwidth was not
always cheap. Do you remember back when it was easy to run up
$40 on one phone call? It wasn't a matter of deliberate
crippling, it was about what's doable at a reasonable price
that fits what the customer needs.
In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.
Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.
That is exactly the issue.
If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the
analyzer/models via channels of dubious characteristics?
Not sure what you mean. The only part of the "speech model"
that matters is the spectral content. Signal/noise matters,
but that's a hardware issue.
But "you" sound different when your speech travels to "me"
via different channels. I can understand what you are trying to
*convey* -- much like you could understand me in a helium
environment. But, you likely wouldn't recognize my "voice"
in that environment. You would have to rely on other cues.
That would be a pretty serious signal degradation.
(snip)
In most cases the PBX or other customer facilities does the
A/D and D/A conversion at the customer premise, so analog
signals traveling down long copper spans are not a source of
degradation.
What's the alternative... a call over Instagram? A
high-priced special circuit from your carrier?
I'm not asking (or requiring) anything special. Rather,
I am trying to learn the characteristics of a channel that
has presented itself to me (my device) without any input
from me (or the other party!) in that selection. I.e.,
you can't tell your provider what sort of connection you want
for THIS call.
I see.
To the extent that a phone call is digitized the entire way,
except for the first and last mile (or fraction thereof),
different call paths aren't going to make any difference.
It's also probable that, if you use the PSTN, you are going to
get the same path every time anyway. The PSTN mostly uses TDM
transport which is pre-determined, rather than multiple paths
like the Internet does.
(snip the rest, as I have nothing useful to add to it)
| Sysop: | Jacob Catayoc |
|---|---|
| Location: | Pasay City, Metro Manila, Philippines |
| Users: | 4 |
| Nodes: | 4 (0 / 4) |
| Uptime: | 494928:15:42 |
| Calls: | 162 |
| Files: | 568 |
| D/L today: |
14 files (349K bytes) |
| Messages: | 74,957 |