Forum: Jacob's Hideout BBS

Telco (voice) channel

From Don Y@3:633/10 to All on Friday, June 05, 2026 13:27:53

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John R Walliker@3:633/10 to All on Friday, June 05, 2026 23:23:12

On 05/06/2026 21:27, Don Y wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

Back in those days - well around year 2000 at least - a rather
different model actually applied. This was dc to 4kHz.
I tested phone calls between the UK and USA and Canada and
also across Europe and was usually able to pass dc across the
phone network.
This did require using ISDN equipment at each end. Even that
far back most of the global phone network was digital and it
was only the codecs at each end that defined the frequency
response. Many codecs had a high-pass corner at around
150Hz and an anti-aliasing filter at about 3.7kHz.
However, the GSM standard rate codec had a high-pass at
about 80Hz. There was a problem because the GSM specification
for that codec had an error in the filter coefficients which
meant that it only slightly attenuated dc. This was a problem
when echo cancelers treated dc just like any other frequency.
A major international project had severe difficulties because
the silence inserted between voice recognizer prompts was
accidentally coded in u-law when it should have been A-law.
The code mismatch resulted in a dc offset during the silences
which caused the silences to take priority when the GSM codec
happened to be working in half duplex mode causing the
user responses to be lost.
The bottom line is - if you really want to know the channel
response you will have to measure it yourself.
John

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Friday, June 05, 2026 16:26:30

On 6/5/2026 3:23 PM, John R Walliker wrote:

On 05/06/2026 21:27, Don Y wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

Back in those days - well around year 2000 at least - a rather
different model actually applied.� This was dc to 4kHz.
I tested phone calls between the UK and USA and Canada and
also across Europe and was usually able to pass dc across the
phone network.
This did require using ISDN equipment at each end.� Even that
far back most of the global phone network was digital and it
was only the codecs at each end that defined the frequency
response.� Many codecs had a high-pass corner at around
150Hz and an anti-aliasing filter at about 3.7kHz.

But, one can't be sure of how point A and point B are
interconnected, even if you know the type of "connection"
at one end.

E.g., I can call a neighbor using an old-fashioned land line.
Or, call the same neighbor from a cell phone. Or, call the
*neighbor's* cell phone. etc.

However, the GSM standard rate codec had a high-pass at
about 80Hz.� There was a problem because the GSM specification
for that codec had an error in the filter coefficients which
meant that it only slightly attenuated dc.� This was a problem
when echo cancelers treated dc just like any other frequency.
A major international project had severe difficulties because
the silence inserted between voice recognizer prompts was
accidentally coded in u-law when it should have been A-law.
The code mismatch resulted in a dc offset during the silences
which caused the silences to take priority when the GSM codec
happened to be working in half duplex mode causing the
user responses to be lost.
The bottom line is - if you really want to know the channel
response you will have to measure it yourself.

That's just not practical. If I called you, today, how could
*you* characterize the connection? Would that characterization
remain unchanged even if the call went on for hours? What if
we dropped the connection and reestablished it?

Now, instead of "I" and "you" imagine two indiscriminate individuals
with no technical expertise and no *interest* in any of that...
Does party A "sound different" (than expectations) because of
a head cold? Or, some characteristic of the particular connection?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Friday, June 05, 2026 16:27:46

On 6/5/2026 3:01 PM, Niocl�s P�l Caile�n de Ghloucester wrote:

Telephone-company employees must sign documentation promising that they
will not enact unfaithful propagations of persons' voices, but
completely faithful telephones do not exist.

It's not just that they aren't "faithful" but, rather, that
they aren't *repeatable*.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Martin Brown@3:633/10 to All on Saturday, June 06, 2026 15:55:01

On 05/06/2026 21:27, Don Y wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

You may be able to infer a rough idea of the channel frequency response
by taking an FFT of a chunk digitised at 44kHz. But I'd be more inclined
to low pass filter everything above 4kHz and use 8b @ 8kHz.

It should be good enough for the intended purpose of speech at that.

--
Martin Brown

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Saturday, June 06, 2026 12:36:33

On 6/6/2026 7:55 AM, Martin Brown wrote:

On 05/06/2026 21:27, Don Y wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

You may be able to infer a rough idea of the channel frequency response by taking an FFT of a chunk digitised at 44kHz. But I'd be more inclined to low pass filter everything above 4kHz and use 8b @ 8kHz.

It should be good enough for the intended purpose of speech at that.

I typically am dealing with speech that is conveyed in open air
or via a relatively high bandwidth channel. I continually update my
models based on my knowledge of the speaker's identity AND THE
"unimposing" CHARACTERISTICS OF THE CHANNEL.

But, also have to handle the likely possibility of a telecom channel
for the same speaker.

This both distorts their perceived speech AND would bias the
model that has been built without those constraints.

If all you are doing is *recognition*, its not a problem. But,
for diarization and identification, it lowers their effectiveness.
Lack of sidetone and the latency of modern telco already leaves
you with a significant challenge!

As an "inverse filter" can't reconstruct characteristics that have
been discarded by a lower sampling frequency, I have two choices:
- run the trained models through an appropriate filter that
models the telco channel
- build a separate model for *just* the telco channel

But, if that channel can change from call to call... <shrug>

I *might* be able to use PRESUMED knowledge of the party's identity
to help characterize the current channel. But, that would defeat identification.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Phil Hobbs@3:633/10 to All on Saturday, June 06, 2026 20:46:42

Martin Brown <'''newspam'''@nonad.co.uk> wrote:

On 05/06/2026 21:27, Don Y wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

You may be able to infer a rough idea of the channel frequency response
by taking an FFT of a chunk digitised at 44kHz. But I'd be more inclined
to low pass filter everything above 4kHz and use 8b @ 8kHz.

It should be good enough for the intended purpose of speech at that.

To use 8 bits, you normally had to use companding digitizers (A-law over
here, mu-law in Europe, iirc).

Haven?t seen one of those in a looong time.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Saturday, June 06, 2026 22:44:23

On 2026-06-06 00:01, Niocl�s P�l Caile�n de Ghloucester wrote:

...

Accuracy
Many file formats that compress audio discard some of the audio signal
information whilst doing so. Converting to such a format and then con?
verting back again will not produce an exact copy of the original au?
dio. This is the case for many formats used in telephony (e.g. A-law,

A-law does not compress. It only changes the amplitude. Does not change
the bandwidth.
A-law changes the amplitude encoding, not the audio bandwidth.

GSM) where low signal bandwidth is more important than high audio fi?
delity, and for many formats used in portable music players (e.g. MP3,
Vorbis) where adequate fidelity can be retained even with the large
compression ratios that are needed to make portable players
practical.

Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Jeremiah Jones@3:633/10 to All on Saturday, June 06, 2026 14:29:40

Don Y <blockedofcourse@foo.invalid> wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

The Telco minimum standard is 300 to 3400 hz (or 3200 for some
equipment manufacturers, depending on who you talk to). LPF
is required before digitizing to avoid aliasing.

In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.

Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Saturday, June 06, 2026 15:06:56

On 6/6/2026 2:29 PM, Jeremiah Jones wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

The Telco minimum standard is 300 to 3400 hz (or 3200 for some
equipment manufacturers, depending on who you talk to). LPF
is required before digitizing to avoid aliasing.

Yes, but that was when PSTN was *the* means of moving voice.
How do more modern network protocols address this? Do they
deliberately cripple themselves thinking there is a land-line
(over copper, SLIC06, etc.) on at least one end of the line
(so why go out of your way to provide MORE fidelity)?

In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.

Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

That is exactly the issue.

If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the analyzer/models via channels of dubious characteristics?

POTS relied on the fact that humans can recognize even severely
distorted speech and can rely on knowledge of WHO they telephoned
(or who reassured themselves at the far end) to identify the
called parties.

Note how some vendors now rely on speech as a biometric
authenticator. How reliable could *that* be if I placed a
call to them from an old exchange through a dubious PBX, etc.?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Jeremiah Jones@3:633/10 to All on Saturday, June 06, 2026 23:25:14

Don Y <blockedofcourse@foo.invalid> wrote:

On 6/6/2026 2:29 PM, Jeremiah Jones wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

In the POTS days, one could model a telephone channel as
roughly 3KHz BW (~300-3KHz) with the copper crapping out
at about 4KHz.

Digitizing to 8b at 8KHz and you were pretty much golden.

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

The Telco minimum standard is 300 to 3400 hz (or 3200 for some
equipment manufacturers, depending on who you talk to). LPF
is required before digitizing to avoid aliasing.

Yes, but that was when PSTN was *the* means of moving voice.
How do more modern network protocols address this? Do they
deliberately cripple themselves thinking there is a land-line
(over copper, SLIC06, etc.) on at least one end of the line
(so why go out of your way to provide MORE fidelity)?

The PSTN is still the go-to carrier for voice calls. There
are VOIP and various ATM vehicles, which would likely use
telco routers and switches or telco TDM facilities... if you
need wider sound bandwidth. I'm not sure what more modern
protocols you mean. I'm not up to date on that.

3400 hz sampled at 8 khz gives you pretty decent voice
quality. Not very good for music, but understandable for
speech, and for people like me with poor high-frequency
hearing, extra audio bandwidth wouldn't help much. Telephone
mikes and sound elements are also cheap and crappy. If you
want fidelity i'd start with a good quality headset.

Most of the information in a human voice signal is well within
the sub-1khz range. More so for men than women.

24 telco voice channels fit nicely on a T1. Bandwidth was not
always cheap. Do you remember back when it was easy to run up
$40 on one phone call? It wasn't a matter of deliberate
crippling, it was about what's doable at a reasonable price
that fits what the customer needs.

In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.

Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

That is exactly the issue.

If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the analyzer/models via channels of dubious characteristics?

Not sure what you mean. The only part of the "speech model"
that matters is the spectral content. Signal/noise matters,
but that's a hardware issue.

POTS relied on the fact that humans can recognize even severely
distorted speech and can rely on knowledge of WHO they telephoned
(or who reassured themselves at the far end) to identify the
called parties.

Note how some vendors now rely on speech as a biometric
authenticator. How reliable could *that* be if I placed a
call to them from an old exchange through a dubious PBX, etc.?

I don't know, but probably just fine. One would think they
are built for telco channel transmission.

In most cases the PBX or other customer facilities does the
A/D and D/A conversion at the customer premise, so analog
signals traveling down long copper spans are not a source of
degradation.

What's the alternative... a call over Instagram? A
high-priced special circuit from your carrier?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Sunday, June 07, 2026 00:07:40

On 6/6/2026 11:25 PM, Jeremiah Jones wrote:

With cellular and the fact that legacy plant is still
in place (i.e., you have no real control over how the
signal is routed -- from second to second), is there
a better model?

Or, a technique to guesstimate the characteristics of
the "current" channel, dynamically?

The Telco minimum standard is 300 to 3400 hz (or 3200 for some
equipment manufacturers, depending on who you talk to). LPF
is required before digitizing to avoid aliasing.

Yes, but that was when PSTN was *the* means of moving voice.
How do more modern network protocols address this? Do they
deliberately cripple themselves thinking there is a land-line
(over copper, SLIC06, etc.) on at least one end of the line
(so why go out of your way to provide MORE fidelity)?

The PSTN is still the go-to carrier for voice calls. There
are VOIP and various ATM vehicles, which would likely use
telco routers and switches or telco TDM facilities... if you
need wider sound bandwidth. I'm not sure what more modern
protocols you mean. I'm not up to date on that.

There are more ways of moving voice signals from speaker A to
listener B than in decades past. Each one "colors" the speech
in ways that are easily accommodated by human listeners but
regarded as distinct by analytical mechanisms.

3400 hz sampled at 8 khz gives you pretty decent voice
quality. Not very good for music, but understandable for
speech, and for people like me with poor high-frequency
hearing, extra audio bandwidth wouldn't help much. Telephone
mikes and sound elements are also cheap and crappy. If you
want fidelity i'd start with a good quality headset.

What I want is repeatability and, ideally, with some form
of quantitative description that lets me relate THIS audio
stream to an audio stream that I would encounter when sharing
a room with the speaker.

Most of the information in a human voice signal is well within
the sub-1khz range. More so for men than women.

But, that's because the human listener is "accommodating".
One can distort their voice and still be understood by
someone intent on that understanding. Witness how you
can understand someone with a severe head cold, despite
their voice being quantitatively "different".

Or, someone with laryngitis.

A machine can similarly understand what they are *saying*.
But, its harder for a machine to identify and authenticate
that speaker. "What does Bob sound like when he's got
a head cold?"

Anything that the channel does to the signal falls into the
same problem domain.

E.g., I have a bridge that lets my *wired* home phones
use my cellular connection. But, the quality of the
connection is beneath the quality of the wired phones
on a wired landline *or* the call phone that is connected
to the bridge.

24 telco voice channels fit nicely on a T1. Bandwidth was not
always cheap. Do you remember back when it was easy to run up
$40 on one phone call? It wasn't a matter of deliberate
crippling, it was about what's doable at a reasonable price
that fits what the customer needs.

In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.

Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

That is exactly the issue.

If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the
analyzer/models via channels of dubious characteristics?

Not sure what you mean. The only part of the "speech model"
that matters is the spectral content. Signal/noise matters,
but that's a hardware issue.

But "you" sound different when your speech travels to "me"
via different channels. I can understand what you are trying to
*convey* -- much like you could understand me in a helium
environment. But, you likely wouldn't recognize my "voice"
in that environment. You would have to rely on other cues.

POTS relied on the fact that humans can recognize even severely
distorted speech and can rely on knowledge of WHO they telephoned
(or who reassured themselves at the far end) to identify the
called parties.

Note how some vendors now rely on speech as a biometric
authenticator. How reliable could *that* be if I placed a
call to them from an old exchange through a dubious PBX, etc.?

I don't know, but probably just fine. One would think they
are built for telco channel transmission.

They rely on other information -- your caller ID (which can
be forged), your knowledge of an account number, etc. And,
if their (machine's) confidence in the authentication isn't
high enough, they let a human operator perform that task
before granting you access to "confidential" information
(even though it is YOUR information)

When someone you know phones you, you are able to *identify* the
caller without checking their CID or waiting for them to introduce
themselves. Because you likely have spoken with them over the
telephone (channel) previously and knows what their "telephone
voice" sounds like.

*Learning* what they sound like requires some prior "training".
If someone called you "out of the blue" and made a demand that
you *should* honor (e.g., your boss), you would likely need
some confirmation prior to acting on it -- and would probably
try to get that confirmation (authentication) surreptitiously
(so you don't have to explain how you didn't recognize the caller).

In most cases the PBX or other customer facilities does the
A/D and D/A conversion at the customer premise, so analog
signals traveling down long copper spans are not a source of
degradation.

What's the alternative... a call over Instagram? A
high-priced special circuit from your carrier?

I'm not asking (or requiring) anything special. Rather,
I am trying to learn the characteristics of a channel that
has presented itself to me (my device) without any input
from me (or the other party!) in that selection. I.e.,
you can't tell your provider what sort of connection you want
for THIS call.

If I phone home and simply state "Meet me out front",
my other half will do so, even if annoyed at the terse,
imperative nature of my comment. Because she would
*recognize* me as the caller and would realize there
was a reason for my demand AND for its brevity as well as
the likelihood of my making such a demand.

I want a machine to be able to act with that same level
of reliability without having to resort to supplemental
interactions to bolster confidence in its authentication
of the caller.

So, if I am out of town, I can instruct the machine to
recognize a neighbor's "commands" (to perform a limited
set of actions, as my proxy) without having to quiz them
on other information (effectively, shared secrets -- even
if not truly "secret") to get that level of confidence
in their authentication:

"What day did Don depart?"
"When did he say he was going to return?"
"Where was he going?"
"What's your pet's name?"

I.e., a recording of said neighbor (or, an AI generated
simulant) could sound correct and yet not respond correctly.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Sunday, June 07, 2026 00:40:41

On 6/7/2026 12:07 AM, Don Y wrote:

If I phone home and simply state "Meet me out front",
my other half will do so, even if annoyed at the terse,
imperative nature of my comment.� Because she would
*recognize* me as the caller and would realize there
was a reason for my demand AND for its brevity as well as
the likelihood of my making such a demand.

I want a machine to be able to act with that same level
of reliability without having to resort to supplemental
interactions to bolster confidence in its authentication
of the caller.

So, if I am out of town, I can instruct the machine to
recognize a neighbor's "commands" (to perform a limited
set of actions, as my proxy) without having to quiz them
on other information (effectively, shared secrets -- even
if not truly "secret") to get that level of confidence
in their authentication:

� "What day did Don depart?"
� "When did he say he was going to return?"
� "Where was he going?"
� "What's your pet's name?"

I.e., a recording of said neighbor (or, an AI generated
simulant) could sound correct and yet not respond correctly.

A good summary of the various technologies involved:
<https://en.wikipedia.org/wiki/Speaker_recognition>
Including:

"In 2023 Vice News and The Guardian separately demonstrated
they could defeat standard financial speaker-authentication
systems using AI-generated voices generated from about five
minutes of the target's voice samples."

(I don't think you even need THAT much training material)

Note that the banking application inherently constrains the types
of "content" that is likely exchanged making it easier to spoof.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Liz Tuddenham@3:633/10 to All on Sunday, June 07, 2026 08:54:59

Jeremiah Jones <jj@j.j> wrote:

[...]

You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

Some mobile 'phones produce sounds that are barely recognisable as a
voice, let alone any particular voice. This is made even worse by
people who stand them on a table and switch them to 'hands-free' with a
loud television or radio in the background.

--
~ Liz Tuddenham ~
(Remove the ".invalid"s and add ".co.uk" to reply)
www.poppyrecords.co.uk

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Sunday, June 07, 2026 12:30:52

On 6/7/2026 12:54 AM, Liz Tuddenham wrote:

Jeremiah Jones <jj@j.j> wrote:

[...]

You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

Some mobile 'phones produce sounds that are barely recognisable as a
voice, let alone any particular voice. This is made even worse by
people who stand them on a table and switch them to 'hands-free' with a
loud television or radio in the background.

But that's also true of wired (and "wireless") phones. I
had a "single chip" phone many decades ago that felt like
I was using a can-on-a-string -- a mild breeze would blow it
off the table!

The fact that they can get any sort of audio quality
(think: "music player") out of a cell phone suggests
they are putting SOME effort into that.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Jeremiah Jones@3:633/10 to All on Sunday, June 07, 2026 16:49:04

Don Y <blockedofcourse@foo.invalid> wrote:

On 6/6/2026 11:25 PM, Jeremiah Jones wrote:

(snip for focus)

3400 hz sampled at 8 khz gives you pretty decent voice
quality. Not very good for music, but understandable for
speech, and for people like me with poor high-frequency
hearing, extra audio bandwidth wouldn't help much. Telephone
mikes and sound elements are also cheap and crappy. If you
want fidelity i'd start with a good quality headset.

What I want is repeatability and, ideally, with some form
of quantitative description that lets me relate THIS audio
stream to an audio stream that I would encounter when sharing
a room with the speaker.

That would be signal-to-noise ratio (SNR).
SNR = St / (St - Sr)
if St is the transmitted signal, ie out of the speaker's
mouth, and Sr is the received signal, that which goes into
your ear.

Simple to say, accurate measurement could be pretty difficult.

You could send a known signal thru the network that would
quantify frequency and phase distortion and latency. I don't
know what off-the-shelf equipment would do that, but there
must be some.

Most of the information in a human voice signal is well within
the sub-1khz range. More so for men than women.

But, that's because the human listener is "accommodating".

No. It's because that's just the spectral content of a human
voice. That one can be objectively measured & quantified.

One can distort their voice and still be understood by
someone intent on that understanding. Witness how you
can understand someone with a severe head cold, despite
their voice being quantitatively "different".

Or, someone with laryngitis.

A machine can similarly understand what they are *saying*.
But, its harder for a machine to identify and authenticate
that speaker. "What does Bob sound like when he's got
a head cold?"

Anything that the channel does to the signal falls into the
same problem domain.

Not exactly. What the channel does to the signal is noise,
which falls under SNR. If Bob has a head cold, that's just a
difference in the original signal.

If you are trying to identify the caller via a "voice print",
it still adds to the confusion all the same. IIRC a voice
print can't be spoofed by another human voice, but i'm pretty
sure it can be spoofed by an AI synthesis of Bob's voice. I
speculate the voice print would not be greatly affected if Bob
has a head cold, since it sounds like a product of the
speaker's physiology, like larynx, tongue etc, at least
according to my very incomplete understanding.

E.g., I have a bridge that lets my *wired* home phones
use my cellular connection. But, the quality of the
connection is beneath the quality of the wired phones
on a wired landline *or* the call phone that is connected
to the bridge.

24 telco voice channels fit nicely on a T1. Bandwidth was not
always cheap. Do you remember back when it was easy to run up
$40 on one phone call? It wasn't a matter of deliberate
crippling, it was about what's doable at a reasonable price
that fits what the customer needs.

In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.

Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

That is exactly the issue.

If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the
analyzer/models via channels of dubious characteristics?

Not sure what you mean. The only part of the "speech model"
that matters is the spectral content. Signal/noise matters,
but that's a hardware issue.

But "you" sound different when your speech travels to "me"
via different channels. I can understand what you are trying to
*convey* -- much like you could understand me in a helium
environment. But, you likely wouldn't recognize my "voice"
in that environment. You would have to rely on other cues.

That would be a pretty serious signal degradation.

(snip)

In most cases the PBX or other customer facilities does the
A/D and D/A conversion at the customer premise, so analog
signals traveling down long copper spans are not a source of
degradation.

What's the alternative... a call over Instagram? A
high-priced special circuit from your carrier?

I'm not asking (or requiring) anything special. Rather,
I am trying to learn the characteristics of a channel that
has presented itself to me (my device) without any input
from me (or the other party!) in that selection. I.e.,
you can't tell your provider what sort of connection you want
for THIS call.

I see.

To the extent that a phone call is digitized the entire way,
except for the first and last mile (or fraction thereof),
different call paths aren't going to make any difference.

It's also probable that, if you use the PSTN, you are going to
get the same path every time anyway. The PSTN mostly uses TDM
transport which is pre-determined, rather than multiple paths
like the Internet does.

(snip the rest, as I have nothing useful to add to it)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Don Y@3:633/10 to All on Monday, June 08, 2026 00:02:21

On 6/7/2026 4:49 PM, Jeremiah Jones wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

On 6/6/2026 11:25 PM, Jeremiah Jones wrote:

(snip for focus)

3400 hz sampled at 8 khz gives you pretty decent voice
quality. Not very good for music, but understandable for
speech, and for people like me with poor high-frequency
hearing, extra audio bandwidth wouldn't help much. Telephone
mikes and sound elements are also cheap and crappy. If you
want fidelity i'd start with a good quality headset.

What I want is repeatability and, ideally, with some form
of quantitative description that lets me relate THIS audio
stream to an audio stream that I would encounter when sharing
a room with the speaker.

That would be signal-to-noise ratio (SNR).
SNR = St / (St - Sr)
if St is the transmitted signal, ie out of the speaker's
mouth, and Sr is the received signal, that which goes into
your ear.

I think it is more than that. I think the channel "colors"
the content.

What prompted the post was a demo I gave the other day.
With my ear buds in place, I was successfully authenticated
(context independant).

I borrowed one of my guests' cell phone and then placed a call
to "myself". And, was NOT authenticated. My voice hadn't
changed in the few minutes between those two tests. I didn't
choose a different utterance.

What had changed was the channel between my mouth and the recognizer.

When I had initially designed the mechanism, I verified it with a
POTS connection. Now, the station set was replaced with a cell phone
and the copper line with a packet switch network operating over a
wireless connection.

To the machine, my voice was no longer strictly what it should have
been.

Simple to say, accurate measurement could be pretty difficult.

You could send a known signal thru the network that would
quantify frequency and phase distortion and latency. I don't
know what off-the-shelf equipment would do that, but there
must be some.

In theory, KNOWING the identity of the remote party AND recognizing
whatever utterance, I could dynammically determine an inverse function
that maps what I'm *hearing* from him to what he is likely *saying*.
That function would qualify the channel's characteristics.

But, if I want to normalize the channel SO I CAN AUTHENTICATE THE CALLER,
I've already corrupted that analysis by using my knowledge of his
speech to adjust the channel so it *matches* his speech. That's rife
with fallacy.

Most of the information in a human voice signal is well within
the sub-1khz range. More so for men than women.

But, that's because the human listener is "accommodating".

No. It's because that's just the spectral content of a human
voice. That one can be objectively measured & quantified.

But only if it is faithfully presented to the analyzer.
If it is altered in transit, then all bets are off.
(Well, you can claim to make a match by enlarging the
confidence interval you're willing to tolerate for
"certainty")

One can distort their voice and still be understood by
someone intent on that understanding. Witness how you
can understand someone with a severe head cold, despite
their voice being quantitatively "different".

Or, someone with laryngitis.

A machine can similarly understand what they are *saying*.
But, its harder for a machine to identify and authenticate
that speaker. "What does Bob sound like when he's got
a head cold?"

Anything that the channel does to the signal falls into the
same problem domain.

Not exactly. What the channel does to the signal is noise,
which falls under SNR. If Bob has a head cold, that's just a
difference in the original signal.

Shirley the instrument used doesn't differ from every other
instrument that could have been used to convey the signal?
I.e., all cell phones, carriers, etc. are 100% interchangeable.
The mechanical dimensions of transducers don't attenuate or
amplify certain frequencies, etc.

It's just a fluke that the cell phone used in my test (above)
happened to corrupt the signal delivered to the network?

If you are trying to identify the caller via a "voice print",
it still adds to the confusion all the same. IIRC a voice
print can't be spoofed by another human voice, but i'm pretty
sure it can be spoofed by an AI synthesis of Bob's voice. I
speculate the voice print would not be greatly affected if Bob
has a head cold, since it sounds like a product of the
speaker's physiology, like larynx, tongue etc, at least
according to my very incomplete understanding.

Your physiology changes in each of these "abnormal" situations.

When you "get a cold", your larynx swells which drives the fundamental frequency of all of your voiced sounds lower. They can become coated
with mucus further deepening the voice. Coughing or clearing phlegm
irritates the vocal chords making your voice sound more raspy or
"shredded". The nasal passages tend to get blocked or narrowed
reducing the prominence of unvoiced sounds (m, ng, etc.) as the
resonance is altered by the physical changes to that "chamber".
Soft tissues don't vibrate as easily. etc.

Most "impersonators" rely on other mechanisms to reinforce the
perception of their voice. A machine can note the difference
in spectral content while a human listener is more willing to
tolerate deviations.

When I am over-tired, my voice gets very breathy and more
monotonic. A human can understand what I am saying -- as
can a machine. But, expecting the machine to correctly
*authenticate* me from such speech samples would be a stretch.

An AI (or other synthetic voice) is actually using a model
built *from* the speaker's voice so it more accurately
reflects the original speaker's voice.

To defeat an AI, you have to posses one or more "secrets"
that the AI can't deduce from the situation. Or, hope
the AI is sluggish in responding to your prompts (which
can be spoken or visual, etc. -- "How many fingers am
I holding up?")

E.g., I have a bridge that lets my *wired* home phones
use my cellular connection. But, the quality of the
connection is beneath the quality of the wired phones
on a wired landline *or* the call phone that is connected
to the bridge.

24 telco voice channels fit nicely on a T1. Bandwidth was not
always cheap. Do you remember back when it was easy to run up
$40 on one phone call? It wasn't a matter of deliberate
crippling, it was about what's doable at a reasonable price
that fits what the customer needs.

In some implementations, such as some special circuit voice
lines, the least significant bit is reserved for supervisory
signalling.

Cell uses a much narrower bandwidth. They do some data
compression to save bits, the nature of which i can't tell
you, but the saving is dramatic. You can tell the difference
in sound quality, tho, if you're calling from a landline to
another landline, as opposed to calling a cell phone.

That is exactly the issue.

If you speak into a microphone hard-wired to a device that
digitizes your speech at a high frequency and resolution and
have models of your speech built in that environment, how
do you match those models to speech that has made its way to the
analyzer/models via channels of dubious characteristics?

Not sure what you mean. The only part of the "speech model"
that matters is the spectral content. Signal/noise matters,
but that's a hardware issue.

But "you" sound different when your speech travels to "me"
via different channels. I can understand what you are trying to
*convey* -- much like you could understand me in a helium
environment. But, you likely wouldn't recognize my "voice"
in that environment. You would have to rely on other cues.

That would be a pretty serious signal degradation.

There are no standards for the station sets on each end of a
comm link. Because humans can recognize seriously degraded
speech, fidelity isn't an important criteria.

(snip)

In most cases the PBX or other customer facilities does the
A/D and D/A conversion at the customer premise, so analog
signals traveling down long copper spans are not a source of
degradation.

What's the alternative... a call over Instagram? A
high-priced special circuit from your carrier?

I'm not asking (or requiring) anything special. Rather,
I am trying to learn the characteristics of a channel that
has presented itself to me (my device) without any input
from me (or the other party!) in that selection. I.e.,
you can't tell your provider what sort of connection you want
for THIS call.

I see.

To the extent that a phone call is digitized the entire way,
except for the first and last mile (or fraction thereof),
different call paths aren't going to make any difference.

It's also probable that, if you use the PSTN, you are going to
get the same path every time anyway. The PSTN mostly uses TDM
transport which is pre-determined, rather than multiple paths
like the Internet does.

"ATM is a core protocol used in the synchronous optical networking
and synchronous digital hierarchy (SONET/SDH) backbone of the public
switched telephone network and in the Integrated Services Digital
Network (ISDN) but has largely been superseded in favor of next-generation networks based on IP technology."

IP doesn't ensure that one datagram follows the same path as the
preceding or following.

(snip the rest, as I have nothing useful to add to it)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online
Recent Visitors
- Wang Bu
  Sunday, June 14, 2026 19:13:00
  from Manila, Philippines via Telnet
- Wang Bu
  Sunday, May 24, 2026 21:32:28
  from Manila, Philippines via Telnet
- Wang Bu
  Monday, May 18, 2026 09:25:45
  from Manila, Philippines via Telnet
- Wang Bu
  Thursday, May 14, 2026 00:10:16
  from Manila, Philippines via Telnet

System Info

Sysop:	Jacob Catayoc
Location:	Pasay City, Metro Manila, Philippines
Users:	4
Nodes:	4 (0 / 4)
Uptime:	494928:15:42
Calls:	162
Files:	568
D/L today:	14 files (349K bytes)
Messages:	74,957

Telco (voice) channel

Who's Online

Recent Visitors

System Info