Discussion:
Anticipating processor architectural evolution
(too old to reply)
Don Y
2024-04-27 23:11:30 UTC
Permalink
I've had to refactor my RTOS design to accommodate the likelihood of SMT
in future architectures.

Thinking (hoping?) these logical cores to be the "closest to the code",
I call them "Processors" (hysterical raisins). Implicit in SMT is the
notion that they are architecturally similar/identical.

These are part of PHYSICAL cores -- that I appropriately call "Cores".

These Cores are part of "Hosts" (ick; term begs for clarity!)... what
one would casually call "chips"/CPUs. Note that a host can house dissimilar
Cores (e.g., big.LITTLE).

Two or more hosts can be present on a "Node" (the smallest unit intended to
be added to or removed from a "System"). Again, they can be dissimilar
(think CPU/GPU).

I believe this covers the composition/hierarchy of any (near) future
system architectures. And, places the minimum constraints on said.

Are there any other significant developments in the pipeline that
could alter my conception of future hardware designs?
John Larkin
2024-04-28 02:52:55 UTC
Permalink
On Sat, 27 Apr 2024 16:11:30 -0700, Don Y
Post by Don Y
I've had to refactor my RTOS design to accommodate the likelihood of SMT
in future architectures.
Thinking (hoping?) these logical cores to be the "closest to the code",
I call them "Processors" (hysterical raisins). Implicit in SMT is the
notion that they are architecturally similar/identical.
These are part of PHYSICAL cores -- that I appropriately call "Cores".
These Cores are part of "Hosts" (ick; term begs for clarity!)... what
one would casually call "chips"/CPUs. Note that a host can house dissimilar
Cores (e.g., big.LITTLE).
Two or more hosts can be present on a "Node" (the smallest unit intended to
be added to or removed from a "System"). Again, they can be dissimilar
(think CPU/GPU).
I believe this covers the composition/hierarchy of any (near) future
system architectures. And, places the minimum constraints on said.
Are there any other significant developments in the pipeline that
could alter my conception of future hardware designs?
Why not hundreds of CPUs on a chip, each assigned to one function,
with absolute hardware protection? They need not be identical, because
many would be assigned to simple functions.

The mess we have now is the legacy of thinking about a CPU as some
precious resource.
Bill Sloman
2024-04-28 03:48:19 UTC
Permalink
Post by John Larkin
On Sat, 27 Apr 2024 16:11:30 -0700, Don Y
Post by Don Y
I've had to refactor my RTOS design to accommodate the likelihood of SMT
in future architectures.
Thinking (hoping?) these logical cores to be the "closest to the code",
I call them "Processors" (hysterical raisins). Implicit in SMT is the
notion that they are architecturally similar/identical.
These are part of PHYSICAL cores -- that I appropriately call "Cores".
These Cores are part of "Hosts" (ick; term begs for clarity!)... what
one would casually call "chips"/CPUs. Note that a host can house dissimilar
Cores (e.g., big.LITTLE).
Two or more hosts can be present on a "Node" (the smallest unit intended to
be added to or removed from a "System"). Again, they can be dissimilar
(think CPU/GPU).
I believe this covers the composition/hierarchy of any (near) future
system architectures. And, places the minimum constraints on said.
Are there any other significant developments in the pipeline that
could alter my conception of future hardware designs?
Why not hundreds of CPUs on a chip, each assigned to one function,
with absolute hardware protection? They need not be identical, because
many would be assigned to simple functions.
The mess we have now is the legacy of thinking about a CPU as some
precious resource.
The "mess" we have now reflects the fact that we are less constrained
than we used to be.

As soon as you could do multi-threaded processing life became more
complicated, but you could do a great deal more.

Anything complicated will look like a mess if you don't understand
what's going on - and if you aren't directly involved why would you
bother to do the work that would let you understand what was going on?

If would be nice if we could find some philosophical high ground from
which all the various forms of parallel processing could be sorted into
a coherent taxonomy, but the filed doesn't seem to have found its Carl
Linnaeus yet.
--
Bill Sloman, Sydney
boB
2024-04-29 19:19:40 UTC
Permalink
On Sat, 27 Apr 2024 19:52:55 -0700, John Larkin
Post by John Larkin
On Sat, 27 Apr 2024 16:11:30 -0700, Don Y
Post by Don Y
I've had to refactor my RTOS design to accommodate the likelihood of SMT
in future architectures.
Thinking (hoping?) these logical cores to be the "closest to the code",
I call them "Processors" (hysterical raisins). Implicit in SMT is the
notion that they are architecturally similar/identical.
These are part of PHYSICAL cores -- that I appropriately call "Cores".
These Cores are part of "Hosts" (ick; term begs for clarity!)... what
one would casually call "chips"/CPUs. Note that a host can house dissimilar
Cores (e.g., big.LITTLE).
Two or more hosts can be present on a "Node" (the smallest unit intended to
be added to or removed from a "System"). Again, they can be dissimilar
(think CPU/GPU).
I believe this covers the composition/hierarchy of any (near) future
system architectures. And, places the minimum constraints on said.
Are there any other significant developments in the pipeline that
could alter my conception of future hardware designs?
Why not hundreds of CPUs on a chip, each assigned to one function,
with absolute hardware protection? They need not be identical, because
many would be assigned to simple functions.
Isn't this what Waferscale is, kinda ?

boB
Post by John Larkin
The mess we have now is the legacy of thinking about a CPU as some
precious resource.
Don Y
2024-04-29 22:03:57 UTC
Permalink
Post by boB
Isn't this what Waferscale is, kinda ?
WSI has proven to be a dead-end (for all but specific niche markets
and folks with deep pockets). "Here lie The Connection Machine,
The Transputer, etc."

Until recently, there haven't really been any mainstream uses for
massively parallel architectures (GPUs being the first real use and
their preemption for use in AI and Expert Systems)

To exploit an array of identical processors you typically need a
problem that can be decomposed into many "roughly comparable" (in
terms of complexity) tasks that have few interdependencies.

Most problems are inherently serial and/or have lots of dependencies
that limit the amount of true parallelism that can be attained.
Or, have widely differing resource needs/complexity to make them
ill suited to being shoe-horned into a one-size-fits-all processor
model. E.g., controlling a motor and recognizing faces have
vastly different computational requirements.

Communication is always the bottleneck in a processing application;
whether it be CPU to memory, task to task, thread to thread, etc.
It's also one of the ripest areas for bugs to creep into a design;
designing good "seams" (interfaces) is the biggest predictor of
success in any project of significance (that's why we have protection
domains, preach small modules, well defined interfaces, "contract"
programming style).

Sadly, few folks are formally taught about these interrelationships
(when was the last time you saw a Petri net?) so we have lots of
monolithic designs that are brittle due to having broken all the
Best Practices rules.

The smarter way of tackling increasingly complex problems is better
partitioning of hardware resources (with similarly architected
software atop) using FIFTY YEAR OLD protection mechanisms to enforce
the boundaries between "virtual processors".

This allows a processor having the capabilities required by the most
demanding "component" to be leveraged to, also, handle the needs of
those of lesser complexity. It also gives you a speedy way of exchanging
information between those processors without requiring specialize
fabric for that task.

And, that SHARED mechanism is easily snooped to see who is talking to
whom (as well as prohibiting interactions that *shouldn't* occur!)

E.g., I effectively allow for the creation of virtual processors of
specific capabilities and resource allocations AS IF they were discrete
hardware units interconnected by <something>. This lets me dole out
the fixed resources (memory, MIPS, time, watts) in the box to specific
uses and have "extra" for uses that require them.

(I can set a virtual processor to only have access to 64KB! -- or 16K
or 16MB -- of memory, only allow it to execute a million opcode fetches
per second, etc. and effectively have a tiny 8b CPU emulated within a
much more capable framework. And, not be limited to moving data via
a serial port to other such processors!)
John Larkin
2024-04-30 00:17:35 UTC
Permalink
On Mon, 29 Apr 2024 15:03:57 -0700, Don Y
Post by Don Y
Post by boB
Isn't this what Waferscale is, kinda ?
WSI has proven to be a dead-end (for all but specific niche markets
and folks with deep pockets). "Here lie The Connection Machine,
The Transputer, etc."
Until recently, there haven't really been any mainstream uses for
massively parallel architectures (GPUs being the first real use and
their preemption for use in AI and Expert Systems)
To exploit an array of identical processors you typically need a
problem that can be decomposed into many "roughly comparable" (in
terms of complexity) tasks that have few interdependencies.
A PC doesn't solve massively parallel computational problems.

One CPU can be a disk file server. One, a keyboard handler. One for
the mouse. One can be the ethernet interface. One CPU for each
printer. One would be the "OS", managing all the rest.

Cheap CPUs can run idle much of the time.

We don't need to share one CPU doing everything any more. We don't
need virtual memory. If each CPU has a bit of RAM, we barely need
memory management.
Post by Don Y
Most problems are inherently serial and/or have lots of dependencies
that limit the amount of true parallelism that can be attained.
Or, have widely differing resource needs/complexity to make them
ill suited to being shoe-horned into a one-size-fits-all processor
model. E.g., controlling a motor and recognizing faces have
vastly different computational requirements.
Communication is always the bottleneck in a processing application;
whether it be CPU to memory, task to task, thread to thread, etc.
It's also one of the ripest areas for bugs to creep into a design;
designing good "seams" (interfaces) is the biggest predictor of
success in any project of significance (that's why we have protection
domains, preach small modules, well defined interfaces, "contract"
programming style).
Sadly, few folks are formally taught about these interrelationships
(when was the last time you saw a Petri net?) so we have lots of
monolithic designs that are brittle due to having broken all the
Best Practices rules.
The smarter way of tackling increasingly complex problems is better
partitioning of hardware resources (with similarly architected
software atop) using FIFTY YEAR OLD protection mechanisms to enforce
the boundaries between "virtual processors".
This allows a processor having the capabilities required by the most
demanding "component" to be leveraged to, also, handle the needs of
those of lesser complexity. It also gives you a speedy way of exchanging
information between those processors without requiring specialize
fabric for that task.
And, that SHARED mechanism is easily snooped to see who is talking to
whom (as well as prohibiting interactions that *shouldn't* occur!)
E.g., I effectively allow for the creation of virtual processors of
specific capabilities and resource allocations AS IF they were discrete
hardware units interconnected by <something>. This lets me dole out
the fixed resources (memory, MIPS, time, watts) in the box to specific
uses and have "extra" for uses that require them.
(I can set a virtual processor to only have access to 64KB! -- or 16K
or 16MB -- of memory, only allow it to execute a million opcode fetches
per second, etc. and effectively have a tiny 8b CPU emulated within a
much more capable framework. And, not be limited to moving data via
a serial port to other such processors!)
Why virtual processors, if real ones are cheap?
Bill Sloman
2024-04-30 04:40:24 UTC
Permalink
Post by John Larkin
On Mon, 29 Apr 2024 15:03:57 -0700, Don Y
Post by Don Y
Post by boB
Isn't this what Waferscale is, kinda ?
WSI has proven to be a dead-end (for all but specific niche markets
and folks with deep pockets). "Here lie The Connection Machine,
The Transputer, etc."
Until recently, there haven't really been any mainstream uses for
massively parallel architectures (GPUs being the first real use and
their preemption for use in AI and Expert Systems)
To exploit an array of identical processors you typically need a
problem that can be decomposed into many "roughly comparable" (in
terms of complexity) tasks that have few interdependencies.
A PC doesn't solve massively parallel computational problems.
One CPU can be a disk file server. One, a keyboard handler. One for
the mouse. One can be the ethernet interface. One CPU for each
printer. One would be the "OS", managing all the rest.
Cheap CPUs can run idle much of the time.
We don't need to share one CPU doing everything any more. We don't
need virtual memory. If each CPU has a bit of RAM, we barely need
memory management.
Post by Don Y
Most problems are inherently serial and/or have lots of dependencies
that limit the amount of true parallelism that can be attained.
Or, have widely differing resource needs/complexity to make them
ill suited to being shoe-horned into a one-size-fits-all processor
model. E.g., controlling a motor and recognizing faces have
vastly different computational requirements.
Communication is always the bottleneck in a processing application;
whether it be CPU to memory, task to task, thread to thread, etc.
It's also one of the ripest areas for bugs to creep into a design;
designing good "seams" (interfaces) is the biggest predictor of
success in any project of significance (that's why we have protection
domains, preach small modules, well defined interfaces, "contract"
programming style).
Sadly, few folks are formally taught about these interrelationships
(when was the last time you saw a Petri net?) so we have lots of
monolithic designs that are brittle due to having broken all the
Best Practices rules.
The smarter way of tackling increasingly complex problems is better
partitioning of hardware resources (with similarly architected
software atop) using FIFTY YEAR OLD protection mechanisms to enforce
the boundaries between "virtual processors".
This allows a processor having the capabilities required by the most
demanding "component" to be leveraged to, also, handle the needs of
those of lesser complexity. It also gives you a speedy way of exchanging
information between those processors without requiring specialize
fabric for that task.
And, that SHARED mechanism is easily snooped to see who is talking to
whom (as well as prohibiting interactions that *shouldn't* occur!)
E.g., I effectively allow for the creation of virtual processors of
specific capabilities and resource allocations AS IF they were discrete
hardware units interconnected by <something>. This lets me dole out
the fixed resources (memory, MIPS, time, watts) in the box to specific
uses and have "extra" for uses that require them.
(I can set a virtual processor to only have access to 64KB! -- or 16K
or 16MB -- of memory, only allow it to execute a million opcode fetches
per second, etc. and effectively have a tiny 8b CPU emulated within a
much more capable framework. And, not be limited to moving data via
a serial port to other such processors!)
Why virtual processors, if real ones are cheap?
Because you can reconfigure them on the fly, which is harder with real
processors.
--
Bill Sloman, Sydney
john larkin
2024-04-29 22:32:47 UTC
Permalink
On Sat, 27 Apr 2024 16:11:30 -0700, Don Y
Post by Don Y
I've had to refactor my RTOS design to accommodate the likelihood of SMT
in future architectures.
Thinking (hoping?) these logical cores to be the "closest to the code",
I call them "Processors" (hysterical raisins). Implicit in SMT is the
notion that they are architecturally similar/identical.
These are part of PHYSICAL cores -- that I appropriately call "Cores".
These Cores are part of "Hosts" (ick; term begs for clarity!)... what
one would casually call "chips"/CPUs. Note that a host can house dissimilar
Cores (e.g., big.LITTLE).
Two or more hosts can be present on a "Node" (the smallest unit intended to
be added to or removed from a "System"). Again, they can be dissimilar
(think CPU/GPU).
I believe this covers the composition/hierarchy of any (near) future
system architectures. And, places the minimum constraints on said.
Are there any other significant developments in the pipeline that
could alter my conception of future hardware designs?
Vaguely related:

https://www.theregister.com/2023/10/30/arm_intel_comment/
Loading...