../whats-make-arm-an-arm

What Makes an ARM... Arm?

You're probably reading this on an ARM chip. No, really. If it's a phone, 100%. A new Mac? Yep. A Windows laptop? Uh oh, look out, Intel. Let's fall down the rabbit hole of what this 'ARM' thing actually is.

It's one of those things that’s everywhere, but no one ever really explains.

Table of Contents


TL;DR


The First Rabbit Hole: What is ARM?

This was the first thing that psyched me out. I thought ARM was a company that made chips, like Intel.

Nope!

    /\_____/\
   /  o   o  \
  ( ==  ^  == )
   )         (
  (           )
 ( (  )   (  ) )
(__(__)___(__)__)

(Baaa!)

ARM (the company) is more like an architectural firm. They design blueprints. The big one is the ARM Instruction Set Architecture (ISA). This is, quite literally, the language that the processor speaks.

They license these blueprints out.

This is completely different from the x86 world. Intel and AMD own their x86 designs. They design 'em, they build 'em, they sell 'em. You can't license the "Core i9" blueprint and make your own. This licensing model is ARM's first superpower.

The Big Showdown: RISC vs. CISC

Okay, so ARM's blueprint is for a RISC chip. What the heck does that mean?

It stands for Reduced Instruction Set Computer. Its arch-nemesis is CISC, the Complex Instruction Set Computer.

This is the core, A Number 1, most important difference.

Your CISC (Intel x86) processor is a jack of all trades. 🌀 (O_o) It has tons of complex, specialized instructions. Want to add two numbers from memory and store the result back in memory all in one go? It’s got an instruction for that! ADD [memory_A], [memory_B] (I'm simplifying, but you get it).

The philosophy is: "Let's make the hardware (the chip) smart, so the software (your code) can be dumber."

          CISC (x86)
+------------------------------------+
|  ONE BIG, COMPLEX INSTRUCTION      |
|  (e.g., "Add mem A to mem B")      |
+------------------------------------+
| Takes multiple  clock cycles, but  |
| looks like one step to the coder.  |
+------------------------------------+

The RISC (ARM) processor is a minimalist. ✨ (^-^) It has a tiny, fixed set of very simple instructions. All instructions are the same length (e.g., 32 bits). All of them (basically) take one clock cycle to run.

Want to add two numbers from memory? Too bad. The RISC philosophy says, "Hey, only 'Load' and 'Store' instructions get to touch memory. That's it."

So you have to do this instead:

  1. LDR R1, [memory_A] (Load the value from memory A into a temporary bucket called Register 1)
  2. LDR R2, [memory_B] (Load the value from memory B into Register 2)
  3. ADD R3, R1, R2 (Add the numbers in R1 and R2, put the result in R3)
  4. STR R3, [memory_C] (Store the result from R3 into memory C)
                   RISC (ARM)
+--------+ +--------+ +--------+ +---------+
|  LDR   | |  LDR   | |  ADD   | |  STR    |
+--------+ +--------+ +--------+ +---------+
| Each step is tiny, simple, and takes     |
| one clock cycle. More steps, but faster  |
| and way more predictable.                |
+------------------------------------------+

This looks less efficient, right? Four instructions vs. one! But here's the magic: because every instruction is so simple and all the same size, the CPU can chew through them at a ludicrous speed. It can pipeline them, run them out of order, and predict what's coming next with spooky accuracy.

The CISC chip, with its janky, variable-length instructions, is like a Rube Goldberg machine. It's constantly trying to figure out "Wait, is this instruction 1 byte? Or 5 bytes? Or 15?"

David Patterson, one of the godfathers of RISC, pointed out that the x86 manual is over 3,600 pages long. The ARM manual is... also a beast (5,400 pages for ARMv8), but the core RISC philosophy is all about that simple, clean foundation.


The Heart of the Matter: The Instruction Set

Let's dive a little deeper into that RISC philosophy.

The Load or Store Philosophy (aka "The No Touchy Rule")

This is what we just talked about, but it's worth its own section. It's the defining feature.

On an ARM chip, math operations (like ADD, SUB, MUL) can only operate on registers. Registers are those tiny, super-fast "bucket" storage spaces right on the CPU. An x86 chip might have 16 of them. An ARM chip has 32 (or more).

[R1] [R2] [R3] ... [R32]
 (tiny, happy, super-fast buckets!)

This is a brilliant design. Why? Because accessing main RAM is slow. Glacially slow. The CPU is a Formula 1 car, and RAM is a horse drawn buggy. (CPU 🏎️) ... (RAM 🐴)

The CISC (x86) approach lets you build an F1 car that can also be pulled by a horse (ADD [memory]...).

The RISC (ARM) approach says: "That's dumb. Let's build a tiny stable (the registers) right next to the F1 car. We'll use two really fast couriers (LDR and STR) to move hay into the stable. But the F1 car only ever deals with the stable."

This separation makes it way easier to optimize and run things in parallel.

"The RISC philosophy concentrates on reducing the complexity of instructions performed by the hardware. The RISC philosophy provides greater flexibility and intelligence in software rather than hardware."

Source: ACS College of Engineering

Conditional Love (and Execution)

This is a classic ARM feature that's just so cool. (^_−)☆ On older ARM architectures (and it's still around in a different form), you could make almost any instruction conditional.

In normal code, you'd say: IF (R1 == 5) THEN { ADD R2, R2, 1 }

This IF causes a "branch," a jump in the code that can mess up the CPU's perfect pipeline.

ARM let you do this: ADD**EQ** R2, R2, 1

This one instruction means: "Add 1 to R2, but only if the 'equal' flag is set." The CPU just runs the instruction, checks the flag, and if it's not set, does... nothing. It just skips it in one cycle. No jump. No pipeline flush. It's... elegant.

The Squeezy Boy: The Thumb Instruction Set

Another classic ARM trick! Those 32-bit instructions were great for performance, but they were chunky. In the '90s, memory was precious. So, ARM introduced "Thumb"—a second instruction set, living right alongside the first, that used 16-bit instructions.

+---------------+   squeeeeeze!   +-------+
|  32-bit Instr |     (>_<)       | 16-bit|
+---------------+                 +-------+

It was a compressed version. The CPU could switch into "Thumb mode," run a bunch of code with double the density, and then switch back to 32-bit mode for the heavy lifting. This was a killer feature for early mobile phones and the Game Boy Advance.


The M1-llion Dollar Question: Why Did Apple Switch?

This is the story, right? Apple, who had been with Intel's x86 for over a decade, just... left. And their new M1 chips destroyed the Intel competition.

It wasn't just RISC vs. CISC. It was control.

  1. Performance-per-Watt: The Intel story for years was "more power, more power, more power." This also meant "more heat, more heat, more heat." Their 12" MacBook was a disaster—it was so hot it had to throttle itself constantly. Intel: (🔥_🔥) (too hot!) Meanwhile, Apple's iPads (running ARM chips) were getting scarily fast... and had no fans. ARM: (•‿•) (so cool) The writing was on the wall. ARM's simple RISC design is just way more power-efficient.

  2. The SoC (System on a Chip): Apple didn't just want to swap a CPU. They wanted to build a single, monolithic beast of a chip.

    +------------------------------------------+
    |           Apple 🍎 M1 (SoC)              |
    |                                          |
    | +-------+ +-------+ +----------+ +-----+ |
    | |  CPU  | |  GPU  | | Neural   | | RAM | |
    | | (ARM) | |       | | Engine   | |     | |
    | +-------+ +-------+ +----------+ +-----+ |
    |                                          |
    +------------------------------------------+
    

    They put the CPU, the GPU, the AI cores, and even the RAM on the same piece of silicon. This is called a "System on a Chip," and it makes data transfer insanely fast. Intel's model was to sell you a CPU. You'd get your RAM from one company, your GPU from another... Apple wanted to control the whole widget. The ARM license let them do that. Intel's x86 business model did not.

  3. The Gory Details: That Hacker News comment was right. The M1's decoders are monstrous. Because ARM's instructions are all the same simple 32-bit length, Apple could just build a super-wide decoder that shovels 8 instructions per clock cycle into the chip. Intel's best chips were struggling to do 4, because they're constantly parsing that messy, variable-length x86 code.

Lessons from the Rabbit Hole

Next Steps

This just scratches the surface. We didn't even really get into 64-bit (AArch64), privilege levels (EL0-EL3), or the new kid on the block, RISC-V (which is open source!).

 (\_/)
 (O.o)
(")_(")
(Down the *next* rabbit hole...)

But for now, I want to leave you with the official Arm developer documentation. It's... a lot. But now you know the philosophy. It's not a 3,600-page manual of "what if"s; it's a testament to the power of "Load, Store, and get out of the way."

What's your favorite "simple-is-actually-genius" tech? Let me know!

/tech/ /arm/ /cpu/ /architecture/