Language and Performance

1. Introduction

Is C always faster than other programming languages? Well, the performance of a program is ultimately determined by the machine instructions the CPU executes and how efficiently they use the underlying hardware. A high-level programming language is an abstraction, and the way it’s translated down to the processor’s Instruction Set Architecture (ISA), like MIPS or ARM64, is the primary factor behind its performance.

Direct Compilation (C): A language like C is a relatively “thin” abstraction over the hardware. The C compiler’s job is to translate the code directly into a sequence of machine instructions for a specific ISA. A C statement like is_prime[j] = 0; can be compiled into a very small number of assembly instructions, similar to a sw (store word) instruction in MIPS that writes a value to a calculated memory address. This direct translation allows for maximum speed because there is no intermediary at runtime.
Virtual Machines & JIT Compilation (Java): Java introduces a layer of abstraction: the Java Virtual Machine (JVM). Java code is first compiled into platform-independent bytecode, which is a kind of instruction set for the virtual machine, not a physical one. When running the program, the JVM interprets this bytecode. Crucially, its Just-In-Time (JIT) compiler identifies frequently executed code loops (hotspots) and translates them from bytecode into the native machine code of the host processor (e.g., ARM64). This means after a “warm-up” period, the core logic runs as native code, explaining its high performance.
Interpretation (Python): Python sits at the highest level of abstraction. The CPython interpreter, itself a compiled C program, reads Python scripts. A statement like is_prime[j] = False does not translate to a single store instruction. Instead, the interpreter executes a complex series of machine-code functions to:
1. Look up the is_prime object.
2. Verify it’s a list.
3. Calculate the memory location for index j.
4. Assign a reference to the global False object.
  Each of these steps requires many more machine instructions than C’s direct approach.

2. Experiment Design

To empirically measure these differences, the Sieve of Eratosthenes algorithm was implemented in C, Java, and Python.

Objective: To measure how efficiently different language translation models utilize the CPU and memory hierarchy for a compute-bound task.
Algorithm: The Sieve of Eratosthenes was chosen for its high number of arithmetic operations and its linear, cache-friendly memory access pattern.
Methodology:
- The C code was compiled with -O3 optimizations, which instruct the compiler to generate more efficient assembly code (e.g., better instruction scheduling to avoid pipeline stalls).
- Tests were run for N = 1,000,000 and N = 1,000,000,000.
- Each test was run three times to get a stable average.
Environment: All tests were conducted on an Apple MacBook Air (M3 - ARM64 ISA, 16GB RAM).

3. Results and Analysis

The average execution times are summarized below.

Language	N	Avg. Execution Time (seconds)
C (No Optimization)	1,000,000	0.00403 s
C (-O3)	1,000,000	0.00115 s
Java	1,000,000	0.00423 s
Python	1,000,000	0.03222 s

C (No Optimization)	1,000,000,000	5.95437 s
C (-O3)	1,000,000,000	5.19292 s
Java	1,000,000,000	5.40840 s
Python	1,000,000,000	~1200.0 s (20 minutes)

Discussion

C: Peak Hardware Utilization: Optimized C is the fastest because it generates the most efficient sequence of machine instructions. The contiguous array in C maps directly to a block of physical memory, allowing the processor to prefetch data effectively and keep the caches full. The compiler’s -O3 flag enables advanced optimizations, rearranging the generated assembly to minimize dependencies and keep the CPU’s execution pipeline saturated.
Java: The Power of On-the-Fly Native Compilation: The JIT compiler is the hero of Java’s performance. It analyzes the running bytecode and effectively does what the C -O3 flag does, but at runtime. It translates the critical for loops into highly optimized native ARM64 machine code, allowing it to rival C’s speed. The overhead comes from the initial interpretation and the JIT compilation process itself.
Python: The Cost of Abstraction: The results for Python clearly show the price of its high-level abstraction. The interpreter adds a massive layer between the code and the ISA. The number of machine instructions executed for a single line of Python is orders of magnitude greater than for the equivalent C code, leading to the dramatic slowdown. Furthermore, Python’s list of objects is less cache-efficient than C’s raw byte array, likely causing more cache misses.

4. Conclusion

From a computer organization standpoint, program performance is a direct function of the efficiency of the generated machine code and its interaction with hardware features like the memory cache.

This experiment demonstrates a clear hierarchy:

C provides the most direct path from source code to optimized machine instructions, offering the highest potential to fully exploit the underlying hardware.
Java uses a sophisticated JIT compiler to bridge the gap, translating platform-agnostic bytecode into high-performance native code at runtime, making it exceptionally fast for server-side and long-running applications.
Python prioritizes developer productivity by design, and this choice imposes a significant abstraction layer that results in a much larger number of machine instructions being executed for the same task, making it less suitable for raw, CPU-bound number crunching.

1. Introduction#

2. Experiment Design#

3. Results and Analysis#

Discussion#

4. Conclusion#

1. Introduction

2. Experiment Design

3. Results and Analysis

Discussion

4. Conclusion