Nativify Benchmark & Test Suite

This document describes the comprehensive benchmark and testing infrastructure for Nativify, ensuring that native obfuscation preserves Java's default behavior while providing detailed performance metrics.

Overview

The Nativify benchmark suite provides two critical capabilities:

Correctness Verification: Ensures native-compiled code behaves identically to JVM bytecode
Performance Measurement: Measures execution time with detailed statistics (10 iterations averaged)

Key Features

Complete 300+ Test Cases covering all Java behaviors
Complete Edge Case Testing for overflow, underflow, null handling, exceptions
Complete 10 Iterations per Benchmark with average, min, max, and standard deviation
Complete Pass/Fail Results - Clear indication if native code breaks Java behavior
Complete Comprehensive Coverage - Loops, arithmetic, functions, arrays, objects, strings, math, bit operations, control flow

Quick Start

Build the Benchmark Module

mvn clean install -Djavacpp.platform=windows-x86_64

Run the Complete Test Suite

Option 1: Run everything (recommended)

java -jar nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar

Option 2: Correctness tests only

java -jar nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar --correctness-only

Option 3: Performance benchmarks only

java -jar nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar --benchmarks-only

Correctness Tests

The correctness test suite (CorrectnessTests.java) verifies that native-compiled code produces identical results to JVM bytecode.

What It Tests

Basic Operations

Complete Integer arithmetic (add, subtract, multiply, divide, modulo)
Complete Long, double, float arithmetic
Complete Comparison operations
Complete Increment/decrement

Critical Edge Cases

Complete Integer overflow/underflow - Ensures wrapping behavior matches JVM
Complete Division by zero - Graceful handling
Complete Sign extension - Negative byte/short values preserved correctly
Complete Array bounds - Empty arrays, single element arrays
Complete Null handling - Null objects, null-safe operations
Complete Type casting - Proper type conversion behavior

Data Structures

Complete Arrays (int, long, double, byte, short, char, boolean)
Complete Multi-dimensional arrays
Complete Array operations (sum, max, min, search, reverse, fill)
Complete Objects and nested objects

Control Flow

Complete If-else chains
Complete Switch statements (table and lookup)
Complete Loops (for, while, do-while, enhanced for)
Complete Ternary operators
Complete Short-circuit evaluation

Bit Operations

Complete AND, OR, XOR, NOT
Complete Left shift, right shift, unsigned right shift
Complete Bit manipulation (set, clear, toggle, test)
Complete Rotate operations

String Operations

Complete Concatenation
Complete Comparison (equals, compareTo, equalsIgnoreCase)
Complete Substring, indexOf, contains
Complete Transform (toLowerCase, toUpperCase, trim, replace)
Complete Split operations

Math Functions

Complete Trigonometric (sin, cos, tan)
Complete Logarithm, exponential, power
Complete Absolute, max, min, floor, ceil, round
Complete GCD, LCM, prime checking
Complete Pythagorean, quadratic formula

Real-World Algorithms

Complete Hash functions (simple hash, FNV-1a, CRC32-like)
Complete Sorting (bubble, insertion, selection)
Complete Searching (binary, linear)
Complete Text processing (reverse, palindrome, Levenshtein distance)
Complete Matrix operations
Complete Statistics (mean, median, standard deviation)
Complete Encoding (Base64-like, run-length)
Complete Dynamic programming (LCS, knapsack)

Example Output

[TESTING LOOP BENCHMARKS]
--------------------------------------------------------------------------------
 simpleCountingLoop(1000)
 simpleCountingLoop(0)
 whileLoop(1000)
 nestedLoops(10, 10)

[TESTING EDGE CASES - JAVA BEHAVIOR COMPATIBILITY]
--------------------------------------------------------------------------------
 Integer overflow
 Division by zero
 Byte sign extension (negative)
 Empty array sum
 Null object check

 All edge cases passed - Native obfuscation preserves Java behavior!

Test Results: 285 passed, 0 failed
SUCCESS: All tests passed!

Performance Benchmarks

The benchmark suite (BenchmarkRunner.java) measures execution time with detailed statistics.

Benchmark Methodology

Each benchmark: 1. Warmup: 5 iterations to warm up JIT compiler 2. Measurement: 10 iterations with precise timing 3. Statistics: Average, minimum, maximum, standard deviation

Statistics Explained

 Simple Counting Loop avg: 5.234 µs min: 4.891 µs max: 6.123 µs ± 0.412 µs

 Benchmark Name Average Minimum Maximum Std Deviation

Average (avg): Mean execution time across 10 iterations
Minimum (min): Fastest execution time observed
Maximum (max): Slowest execution time observed
Standard Deviation (±): Measure of variance/consistency

Time Units

Results automatically scale to appropriate units: - ns (nanoseconds): < 1 microsecond - µs (microseconds): 1-999 microseconds - ms (milliseconds): ≥ 1 millisecond

Benchmark Categories

Loop Benchmarks: For loops, while loops, nested loops, loops with conditions
Arithmetic Benchmarks: Integer, long, float, double operations, sign extension
Function Call Benchmarks: Static calls, recursion, deep call stacks
Constant Folding Benchmarks: Compile-time optimization verification
Array Benchmarks: Array operations, multi-dimensional arrays
Control Flow Benchmarks: If-else, switch, ternary operators
Bit Operation Benchmarks: Bitwise operations, shifts, rotations
Object Benchmarks: Object creation, field access, instanceof, casting
String Benchmarks: String manipulation and comparison
Math Benchmarks: Mathematical functions and algorithms
Real-World Benchmarks: Sorting, searching, hashing, encoding

Comprehensive Test Suite

The ComprehensiveTestSuite.java combines both correctness tests and performance benchmarks into a unified runner.

Features

Two-Phase Execution: Correctness first (MUST PASS), then performance (informational)
Clear Reporting: Visual output with statistics and summaries
Flexible Options: Run all tests, or only specific phases
Exit Codes: Returns non-zero if correctness tests fail

Command Line Options

# Run everything (default)
java -jar nativify-benchmark.jar

# Only verify correctness (fast)
java -jar nativify-benchmark.jar --correctness-only

# Only measure performance
java -jar nativify-benchmark.jar --benchmarks-only

# Show help
java -jar nativify-benchmark.jar --help

Running Tests

Step 1: Build the Project

Windows:

mvn clean install -Djavacpp.platform=windows-x86_64

Linux:

mvn clean install -Djavacpp.platform=linux-x86_64

Step 2: Create Test JAR (Obfuscated)

mvn exec:java -pl nativify-cli ^
 -Dexec.mainClass="dev.haedus.nativify.CLIMain" ^
 -Dexec.args="-i nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar -o benchmark-obfuscated.jar -compileFor windows"

Step 3: Run Tests on Obfuscated JAR

java -jar benchmark-obfuscated.jar

Expected Output

 Nativify Comprehensive Test Suite
 Ensuring Native Obfuscation Preserves Java Behavior




 PHASE 1: CORRECTNESS VERIFICATION
 Testing that native code behaves identically to JVM bytecode


[TESTING LOOP BENCHMARKS]
 simpleCountingLoop(1000)
 ...

Complete ALL CORRECTNESS TESTS PASSED!


 PHASE 2: PERFORMANCE BENCHMARKING
 Measuring performance with 10 iterations (avg, min, max, stddev)


[LOOP BENCHMARKS]
 Simple Counting Loop avg: 5.234 µs min: 4.891 µs max: 6.123 µs ± 0.412 µs
 ...


 TEST SUITE SUMMARY
 Complete Correctness Tests: PASSED
 All Java behaviors preserved by native compilation
 Complete Performance Benchmarks: COMPLETED
 All benchmarks ran 10 iterations with statistics

Understanding Results

Correctness Tests

**** = Test passed (native code matches JVM behavior)
**** = Test failed (behavior mismatch)
Exit code 0 = All tests passed
Exit code 1 = One or more tests failed

Performance Benchmarks

Interpreting Statistics

Low Standard Deviation (good)

avg: 10.5 µs min: 10.2 µs max: 10.9 µs ± 0.2 µs

- Consistent performance - Predictable execution time

High Standard Deviation (investigate)

avg: 10.5 µs min: 8.1 µs max: 15.3 µs ± 2.4 µs

- Variable performance - May indicate JIT compilation, GC pauses, or cache effects

Comparing Native vs Bytecode

Run benchmarks on both versions:

# Bytecode version
java -jar nativify-benchmark.jar --benchmarks-only > bytecode-results.txt

# Native version (after obfuscation)
java -jar benchmark-obfuscated.jar --benchmarks-only > native-results.txt

# Compare
diff bytecode-results.txt native-results.txt

Test Categories

1. Loop Benchmarks

Simple counting loops
While loops
Do-while loops
Enhanced for loops
Nested loops
Loops with conditionals, break, continue

2. Arithmetic Benchmarks

Integer, long, float, double operations
Overflow/underflow handling
Sign extension (critical!)
Mixed arithmetic
Complex expressions

3. Function Call Benchmarks

Static method calls
Recursion (Fibonacci, factorial)
Deep call stacks
Multiple parameters
Polymorphic calls

4. Array Benchmarks

All primitive array types
Multi-dimensional arrays
Array operations (sum, max, min, search, reverse)
Sign extension in byte/short arrays

5. Control Flow Benchmarks

If-else chains
Switch statements
Ternary operators
Short-circuit evaluation

6. Bit Operation Benchmarks

Bitwise AND, OR, XOR, NOT
Shifts (left, right, unsigned right)
Bit manipulation
Rotations

7. Object Benchmarks

Object creation
Field access and modification
Instanceof checks
Type casting
Null handling

8. String Benchmarks

Concatenation
Comparison
Substring operations
Transformations
Split operations

9. Math Benchmarks

Trigonometric functions
Logarithm, exponential
Statistical functions
GCD, LCM
Prime checking

10. Real-World Benchmarks

Sorting algorithms
Search algorithms
Hash functions
Text processing
Matrix operations
Encoding algorithms

Troubleshooting

Test Failures

If correctness tests fail:

Check the error message - It will show expected vs actual values
Verify platform - Ensure -Djavacpp.platform matches your system
Review recent changes - Check if code modifications broke behavior
Run specific test - Isolate the failing test category
Generate LLVM IR - Use -ll output.ll to inspect generated code

Benchmark Issues

If benchmarks show unexpected results:

Warmup period - First few iterations may be slower (JIT compilation)
GC interference - Garbage collection can cause spikes
Background processes - Close other applications for accurate results
CPU throttling - Ensure system is not in power-saving mode

Common Issues

Issue: "ERROR during measurement" - Solution: Check that all benchmark methods are properly marked with @Nativify

Issue: Very high standard deviation - Solution: Increase warmup iterations or run on dedicated hardware

Issue: Tests pass but benchmarks fail - Solution: Benchmarks failing is informational; correctness is what matters

Contributing

When adding new features to Nativify:

Complete Add correctness tests - Ensure new bytecode instructions preserve Java behavior
Complete Add benchmarks - Measure performance impact
Complete Test edge cases - Cover overflow, null, empty arrays, etc.
Complete Run full suite - Verify all tests still pass

Summary

The Nativify benchmark and test suite provides:

Comprehensive correctness verification - 300+ tests ensuring native code = JVM bytecode
Detailed performance metrics - 10 iterations with avg/min/max/stddev
Edge case coverage - Overflow, underflow, null, exceptions, all handled correctly
Easy to use - Single command runs everything
Clear reporting - Visual output with pass/fail indicators

Bottom line: If correctness tests pass, native obfuscation is working correctly. Benchmarks provide additional insight into performance characteristics.

Last Updated: 2025-11-23

Nativify Benchmark & Test Suite

Table of Contents

Overview

Key Features

Quick Start

Build the Benchmark Module

Run the Complete Test Suite

Correctness Tests

What It Tests

Basic Operations

Critical Edge Cases

Data Structures

Control Flow

Bit Operations

String Operations

Math Functions

Real-World Algorithms

Example Output

Performance Benchmarks

Benchmark Methodology

Statistics Explained

Time Units

Benchmark Categories

Comprehensive Test Suite

Features

Command Line Options

Running Tests

Step 1: Build the Project

Step 2: Create Test JAR (Obfuscated)

Step 3: Run Tests on Obfuscated JAR

Expected Output

Understanding Results

Correctness Tests

Performance Benchmarks

Interpreting Statistics

Comparing Native vs Bytecode

Test Categories

1. Loop Benchmarks

2. Arithmetic Benchmarks

3. Function Call Benchmarks

4. Array Benchmarks

5. Control Flow Benchmarks

6. Bit Operation Benchmarks

7. Object Benchmarks

8. String Benchmarks

9. Math Benchmarks

10. Real-World Benchmarks

Troubleshooting

Test Failures

Benchmark Issues

Common Issues

Contributing

Summary