Skip to content

Nativify Benchmark & Test Suite

This document describes the comprehensive benchmark and testing infrastructure for Nativify, ensuring that native obfuscation preserves Java's default behavior while providing detailed performance metrics.

Table of Contents


Overview

The Nativify benchmark suite provides two critical capabilities:

  1. Correctness Verification: Ensures native-compiled code behaves identically to JVM bytecode
  2. Performance Measurement: Measures execution time with detailed statistics (10 iterations averaged)

Key Features

  • Complete 300+ Test Cases covering all Java behaviors
  • Complete Edge Case Testing for overflow, underflow, null handling, exceptions
  • Complete 10 Iterations per Benchmark with average, min, max, and standard deviation
  • Complete Pass/Fail Results - Clear indication if native code breaks Java behavior
  • Complete Comprehensive Coverage - Loops, arithmetic, functions, arrays, objects, strings, math, bit operations, control flow

Quick Start

Build the Benchmark Module

mvn clean install -Djavacpp.platform=windows-x86_64

Run the Complete Test Suite

Option 1: Run everything (recommended)

java -jar nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar

Option 2: Correctness tests only

java -jar nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar --correctness-only

Option 3: Performance benchmarks only

java -jar nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar --benchmarks-only


Correctness Tests

The correctness test suite (CorrectnessTests.java) verifies that native-compiled code produces identical results to JVM bytecode.

What It Tests

Basic Operations

  • Complete Integer arithmetic (add, subtract, multiply, divide, modulo)
  • Complete Long, double, float arithmetic
  • Complete Comparison operations
  • Complete Increment/decrement

Critical Edge Cases

  • Complete Integer overflow/underflow - Ensures wrapping behavior matches JVM
  • Complete Division by zero - Graceful handling
  • Complete Sign extension - Negative byte/short values preserved correctly
  • Complete Array bounds - Empty arrays, single element arrays
  • Complete Null handling - Null objects, null-safe operations
  • Complete Type casting - Proper type conversion behavior

Data Structures

  • Complete Arrays (int, long, double, byte, short, char, boolean)
  • Complete Multi-dimensional arrays
  • Complete Array operations (sum, max, min, search, reverse, fill)
  • Complete Objects and nested objects

Control Flow

  • Complete If-else chains
  • Complete Switch statements (table and lookup)
  • Complete Loops (for, while, do-while, enhanced for)
  • Complete Ternary operators
  • Complete Short-circuit evaluation

Bit Operations

  • Complete AND, OR, XOR, NOT
  • Complete Left shift, right shift, unsigned right shift
  • Complete Bit manipulation (set, clear, toggle, test)
  • Complete Rotate operations

String Operations

  • Complete Concatenation
  • Complete Comparison (equals, compareTo, equalsIgnoreCase)
  • Complete Substring, indexOf, contains
  • Complete Transform (toLowerCase, toUpperCase, trim, replace)
  • Complete Split operations

Math Functions

  • Complete Trigonometric (sin, cos, tan)
  • Complete Logarithm, exponential, power
  • Complete Absolute, max, min, floor, ceil, round
  • Complete GCD, LCM, prime checking
  • Complete Pythagorean, quadratic formula

Real-World Algorithms

  • Complete Hash functions (simple hash, FNV-1a, CRC32-like)
  • Complete Sorting (bubble, insertion, selection)
  • Complete Searching (binary, linear)
  • Complete Text processing (reverse, palindrome, Levenshtein distance)
  • Complete Matrix operations
  • Complete Statistics (mean, median, standard deviation)
  • Complete Encoding (Base64-like, run-length)
  • Complete Dynamic programming (LCS, knapsack)

Example Output

[TESTING LOOP BENCHMARKS]
--------------------------------------------------------------------------------
 simpleCountingLoop(1000)
 simpleCountingLoop(0)
 whileLoop(1000)
 nestedLoops(10, 10)

[TESTING EDGE CASES - JAVA BEHAVIOR COMPATIBILITY]
--------------------------------------------------------------------------------
 Integer overflow
 Division by zero
 Byte sign extension (negative)
 Empty array sum
 Null object check

 All edge cases passed - Native obfuscation preserves Java behavior!

Test Results: 285 passed, 0 failed
SUCCESS: All tests passed!

Performance Benchmarks

The benchmark suite (BenchmarkRunner.java) measures execution time with detailed statistics.

Benchmark Methodology

Each benchmark: 1. Warmup: 5 iterations to warm up JIT compiler 2. Measurement: 10 iterations with precise timing 3. Statistics: Average, minimum, maximum, standard deviation

Statistics Explained

 Simple Counting Loop avg: 5.234 µs min: 4.891 µs max: 6.123 µs ± 0.412 µs

 Benchmark Name Average Minimum Maximum Std Deviation
  • Average (avg): Mean execution time across 10 iterations
  • Minimum (min): Fastest execution time observed
  • Maximum (max): Slowest execution time observed
  • Standard Deviation (±): Measure of variance/consistency

Time Units

Results automatically scale to appropriate units: - ns (nanoseconds): < 1 microsecond - µs (microseconds): 1-999 microseconds - ms (milliseconds): ≥ 1 millisecond

Benchmark Categories

  1. Loop Benchmarks: For loops, while loops, nested loops, loops with conditions
  2. Arithmetic Benchmarks: Integer, long, float, double operations, sign extension
  3. Function Call Benchmarks: Static calls, recursion, deep call stacks
  4. Constant Folding Benchmarks: Compile-time optimization verification
  5. Array Benchmarks: Array operations, multi-dimensional arrays
  6. Control Flow Benchmarks: If-else, switch, ternary operators
  7. Bit Operation Benchmarks: Bitwise operations, shifts, rotations
  8. Object Benchmarks: Object creation, field access, instanceof, casting
  9. String Benchmarks: String manipulation and comparison
  10. Math Benchmarks: Mathematical functions and algorithms
  11. Real-World Benchmarks: Sorting, searching, hashing, encoding

Comprehensive Test Suite

The ComprehensiveTestSuite.java combines both correctness tests and performance benchmarks into a unified runner.

Features

  • Two-Phase Execution: Correctness first (MUST PASS), then performance (informational)
  • Clear Reporting: Visual output with statistics and summaries
  • Flexible Options: Run all tests, or only specific phases
  • Exit Codes: Returns non-zero if correctness tests fail

Command Line Options

# Run everything (default)
java -jar nativify-benchmark.jar

# Only verify correctness (fast)
java -jar nativify-benchmark.jar --correctness-only

# Only measure performance
java -jar nativify-benchmark.jar --benchmarks-only

# Show help
java -jar nativify-benchmark.jar --help

Running Tests

Step 1: Build the Project

Windows:

mvn clean install -Djavacpp.platform=windows-x86_64

Linux:

mvn clean install -Djavacpp.platform=linux-x86_64

Step 2: Create Test JAR (Obfuscated)

mvn exec:java -pl nativify-cli ^
 -Dexec.mainClass="dev.haedus.nativify.CLIMain" ^
 -Dexec.args="-i nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar -o benchmark-obfuscated.jar -compileFor windows"

Step 3: Run Tests on Obfuscated JAR

java -jar benchmark-obfuscated.jar

Expected Output

 Nativify Comprehensive Test Suite
 Ensuring Native Obfuscation Preserves Java Behavior




 PHASE 1: CORRECTNESS VERIFICATION
 Testing that native code behaves identically to JVM bytecode


[TESTING LOOP BENCHMARKS]
 simpleCountingLoop(1000)
 ...

Complete ALL CORRECTNESS TESTS PASSED!


 PHASE 2: PERFORMANCE BENCHMARKING
 Measuring performance with 10 iterations (avg, min, max, stddev)


[LOOP BENCHMARKS]
 Simple Counting Loop avg: 5.234 µs min: 4.891 µs max: 6.123 µs ± 0.412 µs
 ...


 TEST SUITE SUMMARY
 Complete Correctness Tests: PASSED
 All Java behaviors preserved by native compilation
 Complete Performance Benchmarks: COMPLETED
 All benchmarks ran 10 iterations with statistics

Understanding Results

Correctness Tests

  • **** = Test passed (native code matches JVM behavior)
  • **** = Test failed (behavior mismatch)
  • Exit code 0 = All tests passed
  • Exit code 1 = One or more tests failed

Performance Benchmarks

Interpreting Statistics

Low Standard Deviation (good)

avg: 10.5 µs min: 10.2 µs max: 10.9 µs ± 0.2 µs
- Consistent performance - Predictable execution time

High Standard Deviation (investigate)

avg: 10.5 µs min: 8.1 µs max: 15.3 µs ± 2.4 µs
- Variable performance - May indicate JIT compilation, GC pauses, or cache effects

Comparing Native vs Bytecode

Run benchmarks on both versions:

# Bytecode version
java -jar nativify-benchmark.jar --benchmarks-only > bytecode-results.txt

# Native version (after obfuscation)
java -jar benchmark-obfuscated.jar --benchmarks-only > native-results.txt

# Compare
diff bytecode-results.txt native-results.txt

Test Categories

1. Loop Benchmarks

  • Simple counting loops
  • While loops
  • Do-while loops
  • Enhanced for loops
  • Nested loops
  • Loops with conditionals, break, continue

2. Arithmetic Benchmarks

  • Integer, long, float, double operations
  • Overflow/underflow handling
  • Sign extension (critical!)
  • Mixed arithmetic
  • Complex expressions

3. Function Call Benchmarks

  • Static method calls
  • Recursion (Fibonacci, factorial)
  • Deep call stacks
  • Multiple parameters
  • Polymorphic calls

4. Array Benchmarks

  • All primitive array types
  • Multi-dimensional arrays
  • Array operations (sum, max, min, search, reverse)
  • Sign extension in byte/short arrays

5. Control Flow Benchmarks

  • If-else chains
  • Switch statements
  • Ternary operators
  • Short-circuit evaluation

6. Bit Operation Benchmarks

  • Bitwise AND, OR, XOR, NOT
  • Shifts (left, right, unsigned right)
  • Bit manipulation
  • Rotations

7. Object Benchmarks

  • Object creation
  • Field access and modification
  • Instanceof checks
  • Type casting
  • Null handling

8. String Benchmarks

  • Concatenation
  • Comparison
  • Substring operations
  • Transformations
  • Split operations

9. Math Benchmarks

  • Trigonometric functions
  • Logarithm, exponential
  • Statistical functions
  • GCD, LCM
  • Prime checking

10. Real-World Benchmarks

  • Sorting algorithms
  • Search algorithms
  • Hash functions
  • Text processing
  • Matrix operations
  • Encoding algorithms

Troubleshooting

Test Failures

If correctness tests fail:

  1. Check the error message - It will show expected vs actual values
  2. Verify platform - Ensure -Djavacpp.platform matches your system
  3. Review recent changes - Check if code modifications broke behavior
  4. Run specific test - Isolate the failing test category
  5. Generate LLVM IR - Use -ll output.ll to inspect generated code

Benchmark Issues

If benchmarks show unexpected results:

  1. Warmup period - First few iterations may be slower (JIT compilation)
  2. GC interference - Garbage collection can cause spikes
  3. Background processes - Close other applications for accurate results
  4. CPU throttling - Ensure system is not in power-saving mode

Common Issues

Issue: "ERROR during measurement" - Solution: Check that all benchmark methods are properly marked with @Nativify

Issue: Very high standard deviation - Solution: Increase warmup iterations or run on dedicated hardware

Issue: Tests pass but benchmarks fail - Solution: Benchmarks failing is informational; correctness is what matters


Contributing

When adding new features to Nativify:

  1. Complete Add correctness tests - Ensure new bytecode instructions preserve Java behavior
  2. Complete Add benchmarks - Measure performance impact
  3. Complete Test edge cases - Cover overflow, null, empty arrays, etc.
  4. Complete Run full suite - Verify all tests still pass

Summary

The Nativify benchmark and test suite provides:

  • Comprehensive correctness verification - 300+ tests ensuring native code = JVM bytecode
  • Detailed performance metrics - 10 iterations with avg/min/max/stddev
  • Edge case coverage - Overflow, underflow, null, exceptions, all handled correctly
  • Easy to use - Single command runs everything
  • Clear reporting - Visual output with pass/fail indicators

Bottom line: If correctness tests pass, native obfuscation is working correctly. Benchmarks provide additional insight into performance characteristics.


Last Updated: 2025-11-23