Nativify Benchmark & Test Suite
This document describes the comprehensive benchmark and testing infrastructure for Nativify, ensuring that native obfuscation preserves Java's default behavior while providing detailed performance metrics.
Table of Contents
- Overview
- Quick Start
- Correctness Tests
- Performance Benchmarks
- Comprehensive Test Suite
- Running Tests
- Understanding Results
- Test Categories
Overview
The Nativify benchmark suite provides two critical capabilities:
- Correctness Verification: Ensures native-compiled code behaves identically to JVM bytecode
- Performance Measurement: Measures execution time with detailed statistics (10 iterations averaged)
Key Features
- Complete 300+ Test Cases covering all Java behaviors
- Complete Edge Case Testing for overflow, underflow, null handling, exceptions
- Complete 10 Iterations per Benchmark with average, min, max, and standard deviation
- Complete Pass/Fail Results - Clear indication if native code breaks Java behavior
- Complete Comprehensive Coverage - Loops, arithmetic, functions, arrays, objects, strings, math, bit operations, control flow
Quick Start
Build the Benchmark Module
Run the Complete Test Suite
Option 1: Run everything (recommended)
Option 2: Correctness tests only
Option 3: Performance benchmarks only
Correctness Tests
The correctness test suite (CorrectnessTests.java) verifies that native-compiled code produces identical results to JVM bytecode.
What It Tests
Basic Operations
- Complete Integer arithmetic (add, subtract, multiply, divide, modulo)
- Complete Long, double, float arithmetic
- Complete Comparison operations
- Complete Increment/decrement
Critical Edge Cases
- Complete Integer overflow/underflow - Ensures wrapping behavior matches JVM
- Complete Division by zero - Graceful handling
- Complete Sign extension - Negative byte/short values preserved correctly
- Complete Array bounds - Empty arrays, single element arrays
- Complete Null handling - Null objects, null-safe operations
- Complete Type casting - Proper type conversion behavior
Data Structures
- Complete Arrays (int, long, double, byte, short, char, boolean)
- Complete Multi-dimensional arrays
- Complete Array operations (sum, max, min, search, reverse, fill)
- Complete Objects and nested objects
Control Flow
- Complete If-else chains
- Complete Switch statements (table and lookup)
- Complete Loops (for, while, do-while, enhanced for)
- Complete Ternary operators
- Complete Short-circuit evaluation
Bit Operations
- Complete AND, OR, XOR, NOT
- Complete Left shift, right shift, unsigned right shift
- Complete Bit manipulation (set, clear, toggle, test)
- Complete Rotate operations
String Operations
- Complete Concatenation
- Complete Comparison (equals, compareTo, equalsIgnoreCase)
- Complete Substring, indexOf, contains
- Complete Transform (toLowerCase, toUpperCase, trim, replace)
- Complete Split operations
Math Functions
- Complete Trigonometric (sin, cos, tan)
- Complete Logarithm, exponential, power
- Complete Absolute, max, min, floor, ceil, round
- Complete GCD, LCM, prime checking
- Complete Pythagorean, quadratic formula
Real-World Algorithms
- Complete Hash functions (simple hash, FNV-1a, CRC32-like)
- Complete Sorting (bubble, insertion, selection)
- Complete Searching (binary, linear)
- Complete Text processing (reverse, palindrome, Levenshtein distance)
- Complete Matrix operations
- Complete Statistics (mean, median, standard deviation)
- Complete Encoding (Base64-like, run-length)
- Complete Dynamic programming (LCS, knapsack)
Example Output
[TESTING LOOP BENCHMARKS]
--------------------------------------------------------------------------------
simpleCountingLoop(1000)
simpleCountingLoop(0)
whileLoop(1000)
nestedLoops(10, 10)
[TESTING EDGE CASES - JAVA BEHAVIOR COMPATIBILITY]
--------------------------------------------------------------------------------
Integer overflow
Division by zero
Byte sign extension (negative)
Empty array sum
Null object check
All edge cases passed - Native obfuscation preserves Java behavior!
Test Results: 285 passed, 0 failed
SUCCESS: All tests passed!
Performance Benchmarks
The benchmark suite (BenchmarkRunner.java) measures execution time with detailed statistics.
Benchmark Methodology
Each benchmark: 1. Warmup: 5 iterations to warm up JIT compiler 2. Measurement: 10 iterations with precise timing 3. Statistics: Average, minimum, maximum, standard deviation
Statistics Explained
Simple Counting Loop avg: 5.234 µs min: 4.891 µs max: 6.123 µs ± 0.412 µs
Benchmark Name Average Minimum Maximum Std Deviation
- Average (avg): Mean execution time across 10 iterations
- Minimum (min): Fastest execution time observed
- Maximum (max): Slowest execution time observed
- Standard Deviation (±): Measure of variance/consistency
Time Units
Results automatically scale to appropriate units: - ns (nanoseconds): < 1 microsecond - µs (microseconds): 1-999 microseconds - ms (milliseconds): ≥ 1 millisecond
Benchmark Categories
- Loop Benchmarks: For loops, while loops, nested loops, loops with conditions
- Arithmetic Benchmarks: Integer, long, float, double operations, sign extension
- Function Call Benchmarks: Static calls, recursion, deep call stacks
- Constant Folding Benchmarks: Compile-time optimization verification
- Array Benchmarks: Array operations, multi-dimensional arrays
- Control Flow Benchmarks: If-else, switch, ternary operators
- Bit Operation Benchmarks: Bitwise operations, shifts, rotations
- Object Benchmarks: Object creation, field access, instanceof, casting
- String Benchmarks: String manipulation and comparison
- Math Benchmarks: Mathematical functions and algorithms
- Real-World Benchmarks: Sorting, searching, hashing, encoding
Comprehensive Test Suite
The ComprehensiveTestSuite.java combines both correctness tests and performance benchmarks into a unified runner.
Features
- Two-Phase Execution: Correctness first (MUST PASS), then performance (informational)
- Clear Reporting: Visual output with statistics and summaries
- Flexible Options: Run all tests, or only specific phases
- Exit Codes: Returns non-zero if correctness tests fail
Command Line Options
# Run everything (default)
java -jar nativify-benchmark.jar
# Only verify correctness (fast)
java -jar nativify-benchmark.jar --correctness-only
# Only measure performance
java -jar nativify-benchmark.jar --benchmarks-only
# Show help
java -jar nativify-benchmark.jar --help
Running Tests
Step 1: Build the Project
Windows:
Linux:
Step 2: Create Test JAR (Obfuscated)
mvn exec:java -pl nativify-cli ^
-Dexec.mainClass="dev.haedus.nativify.CLIMain" ^
-Dexec.args="-i nativify-benchmark/target/nativify-benchmark-0.1.0-SNAPSHOT.jar -o benchmark-obfuscated.jar -compileFor windows"
Step 3: Run Tests on Obfuscated JAR
Expected Output
Nativify Comprehensive Test Suite
Ensuring Native Obfuscation Preserves Java Behavior
PHASE 1: CORRECTNESS VERIFICATION
Testing that native code behaves identically to JVM bytecode
[TESTING LOOP BENCHMARKS]
simpleCountingLoop(1000)
...
Complete ALL CORRECTNESS TESTS PASSED!
PHASE 2: PERFORMANCE BENCHMARKING
Measuring performance with 10 iterations (avg, min, max, stddev)
[LOOP BENCHMARKS]
Simple Counting Loop avg: 5.234 µs min: 4.891 µs max: 6.123 µs ± 0.412 µs
...
TEST SUITE SUMMARY
Complete Correctness Tests: PASSED
All Java behaviors preserved by native compilation
Complete Performance Benchmarks: COMPLETED
All benchmarks ran 10 iterations with statistics
Understanding Results
Correctness Tests
- **** = Test passed (native code matches JVM behavior)
- **** = Test failed (behavior mismatch)
- Exit code 0 = All tests passed
- Exit code 1 = One or more tests failed
Performance Benchmarks
Interpreting Statistics
Low Standard Deviation (good)
- Consistent performance - Predictable execution timeHigh Standard Deviation (investigate)
- Variable performance - May indicate JIT compilation, GC pauses, or cache effectsComparing Native vs Bytecode
Run benchmarks on both versions:
# Bytecode version
java -jar nativify-benchmark.jar --benchmarks-only > bytecode-results.txt
# Native version (after obfuscation)
java -jar benchmark-obfuscated.jar --benchmarks-only > native-results.txt
# Compare
diff bytecode-results.txt native-results.txt
Test Categories
1. Loop Benchmarks
- Simple counting loops
- While loops
- Do-while loops
- Enhanced for loops
- Nested loops
- Loops with conditionals, break, continue
2. Arithmetic Benchmarks
- Integer, long, float, double operations
- Overflow/underflow handling
- Sign extension (critical!)
- Mixed arithmetic
- Complex expressions
3. Function Call Benchmarks
- Static method calls
- Recursion (Fibonacci, factorial)
- Deep call stacks
- Multiple parameters
- Polymorphic calls
4. Array Benchmarks
- All primitive array types
- Multi-dimensional arrays
- Array operations (sum, max, min, search, reverse)
- Sign extension in byte/short arrays
5. Control Flow Benchmarks
- If-else chains
- Switch statements
- Ternary operators
- Short-circuit evaluation
6. Bit Operation Benchmarks
- Bitwise AND, OR, XOR, NOT
- Shifts (left, right, unsigned right)
- Bit manipulation
- Rotations
7. Object Benchmarks
- Object creation
- Field access and modification
- Instanceof checks
- Type casting
- Null handling
8. String Benchmarks
- Concatenation
- Comparison
- Substring operations
- Transformations
- Split operations
9. Math Benchmarks
- Trigonometric functions
- Logarithm, exponential
- Statistical functions
- GCD, LCM
- Prime checking
10. Real-World Benchmarks
- Sorting algorithms
- Search algorithms
- Hash functions
- Text processing
- Matrix operations
- Encoding algorithms
Troubleshooting
Test Failures
If correctness tests fail:
- Check the error message - It will show expected vs actual values
- Verify platform - Ensure
-Djavacpp.platformmatches your system - Review recent changes - Check if code modifications broke behavior
- Run specific test - Isolate the failing test category
- Generate LLVM IR - Use
-ll output.llto inspect generated code
Benchmark Issues
If benchmarks show unexpected results:
- Warmup period - First few iterations may be slower (JIT compilation)
- GC interference - Garbage collection can cause spikes
- Background processes - Close other applications for accurate results
- CPU throttling - Ensure system is not in power-saving mode
Common Issues
Issue: "ERROR during measurement"
- Solution: Check that all benchmark methods are properly marked with @Nativify
Issue: Very high standard deviation - Solution: Increase warmup iterations or run on dedicated hardware
Issue: Tests pass but benchmarks fail - Solution: Benchmarks failing is informational; correctness is what matters
Contributing
When adding new features to Nativify:
- Complete Add correctness tests - Ensure new bytecode instructions preserve Java behavior
- Complete Add benchmarks - Measure performance impact
- Complete Test edge cases - Cover overflow, null, empty arrays, etc.
- Complete Run full suite - Verify all tests still pass
Summary
The Nativify benchmark and test suite provides:
- Comprehensive correctness verification - 300+ tests ensuring native code = JVM bytecode
- Detailed performance metrics - 10 iterations with avg/min/max/stddev
- Edge case coverage - Overflow, underflow, null, exceptions, all handled correctly
- Easy to use - Single command runs everything
- Clear reporting - Visual output with pass/fail indicators
Bottom line: If correctness tests pass, native obfuscation is working correctly. Benchmarks provide additional insight into performance characteristics.
Last Updated: 2025-11-23