Benchmark & Tool Comparison¶
This document explains how each ABI checking tool works, what analysis method it uses, benchmark results across real-world test cases, and why the numbers come out the way they do.
Note: abicheck detects 100+ change types (see Change Kind Reference). The cross-tool benchmark table below uses 42 representative cases (case01-41 + case26b). The full
examples/directory has 63 cases — abicheck passes all of them.
How each tool analyses ABI¶
abicheck (compare mode)¶
.so (v1) ──► ELF reader: exported symbols, SONAME, visibility
castxml (Clang AST): types, methods, vtable, noexcept
DWARF reader: size cross-check
──► snapshot (JSON)
├──► checker engine ──► verdict
.so (v2) ──► (same) ──► snapshot (JSON) ┘
Analysis basis: ELF symbol table + Clang AST via castxml + DWARF.
Header requirement: Yes — headers are passed to castxml for full type analysis.
Compiler requirement: None — castxml runs separately as a standalone tool.
This gives abicheck three independent data sources per symbol: ELF (what is exported), AST (what the C++ type contract says), and DWARF (actual compiled layout for cross-check).
abicheck (compat mode)¶
Same analysis engine as compare, but accepts ABICC-format XML descriptors
instead of snapshots:
<descriptor>
<version>1.0</version>
<headers>/path/to/include/foo.h</headers>
<libs>/path/to/libfoo.so</libs>
</descriptor>
Used as a drop-in for ABICC-based CI pipelines (abicheck compat check -lib foo -old v1.xml -new v2.xml).
Why compat scores lower (40/42 vs 42/42):
compat follows ABICC's verdict vocabulary: COMPATIBLE, BREAKING, NO_CHANGE.
It cannot emit API_BREAK — the verdict for source-level-only breaks that are binary-safe
(e.g. a parameter rename, or reduced access level in a class method).
Two benchmark cases (case31_enum_rename, case34_access_level) expect API_BREAK,
so they score as misses in compat mode.
This is intentional and documented in examples/ground_truth.json as expected_compat.
When to use compat: When you have an existing ABICC XML pipeline and want to
migrate to abicheck without rewriting scripts.
When to use compare: For all new integrations — full verdict set including API_BREAK.
abicheck (strict mode)¶
compat with -s / --strict flag. Promotes COMPATIBLE → BREAKING and
API_BREAK → BREAKING.
Two sub-modes via --strict-mode:
- full (default with -s): COMPATIBLE + API_BREAK → BREAKING (matches ABICC -strict)
- api: only API_BREAK → BREAKING, additive COMPATIBLE changes stay COMPATIBLE
Why strict scores 31/42 (not 100%):
Nine benchmark cases are legitimately COMPATIBLE (additive changes, SONAME addition,
symbol versioning policy, etc.). --strict-mode full promotes these to BREAKING —
intentionally, just like ABICC -strict. These are correct tool outputs for the
strict policy, but score as misses against the ground truth which says COMPATIBLE.
Why strict (31/42) > ABICC dumper (20/30 = 66%):
ABICC dumper fails to even produce a result on 12 cases (ERROR + TIMEOUT), so its
denominator is only 30. abicheck strict runs on all 42 cases — even cases where the
verdict is intentionally promoted. If you normalise ABICC dumper to 42 cases,
its effective accuracy is 20/42 = 48%, well below strict's 31/42 = 73%.
When to use strict: CI gates where any COMPATIBLE addition (e.g. new symbol) should
fail the build. Use --strict-mode api to avoid false positives on purely additive changes.
abidiff (ELF mode, no headers)¶
Analysis basis: DWARF (primary), CTF/BTF fallback; pure ELF symbol table if no debug info present.
Header requirement: None (in ELF mode).
Compiler requirement: None.
abidiff reads type information from DWARF sections of the .so when available. If DWARF
is absent it falls back to CTF (Oracle/Solaris-style binaries) or BTF (Linux kernel/eBPF
modules), and finally to ELF symbol names only when no debug info is present.
For our benchmark, all .so files are built with -g so DWARF is used throughout.
Benchmark result: 11/42 (26%)
abidiff misses anything that is not directly a symbol removal or a change that DWARF
fully describes. Specifically:
- Struct layout, vtable, return type changes → DWARF often marks as COMPATIBLE because
it cannot determine binary impact without header type context
- Enum value semantics, typedef chains → COMPATIBLE
- noexcept, static qualifier, const qualifier, access level → not in DWARF at all
Stripped binaries (no debug info): abidiff degrades to ELF-only (symbol names). abicheck continues to work via castxml — header-based type analysis does not need debug symbols. This makes abicheck significantly more useful for production binaries.
abidw + headers → abidiff¶
.so (v1) ──► abidw --headers-dir /path/to/headers/ ──► ABI XML ──┐
├──► abidiff ──► report
.so (v2) ──► abidw --headers-dir /path/to/headers/ ──► ABI XML ──┘
Note:
--headers-diris a flag forabidw(the dumper), notabidiffitself. The filtering happens at dump time;abidiffonly compares the resulting XML.
--headers-dir role: Filters which symbols are considered public API.
It does not provide additional type information — abidw still reads types from DWARF.
Why abidiff+headers = abidiff in our suite (both 11/42):
Our benchmark examples are compiled with -fvisibility=default, meaning all symbols
are exported by default. None of the headers use __attribute__((visibility("hidden"))).
So the header filter changes nothing — all symbols are already public in both modes.
The fundamental limitation is that abidiff relies on DWARF for types, not AST.
Even with perfect headers, it cannot see noexcept, static-qualifier changes, or
source-level-only changes that have no ELF/DWARF representation.
When would --headers-dir help? If the library uses visibility("hidden") for internal
symbols in the headers, --headers-dir would filter them out and reduce false positives.
It does not improve detection of semantic changes.
ABICC (abi-dumper workflow)¶
.so (v1, compiled with -g) ──► abi-dumper ──► v1.abi ──┐
├──► abi-compliance-checker ──► report
.so (v2, compiled with -g) ──► abi-dumper ──► v2.abi ──┘
Analysis basis: DWARF — same as abidiff, but through Perl-based abi-dumper.
Header requirement: Optional (pass -public-headers to filter to public API).
Compiler requirement: None. Debug build (-g) required.
Benchmark result: 20/30 scored (66%) — 12 cases ERROR/TIMEOUT:
- case09_cpp_vtable: 122s timeout (abi-compliance-checker with complex vtable)
- case28/30/31/32/33/34/35/36/40: ERROR — abi-dumper fails on these C++ patterns
(typedef chains, access levels, field renames, anon structs, etc.)
Why ABICC(dump) accuracy is 66% but effective is only 48%:
Scoring 20/30 looks reasonable, but 12 out of 42 cases don't even run. For a fair comparison:
20/42 = 48% effective accuracy. Meanwhile abicheck gets 42/42 = 100%.
ABICC (XML / legacy mode)¶
v1.xml (headers dir + .so path) ──► abi-compliance-checker (invokes GCC internally) ──► report
v2.xml (headers dir + .so path) ──┘
Analysis basis: GCC-compiled AST from headers.
Header requirement: Yes — must point to headers directory.
Compiler requirement: Yes — GCC only. Clang and icpx are not supported.
Why ABICC(xml) is slow and unreliable:
1. GCC invocation per case — even for 5-line headers, GCC startup costs dominate
2. Directory input causes redefinition errors — if the descriptor's <headers> tag
points to a directory, abi-compliance-checker includes ALL .h files found there,
including duplicates from build subdirs → redefinition errors → wrong verdicts
3. GCC compatibility — abi-compliance-checker uses gcc -fdump-lang-class internally,
whose output format changed between GCC major versions. ABICC 2.3 prints a compatibility
warning on every run when used with GCC 11+. Results may differ across GCC versions.
4. case16_inline_to_non_inline: reliably hits 120s timeout
Our fix in PR #72: Pass a specific header file path instead of a directory in
<headers>. This drops runtime from 120s → ~1s and fixes wrong verdicts.
Benchmark result: 25/41 (60%) — 1 case TIMEOUT, rest scored.
Verdict vocabulary comparison¶
| Verdict | abicheck compare | abicheck compat | abidiff | ABICC |
|---|---|---|---|---|
NO_CHANGE |
✅ | ✅ | ✅ (exit 0) | ⚠️ reports 100% compat |
COMPATIBLE |
✅ | ✅ | ✅ (exit 4) | ⚠️ reports 100% compat |
API_BREAK |
✅ | ❌ not supported | ❌ | ❌ |
BREAKING |
✅ | ✅ | ✅ (exit 8+) | ✅ |
API_BREAK = source-level break, binary-compatible. Example: parameter renamed,
access level changed, pure API contract violation with no ABI binary change.
Only abicheck compare can emit this verdict.
Why abicheck achieves 100%¶
abicheck uses three independent analysis passes per comparison:
- ELF pass — symbol table diff: detects visibility changes, SONAME, symbol binding, symbol version policy, added/removed/renamed exported symbols
- castxml pass — Clang AST diff: detects noexcept, static qualifier, const qualifier, method-became-static, pure virtual additions, access level, parameter/return type changes that are invisible in ELF/DWARF
- DWARF cross-check — validates actual compiled type sizes, struct/class member offsets,
vtable slot offsets, base class offsets, and
#pragma pack/-march-sensitive alignment that header analysis alone may compute incorrectly
Neither abidiff nor ABICC runs all three passes. abidiff has no AST (misses noexcept, static, const). ABICC has no ELF pass (misses SONAME, visibility). ABICC(dump) has no AST (same gaps as abidiff plus instability on complex C++).
Benchmark summary (2026-03-11, 42 cases)¶
| Tool | Scored | Correct | Accuracy | Not scored | Time |
|---|---|---|---|---|---|
| abicheck (compare) | 42 | 42 | 100% | 0 | 212s |
| abicheck (compat) | 42 | 40 | 95% | 0 — 2 API_BREAK n/a | 79s |
| abicheck (strict) | 42 | 31 | 73% | 0 — 9 intentional FP | 78s |
| abidiff | 42 | 11 | 26% | 0 | 2.5s |
| abidiff+headers | 42 | 11 | 26% | 0 | 3.9s |
| ABICC(dump) | 30 | 20 | 66% (48% of 42) | 12 ERROR/TIMEOUT | 294s |
| ABICC(xml) | 41 | 25 | 61% (60% of 42 effective) | 1 TIMEOUT | 445s |
Full results (42 cases)¶
| Case | Expected | abicheck | compat | strict | abidiff | abidiff+hdr | ABICC(dump) | ABICC(xml) |
|---|---|---|---|---|---|---|---|---|
| case01_symbol_removal | BREAKING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| case02_param_type_change | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ✅ | ✅ |
| case03_compat_addition | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ✅ | ✅ | ✅ | ✅ |
| case04_no_change | NO_CHANGE | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT |
| case05_soname | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ⚠️ BREAKING | ⚠️ BREAKING | ✅ | ✅ |
| case06_visibility | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ⚠️ BREAKING | ⚠️ BREAKING | ❌ BREAKING | ❌ BREAKING |
| case07_struct_layout | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ NO_CHANGE | ⚠️ COMPAT | ⚠️ COMPAT |
| case08_enum_value_change | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ NO_CHANGE | ✅ | ✅ |
| case09_cpp_vtable | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ NO_CHANGE | ⏱️ TIMEOUT | ✅ |
| case10_return_type | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ✅ | ⚠️ COMPAT |
| case11_global_var_type | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ✅ | ✅ |
| case12_function_removed | BREAKING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| case13_symbol_versioning | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ✅ NO_CHANGE | ✅ NO_CHANGE | ✅ | ✅ |
| case14_cpp_class_size | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ✅ | ⚠️ COMPAT |
| case15_noexcept_change | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ⚠️ COMPAT | ⚠️ COMPAT |
| case16_inline_to_non_inline | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ✅ | ✅ | ❌ ERROR | ⏱️ TIMEOUT |
| case17_template_abi | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ⚠️ COMPAT | ⚠️ COMPAT |
| case18_dependency_leak | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ⚠️ COMPAT | ⚠️ COMPAT |
| case19_enum_member_removed | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ⚠️ COMPAT | ⚠️ COMPAT |
| case20_enum_member_value_changed | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ⚠️ COMPAT | ⚠️ COMPAT |
| case21_method_became_static | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ✅ | ✅ |
| case22_method_const_changed | BREAKING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ COMPAT |
| case23_pure_virtual_added | BREAKING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ COMPAT |
| case24_union_field_removed | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ⚠️ COMPAT | ⚠️ COMPAT |
| case25_enum_member_added | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ✅ NO_CHANGE | ✅ NO_CHANGE | ✅ | ✅ |
| case26_union_field_added | BREAKING | ✅ | ✅ | ✅ | ⚠️ COMPAT | ⚠️ COMPAT | ✅ | ✅ |
| case26b_union_field_added_compatible | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ✅ NO_CHANGE | ✅ NO_CHANGE | ✅ | ✅ |
| case27_symbol_binding_weakened | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ✅ NO_CHANGE | ✅ NO_CHANGE | ✅ | ✅ |
| case28_typedef_opaque | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case29_ifunc_transition | COMPATIBLE | ✅ | ✅ | ❌ BREAKING¹ | ✅ NO_CHANGE | ✅ NO_CHANGE | ✅ | ✅ |
| case30_field_qualifiers | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case31_enum_rename | API_BREAK | ✅ | ⚠️ API_BREAK² | ✅ BREAKING | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case32_param_defaults | NO_CHANGE | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ ERROR | ⚠️ COMPAT |
| case33_pointer_level | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case34_access_level | API_BREAK | ✅ | ⚠️ API_BREAK² | ✅ BREAKING | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case35_field_rename | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case36_anon_struct | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case37_base_class | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ⚠️ COMPAT | ⚠️ COMPAT |
| case38_virtual_methods | BREAKING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| case39_var_const | BREAKING | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ ERROR | ✅ |
| case40_field_layout | BREAKING | ✅ | ✅ | ✅ | ⚠️ NO_CHANGE | ⚠️ NO_CHANGE | ❌ ERROR | ⚠️ COMPAT |
| case41_type_changes | BREAKING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Legend: ✅ correct · ⚠️ wrong/undercounted · ❌ wrong in opposite direction · ⏱️ timed out (120s cutoff)
¹ strict false positive: COMPATIBLE → BREAKING is expected with --strict-mode full; use --strict-mode api to avoid.
² compat known limitation: API_BREAK verdict not supported; maps to COMPATIBLE (scored as miss).
Timing¶
| Tool | Total (42 cases) | Notes |
|---|---|---|
| abicheck | ~212s | castxml per case; sequential, parallelisable |
| abicheck compat | ~79s | XML descriptor mode |
| abidiff | ~2.5s | ELF+DWARF, very fast |
| ABICC (dumper) | ~294s | abi-dumper + abi-compliance-checker per case |
| ABICC (xml) | ~445s | GCC compilation per case; case09+case16 TIMEOUT |
Measured on: Ubuntu 22.04, 8 vCPU, 32GB RAM. All runs sequential.
Run the benchmark yourself¶
# Select specific cases or tools
python3 scripts/benchmark_comparison.py --cases case01 case09 case21
python3 scripts/benchmark_comparison.py --tools abicheck abidiff
Choosing the right tool¶
| Scenario | Recommended |
|---|---|
| New CI pipeline, full accuracy | abicheck compare |
| Migrating from ABICC XML pipeline | abicheck compat check |
| Strict gate (any addition = fail) | abicheck compat check -s |
| Debug build available, DWARF check | abicheck compare (castxml already better) |
| Quick ELF-only sanity check | abidiff (fast, 26% but catches symbol removals) |