* Reduce marshal and unmarshal overhead
Targeted optimizations to reduce performance overhead introduced by
recent feature additions and the unsafe removal.
Unmarshal:
- parseKeyval: access the node directly in the builder's slice to set
Raw, bypassing NodeAt which triggers a GC write barrier for the
nodes-pointer on every key-value expression.
- Iterator.Next: cache the *nodes slice dereference in a local variable
to avoid repeated pointer-to-slice indirection in the hot loop.
Marshal:
- Guard shouldOmitZero calls with an inlineable options.omitzero check.
shouldOmitZero has inlining cost 1145 (budget 80), so avoiding the
function call when omitzero is not set removes per-field overhead.
- Inline the isNil check in encodeMap. isNil has inlining cost 93
(budget 80), so expanding it at the single hot call site avoids
per-map-entry function call overhead.
Update README benchmarks.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix benchmark script replacing internal package imports
The sed command in bench() was replacing all occurrences of the go-toml
module path, including sub-package imports like internal/assert. This
caused the BurntSushi/toml benchmark to fail because it tried to import
github.com/BurntSushi/toml/internal/assert which doesn't exist.
Fix by anchoring the sed pattern to only match the import path when
followed by a closing quote, preserving internal package imports.
Also add a guard in the benchstathtml Python script to give a clear
error instead of an IndexError when no benchmark results are available.
https://claude.ai/code/session_016JGASo49PeFSfCaDxvrGFE
* Update benchmark results in README
https://claude.ai/code/session_016JGASo49PeFSfCaDxvrGFE
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Fix ci.sh for new benchmarks
Nice + taskset are more stable on my machine. We want to excude non high-level
benchmarks. BurntSushi/toml now supports canada.toml.
* Update latest benchmarks in README
* Use pointers instead of copying around ast.Node
Node is a 56B struct that is constantly in the hot path. Passing nodes
around by copy had a cost that started to add up. This change replaces
them by pointers. Using unsafe pointer arithmetic and converting
sibling/child indexes to relative offsets, it removes the need to carry
around a pointer to the root of the tree. This saves 8B per Node. This
space will be used to store an extra []byte slice to provide contextual
error handling on all nodes, including the ones whose data is different
than the raw input (for example: strings with escaped characters), while
staying under the size of a cache line.
* Remove conditional
* Add Raw to track range in data for parsed values
* Simplify reference tracking