Move regression tests to public Unmarshal API with full error string assertions

Remove the unstable package test and consolidate all test cases into TestDecodeError_PositionAfterComment, which exercises toml.Unmarshal and validates the complete human-readable error output including context lines and tilde markers. https://claude.ai/code/session_01EXYfFXc3DDGpQ27sWdXTKq
Fix incorrect error positions for non-suffix subslices in unstable.Parser.Range (#1047 )
2026-04-12 11:59:46 +00:00 · 2026-04-12 11:52:16 +00:00 · 2026-03-24 11:08:39 +00:00 · 2026-03-23 22:00:18 -04:00
6 changed files with 131 additions and 46 deletions
@@ -235,17 +235,17 @@ the AST level. See https://pkg.go.dev/github.com/pelletier/go-toml/v2/unstable.
 Execution time speedup compared to other Go TOML libraries:

 <table>
-	<thead>
-		<tr><th>Benchmark</th><th>go-toml v1</th><th>BurntSushi/toml</th></tr>
-	</thead>
-	<tbody>
-		<tr><td>Marshal/HugoFrontMatter-2</td><td>1.9x</td><td>2.2x</td></tr>
-		<tr><td>Marshal/ReferenceFile/map-2</td><td>1.7x</td><td>2.1x</td></tr>
-		<tr><td>Marshal/ReferenceFile/struct-2</td><td>2.2x</td><td>3.0x</td></tr>
-		<tr><td>Unmarshal/HugoFrontMatter-2</td><td>2.9x</td><td>2.7x</td></tr>
-		<tr><td>Unmarshal/ReferenceFile/map-2</td><td>2.6x</td><td>2.7x</td></tr>
-		<tr><td>Unmarshal/ReferenceFile/struct-2</td><td>4.6x</td><td>5.1x</td></tr>
-	 </tbody>
+    <thead>
+        <tr><th>Benchmark</th><th>go-toml v1</th><th>BurntSushi/toml</th></tr>
+    </thead>
+    <tbody>
+        <tr><td>Marshal/HugoFrontMatter-2</td><td>2.1x</td><td>2.0x</td></tr>
+        <tr><td>Marshal/ReferenceFile/map-2</td><td>2.0x</td><td>2.0x</td></tr>
+        <tr><td>Marshal/ReferenceFile/struct-2</td><td>2.3x</td><td>2.5x</td></tr>
+        <tr><td>Unmarshal/HugoFrontMatter-2</td><td>3.3x</td><td>2.8x</td></tr>
+        <tr><td>Unmarshal/ReferenceFile/map-2</td><td>2.9x</td><td>3.0x</td></tr>
+        <tr><td>Unmarshal/ReferenceFile/struct-2</td><td>4.8x</td><td>5.0x</td></tr>
+     </tbody>
 </table>
 <details><summary>See more</summary>
 <p>The table above has the results of the most common use-cases. The table below
@@ -253,22 +253,22 @@ contains the results of all benchmarks, including unrealistic ones. It is
 provided for completeness.</p>

 <table>
-	<thead>
-		<tr><th>Benchmark</th><th>go-toml v1</th><th>BurntSushi/toml</th></tr>
-	</thead>
-	<tbody>
-		<tr><td>Marshal/SimpleDocument/map-2</td><td>1.8x</td><td>2.7x</td></tr>
-		<tr><td>Marshal/SimpleDocument/struct-2</td><td>2.7x</td><td>3.8x</td></tr>
-		<tr><td>Unmarshal/SimpleDocument/map-2</td><td>3.8x</td><td>3.0x</td></tr>
-		<tr><td>Unmarshal/SimpleDocument/struct-2</td><td>5.6x</td><td>4.1x</td></tr>
-		<tr><td>UnmarshalDataset/example-2</td><td>3.0x</td><td>3.2x</td></tr>
-		<tr><td>UnmarshalDataset/code-2</td><td>2.3x</td><td>2.9x</td></tr>
-		<tr><td>UnmarshalDataset/twitter-2</td><td>2.6x</td><td>2.7x</td></tr>
-		<tr><td>UnmarshalDataset/citm_catalog-2</td><td>2.2x</td><td>2.3x</td></tr>
-		<tr><td>UnmarshalDataset/canada-2</td><td>1.8x</td><td>1.5x</td></tr>
-		<tr><td>UnmarshalDataset/config-2</td><td>4.1x</td><td>2.9x</td></tr>
-		<tr><td>geomean</td><td>2.7x</td><td>2.8x</td></tr>
-	 </tbody>
+    <thead>
+        <tr><th>Benchmark</th><th>go-toml v1</th><th>BurntSushi/toml</th></tr>
+    </thead>
+    <tbody>
+        <tr><td>Marshal/SimpleDocument/map-2</td><td>2.0x</td><td>2.9x</td></tr>
+        <tr><td>Marshal/SimpleDocument/struct-2</td><td>2.5x</td><td>3.6x</td></tr>
+        <tr><td>Unmarshal/SimpleDocument/map-2</td><td>4.2x</td><td>3.4x</td></tr>
+        <tr><td>Unmarshal/SimpleDocument/struct-2</td><td>5.9x</td><td>4.4x</td></tr>
+        <tr><td>UnmarshalDataset/example-2</td><td>3.2x</td><td>2.9x</td></tr>
+        <tr><td>UnmarshalDataset/code-2</td><td>2.4x</td><td>2.8x</td></tr>
+        <tr><td>UnmarshalDataset/twitter-2</td><td>2.7x</td><td>2.5x</td></tr>
+        <tr><td>UnmarshalDataset/citm_catalog-2</td><td>2.3x</td><td>2.3x</td></tr>
+        <tr><td>UnmarshalDataset/canada-2</td><td>1.9x</td><td>1.5x</td></tr>
+        <tr><td>UnmarshalDataset/config-2</td><td>5.4x</td><td>3.0x</td></tr>
+        <tr><td>geomean</td><td>2.9x</td><td>2.8x</td></tr>
+     </tbody>
 </table>
 <p>This table can be generated with <code>./ci.sh benchmark -a -html</code>.</p>
 </details>
@@ -147,7 +147,7 @@ bench() {
    pushd "$dir"

    if [ "${replace}" != "" ]; then
-        find ./benchmark/ -iname '*.go' -exec sed -i -E "s|github.com/pelletier/go-toml/v2|${replace}|g" {} \;
+        find ./benchmark/ -iname '*.go' -exec sed -i -E "s|github.com/pelletier/go-toml/v2\"|${replace}\"|g" {} \;
        go get "${replace}"
    fi

@@ -195,6 +195,11 @@ for line in reversed(lines[2:]):
        "%.1fx" % (float(line[3])/v2),  # v1
        "%.1fx" % (float(line[7])/v2),  # bs
    ])
+
+if not results:
+    print("No benchmark results to display.", file=sys.stderr)
+    sys.exit(1)
+
 # move geomean to the end
 results.append(results[0])
 del results[0]
@@ -301,6 +301,73 @@ OtherMissing = 1
 	assert.Equal(t, 2, len(strictErr.Unwrap()))
 }

+func TestDecodeError_PositionAfterComment(t *testing.T) {
+	// Regression test for https://github.com/pelletier/go-toml/issues/1047
+	// Error positions must be correct when the error occurs after comments or
+	// other content that was already scanned past.
+	examples := []struct {
+		desc        string
+		doc         string
+		expectedRow int
+		expectedCol int
+		expectedStr string
+	}{
+		{
+			desc:        "invalid key after comment",
+			doc:         "# comment\n= \"value\"",
+			expectedRow: 2,
+			expectedCol: 1,
+			expectedStr: "1| # comment\n2| = \"value\"\n | ~ invalid character at start of key: =",
+		},
+		{
+			desc:        "invalid key after two comments",
+			doc:         "# one\n# two\n= \"value\"",
+			expectedRow: 3,
+			expectedCol: 1,
+			expectedStr: "1| # one\n2| # two\n3| = \"value\"\n | ~ invalid character at start of key: =",
+		},
+		{
+			desc:        "invalid key after key-value pair",
+			doc:         "a = 1\n= 2",
+			expectedRow: 2,
+			expectedCol: 1,
+			expectedStr: "1| a = 1\n2| = 2\n | ~ invalid character at start of key: =",
+		},
+		{
+			desc:        "invalid key after blank line",
+			doc:         "a = 1\n\n= 2",
+			expectedRow: 3,
+			expectedCol: 1,
+			expectedStr: "1| a = 1\n2|\n3| = 2\n | ~ invalid character at start of key: =",
+		},
+	}
+
+	for _, e := range examples {
+		t.Run(e.desc, func(t *testing.T) {
+			var v interface{}
+			err := Unmarshal([]byte(e.doc), &v)
+			if err == nil {
+				t.Fatal("expected an error")
+			}
+
+			var derr *DecodeError
+			if !errors.As(err, &derr) {
+				t.Fatalf("error not a *DecodeError: %T: %v", err, err)
+			}
+
+			row, col := derr.Position()
+			if row != e.expectedRow {
+				t.Errorf("row: got %d, want %d (error: %s)", row, e.expectedRow, derr.String())
+			}
+			if col != e.expectedCol {
+				t.Errorf("col: got %d, want %d (error: %s)", col, e.expectedCol, derr.String())
+			}
+
+			assert.Equal(t, e.expectedStr, derr.String())
+		})
+	}
+}
+
 func ExampleDecodeError() {
 	doc := `name = 123__456`

@@ -704,15 +704,18 @@ func (enc *Encoder) encodeMap(b []byte, ctx encoderCtx, v reflect.Value) ([]byte
 	for iter.Next() {
 		v := iter.Value()

-		if isNil(v) {
-			// For nil pointers, convert to zero value of the element type.
-			// This allows round-trip marshaling of maps with nil pointer values.
-			// For nil interfaces and nil maps, skip since we can't derive a type.
-			if v.Kind() == reflect.Ptr {
+		// Handle nil values: convert nil pointers to zero value,
+		// skip nil interfaces and nil maps.
+		switch v.Kind() {
+		case reflect.Ptr:
+			if v.IsNil() {
 				v = reflect.Zero(v.Type().Elem())
-			} else {
+			}
+		case reflect.Interface, reflect.Map:
+			if v.IsNil() {
 				continue
 			}
+		default:
 		}

 		k, err := enc.keyToString(iter.Key())
@@ -936,7 +939,7 @@ func (enc *Encoder) encodeTable(b []byte, ctx encoderCtx, t table) ([]byte, erro
 		if shouldOmitEmpty(kv.Options, kv.Value) {
 			continue
 		}
-		if shouldOmitZero(kv.Options, kv.Value) {
+		if kv.Options.omitzero && shouldOmitZero(kv.Options, kv.Value) {
 			continue
 		}
 		hasNonEmptyKV = true
@@ -958,7 +961,7 @@ func (enc *Encoder) encodeTable(b []byte, ctx encoderCtx, t table) ([]byte, erro
 		if shouldOmitEmpty(table.Options, table.Value) {
 			continue
 		}
-		if shouldOmitZero(table.Options, table.Value) {
+		if table.Options.omitzero && shouldOmitZero(table.Options, table.Value) {
 			continue
 		}
 		if first {
@@ -995,7 +998,7 @@ func (enc *Encoder) encodeTableInline(b []byte, ctx encoderCtx, t table) ([]byte
 		if shouldOmitEmpty(kv.Options, kv.Value) {
 			continue
 		}
-		if shouldOmitZero(kv.Options, kv.Value) {
+		if kv.Options.omitzero && shouldOmitZero(kv.Options, kv.Value) {
 			continue
 		}

@@ -28,12 +28,16 @@ func (c *Iterator) Next() bool {
 	if c.nodes == nil {
 		return false
 	}
+	nodes := *c.nodes
 	if !c.started {
 		c.started = true
-	} else if c.idx >= 0 {
-		c.idx = (*c.nodes)[c.idx].next
+	} else {
+		idx := c.idx
+		if idx >= 0 && int(idx) < len(nodes) {
+			c.idx = nodes[idx].next
+		}
 	}
-	return c.idx >= 0 && int(c.idx) < len(*c.nodes)
+	return c.idx >= 0 && int(c.idx) < len(nodes)
 }

 // IsLast returns true if the current node of the iterator is the last
@@ -3,6 +3,7 @@ package unstable
 import (
 	"bytes"
 	"fmt"
+	"reflect"
 	"unicode"

 	"github.com/pelletier/go-toml/v2/internal/characters"
@@ -83,10 +84,14 @@ func (p *Parser) rangeOfToken(token, rest []byte) Range {
 }

 // subsliceOffset returns the byte offset of subslice b within p.data.
-// b must be a suffix (tail) of p.data.
+// b must be a subslice of p.data (sharing the same backing array).
 func (p *Parser) subsliceOffset(b []byte) int {
-	// b is a suffix of p.data, so its offset is len(p.data) - len(b)
-	return len(p.data) - len(b)
+	if len(b) == 0 {
+		return 0
+	}
+	dataPtr := reflect.ValueOf(p.data).Pointer()
+	subPtr := reflect.ValueOf(b).Pointer()
+	return int(subPtr - dataPtr)
 }

 // Raw returns the slice corresponding to the bytes in the given range.
@@ -363,9 +368,10 @@ func (p *Parser) parseKeyval(b []byte) (reference, []byte, error) {
 	p.builder.Chain(valRef, key)
 	p.builder.AttachChild(ref, valRef)

-	// Set Raw to span the entire key-value expression
-	node := p.builder.NodeAt(ref)
-	node.Raw = p.rangeOfToken(startB[:len(startB)-len(b)], b)
+	// Set Raw to span the entire key-value expression.
+	// Access the node directly in the slice to avoid the write barrier
+	// that NodeAt's nodes-pointer setup would trigger.
+	p.builder.tree.nodes[ref].Raw = p.rangeOfToken(startB[:len(startB)-len(b)], b)

 	return ref, b, err
 }
Author	SHA1	Message	Date
Claude	b7ffaf15eb	Move regression tests to public Unmarshal API with full error string assertions Remove the unstable package test and consolidate all test cases into TestDecodeError_PositionAfterComment, which exercises toml.Unmarshal and validates the complete human-readable error output including context lines and tilde markers. https://claude.ai/code/session_01EXYfFXc3DDGpQ27sWdXTKq	2026-04-12 11:59:46 +00:00
Claude	0248fc4c8c	Fix incorrect error positions for non-suffix subslices in unstable.Parser.Range (#1047 ) The unsafe removal (#1021) replaced danger.SubsliceOffset (pointer arithmetic) with len(p.data)-len(b), which only works for suffix slices. Parser.Range is called with arbitrary interior subslices (e.g. ParserError.Highlight), so the offset was wrong whenever the error occurred after previously scanned content like comments. Fix by using reflect.ValueOf().Pointer() to recover the actual data pointer, matching the approach already used in errors.go. https://claude.ai/code/session_01EXYfFXc3DDGpQ27sWdXTKq	2026-04-12 11:52:16 +00:00
Thomas Pelletier	f36a3ece9e	Reduce marshal and unmarshal overhead (#1044 ) * Reduce marshal and unmarshal overhead Targeted optimizations to reduce performance overhead introduced by recent feature additions and the unsafe removal. Unmarshal: - parseKeyval: access the node directly in the builder's slice to set Raw, bypassing NodeAt which triggers a GC write barrier for the nodes-pointer on every key-value expression. - Iterator.Next: cache the *nodes slice dereference in a local variable to avoid repeated pointer-to-slice indirection in the hot loop. Marshal: - Guard shouldOmitZero calls with an inlineable options.omitzero check. shouldOmitZero has inlining cost 1145 (budget 80), so avoiding the function call when omitzero is not set removes per-field overhead. - Inline the isNil check in encodeMap. isNil has inlining cost 93 (budget 80), so expanding it at the single hot call site avoids per-map-entry function call overhead. Update README benchmarks. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 11:08:39 +00:00
Thomas Pelletier	77f3862df4	Fix benchmark script replacing internal package imports (#1042 ) * Fix benchmark script replacing internal package imports The sed command in bench() was replacing all occurrences of the go-toml module path, including sub-package imports like internal/assert. This caused the BurntSushi/toml benchmark to fail because it tried to import github.com/BurntSushi/toml/internal/assert which doesn't exist. Fix by anchoring the sed pattern to only match the import path when followed by a closing quote, preserving internal package imports. Also add a guard in the benchstathtml Python script to give a clear error instead of an IndexError when no benchmark results are available. https://claude.ai/code/session_016JGASo49PeFSfCaDxvrGFE * Update benchmark results in README https://claude.ai/code/session_016JGASo49PeFSfCaDxvrGFE --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-03-23 22:00:18 -04:00