fix short usage formatting

add some handy builtin character classes as split separators (#84 )
fix builder go version
2025-12-18 21:11:03 +01:00 · 2025-10-09 23:16:07 +02:00 · 2025-10-09 23:03:57 +02:00 · 2025-10-08 10:36:09 +02:00 · 2025-10-06 23:27:48 +02:00 · 2025-10-06 23:02:28 +02:00
14 changed files with 358 additions and 69 deletions
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -15,7 +15,7 @@ jobs:
      - name: Set up Go
        uses: actions/setup-go@v6
        with:
-          go-version: 1.22.11
+          go-version: 1.24.0

      - name: Build the executables
        run: ./mkrel.sh tablizer ${{ github.ref_name}}
--- a/2
+++ b/2
@@ -65,7 +65,7 @@ clean:
 	rm -rf $(tool) releases coverage.out

 test: clean
-	go test -cover ./... $(OPTS)
+	go test -count=1 -cover ./... $(OPTS)

 singletest:
 	@echo "Call like this: 'make singletest TEST=TestPrepareColumns MOD=lib'"
--- a/README.md
+++ b/README.md
@@ -192,10 +192,9 @@ hesitate to ask me about it, I'll add it.
 ## Documentation

 The  documentation  is  provided  as  a unix  man-page.   It  will  be
-automatically installed if  you install from source.  However, you can
-read the man-page online:
+automatically installed if  you install from source.

-https://github.com/TLINDEN/tablizer/blob/main/tablizer.pod
+[However, you can read the man-page online](https://github.com/TLINDEN/tablizer/blob/main/tablizer.pod).

 Or if you cloned  the repository you can read it  this way (perl needs
 to be installed though): `perldoc tablizer.pod`.
--- a/cfg/config.go
+++ b/cfg/config.go
@@ -27,13 +27,26 @@ import (
 	"github.com/hashicorp/hcl/v2/hclsimple"
 )

-const DefaultSeparator string = `(\s\s+|\t)`
-const Version string = "v1.5.7"
-const MAXPARTS = 2
+const (
+	Version  = "v1.5.9"
+	MAXPARTS = 2
+)

-var DefaultConfigfile = os.Getenv("HOME") + "/.config/tablizer/config"
+var (
+	DefaultConfigfile = os.Getenv("HOME") + "/.config/tablizer/config"
+	VERSION           string // maintained by -x

-var VERSION string // maintained by -x
+	SeparatorTemplates = map[string]string{
+		":tab:":      `\s*\t\s*`,                               // tab but eats spaces around
+		":spaces:":   `\s{2,}`,                                 // 2 or more spaces
+		":pipe:":     `\s*\|\s*`,                               // one pipe eating spaces around
+		":default:":  `(\s\s+|\t)`,                             // 2 or more spaces or tab
+		":nonword:":  `\W`,                                     // word boundary
+		":nondigit:": `\D`,                                     // same for numbers
+		":special:":  `[\*\+\-_\(\)\[\]\{\}?\\/<>=&$§"':,\^]+`, // match any special char
+		":nonprint:": `[[:^print:]]+`,                          // non printables
+	}
+)

 // public config, set via config file or using defaults
 type Settings struct {
@@ -356,6 +369,13 @@ func (conf *Config) ApplyDefaults() {
 	if conf.OutputMode == Yaml || conf.OutputMode == CSV {
 		conf.Numbering = false
 	}
+
+	if conf.Separator[0] == ':' && conf.Separator[len(conf.Separator)-1] == ':' {
+		separator, ok := SeparatorTemplates[conf.Separator]
+		if ok {
+			conf.Separator = separator
+		}
+	}
 }

 func (conf *Config) PreparePattern(patterns []*Pattern) error {
--- a/cmd/root.go
+++ b/cmd/root.go
@@ -123,7 +123,7 @@ func Execute() {
 		"Use alternating background colors")
 	rootCmd.PersistentFlags().StringVarP(&ShowCompletion, "completion", "", "",
 		"Display completion code")
-	rootCmd.PersistentFlags().StringVarP(&conf.Separator, "separator", "s", cfg.DefaultSeparator,
+	rootCmd.PersistentFlags().StringVarP(&conf.Separator, "separator", "s", cfg.SeparatorTemplates[":default:"],
 		"Custom field separator")
 	rootCmd.PersistentFlags().StringVarP(&conf.Columns, "columns", "c", "",
 		"Only show the speficied columns (separated by ,)")
--- a/cmd/tablizer.go
+++ b/cmd/tablizer.go
@@ -14,7 +14,7 @@ SYNOPSIS
          -n, --numbering                    Enable header numbering
          -N, --no-color                     Disable pattern highlighting
          -H, --no-headers                   Disable headers display
-          -s, --separator <string>           Custom field separator
+          -s, --separator <string>           Custom field separator (maybe char, string or :class:)
          -k, --sort-by <int|name>           Sort by column (default: 1)
          -z, --fuzzy                        Use fuzzy search [experimental]
          -F, --filter <field[!]=reg>        Filter given field with regex, can be used multiple times
@@ -141,6 +141,57 @@ DESCRIPTION
    Finally the -d option enables debugging output which is mostly useful
    for the developer.

+  SEPARATOR
+    The option -s can be a single character, in which case the CSV parser
+    will be invoked. You can also specify a string as separator. The string
+    will be interpreted as literal string unless it is a valid go regular
+    expression. For example:
+
+        -s '\t{2,}\'
+
+    is being used as a regexp and will match two or more consecutive tabs.
+
+        -s 'foo'
+
+    on the other hand is no regular expression and will be used literally.
+
+    To make live easier, there are a couple of predefined regular
+    expressions, which you can specify as classes:
+
+        * :tab:
+
+        Matches a tab and eats spaces around it.
+
+        * :spaces:
+
+        Matches 2 or more spaces.
+
+        * :pipe:
+
+        Matches a pipe character and eats spaces around it.
+
+        * :default:
+
+        Matches 2 or more spaces or tab. This is the default separator if
+        none is specified.
+
+        * :nonword:
+
+        Matches a non-word character.
+
+        * :nondigit:
+
+        Matches a non-digit character.
+
+        * :special:
+
+        Matches one or more special chars like brackets, dollar sign,
+        slashes etc.
+
+        * :nonprint:
+
+        Matches one or more non-printable characters.
+
  PATTERNS AND FILTERING
    You can reduce the rows being displayed by using one or more regular
    expression patterns. The regexp language being used is the one of
@@ -458,7 +509,7 @@ Operational Flags:
  -n, --numbering                    Enable header numbering
  -N, --no-color                     Disable pattern highlighting
  -H, --no-headers                   Disable headers display
-  -s, --separator <string>           Custom field separator
+  -s, --separator <string>           Custom field separator (maybe char, string or :class:)
  -k, --sort-by <int|name>           Sort by column (default: 1)
  -z, --fuzzy                        Use fuzzy search [experimental]
  -F, --filter <field[!]=reg>        Filter given field with regex, can be used multiple times
--- a/lib/helpers.go
+++ b/lib/helpers.go
@@ -22,7 +22,7 @@ import (
 	"fmt"
 	"os"
 	"regexp"
-	"sort"
+	"slices"
 	"strconv"
 	"strings"

@@ -30,16 +30,6 @@ import (
 	"github.com/tlinden/tablizer/cfg"
 )

-func contains(s []int, e int) bool {
-	for _, a := range s {
-		if a == e {
-			return true
-		}
-	}
-
-	return false
-}
-
 func findindex(s []int, e int) (int, bool) {
 	for i, a := range s {
 		if a == e {
@@ -172,48 +162,32 @@ func PrepareColumnVars(columns string, data *Tabdata) ([]int, error) {
 		}
 	}

-	// deduplicate: put all values into a map (value gets map key)
-	// thereby  removing duplicates,  extract keys into  new slice
-	// and sort it
-	imap := make(map[int]int, len(usecolumns))
+	// deduplicate columns, preserve order
+	deduped := []int{}
 	for _, i := range usecolumns {
-		imap[i] = 0
+		if !slices.Contains(deduped, i) {
+			deduped = append(deduped, i)
+		}
 	}

-	// fill with deduplicated columns
-	usecolumns = nil
-
-	for k := range imap {
-		usecolumns = append(usecolumns, k)
-	}
-
-	sort.Ints(usecolumns)
-
-	return usecolumns, nil
+	return deduped, nil
 }

 // prepare headers: add numbers to headers
 func numberizeAndReduceHeaders(conf cfg.Config, data *Tabdata) {
-	numberedHeaders := []string{}
+	numberedHeaders := make([]string, len(data.headers))
+
 	maxwidth := 0 // start from scratch, so we only look at displayed column widths

+	// add numbers to headers if needed, get widest cell width
 	for idx, head := range data.headers {
 		var headlen int

-		if len(conf.Columns) > 0 {
-			// -c specified
-			if !contains(conf.UseColumns, idx+1) {
-				// ignore this one
-				continue
-			}
-		}
-
 		if conf.Numbering {
-			numhead := fmt.Sprintf("%s(%d)", head, idx+1)
-			headlen = len(numhead)
-			numberedHeaders = append(numberedHeaders, numhead)
+			newhead := fmt.Sprintf("%s(%d)", head, idx+1)
+			numberedHeaders[idx] = newhead
+			headlen = len(newhead)
 		} else {
-			numberedHeaders = append(numberedHeaders, head)
 			headlen = len(head)
 		}

@@ -222,7 +196,24 @@ func numberizeAndReduceHeaders(conf cfg.Config, data *Tabdata) {
 		}
 	}

+	if conf.Numbering {
 		data.headers = numberedHeaders
+	}
+
+	if len(conf.UseColumns) > 0 {
+		// re-align headers based on user requested column list
+		headers := make([]string, len(conf.UseColumns))
+
+		for i, col := range conf.UseColumns {
+			for idx := range data.headers {
+				if col-1 == idx {
+					headers[i] = data.headers[col-1]
+				}
+			}
+		}
+
+		data.headers = headers
+	}

 	if data.maxwidthHeader != maxwidth && maxwidth > 0 {
 		data.maxwidthHeader = maxwidth
@@ -234,18 +225,18 @@ func reduceColumns(conf cfg.Config, data *Tabdata) {
 	if len(conf.Columns) > 0 {
 		reducedEntries := [][]string{}

+		for _, entry := range data.entries {
 			var reducedEntry []string

-		for _, entry := range data.entries {
-			reducedEntry = nil
-
-			for i, value := range entry {
-				if !contains(conf.UseColumns, i+1) {
-					continue
-				}
+			for _, col := range conf.UseColumns {
+				col--

+				for idx, value := range entry {
+					if idx == col {
 						reducedEntry = append(reducedEntry, value)
 					}
+				}
+			}

 			reducedEntries = append(reducedEntries, reducedEntry)
 		}
--- a/lib/helpers_test.go
+++ b/lib/helpers_test.go
@@ -19,6 +19,7 @@ package lib

 import (
 	"fmt"
+	"slices"
 	"testing"

 	"github.com/stretchr/testify/assert"
@@ -38,7 +39,7 @@ func TestContains(t *testing.T) {
 	for _, tt := range tests {
 		testname := fmt.Sprintf("contains-%d,%d,%t", tt.list, tt.search, tt.want)
 		t.Run(testname, func(t *testing.T) {
-			answer := contains(tt.list, tt.search)
+			answer := slices.Contains(tt.list, tt.search)

 			assert.EqualValues(t, tt.want, answer)
 		})
@@ -72,7 +73,8 @@ func TestPrepareColumns(t *testing.T) {
 	}

 	for _, testdata := range tests {
-		testname := fmt.Sprintf("PrepareColumns-%s-%t", testdata.input, testdata.wanterror)
+		testname := fmt.Sprintf("PrepareColumns-%s-%t",
+			testdata.input, testdata.wanterror)
 		t.Run(testname, func(t *testing.T) {
 			conf := cfg.Config{Columns: testdata.input}
 			err := PrepareColumns(&conf, &data)
--- a/lib/parser.go
+++ b/lib/parser.go
@@ -25,6 +25,7 @@ import (
 	"fmt"
 	"io"
 	"log"
+	"math"
 	"regexp"
 	"strings"

@@ -222,6 +223,32 @@ func parseRawJSON(conf cfg.Config, input io.Reader) (Tabdata, error) {
 					row[idxmap[currentfield]] = val
 				}
 			}
+
+		case float64:
+			var value string
+
+			// we set precision to 0 if the float is a whole number
+			if val == math.Trunc(val) {
+				value = fmt.Sprintf("%.f", val)
+			} else {
+				value = fmt.Sprintf("%f", val)
+			}
+
+			if !haveheaders {
+				row = append(row, value)
+			} else {
+				row[idxmap[currentfield]] = value
+			}
+
+		case nil:
+			// we ignore here if a value  shall be an int or a string,
+			// because tablizer only works with strings anyway
+			if !haveheaders {
+				row = append(row, "")
+			} else {
+				row[idxmap[currentfield]] = ""
+			}
+
 		case json.Delim:
 			if val.String() == "}" {
 				data = append(data, row)
@@ -240,6 +267,8 @@ func parseRawJSON(conf cfg.Config, input io.Reader) (Tabdata, error) {
 				haveheaders = true
 			}
 			isjson = true
+		default:
+			fmt.Printf("unknown token: %v type: %T\n", t, t)
 		}

 		iskey = !iskey
--- a/lib/parser_test.go
+++ b/lib/parser_test.go
@@ -34,7 +34,7 @@ var input = []struct {
 }{
 	{
 		name:      "tabular-data",
-		separator: cfg.DefaultSeparator,
+		separator: cfg.SeparatorTemplates[":default:"],
 		text: `
 ONE    TWO    THREE  
 asd    igig   cxxxncnc  
@@ -148,7 +148,7 @@ asd    igig
 19191  EDD 1  X`

 	readFd := strings.NewReader(strings.TrimSpace(table))
-	conf := cfg.Config{Separator: cfg.DefaultSeparator}
+	conf := cfg.Config{Separator: cfg.SeparatorTemplates[":default:"]}
 	gotdata, err := wrapValidateParser(conf, readFd)

 	assert.NoError(t, err)
@@ -180,6 +180,38 @@ func TestParserJSONInput(t *testing.T) {
 			expect: Tabdata{},
 		},

+		{
+			// contains nil, int and float values
+			name:      "niljson",
+			wanterror: false,
+			input: `[
+  {
+    "NAME": "postgres-operator-7f4c7c8485-ntlns",
+    "READY": "1/1",
+    "STATUS": "Running",
+    "RESTARTS": 0,
+    "AGE": null,
+    "X": 12,
+    "Y": 34.222
+  }
+]`,
+			expect: Tabdata{
+				columns: 7,
+				headers: []string{"NAME", "READY", "STATUS", "RESTARTS", "AGE", "X", "Y"},
+				entries: [][]string{
+					[]string{
+						"postgres-operator-7f4c7c8485-ntlns",
+						"1/1",
+						"Running",
+						"0",
+						"",
+						"12",
+						"34.222000",
+					},
+				},
+			},
+		},
+
 		{
 			// one field missing + different order
 			// but shall not fail
@@ -282,6 +314,58 @@ func TestParserJSONInput(t *testing.T) {
 	}
 }

+func TestParserSeparators(t *testing.T) {
+	list := []string{"alpha", "beta", "delta"}
+
+	tests := []struct {
+		input string
+		sep   string
+	}{
+		{
+			input: `🎲`,
+			sep:   ":nonprint:",
+		},
+		{
+			input: `|`,
+			sep:   ":pipe:",
+		},
+		{
+			input: `   `,
+			sep:   ":spaces:",
+		},
+		{
+			input: "   \t  ",
+			sep:   ":tab:",
+		},
+		{
+			input: `-`,
+			sep:   ":nonword:",
+		},
+		{
+			input: `//$`,
+			sep:   ":special:",
+		},
+	}
+
+	for _, testdata := range tests {
+		testname := fmt.Sprintf("parse-%s", testdata.sep)
+		t.Run(testname, func(t *testing.T) {
+			header := strings.Join(list, testdata.input)
+			row := header
+			content := header + "\n" + row
+
+			readFd := strings.NewReader(strings.TrimSpace(content))
+			conf := cfg.Config{Separator: testdata.sep}
+			conf.ApplyDefaults()
+
+			gotdata, err := wrapValidateParser(conf, readFd)
+
+			assert.NoError(t, err)
+			assert.EqualValues(t, [][]string{list}, gotdata.entries)
+		})
+	}
+}
+
 func wrapValidateParser(conf cfg.Config, input io.Reader) (Tabdata, error) {
 	data, err := Parse(conf, input)

--- a/lib/printer_test.go
+++ b/lib/printer_test.go
@@ -292,6 +292,7 @@ func TestPrinter(t *testing.T) {
 				conf.UseSortByColumn = []int{testdata.column}
 			}

+			conf.Separator = cfg.SeparatorTemplates[":default:"]
 			conf.ApplyDefaults()

 			// the test checks the len!
--- a/tablizer.1
+++ b/tablizer.1
@@ -133,7 +133,7 @@
 .\" ========================================================================
 .\"
 .IX Title "TABLIZER 1"
-.TH TABLIZER 1 "2025-10-01" "1" "User Commands"
+.TH TABLIZER 1 "2025-10-09" "1" "User Commands"
 .\" For nroff, turn off justification.  Always turn off hyphenation; it makes
 .\" way too many mistakes in technical documents.
 .if n .ad l
@@ -152,7 +152,7 @@ tablizer \- Manipulate tabular output of other programs
 \&      \-n, \-\-numbering                    Enable header numbering
 \&      \-N, \-\-no\-color                     Disable pattern highlighting
 \&      \-H, \-\-no\-headers                   Disable headers display
-\&      \-s, \-\-separator <string>           Custom field separator
+\&      \-s, \-\-separator <string>           Custom field separator (maybe char, string or :class:)
 \&      \-k, \-\-sort\-by <int|name>           Sort by column (default: 1)
 \&      \-z, \-\-fuzzy                        Use fuzzy search [experimental]
 \&      \-F, \-\-filter <field[!]=reg>        Filter given field with regex, can be used multiple times
@@ -293,6 +293,62 @@ Sorts timestamps.
 .PP
 Finally the  \fB\-d\fR option  enables debugging  output which  is mostly
 useful for the developer.
+.SS "\s-1SEPARATOR\s0"
+.IX Subsection "SEPARATOR"
+The option \fB\-s\fR can be a single character, in which case the \s-1CSV\s0
+parser will be invoked. You can also specify a string as
+separator. The string will be interpreted as literal string unless it
+is a valid go regular expression. For example:
+.PP
+.Vb 1
+\&    \-s \*(Aq\et{2,}\e\*(Aq
+.Ve
+.PP
+is being used as a regexp and will match two or more consecutive tabs.
+.PP
+.Vb 1
+\&    \-s \*(Aqfoo\*(Aq
+.Ve
+.PP
+on the other hand is no regular expression and will be used literally.
+.PP
+To make live easier, there are a couple of predefined regular
+expressions, which you can specify as classes:
+.Sp
+.RS 4
+* 		:tab:
+.Sp
+Matches a tab and eats spaces around it.
+.Sp
+*		:spaces:
+.Sp
+Matches 2 or more spaces.
+.Sp
+*		:pipe:
+.Sp
+Matches a pipe character and eats spaces around it.
+.Sp
+*		:default:
+.Sp
+Matches 2 or more spaces or tab. This is the default separator if none
+is specified.
+.Sp
+*		:nonword:
+.Sp
+Matches a non-word character.
+.Sp
+*		:nondigit:
+.Sp
+Matches a non-digit character.
+.Sp
+*		:special:
+.Sp
+Matches one or more special chars like brackets, dollar sign, slashes etc.
+.Sp
+*		:nonprint:
+.Sp
+Matches one or more non-printable characters.
+.RE
 .SS "\s-1PATTERNS AND FILTERING\s0"
 .IX Subsection "PATTERNS AND FILTERING"
 You can reduce  the rows being displayed by using  one or more regular
--- a/tablizer.pod
+++ b/tablizer.pod
@@ -13,7 +13,7 @@ tablizer - Manipulate tabular output of other programs
      -n, --numbering                    Enable header numbering
      -N, --no-color                     Disable pattern highlighting
      -H, --no-headers                   Disable headers display
-      -s, --separator <string>           Custom field separator
+      -s, --separator <string>           Custom field separator (maybe char, string or :class:)
      -k, --sort-by <int|name>           Sort by column (default: 1)
      -z, --fuzzy                        Use fuzzy search [experimental]
      -F, --filter <field[!]=reg>        Filter given field with regex, can be used multiple times
@@ -153,6 +153,62 @@ Sorts timestamps.
 Finally the  B<-d> option  enables debugging  output which  is mostly
 useful for the developer.

+=head2 SEPARATOR
+
+The option B<-s> can be a single character, in which case the CSV
+parser will be invoked. You can also specify a string as
+separator. The string will be interpreted as literal string unless it
+is a valid go regular expression. For example:
+
+    -s '\t{2,}\'
+
+is being used as a regexp and will match two or more consecutive tabs.
+
+    -s 'foo'
+
+on the other hand is no regular expression and will be used literally.
+
+To make live easier, there are a couple of predefined regular
+expressions, which you can specify as classes:
+
+=over
+
+* 		:tab:      
+
+Matches a tab and eats spaces around it.
+
+*		:spaces:
+
+Matches 2 or more spaces.
+
+*		:pipe:
+
+Matches a pipe character and eats spaces around it.
+
+*		:default:
+
+Matches 2 or more spaces or tab. This is the default separator if none
+is specified.
+
+*		:nonword:
+
+Matches a non-word character.
+
+*		:nondigit:
+
+Matches a non-digit character.
+
+*		:special:
+
+Matches one or more special chars like brackets, dollar sign, slashes etc.
+
+*		:nonprint:
+
+Matches one or more non-printable characters.
+
+
+=back
+
 =head2 PATTERNS AND FILTERING

 You can reduce  the rows being displayed by using  one or more regular
Author	SHA1	Message	Date
Thomas von Dein	4ce6c30f54	fix short usage formatting	2025-10-09 23:16:07 +02:00
T.v.Dein	ec0b210167	add some handy builtin character classes as split separators (#84 )	2025-10-09 23:03:57 +02:00
Thomas von Dein	253ef8262e	fix builder go version	2025-10-08 10:36:09 +02:00
Thomas von Dein	da48994744	fix comment	2025-10-06 23:27:48 +02:00
Thomas von Dein	39f06fddc8	md fix	2025-10-06 23:02:28 +02:00
T.v.Dein	50a9378d92	use column order of -c when specified (#81 )	2025-10-06 22:55:04 +02:00
T.v.Dein	35b726fee4	Fix json parser (#80 ) * fix #77: parse floats and nils as well and convert them to string	2025-10-06 22:54:31 +02:00