tokenizer

package module
v0.0.0-...-7931447 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 1, 2021 License: MIT Imports: 8 Imported by: 0

README

tokenizer

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BySpace

func BySpace(a string) []string

BySpace splits a string into strings by space

func Pipeline

func Pipeline(steps ...Step) func(a string) ([]string, error)

Types

type Err

type Err struct {
	// contains filtered or unexported fields
}

func (Err) Err

func (e Err) Err() error

func (Err) String

func (e Err) String() string

func (Err) Strings

func (e Err) Strings() []string

type Input

type Input interface {
	String() string
	Strings() []string
}

type Normalizer

type Normalizer struct {
	// contains filtered or unexported fields
}

Normalizer is just a byte buffer

func NewNormalizer

func NewNormalizer() *Normalizer

NewNormalizer creates a new Normalizer

func (*Normalizer) Do

func (n *Normalizer) Do(a Input) Result

func (Normalizer) MustNorm

func (n Normalizer) MustNorm(a string) string

func (*Normalizer) Norm

func (n *Normalizer) Norm(a string) (string, error)

type Result

type Result interface {
	Input
	Err() error
}

type S

type S string

func (S) Err

func (s S) Err() error

func (S) String

func (s S) String() string

func (S) Strings

func (s S) Strings() []string

type SS

type SS []string

func (SS) Err

func (s SS) Err() error

func (SS) String

func (s SS) String() string

func (SS) Strings

func (s SS) Strings() []string

type Split

type Split func(a string) []string

Split is a function that splits a string into strings

func (Split) Do

func (s Split) Do(a Input) Result

type Step

type Step interface {
	Do(Input) Result
}

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

func NewTokenizer

func NewTokenizer(enc bpe.Encoder) *Tokenizer

func (*Tokenizer) Do

func (t *Tokenizer) Do(a Input) Result

func (*Tokenizer) Tokenize

func (t *Tokenizer) Tokenize(a string) ([]string, error)

func (*Tokenizer) Untokenize

func (t *Tokenizer) Untokenize(a []string) string

type Transform

type Transform func(a string) string

Transform is a function that transforms a string.

func (Transform) Do

func (t Transform) Do(a Input) Result

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL