crawldatabase

package
v0.0.0-...-82d9017 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 20, 2023 License: BSD-3-Clause Imports: 21 Imported by: 0

Documentation

Index

Constants

View Source
const (
	TypeNothing  byte = 0
	TypeKnow     byte = 1
	TypeRedirect byte = 2

	TypeFile        byte = 3
	TypeFileRobots  byte = 3
	TypeFileHTML    byte = 4
	TypeFileRSS     byte = 5
	TypeFileSitemap byte = 6
	TypeFileFavicon byte = 7

	TypeError           byte = 128
	TypeErrorNetwork    byte = 128
	TypeErrorParsing    byte = 129
	TypeErrorFilterURL  byte = 130
	TypeErrorFilterPage byte = 131
	TypeErrorRobot      byte = 132
	TypeErrorNoIndex    byte = 133
)

Variables

View Source
var (
	NotExist = errors.New("Not exist")
	NotFile  = errors.New("This value is not a file")
)

Functions

This section is empty.

Types

type Database

type Database[T any] struct {
	// contains filtered or unexported fields
}

func Open

func Open[T any](logger *slog.Logger, base string, logStatistics bool) ([]*url.URL, *Database[T], error)

Open the database but return no url.

func OpenMemory

func OpenMemory[T any](logger *slog.Logger, _ string, _ bool) ([]*url.URL, *Database[T], error)

Open a database in the memory, so it not persistent. Use only for test. Always retuns nil for url slice and error.

func OpenWithKnow

func OpenWithKnow[T any](logger *slog.Logger, base string, logStatistics bool) ([]*url.URL, *Database[T], error)

Open the DB, and return all know URL.

func (*Database[_]) AddURL

func (db *Database[_]) AddURL(urls map[keys.Key]*url.URL) error

Add unknwon url.

If the URL is known, is deleted of urls, else is saved in DB files. Error are logged and returned.

func (*Database[_]) Close

func (db *Database[_]) Close() error

Close the database. After close, call of database method can infinity block.

func (*Database[_]) CountHTML

func (db *Database[_]) CountHTML() int

Return the number of HTML page

func (*Database[T]) ForHTML

func (db *Database[T]) ForHTML(f func(keys.Key, *T)) (returnErr error)

Iterate for each element of type TypeFileHTML.

Log the progession with the intern logger.

func (*Database[T]) GetValue

func (db *Database[T]) GetValue(key keys.Key) (*T, time.Time, error)

Get the value from the DB. If the value if not a file, return NotFile. If the value do not exist, return NotExist. The time is the instant of value store.

func (*Database[_]) Redirections

func (db *Database[_]) Redirections() map[keys.Key]keys.Key

Return all redictions to valid file. If r1 -> r2 -> r3 -> p, the map contain:

  • m[r1] = p
  • m[r2] = p
  • m[r3] = p

The redirection chain is limited to 10.

func (*Database[_]) SetRedirect

func (db *Database[_]) SetRedirect(key, destination keys.Key) error

Set the redirection.

func (*Database[_]) SetSimple

func (db *Database[_]) SetSimple(key keys.Key, t byte) error

Set in the DB a simple type: nothing, known or error. Is t is a file type, it return an error, and do not modify the DB.

func (*Database[T]) SetValue

func (db *Database[T]) SetValue(key keys.Key, value *T, t byte) error

Set the value to the DB, overwrite previous value. t must be a type of a regular file.

func (*Database[_]) Statistics

func (db *Database[_]) Statistics() Statistics

Return statistics of the database

type Statistics

type Statistics struct {
	// Number element by type.
	Count [256]int

	// Total number of element.
	Total      int
	TotalFile  int
	TotalError int

	// The size of compressed chunck indexed by type of entry.
	FileSize [TypeError]int64

	// Sum of compressed chunck of data
	TotalFileSize int64
}

All statistics of a database.

func (Statistics) Log

func (stats Statistics) Log(logger *slog.Logger)

Log the total count and size.

func (Statistics) LogAll

func (stats Statistics) LogAll(logger *slog.Logger)

Log the details of the statistics (total and each type) for count and size.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL