krip

package module
v1.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 28, 2025 License: GPL-3.0 Imports: 4 Imported by: 0

README

Krip* 🇺🇦

godoc semver tag go report card license

Krip is a Go library designed for fast, comprehensive, and generalised scraping of culinary recipes from any website or HTML file.

The project aims to provide a robust solution for extracting structured culinary data from unstructured or semi-structured web pages, normalizing it into a strict Schema.org Recipe model.

  • Krip is a Ukrainian word for dill. The bud of the dill looks like web pages connected to a single database.

I started this project as I wanted to build my own recipe keeper and found that there is only one library that everyone uses for scraping recipes recipe-scrapers written in Python. The library is great, but I was naive enough to think that it can be improved.

I focused on speed and flexibility to cover most of the possible schemas and websites from the beginning and to retrieve a rich model. Still, it supports per-domain customisation in case someone does not use a schema.

Install

go get -u github.com/borschtapp/krip

Features

  • Multi-Strategy Extraction: Combines Microdata, OpenGraph and JSON-LD.
  • Robust Parsing: Handles erroneous JSON and sanitises HTML content.
  • Standardized Output: Produces Recipe structs compatible with Schema.org/Recipe.
  • Extensible: If needed, it's easy to add support for a custom website via the Scraper interface.
  • Performance: Fast execution with minimal external dependencies.

Usage

Command-line tool
go install github.com/borschtapp/krip/cmd/krip
krip --help
krip https://cooking.nytimes.com/recipes/3783-original-plum-torte
Go library
recipe, err := krip.ScrapeUrl("https://cooking.nytimes.com/recipes/3783-original-plum-torte")
if err != nil {
  // handle err
}

// Retrieve the recipe data
name := recipe.Name
ingredients := recipe.Ingredients
instructions := recipe.Instructions

// Print the recipe as JSON
fmt.Println(recipe)
{
  "@id": "https://cooking.nytimes.com/recipes/3783-original-plum-torte",
  "name": "Original Plum Torte",
  "thumbnailUrl": "https://static01.nyt.com/images/2019/09/07/dining/plumtorte/plumtorte-articleLarge-v4.jpg",
  "author": {
    "name": "Marian Burros"
  },
  "publisher": {
    "name": "NYT Cooking",
    "url": "https://cooking.nytimes.com"
  },
  "inLanguage": "en-US",
  "description": "The Times published Marian Burros’s recipe for Plum Torte every September from 1983 until 1989, when the editors determined that enough was enough. The recipe was to be printed for the last time that year. “To counter anticipated protests,” Ms. Burros wrote a few years later, “the recipe was printed in larger type than usual with a broken-line border around it to encourage clipping.” It didn’t help. The paper was flooded with angry letters. “The appearance of the recipe, like the torte itself, is bittersweet,” wrote a reader in Tarrytown, N.Y. “Summer is leaving, fall is coming. That's what your annual recipe is all about. Don't be grumpy about it.” We are not! And we pledge that every year, as summer gives way to fall, we will make sure that the recipe is easily available to one and all. The original 1983 recipe called for 1 cup sugar; the 1989 version reduced that to 3/4 cup. We give both options below. Here are \u003ca href=\" http://www.nytimes.com/interactive/2016/09/14/dining/marian-burros-plum-torte-recipe-variations.html\"\u003efive ways to adapt the torte\u003c/a\u003e.",
  "totalTime": 75,
  "recipeCategory": [
    "breakfast",
    "brunch",
    "easy",
    "weekday",
    "times classics",
    "dessert"
  ],
  "keywords": [
    "flour",
    "plum",
    "unsalted butter",
    "nut-free",
    "vegetarian"
  ],
  "recipeYield": 8,
  "recipeIngredient": [
    "3/4 to 1 cup sugar",
    "1/2 cup unsalted butter, softened",
    "1 cup unbleached flour, sifted",
    "1 teaspoon baking powder",
    "Pinch of salt (optional)",
    "2 eggs",
    "24 halves pitted purple plums",
    "Sugar, lemon juice and cinnamon, for topping"
  ],
  "recipeInstructions": [
    {
      "text": "Heat oven to 350 degrees."
    },
    {
      "text": "Cream the sugar and butter in a bowl. Add the flour, baking powder, salt and eggs and beat well."
    },
    {
      "text": "Spoon the batter into a springform pan of 8, 9 or 10 inches. Place the plum halves skin side up on top of the batter. Sprinkle lightly with sugar and lemon juice, depending on the sweetness of the fruit. Sprinkle with about 1 teaspoon of cinnamon, depending on how much you like cinnamon."
    },
    {
      "text": "Bake 1 hour, approximately. Remove and cool; refrigerate or freeze if desired. Or cool to lukewarm and serve plain or with whipped cream. (To serve a torte that was frozen, defrost and reheat it briefly at 300 degrees.)"
    }
  ],
  "nutrition": {
    "calories": "350",
    "carbohydrateContent": "57 grams",
    "fatContent": "13 grams",
    "fiberContent": "3 grams",
    "proteinContent": "4 grams",
    "saturatedFatContent": "8 grams",
    "sodiumContent": "63 milligrams",
    "sugarContent": "42 grams",
    "transFatContent": "0 grams",
    "unsaturatedFatContent": "4 grams"
  },
  "aggregateRating": {
    "ratingCount": 8717,
    "ratingValue": 5
  }
}

Project Structure

  • cmd/: Entry points for the CLI application.
  • web/: HTTP Web Server implementation.
  • krip.go: Facade layer and public API.
  • model/: Domain data structures (Recipe, DataInput).
  • scraper/: Core scraping engine.
    • common/: Orchestration logic.
    • schema/: Schema.org (JSON-LD/Microdata) strategies.
    • opengraph/: OpenGraph metadata strategies.
    • website/: Site-specific scraper implementations.
  • utils/: Helper functions for parsing, HTTP, and string manipulation.

Contributing

Contributions are welcome! Whether it's adding a new website scraper or improving the core logic.

Implementing Custom Scrapers

All you need is to implement a Scraper interface and register it via krip.RegisterScraper().

Take a look at the already implemented custom scrapers:

  1. Create a new file in scraper/website/ (e.g., mysite.go).
  2. Implement the Scraper function signature: func(data *model.DataInput, r *model.Recipe) error.
  3. Register the scraper in scraper/website/0_scraper.go.
  4. Add test cases in testdata/.

Supported Websites

Below is a list of websites the scraper has been tested against and is known to work correctly.

Which means the scraped recipe contains all the important fields, including but not limited to:

  • url
  • name
  • inLanguage
  • thumbnailUrl
  • recipeIngredient
  • recipeInstructions
  • publisher (including name and url)

The automatically generated list (based on testdata) is as follows: [//]: # (This list is generated automatically, do not edit manually)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func RegisterScraper

func RegisterScraper(hostname string, fn model.Scraper)

func Scrape

func Scrape(input *model.DataInput) (*model.Recipe, error)

func ScrapeFile

func ScrapeFile(fileName string) (*model.Recipe, error)

ScrapeFile reads content and scrapes a recipe from the file

func ScrapeUrl

func ScrapeUrl(url string) (*model.Recipe, error)

ScrapeUrl retrieves and scrapes a recipe from the url

Types

type AggregateRating added in v0.1.1

type AggregateRating = model.AggregateRating

type DataInput added in v0.1.1

type DataInput = model.DataInput

type HowToSection added in v0.1.1

type HowToSection = model.HowToSection

type HowToStep added in v0.1.1

type HowToStep = model.HowToStep

type ImageObject added in v0.1.1

type ImageObject = model.ImageObject

type InputOptions added in v0.1.1

type InputOptions = model.InputOptions

type NutritionInformation added in v0.1.1

type NutritionInformation = model.NutritionInformation

type Organization added in v0.1.1

type Organization = model.Organization

type Person added in v0.1.1

type Person = model.Person

type Recipe added in v0.1.1

type Recipe = model.Recipe

type Scraper added in v0.1.1

type Scraper = model.Scraper

type VideoObject added in v0.1.1

type VideoObject = model.VideoObject

Directories

Path Synopsis
cmd
krip command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL