Metrics for GenAI Text


Here's a breakdown of some common metrics used to evaluate generative AI models, including BLEU, ROUGE, METEOR, and GLEU: Metrics based on N-gram Overlap:

BLEU (Bilingual Evaluation Understudy): BLEU scores measure how similar the generated text is to a set of human-written reference texts. It considers matching n-grams (sequences of n words) between the generated text and the references. Higher BLEU scores indicate better performance, but BLEU can be criticized for not considering word order or semantics.

BLEU-n: BLEU-n is a variant of BLEU that specifically focuses on n-gram matches of length n. BLEU-4, for example, considers 4-word sequence matches.

GLEU (Gymnastics Error Rate): Similar to BLEU, GLEU scores assess n-gram overlap. However, GLEU penalizes the model more severely for unmatched words compared to BLEU.

GLEU-n: Similar to BLEU-n, GLEU-n is a variant of GLEU that focuses on n-gram matches of specific length n.

Metrics Beyond N-gram Overlap:

ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ROUGE scores look beyond just n-gram overlap to consider how well the generated text captures the gist or important information from the reference text. ROUGE offers several variants, including ROUGE-L (Longest Common Subsequence) and ROUGE-N (n-gram), each measuring different aspects of similarity.

METEOR (Metric for Evaluation of Translation with Ordering): METEOR scores take into account not just n-gram overlap but also synonyms and paraphrases. It aims to provide a more semantic evaluation of how well the generated text aligns with the reference text.

The future of creativity is generative ai. Here are slides and deep dive for Generative AI

Another Article

From the blog

Build Dataproducts

How Dataknobs help in building data products

Enterprises are most successful when they treat data like a product. It enable to use data in multiple use cases. However data product should be designed differently compared to software product.

Be Data Centric and well governed

Generative AI is one of approach to build data product

Generative AI has enabled many transformative scenarios. We combine generative AI, AI, automation, web scraping, ingesting dataset to build new data products. We have expertise in generative AI, but for business benefit we define our goal to build data product in data centric manner.

Spotlight

Generative AI slides

  • Learn generative AI - applications, LLM, architecture
  • See best practices for prompt engineering
  • Evaluate whether you should use out of box foundation model, fne tune or use in-context learning
  • Most important - be aware of concerns, issues, challenges, risk of genAI and LLM
  • See vendor comparison - Azure, OpenAI, GCP, Bard, Anthropic. Review framework for cost computation for LLM