LLM Overview Slides | LLM & RAG Guide

AGENDA

AGENDA

LLM TOPICS

LLM TOPICS

LLM OVERVIEW

LLM OVERVIEW

APPLICATIONS OF LLM

APPLICATIONS OF LLM

NLP APPLICATIONS

NLP APPLICATIONS

SOFTWARE APPLICATIONS

SOFTWARE APPLICATIONS

EVOLOVING APPLICATIONS OF LLMS

EVOLOVING APPLICATIONS OF LLMS

OPEN SOURCE VS CLOSE

OPEN SOURCE VS CLOSE

BUILDING BLOCKS OF LLMS

BUILDING BLOCKS OF LLMS

MULTI MODEL LLMS

MULTI MODEL LLMS

COMMON TERMINOLOGY LLMS

COMMON TERMINOLOGY LLMS

AI ASSISTANTS TYPES

AI ASSISTANTS TYPES

TYPES OF AI ASSISTANTS

TYPES OF AI ASSISTANTS

FEATURES OF AI ASSISTANTS

FEATURES OF AI ASSISTANTS

EXAMPLE FEATURES OF AI ASSISTA

EXAMPLE FEATURES OF AI ASSISTA

AI ASSISTANT EVALUATION METRIC

AI ASSISTANT EVALUATION METRIC

ASSISTANT BOT METRICS

ASSISTANT BOT METRICS

METRICS TO EVALUATE AI ASSISTA

METRICS TO EVALUATE AI ASSISTA

TECHNICAL METRICS AI ASSISTANT

TECHNICAL METRICS AI ASSISTANT

METRICS FOR SEARCH BOT

METRICS FOR SEARCH BOT

METRICS FOR RECOMMENDATION BOT

METRICS FOR RECOMMENDATION BOT

BEHAVIORAL METRICS FOR RECOMME

BEHAVIORAL METRICS FOR RECOMME

CRITERIA TO COMPARE LLMS

CRITERIA TO COMPARE LLMS

LLM TECHNOLOGY SLIDES

LLM TECHNOLOGY SLIDES

AI ASSISTANT TECH STACK

AI ASSISTANT TECH STACK

AI ASSISTANT ARCHITECTURE

AI ASSISTANT ARCHITECTURE

CONSIDERAIONS FOR BOT ARCHITEC

CONSIDERAIONS FOR BOT ARCHITEC

RAG SLIDES

RAG SLIDES

WHEN TO USE RAG

WHEN TO USE RAG

WHEN NOT TO USE RAG

WHEN NOT TO USE RAG

AI ASSISTANT WRAPPER

AI ASSISTANT WRAPPER

AI ASSISTANT ON YOUR DATA

AI ASSISTANT ON YOUR DATA

AI ASSISTANT FINETUNE MODEL

AI ASSISTANT FINETUNE MODEL

AI ASSISTANT CUSTOM MODEL

AI ASSISTANT CUSTOM MODEL

AI ASSISTANT BUILDING BLOCKS

AI ASSISTANT BUILDING BLOCKS

LLM FUNCTION CALLING

LLM FUNCTION CALLING

RAG OVERVIEW SLIDES

RAG OVERVIEW SLIDES

RAG ARCHITECTURE SLIDE

RAG ARCHITECTURE SLIDE

RAG RETRIEVER OPTIONS

RAG RETRIEVER OPTIONS

RAG NODE PROCESSOR

RAG NODE PROCESSOR

RAG NODE POST PROCESSOR

RAG NODE POST PROCESSOR

HOW TO FORM RESPONSE IN AI ASS

HOW TO FORM RESPONSE IN AI ASS

LLM ARCHITECTUE FOR BOT

LLM ARCHITECTUE FOR BOT

LLM CONCERNS SLIDES

LLM CONCERNS SLIDES

LLM THREATS SLIDES

LLM THREATS SLIDES

LLM CHALLENGES SLIDE

LLM CHALLENGES SLIDE

LLM ETHICAL CONCERNS SLIDES

LLM ETHICAL CONCERNS SLIDES

LLM UNCONTROLLED BEHAVIOR SLID

LLM UNCONTROLLED BEHAVIOR SLID

LLLM ETHICAL ISSUES

LLLM ETHICAL ISSUES

DATA OWNERSHIP ISSUES LLM

DATA OWNERSHIP ISSUES LLM

LLM GENERATED OUTPUT ISSUES

LLM GENERATED OUTPUT ISSUES

LLM ENVIRONMENT ISSUES

LLM ENVIRONMENT ISSUES

ENTERPRISE GRADE ANSWERS AI AS

ENTERPRISE GRADE ANSWERS AI AS

APPROACHES TO VERIFY AI ASSIST

APPROACHES TO VERIFY AI ASSIST

EXAMPLE BOT ASSISTANTS

EXAMPLE BOT ASSISTANTS

CUSTOMER ONBOARDING BOT

CUSTOMER ONBOARDING BOT

CUSTOMER ONBOARDING AI ASSISAT

CUSTOMER ONBOARDING AI ASSISAT

ARCHITECTURE SLIDE FOR CUSTOME

ARCHITECTURE SLIDE FOR CUSTOME

SLIDE59

SLIDE59

LLM TRAINING STEPS

LLM TRAINING STEPS

ARCHITECTURE LLM TRAINING

ARCHITECTURE LLM TRAINING


Large Language Models (LLMs)


LLMs are a type of artificial intelligence (AI) capable of processing and generating human-like text in response to a wide range of prompts and questions. Trained on massive datasets of text and code, they can perform various tasks such as:

Generating different creative text formats: poems, code, scripts, musical pieces, emails, letters, etc.
Answering open ended, challenging, or strange questions in an informative way: drawing on their internal knowledge and understanding of the world.
Translating languages: seamlessly converting text from one language to another.
Writing different kinds of creative content: stories, poems, scripts, musical pieces, etc., often indistinguishable from human-written content.

Retrieval Augmented Generation (RAG)


RAG is a novel approach that combines the strengths of LLMs with external knowledge sources. It works by:

Retrieval: When given a prompt, RAG searches through an external database of relevant documents to find information related to the query.
Augmentation: The retrieved information is then used to enrich the context provided to the LLM. This can be done by incorporating facts, examples, or arguments into the prompt.
Generation: Finally, the LLM uses the enhanced context to generate a response that is grounded in factual information and tailored to the specific query.
RAG offers several advantages over traditional LLM approaches:

Improved factual accuracy: By anchoring responses in real-world data, RAG reduces the risk of generating false or misleading information.
Greater adaptability: As external knowledge sources are updated, RAG can access the latest information, making it more adaptable to changing circumstances.
Transparency: RAG facilitates a clear understanding of the sources used to generate responses, fostering trust and accountability.
However, RAG also has its challenges:

Data quality: The accuracy and relevance of RAG's outputs depend heavily on the quality of the external knowledge sources.
Retrieval efficiency: Finding the most relevant information from a large database can be computationally expensive.
Integration complexity: Combining two different systems (retrieval and generation) introduces additional complexity in terms of design and implementation.

Prompt Engineering


Prompt engineering is a crucial technique for guiding LLMs towards generating desired outputs. It involves crafting prompts that:

Clearly define the task: Specify what the LLM should do with the provided information.
Provide context: Give the LLM enough background knowledge to understand the prompt and generate an appropriate response.
Use appropriate language: Frame the prompt in a way that aligns with the LLM's capabilities and training data.



Advantage of using RAG


Better Accuracy: If factual correctness is crucial, RAG can be fantastic. It retrieves information from external sources, allowing the AI assistant to double-check its responses and provide well-sourced answers.
Domain Knowledge: Imagine an AI assistant for medical diagnosis or legal or up to date tax laws. RAG can access medical databases to enhance its responses and ensure they align with established medical knowledge.
Reduce Hallucination: LLMs can sometimes fabricate information, a phenomenon called hallucination in which they make up things. RAG mitigates this risk by grounding the response in retrieved data.
Building Trust: By citing sources, RAG fosters trust with users. Users can verify the information and see the reasoning behind the response.

Disadvantages of using RAG


Speed is Crucial: RAG involves retrieving information, which can add a slight delay to the response. If real-time response is essential, a pre-trained LLM might be sufficient.
Limited Context: RAG works best when the user's query and context are clear. If the conversation is ambiguous, retrieved information might not be relevant.
Privacy Concerns: If the AI assistant deals with sensitive user data, RAG might raise privacy concerns. External retrievals could potentially expose user information.



When to finetune LLM


Consider fine-tuning a large language model (LLM) when you want it to perform better at a specific task or adapt to a particular domain. Here are exampple scenarios where fine-tuning is optimal

Domain-Specific Nuances: If you need an LLM for financial analysis, legal document or on medical document - finething is better. Example word capital has meaning in finance domain. While a pre-trained LLM might understand language, it won't grasp legal, finance or medical specific terms or jargons. Fine-tuning on finance Q&A, document or legal documents will imrpove the LLM to that specific domain.

Instruction Fine-Tuning: This is a recent advancement where you provide the LLM with instructions or demonstrations alongside training data. This can be useful for tasks where you want the LLM to follow a certain style or format, like writing safety instructions in a specific tone.

Specialized Tasks: Imagine you want an LLM to write different kinds of creative content, like poems or code. Fine-tuning on a dataset of poems can improve its poetry generation skills, while fine-tuning on code samples can enhance its code writing abilities.

However, fine-tuning isn't always the right answer. Here are examples when you should not do fine tuning. In fact you should first use standard LLM, prompt engineering. Then try RAG. Later consider fine tuning.

General Use Cases: If you need a broad LLM for various tasks, a pre-trained model will do better job. Pre-trained models are versatile, trained on diverse data and can handle many tasks well enough without specific fine-tuning.

Limited Data: Fine-tuning works well when you have lot of data related to your specific task or domain. If you only add few records which most demo shows - fine-tuning might not be effective and could even harm the model's performance.

Knowledge Integration: If your goal is to add propietary/latest/specific knowledge to the LLM, retrieval-augmented-generation (RAG)is better approach. In RAG LLM retrieves relevant information from a knowledge base. You can use out of box RAG. You can further optimize it by smmarizing knowledge,, creating embeddings with meta data etc.



Fine Tuning Steps



How can LLMs be fine-tuned for summarization?
LLMs (Large Language Models) like GPT-3 can be fine-tuned for summarization using the following approaches:
Supervised training - The simplest approach is to fine-tune the LLM using a large dataset of text-summary pairs. The model is trained to generate the corresponding summary given the input text.
This requires a sizable supervised dataset, which can be expensive to create. Public datasets like CNN/DailyMail can be used.

Self-supervised training - The LLM is trained using the original text as input and the first few sentences as the "summary". This creates weak supervision from the data itself.
The model is then fine-tuned on a smaller set of human-written summaries to improve accuracy. This approach requires less labeled data.

Reinforcement learning - The LLM is first trained autoencoding - to reproduce the input text. Then, rewards are given based on the quality and conciseness of the generated summary.
The model learns to generate better summaries through trial-and-error to maximize these rewards. However, this requires defining a good reward function.

Filtering and post-processing - Generated summaries from the LLM can be filtered and refined using techniques like:
• Extracting sentences with the highest similarity to human references • Removing repetitive sentences • Combining overlapping content into a single sentence, etc.
This requires minimal fine-tuning of the base LLM but provides less control over the summary style.

Prompting - The LLM can be "prompted" to generate a summary using natural language instructions. For example:
In 2-3 short paragraphs, summarize the main points of the following text:
This relies more on the pre-trained LLM abilities and requires less labeled data. But accuracy tends to be lower.
So in short, there are a variety of approaches to fine-tune LLMs for summarization - from fully supervised to minimally supervised. The choice depends on the available data, required accuracy and custom need.



Verify LLM and AI Assistant Answers

If you are using AI Assistant, you should cross check facts/number given by AI Assistant

Check in Vecor DB: If you are using Vector DB/RAG, you can check what value RAG provide. This will help to ensure that response generated by RAG is in line with value stored in vector DB.
Use Second LLM: Other/aditional approach is you can ask a smaller question from second or same LLM and se what answer you get e.g. if there is 1 page of text and it says company Dataknobs has revenue of $78M, you can ask a smaller question "how much revneue Dataknobs has". However you need to consider additional cost of 2nd call? You may have more than one fact and multiple calls may be needed for each fact.
Call to Search Engine: You can run a query on search engine programmatically and chec response. However depending on domain this result may or may not work. It may require parsing result from search engine.





How to evaluate LLM


Method Description
Perplexity Perplexity measures how well a language model predicts a sample of text. Lower perplexity indicates better performance.
BLEU Score BLEU (Bilingual Evaluation Understudy) Score is commonly used to evaluate the quality of machine-translated text by comparing it to human-generated translations.
ROUGE Score ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Score is used to evaluate the quality of summaries produced by a language model by comparing them to reference summaries.
Human Evaluation Human evaluation involves having human judges assess the quality of text generated by the language model based on criteria such as fluency, coherence, and relevance.
Word Error Rate (WER) WER measures the difference between the words generated by the language model and the reference text. Lower WER indicates better performance.





100K-tokens    Agenda    Ai-assistant-architecture    Ai-assistant-building-blocks    Ai-assistant-custom-model    Ai-assistant-evaluation-metric    Ai-assistant-finetune-model    Ai-assistant-on-your-data    Ai-assistant-tech-stack    Ai-assistant-wrapper   

From the blog

Build Dataproducts

How Dataknobs help in building data products

Enterprises are most successful when they treat data like a product. It enable to use data in multiple use cases. However data product should be designed differently compared to software product.

Be Data Centric and well governed

Generative AI is one of approach to build data product

Generative AI has enabled many transformative scenarios. We combine generative AI, AI, automation, web scraping, ingesting dataset to build new data products. We have expertise in generative AI, but for business benefit we define our goal to build data product in data centric manner. Our Product KREATE enable creation of data, user interface, AI assistant. Click to see it in action.

Well Governed data

Data Lineage and Extensibility

To build a commercial data product, create a base data product. Then add extension to these data product by adding various types of transformation. However it lead to complexity as you have to manage Data Lineage. Use knobs for lineage and extensibility

Build Budget Plan for GenAI

CIO Guide to create GenAI Budget for 2025

CIOs and CTOs can apply GenAI in IT Systems. The guide here describe scenarios and solutions for IT system, tech stack, GenAI cost and how to allocate budget. Once CIO and CTO can apply this to IT system, it can be extended for business use cases across company.

What is KREATE and KreatePro

Kreate - Bring your Ideas to Life

KREATE empowers you to create things - Dataset, Articles, Presentations, Proposals, Web design, Websites and AI Assistants Kreate is a platform inclide set of tools that ignite your creatviity and revolutionize the way you work. KReatePro is enterprise version.

What is KONTROLS

KONTROLS - apply creatvity with responsbility

KONTROLS enable adding guardrails, lineage, audit trails and governance. KOntrols recogizes that different use cases for Gen AI and AI have varying levels of control requirements. Kontrols provide structure to select right controls.

What is KNOBS

KNOBS - Experimentation and Diagnostics

Well defined tunable paramters for LLM API, LLM fine tuning , Vector DB. These parameters enable faster experimentation and diagosis for every state of GenAI development - chunking, embedding, upsert into vector DB, retrievel, generation and creating responses for AI Asistant.

Kreate Articles

Create Articles and Blogs

Create articles for Blogs, Websites, Social Media posts. Write set of articles together such as chapters of book, or complete book by giving list of topics and Kreate will generate all articles.

Kreate Slides

Create Presentations, Proposals and Pages

Design impactful presentation by giving prmpt. Convert your text and image content into presentations to win customers. Search in your knowledbe base of presentations and create presentations or different industry. Publish these presentation with one click. Generate SEO for public presentations to index and get traffic.

Kreate Websites

Agent to publish your website daily

AI powered website generation engine. It empower user to refresh website daily. Kreate Website AI agent does work of reading conent, website builder, SEO, create light weight images, create meta data, publish website, submit to search engine, generate sitemap and test websites.

Kreate AI Assistants

Build AI Assistant in low code/no code

Set up AI Assistant that give personized responss to your customers in minutes. Add RAG to AI assistant with minimal code- implement vector DB, create chunks to get contextual answer from your knowlebase. Build quality dataset with us for fine tuning and training a cusom LLM.

Create AI Agent

Build AI Agents - 5 types

AI agent independently chooses the best actions it needs to perform to achieve their goals. AI agents make rational decisions based on their perceptions and data to produce optimal performance and results. Here are features of AI Agent, Types and Design patterns

Develop data products with KREATE and AB Experiment

Develop data products and check user response thru experiment

As per HBR Data product require validation of both 1. whether algorithm work 2. whether user like it. Builders of data product need to balance between investing in data-building and experimenting. Our product KREATE focus on building dataset and apps , ABExperiment focus on ab testing. Both are designed to meet data product development lifecycle

Innovate with experiments

Experiment faster and cheaper with knobs

In complex problems you have to run hundreds of experiments. Plurality of method require in machine learning is extremely high. With Dataknobs approach, you can experiment thru knobs.

RAG For Unstructred and Structred Data

RAG Use Cases and Implementation

Here are several value propositions for Retrieval-Augmented Generation (RAG) across different contexts: Unstructred Data, Structred Data, Guardrails.

Why knobs matter

Knobs are levers using which you manage output

See Drivetrain appproach for building data product, AI product. It has 4 steps and levers are key to success. Knobs are abstract mechanism on input that you can control.

Our Products

KreateBots

  • Pre built front end that you can configure
  • Pre built Admin App to manage chatbot
  • Prompt management UI
  • Personalization app
  • Built in chat history
  • Feedback Loop
  • Available on - GCP,Azure,AWS.
  • Add RAG with using few lines of Code.
  • Add FAQ generation to chatbot
  • KreateWebsites

  • AI powered websites to domainte search
  • Premium Hosting - Azure, GCP,AWS
  • AI web designer
  • Agent to generate website
  • SEO powered by LLM
  • Content management system for GenAI
  • Buy as Saas Application or managed services
  • Available on Azure Marketplace too.
  • Kreate CMS

  • CMS for GenAI
  • Lineage for GenAI and Human created content
  • Track GenAI and Human Edited content
  • Trace pages that use content
  • Ability to delete GenAI content
  • Generate Slides

  • Give prompt to generate slides
  • Convert slides into webpages
  • Add SEO to slides webpages
  • Content Compass

  • Generate articles
  • Generate images
  • Generate related articles and images
  • Get suggestion what to write next