SIGIR 2024 Tutorial:
Preventing and Detecting Misinformation Generated by Large Language Models

1Tsinghua University, 2Institute of Computing Technology, Chinese Academy of Sciences 3The Hong Kong University of Science and Technology (Guangzhou)

Tuesday July 14 South American B 13:30 - 17:00 (EST) @ Washington, D.C.
Our slides are available here.

About this tutorial

As large language models (LLMs) become increasingly capable and widely deployed, the risk of them generating misinformation poses a critical challenge. Misinformation from LLMs can take various forms, from factual errors due to hallucination to intentionally deceptive content, and can have severe consequences in high-stakes domains.

This tutorial covers comprehensive strategies to prevent and detect misinformation generated by LLMs. We first introduce the types of misinformation LLMs can produce and their root causes. We then explore two broad categories:

Preventing misinformation generation:

  • a) Enhancing LLM Knowledge:
    • [Internal Knowledge] Constructing more truthful datasets
    • [Internal Knowledge] LLM knowledge editing
    • [External Knowledge] Retrieval augmented generation
  • b) Enhancing Knowledge Inference in LLMs:
    • Factual decoding method
    • Factual alignment
    • Adversarial training
  • c) Promoting Ethical Values in LLMs:
    • Safety alignment

Detecting misinformation after generation, including:

  • a) LLM-Generated Text Detection:
    • Watermarking based detection
    • Post-generation detection
  • b) Misinformation Detection:
    • General misinformation detection
    • LLM-generated misinformation detection

We also discuss the challenges and limitations of detecting LLM-generated misinformation.

Schedule

Our tutorial will be held on July 14 (all the times are based on EST = Washington local time).

Time Section Presenter
13:30—13:45 Section 1: Overview of LLM Generated Misinformation Xuming (Visa Issue) Aiwei
13:45—14:15 Section 2: [Preventing Misinformation] Enhancing LLM Knowledge Xuming (Visa Issue) Aiwei
14:15—14:45 Section 3: [Preventing Misinformation] Enhancing Knowledge Inference in LLMs Aiwei
14:45—14:55 Section 4: [Preventing Misinformation] Promoting Ethical Values in LLMs Aiwei
14:55—15:00 Q & A Session I
15:00—15:30 Coffee break
15:30—15:50 Section 5: [Detecting Misinformation] Watermarking Based Detection Aiwei
15:50—16:15 Section 6: [Detecting Misinformation] Post-Generation Detection Qiang
16:15—16:30 Section 7: [Detecting Misinformation] General Misinformation Detection Qiang
16:30—16:45 Section 8: [Detecting Misinformation] LLM-Generated Misinformation Detection Qiang
16:45—16:50 Section 9: Conclusion and Discussion Qiang
16:50—17:00 Q & A Session II

Reading List


Section 1: Overview of LLM Generated Misinformation


Section 2: [Preventing Misinformation] Enhancing LLM Knowledge


Section 3: [Preventing Misinformation] Enhancing Knowledge Inference in LLMs


Section 4: [Preventing Misinformation] Promoting Ethical Values in LLMs


Section 5: [Detecting Misinformation] Watermarking Based Detection


Section 6: [Detecting Misinformation] Post-Generation Detection


Section 7: [Detecting Misinformation] General Misinformation Detection


Section 8: [Detecting Misinformation] LLM-Generated Misinformation Detection


BibTeX

@inproceedings{10.1145/3626772.3661377,
      author = {Liu, Aiwei and Sheng, Qiang and Hu, Xuming},
      title = {Preventing and Detecting Misinformation Generated by Large Language Models},
      year = {2024},
      isbn = {9798400704314},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3626772.3661377},
      doi = {10.1145/3626772.3661377},
      pages = {3001–3004},
      numpages = {4},
      keywords = {hallucination, large language models, misinformation},
      location = {Washington DC, USA},
      series = {SIGIR '24}
}