yobook-api

YoBook API

An open-source Nepal school textbook catalog and API.

YoBook API collects public educational-book metadata from official and public sources, keeps CEHRD as the primary source, generates real cover images from PDF first pages, and serves everything through a simple Flask API and browser UI.

Open Source License

This project is free to use, copy, modify, distribute, and build on.

Please give credit when you use it by preserving the LICENSE and NOTICE files, and by mentioning:

Powered by YoBook API

The project code is released under the MIT License. Source textbook PDFs, book covers generated from those PDFs, trademarks, and third-party metadata remain owned by their original publishers and providers.

Why CEHRD First?

CEHRD Learning Portal is the primary source because it currently gives the cleanest official structure:

Other sources are still useful as secondary enrichment.

Sources

Source Role What It Provides
CEHRD Learning Portal Primary Official grade/subject textbook PDFs from learning.cehrd.gov.np
CDC Nepal Secondary Official CDC publication links and curated textbook records
E-Pustakalaya Secondary Public digital-library records for Nepal education
Internet Archive Supplementary Digitized Nepal-related books and documents
Open Library Supplementary Additional public catalog metadata

Features

Quick Start

pip install -r requirements.txt
python api.py

Open:

http://127.0.0.1:5000/

Scraping

Scrape the primary CEHRD source:

python scraper.py --source cehrd

Scrape one grade:

python scraper.py --source cehrd --grade 5

Scrape everything:

python scraper.py

Generate real covers from PDF first pages:

python generate_pdf_covers.py --source cehrd-learning

The generated covers are saved in:

data/covers/

API

List Books

GET /api/books

Useful filters:

Query Example
source /api/books?source=cehrd-learning
grade /api/books?grade=10
subject /api/books?subject=Science
q /api/books?q=mathematics
limit /api/books?limit=20
page /api/books?page=2

Other Endpoints

GET /api/books/<id>
GET /api/sources
GET /api/stats
GET /docs

Data Shape

{
  "id": "cehrd-learning-g1-mathematics-40",
  "title": "Mathematics - Grade 1",
  "author": "Centre for Education and Human Resource Development",
  "grade": 1,
  "subject": "Mathematics",
  "language": "en",
  "country": "np",
  "curriculum": "CDC Nepal",
  "source": "cehrd-learning",
  "sourceUrl": "https://learning.cehrd.gov.np/mod/resource/view.php?id=40",
  "readUrl": "https://learning.cehrd.gov.np/mod/resource/view.php?id=40",
  "pdfUrl": "https://learning.cehrd.gov.np/pluginfile.php/...",
  "coverUrl": "/covers/cehrd-learning-g1-mathematics-40.jpg",
  "category": "Textbook",
  "keywords": ["CEHRD", "CDC", "textbook", "Nepal", "class 1", "Mathematics"]
}

Project Structure

book-api/
  api.py                    Flask API and UI server
  scraper.py                Source scrapers
  generate_pdf_covers.py    Generates covers from PDF first pages
  requirements.txt          Python dependencies
  Procfile                  Production start command
  openapi.json              API schema
  templates/
    index.html              Browser UI
  data/
    all_books.json          Merged catalog, CEHRD first
    cehrd_learning.json     Primary CEHRD data
    cdc_nepal.json          CDC data
    pustakalaya.json        E-Pustakalaya data
    archive_org.json        Internet Archive data
    open_library.json       Open Library data
    covers/                 Generated local book covers

Deployment

Recommended start command:

gunicorn api:app

Good hosting options:

For a simple public deployment, commit the JSON data and generated covers so the app works immediately after deploy.

Attribution

If you use this project in an app, website, API, dataset, research project, or redistributed package, please include visible or documented credit:

Powered by YoBook API

Also keep the original LICENSE and NOTICE files with the code or distribution.

Content Notice

YoBook API does not claim ownership of CEHRD, CDC, E-Pustakalaya, Internet Archive, Open Library, or other third-party source content.

The scraper and API code, catalog structure, normalization logic, and documentation are open source. Textbook PDFs and generated PDF-cover images may be subject to the original publishers’ terms.

Contributing

Contributions are welcome. Good first improvements include:

Please read CONTRIBUTING.md before opening a pull request.