Hugging Face Hub Python SDK

ML Tooling
NLP
HuggingFace
Author

Ravi Kalia

Published

April 6, 2025

The Hugging Face Hub is a powerful platform for sharing, discovering, and using machine learning models, datasets, and demos. It integrates seamlessly with the Hugging Face Python ecosystem (transformers, datasets, huggingface_hub, diffusers), and supports workflows for both research and production.

Model Hub

The Model Hub hosts over 500,000 models across tasks like NLP, vision, audio, tabular, and multimodal learning.

Each model repo includes: - Pretrained weights (pytorch_model.bin, tf_model.h5, etc.) - Config files (config.json, tokenizer.json, etc.) - README with metadata, examples, and citations - Git-based version control

Code
from transformers import pipeline
pipe = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
print(pipe("I love the Hugging Face Hub!"))
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use mps:0
[{'label': 'POSITIVE', 'score': 0.999834418296814}]

Dataset Hub

The Dataset Hub provides thousands of datasets, from common NLP corpora to genomics and code.

Code
from datasets import load_dataset
ds = load_dataset("ag_news")
print(ds["train"][0])
{'text': "Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.", 'label': 2}

Features: • Supports streaming and lazy loading • Efficient storage using Apache Arrow / Parquet • Built-in filtering, mapping, and shuffling

Spaces

Spaces are interactive demos powered by Gradio or Streamlit.

Use cases: • Prototypes • Research artifacts • Community tutorials

Example: Stable Diffusion Space

Spaces can be deployed with just a app.py and pushed via Git.

Versioning & Access Control

Every repo is a git repository. You can: • Push and pull like GitHub • Track changes, create branches • Set permissions (private/public) • Create organizations and teams

huggingface-cli login
git clone https://huggingface.co/your-username/your-model

Python SDK: huggingface_hub

Install:

pip install huggingface_hub

Common operations:

Code
from huggingface_hub import snapshot_download
snapshot_download(repo_id="google/flan-t5-small")
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
'/Users/ravikalia/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab'

You can also create, delete, or update models and datasets programmatically.

Enterprise Features

For teams and orgs: • Private model hosting • Inference endpoints • Model cards with ethical considerations • Compliance with data governance • Integration with cloud platforms (AWS/GCP/Azure)

The Hugging Face Hub is a cornerstone of the open-source ML ecosystem. Whether you’re a researcher sharing checkpoints or a developer integrating state-of-the-art models, it’s an essential tool in your stack.