Dimension scores are derived from public data and fields; weighted into the composite. Reference only.
HUPD (Harvard USPTO Patent Dataset) is a dataset of English-language U.S. USPTO utility patent applications released by researchers from Stanford, Oxford, Harvard, and other institutions. According to the site, it covers English utility patent applications filed with the USPTO from January 2004 to December 2014, and is positioned as a “large-scale, structured, multi-purpose” corpus. It is better understood as research data infrastructure rather than a traditional SaaS developer tool.
In terms of features and use cases, HUPD is suitable for patent-text NLP, machine learning modeling, patent classification, search, summarization, legal/technical text analysis, and related scenarios. The page provides a paper, a GitHub Codebase, dataset downloads, and Google Colab Notebooks, indicating an emphasis on reproducible research and quick experimentation. As for language support, the dataset itself is in English. The crawled content does not mention dedicated support for frameworks such as Python, PyTorch, or TensorFlow, nor does it disclose an API or SDK.
Because the page provides access to a GitHub repository and dataset downloads, developers should be able to use it locally or in their own computing environments, making self-hosting practically feasible. However, the main text does not clearly specify an open-source license, data license, field definitions, dataset size, version update policy, or commercial-use restrictions. In terms of documentation quality, the paper and Colab resources are very helpful for researchers, but based only on the crawled text, engineering documentation and data governance details remain insufficient.
The main text does not mention any fees, subscriptions, or enterprise editions. Since it offers “Download the Dataset,” it can be regarded as a free/open-download resource, though the exact licensing should still be verified on the actual download page. Access from China cannot be determined from the text alone; GitHub and Google Colab may be unstable or restricted in mainland China, so practical use may require alternative download sources or a proxy environment. No payment methods are provided.
Its strengths are an authoritative data source, a clearly defined time range, and solid supporting research materials, making it suitable for universities, labs, NLP engineers, and patent analytics teams. Its limitations are that coverage ends in 2014, the scope is limited to U.S. English utility patent applications, and it lacks API access, commercial support, and clear licensing details. If you need ongoing updates, visual search, or production-grade interfaces, alternatives worth comparing include USPTO bulk data, Google Patents Public Datasets, PatentsView, and The Lens.
⚠ This review is compiled from public sources and does not constitute a purchase recommendation. Verify all facts on the vendor's official site. Verify on patentdataset.org official site.
patentdataset.org is an United States Dev Tools provider. TG4G tracks its product information, an overall rating of 8.0/10, and a China-accessibility score of China direct-connect friendly. Click "Visit Official Site" to reach patentdataset.org directly.