What is site-bench.com?

site-bench.com is a United States-based Dev Tools provider. Aimed at AI Agent and frontend evaluation, with a relatively strong information gap.

Is site-bench.com usable in China?

site-bench.com offers good direct-connect performance in mainland China and works in most regions without a proxy. The provider is headquartered in United States and primarily serves overseas markets.

How do I sign up for site-bench.com?

Visit the site-bench.com official site to complete sign-up. Registration typically requires an email (Gmail/Outlook recommended) and a payment method. Most overseas services accept credit card / PayPal / crypto. See the "Visit Official Site" button on this page for the direct link.

🔧 Dev Tools 📍 HQ: United States

S

site-bench.com

Name: site-bench.com
Brand: site-bench.com
Rating: 7.0 (1 reviews)

Overall Rating

★★★⯨☆ 7.0/10

China Access

★★★ China direct-connect friendly

Data source

ai_crawl · Last updated 2026-06-08

Editorial Highlights

Aimed at AI Agent and frontend evaluation, with a relatively strong information gap.

In-Depth Review TG4G Review ·2026-06-08 · For reference only

What It Is

WcodeW is a “web → code → web” closed-loop benchmark and viewer designed to measure how faithfully an LLM Agent can recreate real web pages from static specifications, without access to the actual URL. It first archives the real page with Playwright, then asks the agent to generate a single-file, self-contained index.html, and finally compares the generated result against the archived snapshot item by item.

Core Capabilities

Its evaluation covers four main dimensions: visual, DOM, interaction, and agentic judge. Visual SSIM carries a 50% weight, DOM similarity 30%, interaction execution and post-state matching 5%, and LLM-based semantic judgment 15%. The viewer also provides a per-pixel diff percentage, making it easier to quickly identify differences visible to the naked eye. The frontend supports four modes: iframe, screenshot, diff, and code. Users can browse different bundles, viewports, and scroll steps via a slider, matrix view, and gallery.

Technology and Ecosystem

The project uses Playwright to capture the DOM, accessibility tree, network responses, and screenshots. The viewer is built with plain HTML, ES modules, and CSS, with no build dependency. It can run offline or be deployed to GitHub Pages. The data layer provides wclone-export.csv and multiple JSON indexes, making it suitable for import into spreadsheets or pandas for further analysis. The source code is released on GitHub under the MIT license and has the basics needed for self-hosting.

Pricing and Documentation

No commercial pricing or paid plans are mentioned in the main text. As an MIT open-source project, it can be understood as free to use. The documentation explains scoring weights, view modes, limitations, keyboard shortcuts, and data export fairly clearly. However, adding a new bundle or agent run requires placing HTML files in the correct directories and running scripts, while bundle creation details are only referenced via the annotator playbook. Overall, it still feels more like a research and engineering tool.

Pros, Cons, and Who It’s For

Its strengths include a transparent evaluation workflow, strong visual comparison, analysis-friendly metric exports, and simple deployment. The limitations are also clear: it evaluates static visual replication, not a production-ready functional replacement; JavaScript is disabled in sandboxed iframes, so complex dynamic effects cannot be covered; and pixel diffs are sensitive to tiny shifts. It is best suited for LLM Agent researchers, AI coding tool teams, and evaluators of web page generation models.

Access from China

The main text does not provide information about mainland China network access, mirrors, payment, or service support, so site accessibility can only be marked as unknown. If GitHub access is unstable, self-hosting the static files may be more reliable. Alternative directions include WebArena, VisualWebArena, and BrowserGym; for visual regression scenarios, Playwright screenshot diff, Percy, or Chromatic can be used.

⚠ This review is compiled from public sources and does not constitute a purchase recommendation. Verify all facts on the vendor's official site. Verify on site-bench.com official site.

About this entry

site-bench.com is an United States Dev Tools provider. TG4G tracks its product information, an overall rating of 7.0/10, and a China-accessibility score of China direct-connect friendly. Click "Visit Official Site" to reach site-bench.com directly.