What is claw-mark.com?

claw-mark.com is a United States-based AI Apps provider. Useful for tracking Agent evaluations and monitoring model capabilities.

Is claw-mark.com good? Is it worth it?

claw-mark.com scores 7.0/10 on TG4G — a solid rating, based in 美国. See the in-depth review below for pros, cons and China accessibility.

Is claw-mark.com usable in China?

claw-mark.com offers good direct-connect performance in mainland China and works in most regions without a proxy. The provider is headquartered in United States and primarily serves overseas markets.

How do I sign up for claw-mark.com?

Visit the claw-mark.com official site to complete sign-up. Registration typically requires an email (Gmail/Outlook recommended) and a payment method. Most overseas services accept credit card / PayPal / crypto. See the "Visit Official Site" button on this page for the direct link.

🤖 AI Apps 📍 HQ: United States

C

claw-mark.com

Name: claw-mark.com
Brand: claw-mark.com
Rating: 7.0 (1 reviews)

Overall Rating

★★★⯨☆ 7.0/10

China Access

★★★ China direct-connect friendly

Quick Check

🔎 Is any site accessible in China? →

Data source

ai_crawl · Last updated 2026-06-07

⚡ Score breakdown

5-dim weighted · /10

Performance25% 7.0

Value20% 7.0

China access20% 10.0

Reputation20% 6.0

Support15% 6.5

Dimension scores are derived from public data and fields; weighted into the composite. Reference only.

Editorial Highlights

Useful for tracking Agent evaluations and monitoring model capabilities.

In-Depth Review TG4G Review ·2026-06-07 · For reference only

What It Is

ClawMark is an AI coworker agents benchmark launched by Evolvent AI/ClawMark Team. Its goal is not to provide a ready-to-use office Agent product, but to systematically evaluate models’ long-horizon, multi-tool, multimodal collaboration capabilities in real-world workflows. It includes 100 tasks across 13 professional domains. Each task simulates 1–3 workdays and covers scenarios such as research, content operations, HR, ecommerce, news, product management, and insurance.

Core Capabilities

Its core design is the “omni setting”: an Agent must switch between environments such as the file system, email, Notion mock, Google Sheets mock, and Calendar, while processing raw evidence including screenshots, photos, PDFs, CSVs, audio, and video. Tasks also include implicit state changes over time, such as new emails arriving, database updates, and calendar adjustments, testing whether a model can proactively refresh external state and make decisions continuously. Scoring is based entirely on 10–25 deterministic Python checkers, with no LLM-as-judge involved, so results are reproducible. It can also output per-checker pass/fail results, messages.jsonl, and the final workspace, making it easier to identify why a run failed.

Models and Pricing

The page shows benchmark rankings, token usage, and estimated costs for GPT-5.4, Claude 4.6 Sonnet, Qwen 3.6 Plus, Gemini 3.1 Pro Preview, and MiniMax M2.7. ClawMark itself does not disclose any pricing. The project can be cloned from GitHub and run locally with Docker, but a full evaluation requires bringing your own model API keys and paying for model usage. The costs listed on the page range from about $53 for MiniMax M2.7 to about $946 for Claude 4.6 Sonnet.

Pros and Cons

Its strengths are that the tasks closely resemble real office work, cover multimodal inputs and cross-tool collaboration, and use rule-based scoring that is more stable than subjective review. It also provides a clear project structure, Quick Start, environment configuration, and smoke tests, making it suitable for reproducible experiments. The drawbacks are that it is not an out-of-the-box tool for ordinary users: it requires configuring Docker, uv, model APIs, and Notion and Google Sheets credentials. Full runs are not cheap, and the page does not explain commercial support, privacy compliance, or SLA.

Who It’s For and Access from China

ClawMark is better suited to AI Agent researchers, model evaluation teams, enterprise AI application labs, and framework developers. It can be used for model comparisons, validation of complex office automation, and regression testing. The page does not provide clear information about access from China. Because it depends on GitHub, some overseas model APIs, Notion/Google services, and similar resources, actual usability may be affected by network and payment conditions. Alternative or comparative benchmarks to watch include GAIA, WebArena, OSWorld, AgentBench, SWE-bench, and τ-bench.

⚠ This review is compiled from public sources and does not constitute a purchase recommendation. Verify all facts on the vendor's official site. Verify on claw-mark.com official site.

About this entry

claw-mark.com is an United States AI Apps provider. TG4G tracks its product information, an overall rating of 7.0/10, and a China-accessibility score of China direct-connect friendly. Click "Visit Official Site" to reach claw-mark.com directly.