Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesEarnSquareMore
New initiative enhances AI access to Wikipedia information

New initiative enhances AI access to Wikipedia information

Bitget-RWA2025/10/01 13:25
By:Bitget-RWA

On Wednesday, Wikimedia Deutschland revealed a new database designed to make Wikipedia’s extensive information more easily available to AI systems.

Named the Wikidata Embedding Project, this platform utilizes a vector-based semantic search method—a process that enables computers to interpret the meanings and connections between words—on the vast data from Wikipedia and its related sites, which together hold close to 120 million records.

By integrating support for the Model Context Protocol (MCP)—a standard that enables AI to interact with data sources—the initiative allows LLMs to access the data through natural language queries more effectively.

Wikimedia’s German division developed the project in partnership with neural search company Jina.AI and DataStax, a real-time data training firm owned by IBM.

For years, Wikidata has provided machine-readable information from Wikimedia sites, but previous tools only supported keyword searches and SPARQL, a specialized query language. The updated system is better suited for retrieval-augmented generation (RAG) setups, which let AI models incorporate external knowledge, giving developers the ability to anchor their models in content reviewed by Wikipedia editors.

The data is organized to deliver essential semantic context. For example, searching for “scientist” in the database will yield lists of notable nuclear scientists, scientists affiliated with Bell Labs, translations of “scientist” in various languages, an approved Wikimedia image of scientists at work, and related terms like “researcher” and “scholar.”

Anyone can access the database on Toolforge. Additionally, Wikidata will host a webinar for developers interested in the project on October 9th.

This initiative arrives at a time when AI developers are urgently seeking reliable, high-quality data to refine their models. Training environments have grown more advanced—often built as intricate systems rather than simple datasets—but they still depend on carefully curated information. For applications demanding high precision, trustworthy data is crucial. While Wikipedia may have its critics, its content is far more fact-based than broad collections like Common Crawl, which aggregates vast numbers of web pages from the internet.

Sometimes, the pursuit of top-tier data can be costly for AI companies. For instance, in August, Anthropic agreed to pay $1.5 billion to settle a lawsuit with a group of authors whose works were used for training, resolving all related claims.

In a statement to the media, Wikidata AI project manager Philippe Saadé highlighted the project’s independence from major tech firms or leading AI labs. “The launch of this Embedding Project demonstrates that advanced AI doesn’t need to be dominated by a few corporations,” Saadé said. “It can be open, collaborative, and designed to benefit everyone.”

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!

You may also like

Dogecoin News Today: Dogecoin ETFs Indicate Growing Interest from Institutions, Yet Can the Meme Coin Demonstrate Its Value?

- U.S. Dogecoin ETFs (BWOW, GDOG) launched amid mixed market reactions, with GDOG's $1.4M debut volume far below $12M forecasts. - Both ETFs hold actual Dogecoin but lack 1940 Act registration, exposing investors to liquidity risks and regulatory uncertainty. - Market analysts cite Dogecoin's volatility, meme origins, and descending price patterns as barriers to mainstream adoption. - Fee structures (0.34-0.35%) and regulatory ambiguity highlight challenges in monetizing meme coins despite institutional in

Bitget-RWA2025/11/29 19:04
Dogecoin News Today: Dogecoin ETFs Indicate Growing Interest from Institutions, Yet Can the Meme Coin Demonstrate Its Value?

Australia's Cryptocurrency Reform Strikes a Balance Between Fostering Innovation and Safeguarding Investors

- Australia introduces 2025 Digital Assets Framework Bill to regulate crypto exchanges and custody providers under stricter licensing and ASIC oversight, aiming to protect investors and align with traditional finance standards. - The bill classifies operators into "digital asset platforms" and "tokenized custody platforms," with exemptions for small operators under A$5,000 per customer and A$10M annual transactions. - An 18-month transition period and potential A$24B annual productivity gains are expected,

Bitget-RWA2025/11/29 19:04
Australia's Cryptocurrency Reform Strikes a Balance Between Fostering Innovation and Safeguarding Investors

ZK Pumping: How Infrastructure Grants Propel Expansion in Real Estate and Technology Sectors

- Webster , NY's $9.8M FAST NY grant transforms a 300-acre Xerox brownfield into a high-tech industrial hub via infrastructure upgrades. - The project reduces development barriers, attracting $650M private investments like the fairlife® dairy plant and boosting property values by up to 30%. - Tech integration, including blockchain-based traffic systems, positions Webster as a model for linking physical and digital infrastructure in industrial growth. - "ZK Pumping" demonstrates how strategic infrastructure

Bitget-RWA2025/11/29 19:02
ZK Pumping: How Infrastructure Grants Propel Expansion in Real Estate and Technology Sectors

Bitcoin Updates Today: Assessing Bitcoin's Support Zones—Will Institutional Investments Surpass Federal Reserve Ambiguity?

- Bitcoin faces critical $84,000–$86,000 support after 31% November selloff, with institutional inflows and whale accumulation signaling ongoing bull cycle resilience. - JPMorgan upgrades miners like Cipher Mining amid rising HPC demand, while Fed rate-cut odds hit 71% for December, potentially boosting risk assets. - On-chain data shows historic BTC transfers to long-term holdings, contrasting with Binance's delistings and regulatory-driven liquidity management efforts. - 2025–2030 price forecasts range $

Bitget-RWA2025/11/29 18:50
Bitcoin Updates Today: Assessing Bitcoin's Support Zones—Will Institutional Investments Surpass Federal Reserve Ambiguity?