Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesBotsEarnCopy
Local Byte Fusion for Neural Machine Translation

Local Byte Fusion for Neural Machine Translation

Netmind BlogNetmind Blog2024/08/07 08:24
By:Netmind Blog
Local Byte Fusion for Neural Machine Translation image 0
(1) N-gram Convolution Fusion (nCF)

Collaborating with researchers from the University of Wisconsin-Madison, we put forward a fundamental tokenization method that we term Local Byte Fusion (LOBEF) for byte-based machine translation. The method outperforms other techniques, especially when it comes to multilingual translation.

Unlike the current dominant tokenization technique, subword tokenization, which has limitations on multilingual corpus, LOBEF utilizes byte n-gram and word boundaries to aggregate local semantic information. Thus, it has advantages on multilingual corpus with universal tokenization schema. In experiments, our method outperforms traditional byte-based models and subword techniques on multilingual translation, zero-shot cross-lingual transfer and domain adaptation.

LOBEF contains both n-gram Convolution Fusion (nCF) and Word-based Self-attention Fusion (WSF). In nCF, we use four convolutional layers to aggregate character level information, and in WSF, we use word boundaries with block-wise self attention to aggregate word level information. Our results indicate that byte based models outperform subword baselines on the Flores-101 dataset. Additionally, byte based models are smaller than comparable subword models and are 20% faster to train.

Local Byte Fusion for Neural Machine Translation image 1
(2) Word-based Self-attention Fusion (WSF)
NetMind.Ai © 2021-2024 Terms and conditions Privacy policy
0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Locked for new tokens.
APR up to 10%. Always on, always get airdrop.
Lock now!

You may also like

WEMIX Foundation Unveils WEMIX PAY Buyback Plan to Drive Sustainable Token Growth

The WEMIX Foundation remains committed to continuous innovation and community-driven growth, aligning its efforts with the evolving needs of its users.

Coinomedia2025/05/18 17:44
WEMIX Foundation Unveils WEMIX PAY Buyback Plan to Drive Sustainable Token Growth

Troller Cat Charges Toward 4000% ROI as Neiro and Notcoin Cool—Best Cryptos for Beginners in 2025

Troller Cat heats up as Stage 5 kicks in. Compare it with Neiro, and Notcoin to discover the best cryptos for beginners.Troller Cat ($TCAT): Stage 5 Goes Live—A 34.95% Price Boost Signals the Race Is OnNeiro ($NEIRO) Crashes Over 12% as Volume Surges Past Market Cap in High-Volatility MoveNotcoin ($NOT) Drops Over 6% as Traders Pull Back Amid Slowing VolumeConclusion

Coinomedia2025/05/18 17:44
Troller Cat Charges Toward 4000% ROI as Neiro and Notcoin Cool—Best Cryptos for Beginners in 2025