FUTO Releases 1M Open-Source Swipe Typing Dataset

FUTO has announced the release of a significant dataset containing over one million QWERTY English swipes, now available on HuggingFace under the MIT license. The collection process began in August 2024, utilizing a voluntary mobile-based platform where users swiped Wikipedia-sourced sentences word-by-word. After filtering for quality, the final dataset was released in March 2025. This initiative aims to improve swipe typing models and provide a robust benchmark for evaluating different typing systems. FUTO utilized this data extensively to refine its own models, marking a major contribution to open-source mobile input technology and linguistic data accessibility. By providing this data under a permissive license, FUTO enables developers to enhance mobile keyboard accuracy and performance.

Key Takeaways

Massive Scale: The dataset contains over 1 million high-quality QWERTY English swipes collected from voluntary users.
Open Source Accessibility: Released under the MIT license, the data is freely available on HuggingFace for developers and researchers.
Rigorous Methodology: Data was collected word-by-word using Wikipedia sentences and underwent a filtering process to ensure quality.
Practical Application: FUTO has already utilized this dataset to train its own models and evaluate various swipe typing systems.
Timeline: The project spanned from initial collection in August 2024 to the public release in March 2025.

In-Depth Analysis

The Lifecycle of the FUTO Swipe Dataset

The development of the FUTO Swipe dataset represents a multi-stage effort to improve mobile input technology. The initiative began in August 2024 with the launch of a dedicated collection domain, swipe.futo.org. This platform was specifically designed for mobile users to contribute QWERTY English swipes. The process was built on a foundation of user consent; participants were provided with detailed instructions and information about the dataset before agreeing to contribute. This transparent approach ensured that the data collected was both ethical and focused on the specific needs of swipe typing models.

Between the start of collection and the eventual release, the project focused on gathering a diverse range of inputs. Users were presented with sentences primarily sourced from Wikipedia, which provided a broad vocabulary and varied sentence structures. The specific instruction to swipe "word-by-word" allowed for a more granular and accurate mapping of swipe gestures to specific English words. By March 2025, the effort had resulted in over 1 million swipes, which were then subjected to a filtering process. This quality control phase was essential to remove low-quality or erroneous swipes, ensuring that the final dataset would be a reliable resource for machine learning applications.

Methodology and Data Integrity

The methodology employed by FUTO highlights a commitment to data integrity and practical utility. By using a web-based mobile interface, FUTO was able to capture swipes in a naturalistic environment—on the actual devices where swipe typing is used. The choice of Wikipedia as the primary text source ensured that the dataset covered a wide array of common and technical English terms, making the resulting models more robust for general-purpose typing.

The decision to release the dataset under the MIT license is a significant move for the open-source community. By hosting the 1 million swipes on HuggingFace, FUTO has made the data easily accessible to the global research community. This level of accessibility is crucial for the advancement of mobile input systems, as it allows multiple parties to evaluate different swipe typing architectures against the same high-quality benchmark. FUTO's own use of the data to train and evaluate its models serves as a proof of concept for the dataset's effectiveness in improving gesture-based text entry.

Industry Impact

The release of the FUTO Swipe dataset has several implications for the AI and mobile technology industries. First, it addresses a common bottleneck in the development of mobile keyboards: the lack of large-scale, open-source gesture data. While proprietary datasets exist, the availability of a 1-million-swipe dataset under the MIT license levels the playing field for independent developers and smaller tech firms.

Furthermore, the dataset provides a standardized way to evaluate swipe typing systems. By using the same data for training and testing, the industry can more accurately compare the performance of different algorithms. This transparency can lead to faster iterations and improvements in swipe typing accuracy, speed, and user experience. FUTO’s contribution reinforces the importance of open data in driving innovation within the niche but essential field of mobile human-computer interaction.

Frequently Asked Questions

Question: What is the licensing for the FUTO Swipe dataset?

The dataset is released under the MIT license, which allows for broad use, modification, and distribution in both open-source and commercial projects.

Question: Where can developers access the dataset?

The dataset of 1 million swipes is currently available for download on HuggingFace, making it easy to integrate into existing machine learning workflows.

Question: How was the data quality ensured during collection?

FUTO implemented a filtering process to remove a small set of low-quality swipes that were identified after the initial collection phase, ensuring the final 1 million swipes met a high standard for training and evaluation.

FUTO Releases Comprehensive Open-Source Dataset of One Million English Swipes for Mobile Input Development