Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nyuuzyou 
posted an update Mar 26
Post
2042
📚 Archive of Our Own (AO3) Dataset - nyuuzyou/archiveofourown

Collection of approximately 12.6 million fanfiction works (from 63.2M processed IDs) featuring:
- Full text content from diverse fandoms across television, film, books, anime, and more
- Comprehensive metadata including warnings, relationships, characters, and tags
- Multilingual content with works in 40+ languages though English predominant
- Rich classification data preserving author-created folksonomy and content categorization

P.S. This is the most expensive dataset I've created so far! And also, thank you all for the 100 followers on Hugging Face!
deleted
This comment has been hidden (marked as Abuse)
·

you're welcome

what the actual fuck is this. the other comment is right.

This comment has been hidden

In creating this dataset, you are essentially re-uploading the entire contents of Ao3 to another website.

The official Ao3 terms of service state the following: "AO3 maintains that fanworks are transformative and that a fanwork's creator owns the rights to the expressions in their work that are unique to them. A fanwork creator holds the rights to their own content, just the same as any professional author, artist, or other creator." (https://archiveofourown.org/tos_faq#fanwork_copyright).

I am not a part of the generative AI community, but I understand this project may have taken you a long time to complete. I understand you may be proud of your accomplishment. I don't know who you are. I don't know what circumstances led you to compile this data. I have no grounds on which to judge your character, nor will I try to. I only ask you take note of the legal implications of this dataset, and act accordingly.

To the people threatening violence, I understand your frustration but this is not the way to be taken seriously.

I speak for no one but myself and I have linked my own data removal request here: https://huggingface.co./datasets/nyuuzyou/archiveofourown/discussions/135.

Sending threats to the uploader is not the way to go, but what the uploader did is both legally questionable and disgusting.

If any other NLP and ML engineers are reading this and are siding with the uploader... please take a good look at your life choices. Just because you can, doesn’t mean you should. This isn't some heroic act of open-source principles or “sticking it to the man”; it's punching down. You're in one of the cushiest, best-paid professions, you have everything, and you still want more.

This comment has been hidden

Sad that people lack creativity and joy so much they think stealing is okay

·

you're welcome

You’re awful and I have all the time in the world. Ao3 is a site we donate to so we can have a place we legally own our fic. Take this down or we will keep DMCA ing you.

·

https://huggingface.co./datasets/nyuuzyou/archiveofourown/discussions/2#680b5fa727163e0c419e4045

The OTW are the only ones making money off of you. If a non-profit organization does the opposite of their principles, there are big questions about the rest of their activities, including financial ones

This comment has been hidden (marked as Abuse)
·

you're welcome

This comment has been hidden
·

You're probably 12 and mommy didn't let you have the ice cream you wanted, that's why you're here throwing insults like a dumb ass toddler.

wtf is wrong with u stop stealing other peoples shit