Hello everyone,

Has anyone ever considered potentially building or launching a platform similar to the Internet Archive but using ActivityPub?

This could serve as a decentralized network to document, preserve, and protect online content from loss, censorship, and other threats, ensuring its availability for future generations.

For those unfamiliar, the Internet Archive is a non-profit that has been preserving digital media and promoting universal access to knowledge since 1996.

It’s famous for services like the Wayback Machine and Archive-It.

Given the importance of preserving digital heritage, especially in the context of censorship and data loss, a Fediverse-based equivalent could fill a crucial role.

The decentralized nature of ActivityPub could provide a robust alternative to centralized solutions.

I’d love to see this kind of project come to life, but, unfortunately, I lack the motivation, time, and energy to take it on alone.

Has anyone else ever considered something similar?

Are there any existing projects that might be interested in this direction?


Internet Archive Wikipedia

  • 9point6@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    2 hours ago

    ActivityPub seems like the wrong tool for this job

    You’re more looking for a decentralised distributed file system/object store as the base for this.

    And it’s going to require a lot of participants in the network to get to the storage capacity and redundancy necessary for it to function well

  • asudox@lemmy.asudox.dev
    link
    fedilink
    arrow-up
    20
    ·
    4 hours ago

    IPFS? I assure you, no individual here can afford to host even one single copy of the whole Internet Archive

  • haverholm@kbin.earth
    link
    fedilink
    arrow-up
    9
    ·
    4 hours ago

    So, in my understanding ActivityPub is fine for different forms of decentralised communication — what you’re suggesting sounds more to me like a generalised peer-to-peer network or distributed file storage (see DAT or IPFS)?

  • AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    4 hours ago

    The issue I see is ensuring that a distributed archive is comprehensive. How do you know what’s missing and needs to be added unless there’s a central coordinating process aware of what everyone already has?

    • floofloof@lemmy.ca
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      3 hours ago

      There are distributed filesystems with redundancy, but the last time I tried something like that, it was extremely slow for both reading and writing. For an offline archive it might be feasible, but you’d have to do a lot of redundancy and error correction to be sure you didn’t lose chunks. Plus, the Internet Archive is so big that even with the data distributed, each participant might have to store a prohibitively large amount.

      • floofloof@lemmy.ca
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 hours ago

        Is that like the usual blockchains where every computer has to store a complete copy? That would get huge with the Internet Archive.

        • asudox@lemmy.asudox.dev
          link
          fedilink
          arrow-up
          3
          ·
          edit-2
          3 hours ago

          No, just some metadata:

          Filecoin is an open protocol and uses a blockchain to record participation in the network.

  • Mike Wooskey@lemmy.thewooskeys.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 hours ago

    This is well beyond my skillset (or knowledge level), but something like ArchiveBox combined with ActivityPub might be able to distribute internet archiving, each instance sharing with the fediverse what it has archived.