2019-12-03: We regret to inform users that the archiver stack FoolFuuka&Asagi on Rbt.asia, Desuarchive, Archived.moe, and others are suffering reduced accuracy and missing posts in scraping 4chan. New, efficient, and more accurate scrapers are in the progress of development, but needs assistance and testing from C# .NET and MySQL/Percona developers to be ready for use. On behalf of all archivers, unless FoolFuuka&Asagi is replaced, we will now be unable to scrape properly under the strain of deep software inefficiency and unsustainable costs (like fireden). More details here.
Donations to our site would help to ensure a normal lifecycle replacement of drives in our RAID.
Please refrain from spamming the ghostposting system or it may not be around for long.

## Developer No.4019 View ViewReplyReportDelete
BTW if anyone knows any 4chan admins, please tell them to unblock or increase the Cloudflare limits on the desuarchive/rbt.asia and archived.moe scrapers. The method used to hold us over for the past year does not seem to be working well anymore and it is failing to archive threads periodically.

Relevant actions taken by archived.moe admins as FYI, we are also using the same temporary fix: https://archived.moe/talk/thread/14/#q141_200

This reveals that our time is up with the FoolFuuka/Asagi archiver stack. It has not had active development since the collapse of Archive.moe in 2016. It is too inefficient with bandwidth to function any further. It is not worth feeding both with additional RAM and resources if it is too inefficient with requests as well, the stack is already consuming 168GB RAM as it is. It is only a matter of time before the other archivers meet this fate too.

This is an unexpected development and it could take weeks if not months, but the C# .NET developer of hayden is working hard on it with massive reductions and huge improvements in efficiency and request limit compliance, such as RAM use reduction to 100MB, but his time is limited, we wish we could have bought some more time.

To anyone who can help, call up all C# .NET developers and skilled MySQL/Percona DBAs to try and bring the Hayden code up to scratch as a suitable drop in Asagi replacement. Once the scraper is replaced, we can also work together with Python developers to build a new frontend replacement for FoolFuuka as described in the previous thread. It is already demonstrably more efficient and accurate, however it is not fully tested and often hits deadlock issues. Using Hayden will allow us to consolidate our operations on s2.desuarchive.org instead of s1.desuarchive.org on a separate continent, immediately saving $90 a month and removing a Sword of Damocles by eliminating s1 which the previous admin (peace be upon his wrists) is likely to default on due to crippling medical bills.

There are some download reliability issues that need testing to be able to use in production, but we wish we had some more time before this happened.

https://github.com/bbepis/Hayden

In the future we would like to totally overhaul the FoolFuuka/Asagi stack with a total replacement for the sole use of Desuarchive, but a drop in Asagi replacement is crucial as the vast majority of other archiver admins such as archived.moe have lives to live and are unlikely to take up any SQL schema changes.

Feel free to drop by the chat to help brainstorm what can be done.

https://matrix.to/#/@bibanon-chat:matrix.org

https://discord.gg/phPHTEs

It is helpful to read our previous thread on the topic as well: https://desuarchive.org/desu/thread/3894/

EDIT: Modified to clarify that FoolFuuka/Asagi, both the frontend, backend, and inefficient MySQL schema, are all to blame for the situation.

This post was modified by Desuarchive Administrator on 2019-12-07
5 posts omitted

## Developer No.3894 View ViewReplyLast 50ReportDelete
## TL;DR Regarding the Absolute State of 4chan Archival

At Desuarchive we have long struggled with many issues with many unsolved mysteries from the previous admin (peace be upon his wrists), but we have now set up the archiver on a more stable footing and there is some development going on with the scraper at least, so things are looking up as you may have seen this past year.

It is imperative for the survival of all 4chan archivers that Java-based Asagi is replaced (especially given the downfall of Fireden) and significant efficiency improvements are made in both excessive HTTP requests and RAM usage, while providing the same reliability and accuracy. As 4chan grows all archivers will be in grave danger of dying under the strain of deep software inefficiency and unsustainable costs if this is not done.

Archives are not going to be sustainable as seen with Fireden if only one dude has to shoulder the weight of thousands of dollars of equipment and bandwidth usage. The next archiver on the deathwatch appears to be Warosu.

On behalf of all 4chan archives, we need your help with the two scrapers being developed. These scrapers are currently set to be Asagi compatible as a future drop-in replacement for all other archivers: which is no small feat as Asagi regularly uses 40-60GB of RAM at full load but these could use as low as 30-150MB.

https://github.com/bibanon/eve - Python based scraper. We currently actively test it in production with /wsg/ scraping.

https://github.com/bbepis/Hayden - C# .NET light scraper, still needs testing for evaluation. But it's doing real great.

As such, we hope to be able to build a brand new archival stack based on these that dispenses with the inefficiencies of the scrapers of the past using PostgreSQL JSONB to store threads exactly as they are from the 4chan API (NoSQL style). While we are not frontend developers, we can sidestep this by building middleware to emit a 4chan compatible API, so that 4chan-X can be used as the JavaScript webapp and Android apps (Chanu, Clover) and iPhone apps could be modified with a few lines of code to work with the archive.

In support of research and onboarding for this, this effort 4plebs has generously developed partial 4chan API compatibility for the FoolFuuka frontend which is slowly being rolled out. This will allow Android and iPhone applications to view the FoolFuuka archivers (but not ghostpost yet). If you are a PHP developer we need your help here.

https://github.com/pleebe/foolfuuka-plugin-fourchan-api

They also developed 4plebs X which uses 4chan-X to function as a webapp frontend, possibly able utilize this 4chan API to replace the user facing part of the PHP HHVM FoolFuuka stack with a familiar alternative. It has flaws such as the lack of search and ghostposting, but hopefully developers could try to step in regarding that.

https://github.com/pleebe/4plebs-x

Demo: https://test.4plebs.org (to use disable 4chan-X to avoid conflicts).

If you know any third party 4chan app devs, please refer them to us so we can direct them on how to set the proper configurations for their app to access FoolFuuka archives. (there was an old FoolFuuka API already but it predates 4chan API so it is not directly compatible, best to move off it)

We are willing to provide support and troubleshooting for better understanding of FoolFuuka/Asagi instances for the construction of new ones or development of replacement scrapers, or if anyone wants to pick up the boards of Fireden. We have institutional knowledge and experience running many major archival websites gathered over 2 years, so don't hesitate to drop by.

https://matrix.to/#/@bibanon-chat:matrix.org

https://discord.gg/phPHTEs

Our guide could use some work but it will guide you there with some hiccups.

https://wiki.bibanon.org/FoolFuuka

## Regarding the Absolute State of Fireden and /v/ and /vg/ archival

Fireden is infamous in the community for never reaching out for help or advice, and never acting on anything other than abuse emails. I don't think they ever planned to operate for this long they were set up on the whim in 2015 after archive.moe died, so they probably just had enough it costs a lot to operate a site that can scrape /v/ and /vg/ images. But if the Fireden admin is reading this, be the prodigial son: we can provide any assistance or backup you need so that your hard work is not in vain.

The next archiver I expect to collapse under pressure is Warosu. As for us we are pretty stable after a $500 chassis upgrade and hot spare SSDs, but it really sucks to be one of the few people in the world who puts a large amount of capital into 4chan archival.

We refuse to pump more money in to bail out any more archivers for barely any returns, We have had to bail out 4 of them already and have paid $7000 to date out of pocket, and $200 a month, can't someone else pony up?

The best bet is for a large capital investment to be made on arch.b4k.co so it can be significantly upgraded to our standards to match the levels of Fireden, providing /vg/ scraping and full images for both. It will actually not cost too much to start out with maybe only 5x10TB drives for $700, $300 for a new case and maybe $600 for new AMD Ryzen with 160GB of RAM for Asagi and MySQL and $100 for colocation. Because we will probably never see the fireden images ever again, so that saves a lot of space.

4plebs refuses to take on any more boards as they are barely able to handle the ones they have.

## Basic Details about the Maintenance Done

This weekend we managed to do a major case upgrade for $500 for our backend image server to allow it to host more services such as scrapers and frontend content. All SSDs were moved out of the internal bay and into hotswap bays, and a hot spare SSD for booting was added: without those it was really difficult to service and made it difficult to consider using it for hosting databases safely. It may be possible to attach at least 6 more 3.5" drives which will be necessary as only 10TB of storage is available.

This may make it possible to halve the costs of cloud servers and bandwidth that we currently use by consolidating service together into a single server.

1 drive with bad sectors was replaced safely for $150 and a ZFS resilver completed. The other drives do not appear to have issues, but we continue to monitor the situation.

Tests done with the bibanon/eve scraper for scraping /wsg/ have been extremely promising, though development is still ongoing to put it on par with the Asagi scraper. It is possible that any new deployment of the scraper will utilize either this or hayden, but proper testing will still be necessary.
59 posts and 1 image omitted

## Admin No.3026 View ViewReplyReportDelete
Welcome to /desu/. Use this board to report issues, request features, and for other discussions regarding desuarchive.org & rbt.asia. Other posts will be removed.

When reporting a technical issue, be sure to include the full URL of the page/image.

Do not use this board for removal requests, which must be emailed to [email protected] Other rule violations can be reported by clicking the "Report" button on the post.

No.4001 View ViewReplyReportDelete
Having difficulty viewing images in this thread: https://desuarchive.org/aco/thread/971573/
Most of the images in the thread give a 404 Not Found error when opened. It doesn't seem to be purely a filesize issue (some of the largest images in the thread load fine), and all the thumbnails are all visible.
Was this simply an issue with how the thread was archived in the first place, or is this a more recent image storage problem?

No.3970 View ViewReplyReportDelete
Is there any way I could download the entirety of the /mlp/ archives hosted here? I know it would be huge, but I want to know if there is a way.

No.4004 View ViewReplyReportDelete
I'm aware mods have asked people not to reply to that spammer on /a/, so I'm just pointing out some of his posts that have been up for a while and have not been deleted.

Please also delete my post here as to avoid potential problems.

>>>/a/195507130,1 >>>/a/195439787,2 >>>/a/195413754,2 >>>/a/195396542,4 >>>/a/195316797,3 >>>/a/195297454,1 >>>/a/195194913,1 >>>/a/194719540,1 >>>/a/195413754,2 >>>/a/194883995,4 >>>/a/194876565,6 >>>/a/195396542,4 >>>/a/195439787,2 >>>/a/195155611,1 >>>/a/195155611,2 >>>/a/194883995,4 >>>/a/194618960,2 >>>/a/194543346,1

No.3994 View ViewReplyReportDelete
What happened to the ban page of /tg/? According to that page, nobody was banned on that board for the past couple of days. Which I find very unlikely.

No.3983 View ViewReplyReportDelete
https://desuarchive.org/m/thread/17375507/#17375728_19

WHY THE FUCK ARE YOU STUPID FAGGOTS DEFENDING DUEL BY DELETING POSTS THAT CALL HIM OUT? IT WAS BAD ENOUGH WHEN THE JANITOR COVERED HIS ASS, SO WHY THE FUCK ARE YOU PROTECTING HIM TOO?

No.3973 View ViewReplyReportDelete
View
In the /m/ ghost posting there are a pair of shitposter post that are not deleted, but my comment about how janitors deleting legit recommendations should be called out got deleted and I am not able to ghost post in said thread. Is there any particular reason for this?