Donations to the archive would be appreciated to help fund our server hardware & storage drives.   New archive software is currently under development, details here.

## Developer No.4019 View ViewReplyLast 50ReportDelete
BTW if anyone knows any 4chan admins, please tell them to unblock or increase the Cloudflare limits on the desuarchive/rbt.asia and archived.moe scrapers. The method used to hold us over for the past year does not seem to be working well anymore and it is failing to archive threads periodically.

Relevant actions taken by archived.moe admins as FYI, we are also using the same temporary fix: https://archived.moe/talk/thread/14/#q141_200

This reveals that our time is up with the FoolFuuka/Asagi archiver stack. It has not had active development since the collapse of Archive.moe in 2016. It is too inefficient with bandwidth to function any further. It is not worth feeding both with additional RAM and resources if it is too inefficient with requests as well, the stack is already consuming 168GB RAM as it is. It is only a matter of time before the other archivers meet this fate too.

This is an unexpected development and it could take weeks if not months, but the C# .NET developer of hayden is working hard on it with massive reductions and huge improvements in efficiency and request limit compliance, such as RAM use reduction to 100MB, but his time is limited, we wish we could have bought some more time.

To anyone who can help, call up all C# .NET developers and skilled MySQL/Percona DBAs to try and bring the Hayden code up to scratch as a suitable drop in Asagi replacement. Once the scraper is replaced, we can also work together with Python developers to build a new frontend replacement for FoolFuuka as described in the previous thread. It is already demonstrably more efficient and accurate, however it is not fully tested and often hits deadlock issues. Using Hayden will allow us to consolidate our operations on s2.desuarchive.org instead of s1.desuarchive.org on a separate continent, immediately saving $90 a month and removing a Sword of Damocles by eliminating s1 which the previous admin (peace be upon his wrists) is likely to default on due to crippling medical bills.

There are some download reliability issues that need testing to be able to use in production, but we wish we had some more time before this happened.

https://github.com/bbepis/Hayden

In the future we would like to totally overhaul the FoolFuuka/Asagi stack with a total replacement for the sole use of Desuarchive, but a drop in Asagi replacement is crucial as the vast majority of other archiver admins such as archived.moe have lives to live and are unlikely to take up any SQL schema changes.

Feel free to drop by the chat to help brainstorm what can be done.

https://matrix.to/#/@bibanon-chat:matrix.org

https://discord.com/invite/3jxxGDC

It is helpful to read our previous thread on the topic as well: https://desuarchive.org/desu/thread/3894/

EDIT: Modified to clarify that FoolFuuka/Asagi, both the frontend, backend, and inefficient MySQL schema, are all to blame for the situation.

This post was modified by Desuarchive Administrator on 2019-12-07
64 posts and 2 images omitted

## Developer No.3894 View ViewReplyLast 50ReportDelete
## TL;DR Regarding the Absolute State of 4chan Archival

At Desuarchive we have long struggled with many issues with many unsolved mysteries from the previous admin (peace be upon his wrists), but we have now set up the archiver on a more stable footing and there is some development going on with the scraper at least, so things are looking up as you may have seen this past year.

It is imperative for the survival of all 4chan archivers that Java-based Asagi is replaced (especially given the downfall of Fireden) and significant efficiency improvements are made in both excessive HTTP requests and RAM usage, while providing the same reliability and accuracy. As 4chan grows all archivers will be in grave danger of dying under the strain of deep software inefficiency and unsustainable costs if this is not done.

Archives are not going to be sustainable as seen with Fireden if only one dude has to shoulder the weight of thousands of dollars of equipment and bandwidth usage. The next archiver on the deathwatch appears to be Warosu.

On behalf of all 4chan archives, we need your help with the two scrapers being developed. These scrapers are currently set to be Asagi compatible as a future drop-in replacement for all other archivers: which is no small feat as Asagi regularly uses 40-60GB of RAM at full load but these could use as low as 30-150MB.

https://github.com/bibanon/eve - Python based scraper. We currently actively test it in production with /wsg/ scraping.

https://github.com/bbepis/Hayden - C# .NET light scraper, still needs testing for evaluation. But it's doing real great.

As such, we hope to be able to build a brand new archival stack based on these that dispenses with the inefficiencies of the scrapers of the past using PostgreSQL JSONB to store threads exactly as they are from the 4chan API (NoSQL style). While we are not frontend developers, we can sidestep this by building middleware to emit a 4chan compatible API, so that 4chan-X can be used as the JavaScript webapp and Android apps (Chanu, Clover) and iPhone apps could be modified with a few lines of code to work with the archive.

In support of research and onboarding for this, this effort 4plebs has generously developed partial 4chan API compatibility for the FoolFuuka frontend which is slowly being rolled out. This will allow Android and iPhone applications to view the FoolFuuka archivers (but not ghostpost yet). If you are a PHP developer we need your help here.

https://github.com/pleebe/foolfuuka-plugin-fourchan-api

They also developed 4plebs X which uses 4chan-X to function as a webapp frontend, possibly able utilize this 4chan API to replace the user facing part of the PHP HHVM FoolFuuka stack with a familiar alternative. It has flaws such as the lack of search and ghostposting, but hopefully developers could try to step in regarding that.

https://github.com/pleebe/4plebs-x

Demo: https://test.4plebs.org (to use disable 4chan-X to avoid conflicts).

If you know any third party 4chan app devs, please refer them to us so we can direct them on how to set the proper configurations for their app to access FoolFuuka archives. (there was an old FoolFuuka API already but it predates 4chan API so it is not directly compatible, best to move off it)

We are willing to provide support and troubleshooting for better understanding of FoolFuuka/Asagi instances for the construction of new ones or development of replacement scrapers, or if anyone wants to pick up the boards of Fireden. We have institutional knowledge and experience running many major archival websites gathered over 2 years, so don't hesitate to drop by.

https://matrix.to/#/@bibanon-chat:matrix.org

https://discord.com/invite/3jxxGDC

Our guide could use some work but it will guide you there with some hiccups.

https://wiki.bibanon.org/FoolFuuka

## Regarding the Absolute State of Fireden and /v/ and /vg/ archival

Fireden is infamous in the community for never reaching out for help or advice, and never acting on anything other than abuse emails. I don't think they ever planned to operate for this long they were set up on the whim in 2015 after archive.moe died, so they probably just had enough it costs a lot to operate a site that can scrape /v/ and /vg/ images. But if the Fireden admin is reading this, be the prodigial son: we can provide any assistance or backup you need so that your hard work is not in vain.

The next archiver I expect to collapse under pressure is Warosu. As for us we are pretty stable after a $500 chassis upgrade and hot spare SSDs, but it really sucks to be one of the few people in the world who puts a large amount of capital into 4chan archival.

We refuse to pump more money in to bail out any more archivers for barely any returns, We have had to bail out 4 of them already and have paid $7000 to date out of pocket, and $200 a month, can't someone else pony up?

The best bet is for a large capital investment to be made on arch.b4k.co so it can be significantly upgraded to our standards to match the levels of Fireden, providing /vg/ scraping and full images for both. It will actually not cost too much to start out with maybe only 5x10TB drives for $700, $300 for a new case and maybe $600 for new AMD Ryzen with 160GB of RAM for Asagi and MySQL and $100 for colocation. Because we will probably never see the fireden images ever again, so that saves a lot of space.

4plebs refuses to take on any more boards as they are barely able to handle the ones they have.

## Basic Details about the Maintenance Done

This weekend we managed to do a major case upgrade for $500 for our backend image server to allow it to host more services such as scrapers and frontend content. All SSDs were moved out of the internal bay and into hotswap bays, and a hot spare SSD for booting was added: without those it was really difficult to service and made it difficult to consider using it for hosting databases safely. It may be possible to attach at least 6 more 3.5" drives which will be necessary as only 10TB of storage is available.

This may make it possible to halve the costs of cloud servers and bandwidth that we currently use by consolidating service together into a single server.

1 drive with bad sectors was replaced safely for $150 and a ZFS resilver completed. The other drives do not appear to have issues, but we continue to monitor the situation.

Tests done with the bibanon/eve scraper for scraping /wsg/ have been extremely promising, though development is still ongoing to put it on par with the Asagi scraper. It is possible that any new deployment of the scraper will utilize either this or hayden, but proper testing will still be necessary.
83 posts and 1 image omitted

## Admin No.3026 View ViewReplyReportDelete
Welcome to /desu/. Use this board to report issues, request features, and for other discussions regarding desuarchive.org & rbt.asia. Other posts will be removed.

When reporting a technical issue, be sure to include the full URL of the page/image.

Do not use this board for removal requests, which must be emailed to [email protected] Other rule violations can be reported by clicking the "Report" button on the post.

No.4400 View ViewReplyReportDelete
Selam zoruQQlar

Trchanden geldim buraya,gelin tanışalım. Turj bayragim yok vpn ile giriyorum kusura bakmayın.

Salvage

No.4396 View ViewReplyReportDelete
https://desuarchive.org/trash/search/image/gJNKtZc5UWPijbZOQXEB4A/

Could this image possibly be salvaged? Or is it gone?

No.4397 View ViewReplyReportDelete
I can't ghostpost with uMatrix. Basically the captcha window doesn't show up, even tho I have google whitelisted, this doesn't happen on other sites. To ghostpost I have to disable umatrix, clcik post, have recaptcha to show up(the rectangle with checkbox checkmark), then enable again and solve a captcha.

No.4378 View ViewReplyReportDelete
Hi, since i don't know what to do this is my last hope to solve my problem.
There is a thread that contain personal information about me.
I send an email at [email protected] the 18 October but the thread is still up and the 2 November i made a report on the thread followed by another email this time at [email protected]
Other archives removed the thread quickly but here the thread is still up.
What can i do? Is a temporary situation?
2 posts omitted

Desuarchive data analysis

## Mod No.4382 View ViewReplyReportDelete
Hi, I noticed that there has been a desire for us administrators to do bulk data analysis on the Desuarchive database.
This isn't just for academic researchers, we'll probably help out hobbyists etc. if someone bothers asking for something.

e.g. Finding all pastebin links on a board we archive.
(Pastebin has gotten a little unpredictable lately, so people are wanting to do archival projects for it.)

If you contact the me or the admins in general, we can run SQL queries and more expensive searches manually and provide the results.

The general requirement to get a request fulfilled is to make it easy for us to do whatever it is you need done.
This might include:
Composing a SQL query to get the information you need.
Figuring out the formatting parameters you need to have fed to mysql.
Composing the actual raw search to feed into the search software.

Please visit the Bibanon Matrix channel to let us know that there is something to do, because we are busy and I don't even get around to reading the admin boards bar a few times a year.
https://riot.im/app/#/room/#bibanon-chat:matrix.org
http://qchat.rizon.net/?channels=bibanon&uio=d4
https://discord.com/invite/3jxxGDC

[continued...]

Post your board's booru shit

## Developer No.4157 View ViewReplyReportDelete
Occasional-assistant-with-Desuarchive here; got bored a few weeks back and decided to look into the suggestions in >>3834. tl;dr: find missing images for /mlp/, and if the filenames look like the pony booru filename format, download them and add them to the archive.

It was a pain in the ass, but I did it, and /mlp/ now has ~80,000 more images in the archive than it used to. There are still some questions around how to regenerate thumbnails for webms and rough edges, but adding shit into the archive from external sources is now down to 'tedious and time-consuming' levels of difficulty.

If your board has a popular booru or other media site with distinctive filenames, we can look at scraping those as well. Post that shit, and we'll look into it with our usual focus and responsiveness.

search doesnt work

No.4265 View ViewReplyReportDelete
i tried doing a search on /int/ and it gave me "Error!
No results found." despite search word being in recent threads on the archive.
example
https://desuarchive.org/int/search/text/%D7%92%D7%9C%D7%99%D7%93%D7%94/

can you please fix this? thank you

Request - dump of all pastebin urls in archive

No.4357 View ViewReplyReportDelete
Pastebin is shitting itself. They are deleting a bunch of pastes. /mlp/ (and other boards?) rely on pastebins for hosting stories.

The kind folks at ArchiveTeam are willing to archive URLs - but they need a list of URLs to grab.

There is a 4chan thread where a response is being coordinated:
https://boards.4channel.org/mlp/thread/36031882#p36035985

This is the ArchiveTeam IRC for pastebin archiving:

https://webirc.hackint.org/#irc://irc.hackint.org/#pastalavista

I'll be in the IRC for the next 10ish hours. I know this is a big request, especially given all the edge cases for pastebin urls. Maybe you could start with a query like this? (Either /mlp/ or all boards)

https://desuarchive.org/mlp/search/text/%22pastebin.com%2F%22/

I don't want to be some shitter telling you what to do with your database so let us know if you're willing to do this request at all and if so, what help you need. (i.e. query construction / regex )