• 0 Posts
  • 17 Comments
Joined 5 months ago
cake
Cake day: March 31st, 2025

help-circle

  • Hrrmm. Webrings it is. But also, the search engine problem seems like one calling out for a creative solution. I’ll try to look into it some more I guess. Maybe there’s a way that you could distribute which peer indexes which sites. I would even be fine sharing some local processing power when I browse to run a local page ranking that then gets shared with peers…maybe it could be done in a way where attributes of the page are measured by prevalence and then the relative positive or negative weighting of those attributes could be adjusted per-user.

    Hope it’s not annoying for me to spitball ideas in random Lemmy comments.


  • Never heard of Kagi before, article convinced me I don’t wanna use it anyways…lol.

    Wasn’t the original Google search algorithm published in a research paper? Maybe someone with more domain knowledge than I could help me understand this: is there any obstacle to starting a search engine today that just works like that? No AI, no login, no crazy business…just something nice and rudimentary. I do understand all the ways that system could be gamed, but given Google/Bing etc.'s dominance, I feel like a smaller search engine doesn’t really need to worry about people trying to game it’s algorithm.




  • Yeah, you’re absolutely right and I agree. So then do we have to resign the situation to being an eternal back-and-forth of just developing random new challenges every time the scrapers adapt to them? Like antibiotics for viruses? Maybe that is the way it is. And honestly that’s what I suspect. But Anubis feels so clever and so close to something that would work. The concept of making it about a cost that adds up, so that it intrinsically only effects massive processes significantly, is really smart…since it’s not about coming up with a challenge a computer can’t complete, but just a challenge that makes it economically not worth it to complete. But it’s disappointing to see that, at least with the current wait times, it doesn’t seem like it will cost enough to dissuade scrapers. And worse, the cost is so low that it seems like making the cost significant to the scrapers will require really insufferable wait times for users.


  • By negligence, I meant that the cost is negligible to the companies running scrapers, not that the solution itself is negligent. I should have said “negligibility” of Anubis, sorry - that was poor clarity on my part.

    But I do think that the cost of it is indeed negligible, as the article shows. It doesn’t really matter if the author is biased or not, their analysis of the costs seems reasonable. I would need a counter-argument against that to think they were wrong. Just because they’re biased isn’t enough to discount the quantification they attempted to bring to the debate.

    Also, I don’t think there’s any hypocrisy in me saying I’ve only thought about other solutions here and there - I’m not maintaining an anti-scraping library. And there’s already been indications that scrapers are just accepting the cost of Anubis on Codeberg, right? So I’m not trying to say I’m some sort of tech genius who has the right idea here, but from what Codeberg was saying, and from the numbers in this article, it sure looks like Anubis isn’t the right idea. I am indeed only having fun with my suggestions, not making whole libraries out of them and pronouncing them to be solutions. I personally haven’t seen evidence that Anubis is so clearly working? As the author points out, it seems like it’s only working right now because of how new it is, but if scrapers want to go through it, they easily can - which puts us in a sort of virus/antibiotic eternal war of attrition. And if course that is the case with many things in computing as well. So I guess my open wondering are just about if there’s ever any way to develop a countermeasure that the scrapers won’t find “worth it” to force through?

    Edit for tone clarity: I’m don’t want to be antagonistic, rude, or hurtful in any way. Just trying to have a discussion and understand this situation. Perhaps I was arrogant, if so I apologize. It was also not my intent, fwiw. Also, thanks for helping me understand why I was getting downvoted. I intended my post to just be constructive spitballing about what I see as the eventual inevitable weakness in Anubis. I think it’s a great project and it’s great that people are getting use out of it even temporarily, and of course the devs deserve lots of respect for making the thing. But as much as I wish I could like it and believe it will solve the problem, I still don’t think it will.


  • Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the negligence negligible cost to scrapers of Anubis.

    It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.

    Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the scrapers got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the scraper into picking up junk text sometimes.






  • As others have pointed out, I don’t think you have solid evidence to suspect that this is a neurotypical vs ADHD thing.

    Personally I think it’s just a matter of poor taste. The sad truth is most people cannot appreciate good art, and the only reason why most works of art are as high quality as they are is because artists make them, and artists do appreciate good art and have high standards. From the artists point of view, their piece needs to meet criteria X, Y, Z, etc. to be a good satisfying piece. But from the point of view of the tasteless plebian masses, it probably only needs to meet criteria X. I first noticed this when I saw that almost every highly upvoted artwork on Reddit years ago was a really hyper realistic pencil drawing, usually of a pretty girl. Most people don’t appreciate form, composition, subtle meanings, abstraction, etc. Those things require more thinking and are therefore too difficult for many people to engage with. Instead, “how hard does this seem to make” and “how much do I like this at first glance” become the proxy standards used by tasteless lazy people to judge art, and hence the “best” art by those standards is a super realistic pencil drawing of a pretty woman became “zomg I thought this was a photo!!!” and “I couldn’t do this in a million years!!! So impressive!!!” As if the point of art is just to flex on people?

    But it gets worse, because even when people decide to half-ass their ingestion of art by flattening it down to a single dimension of “how realistic is it”, again, because people aren’t artists and have never even tried to engage in art (and this I actually don’t hold against them, unlike their prior laziness), they don’t have a trained eye. So sometimes you’ll see just a mediocre pencil drawing of a pretty girl, and people with less art skills will be like “wow 10/10 it’s perfect!!!”, but people with art skills will be able to notice things like “well if the shadow on the neck is like that the shadow on the nose should be going the other way, you mixed up your light sources”, or “the perspective is off on the angle of the eyes here”. Sometimes these improvements would be subconsciously picked up by the masses, but many times not. Often the subtleties that make an artwork go from mediocre to amazing are lost on the masses. As a result, the masses are equally satisfied with poor quality AI-generated images as they are with high quality human-generated images.

    TLDR; The lack of media literacy among many people strikes again



  • Sorry, my examples maybe didn’t make clear what my issue with the post is. The fact that public support for Israel in Western Europe is at the lowest point ever recorded, is not really a “YSK”, it’s not a piece of advice or tip that I can use in my daily life. It’s good information, but it belongs under News, or Politics. It’s not, as the sidebar says “things that can make your life easier”, unless you went to argue that it psychologically makes my life easier, in which case then I can fit just about anything into this community, in which case why do I even have the community? If everything belongs in the community, then the community may as well not exist.

    Just think of how much better and more honest this post would have been if it had been made in a news community with a title that was just the title of the article and then a link to the article. But by being posted here in this manner, it comes across as engagement bait - and yes, the title is definitely contributing to that. Is it really news to anyone that people don’t like genocidal murderous bastards? Is that really something “I should know”?

    Technically anything that’s news could also be posted here, if we take the definition of the community at its most literal level. But if that’s the case, why should we have a separate news community and a ysk community? Clearly, there should be some sort of distinction between things that belong in ysk versus in the various news communities.


  • But in all practicality, every Lemmy user already knows about Israeli genocidal behavior in Gaza. If every community just becomes format-differentiated reposts of the same stuff, all of Lemmy becomes one big content-blob.

    Even if I totally agree that, for example, Elon Musk is obnoxious, and I want to hear some news that he got punched in the face - I don’t want to open Lemmy and see:

    You should know Elon Musk got punched in the face

    Mildly interesting: Elon Musk got punched in the face

    Mildly infuriating: Whoever punched Elon Musk in the face didn’t punch him hard enough

    Map porn: Countries where Elon Musk has been punched in the face

    Gaming: Would you play a Punch Elon Musk In The Face Simulator?

    Am I the asshole: for thinking Elon Musk deserved to be punched in the face?

    Programmer Humor: if(isElonMusk){punchedInFace = True;}

    Privacy: If it’s illegal to punch Elon in the face why is it legal to punch my privacy in the face with tracking?

    LinuxMemes: sudo punch Elon Musk in face

    Uplifting news: Elon Musk punched in face

    Depressing news: Elon Musk not punched twice in face

    Television: Just watched this character get punched in the face. Remind you of anyone?

    Classic Rock: “Facepunch” - 1982

    Piracy: Links to movies where billionaires get punched in the face?


  • I love this comment so much. One of the biggest things that destroyed the quality of Reddit, although this is almost never talked about, was the trend of shoehorning the same topic into every subreddit, no matter how niche. Then to make matters worse, people will insist on leaving the post in an unsuitable community just because they like the sentiment of the post. But over time this means that the purpose of communities completely breaks down, and the whole site just becomes “different formats for us all to express the same take on the same current event”. Absolutely insidious. Entire purpose of communities is so that people can customize their experience and see different types of content depending on what they’re interested in. Forcing the same topic into every community not only makes the service insufferable, but it also means there’s no point to joining small communities or contributing to them. You devolve to everyone just looking at the top most popular stuff, because all they would see anywhere else is just cutesy forced variants on that same thing anyways. Do not force topics into every community.

    Again: Do not force topics into every community.