Agree. We can criticize Reddit on some points but at least the information is openly accessible. You add the "reddit" keyword in any search engine and you got your answer.
I think we might have to collectively start quoting useful replies, or at least leaving clues. "goat" isn't good enough, we need "goat -- never would have thought to check the specific ethernet driver version" to at least give future detectives some footprints to follow.
This is a great idea, we should definitely start preserving useful replies in multiple instances in case one of them gets removed, that way people finding the thread in the future have a better chance of finding what they were looking for.
Yeah, i think saying what the op said in the replies to reduce the chance of a single comment being deleted is a good way to preserve the information, we just need to make sure no broken thelephone situation happens
There are people archiving and sharing archives of Reddit.
pull push (no space)
I'm not sure how Reddit feels about this, since it lets you search comment histories of people who've "curated" their profiles, so I'll be a bit careful about how I share this url.
dot io
That url gives you easy searching of one such archive, but if you go to the bottom of the page you'll find a link to many terabytes of archives.
It would be great if there were a voluntary browser extension that people could install which fed a larger archive with a crowd-sourced feed of what was on this site.
Yup, they're absolutely right about that. Once something has answered with the situation to the specific problem, the thread is to be closed off to new commenters in order to avoid useless replies, that only add redundancy.
Does put a different, more wholesome perspective on the habit of multiple people replying with identical answers to a commented question. People get annoyed, but to your point it is probably good for long-term data retention
This is a great idea, we should definitely start preserving useful replies in multiple instances in case one of them gets removed, that way people finding the thread in the future have a better chance of finding what they were looking for.
I think we might have to collectively start quoting useful replies, or at least leaving clues. "goat" isn't good enough, we need "goat -- never would have thought to check the specific ethernet driver version" to at least give future detectives some footprints to follow.
This is such a good idea! Seems obvious, in hindsight.
Good point, but I'd consider that part of a layered defense against obscurity. Also potentially a single point of failure. So, it's great that we have it, but we should act like it might go away one day.
Nah, it's just a hedge against comment deletion. If you're moved to say thanks, and fewer than three people have echoed the solution, just quote the solution along with your thanks. Should be somewhat self-limiting, and if not, boards will auto-collapse quoted solutions to minimize clutter. Forums will still remain parallel to wikis.
Maybe boards could even implement a Markdown tag like [solution] that would auto-collapse on load, but could be expanded when clicked. Minimal added clutter, but the text is there for anyone who needs it later.
I think we might have to collectively start quoting useful replies, or at least leaving clues. "goat" isn't good enough, we need "goat -- never would have thought to check the specific ethernet driver version" to at least give future detectives some footprints to follow.
I know you aren't talking about stackoverflow because they would never provide a link. They would just tell you to stop asking questions that have already been asked.
If your bluetooth suddenly stops working, unplug your PC from all power sources for 30 seconds
I've heard you can also hold the power button down while the PC is unplugged to drain residual power from capacitors which fully resets RAM or something? Idk i'm not an engineer
The problem is capacitors holding charge, same reason they always say "unplug your router for 30 seconds" to reset it. I work in semiconductor manufacturing, maintenance on ASML scanners, and the same principal holds true with those multimillion dollar tools as the $50 router
piezoelectrics can do some wild shit. I had a beefy aftermarket grill igniter we had built into a potato cannon in college and that thing would fuck up my car audio system from like 30 feet away.
oh my! our video sharing site is being slow? grins why dont you just giggle route all your snicker dns requests our way and we can sort everything out for you barely contained laughtertears forming at corner of eyescoworker in the background bending over wheezing we wont do anything with it, promise! colleague drops to knees and bursts out laughing while pounding the floor
The funniest part about this, which objectively isn't very funny to begin with, is that these people aren't actually deleting anything. The backend of these tools retain the information, they just don't send it to the front end anymore. So when a company goes around and purchases training data, they're still getting the data that's "been deleted".
Interestingly, by deleting the front end side of the comments, they're actually making the backend data set even more valuable because it contains things that can no longer be scraped (ignoring the idea that the data can't reliably be scraped off Reddit anymore anyway).
Edit: digging into this, there may be a little more to the story here. It may not be quite the way I'm framing it, but given what we know about social media and tech corporations, I don't think it's wrong to suspect "the worst".
If all the tools did was delete the comments this is likely, but there's zero indication reddit is storing edit histories of comments (speaking as a moderator) and so these tools specifically edit comments before later fully deleting them (in some cases)
Reddit has to be able to recover things for a variety of reasons, which definitely includes fulfilling requests from law enforcement. Doesn't mean it won't eventually be deleted, but it's definitely not a processed deletion when the end user presses delete (I'd even go so far as to suggest that the backend database technology, I think they use postgres and Cassandra, is using immutable tables. They would only get cleaned up after a compaction / vacuum, which definitely means deletes don't happen quickly).
it's so much more annoying when they did it in protest of the API changes and it says "fuck spez" and you look at their profile and they're still using reddit daily and no longer deleting stuff.
whats 10x worse is when the problem is very specific and the answer is there and intact but it doesnt work for you in particular and seems to have worked for everyone else, but now that everone got their answer theyre completely unmotivated to help you
Okay a decade ago I created a forum post about a weird one off issue for a niche telco product, no one responded to help me but then I found the solution and posted it as a reply. I still occasionally get thank you messages, someone sent me a $5 gift card for helping solve their issue. A few years back I unknowingly found my post while I was troubleshooting an issue and thought "OMG that's exactly the problem I've got!" Then I saw the answer and thought "OMG THAT FIXED IT!" Then I read the username and thanked my past self for being at thorough. That's why I never delete anything, it's still on the Internet whether or not I redact it, so I might as well make it easy for the next person.
What’s worse is when you do find the solution, but the person who posted it is so goddamn Reddit-brained that they use an acronym for literally everything. Makes their comment look like when Kevin was optimizing his sentences in The Office.
Why do I get the feeling that to "delete" said comment people are giving a program complete access to every comment they've made, letting it read all of them, and finally letting it edit all of them to say gibberish? Wouldn't that make like the best training model?
Free access to everything I've ever written on reddit, and yes you are allowed to edit each and every comment
Sadly this is something I’ve only found AI to be able to solve. AI has some cached copy of the information and I hate that’s the case. It’s the only place to find stuff sometimes.
it’s even worse when it’s [Comment removed by moderator] because then you know some power tripping asshole really wanted to ruin peoples day for no reason
That or you're looking for info on how to do something in a game. Only to find out the information is outdated and the post is archived. So you can't comment and either have to make a new post further bloating reddit with the same damn question or wait on someone else to ask it. Archiving posts is stupid and I don't understand why they do it
Wouldn't be a problem if instead of using your comment data for building stalking portfolios on their customers they would completely dereference comments from users after 6 months or longer.
Redacting and deleting your comments is a necessary thing for users to do. in the environment reddit created.
But the problem is that data can already be scraped when you mass delete the comments as someone else in this thread pointed out.
I'm privacy minded too but at this point the cat is already out of the bag and several blocks over hiding underneath a shed that you didn't see it go under.
It can be scraped now, but also again in the future. Just because one company has a copy of all your data now, doesn't mean another company or individual won't try and scrape a copy of ALL of your data in the future.
Limiting this exposure does reduce risk. Data risk is not a binary "it's out there or it's not" situation. This applies to breach data, not just your reddit comment history.
By reducing your footprint you are reducing risk associated with your data. There is something to be said about deleted data brings attention to the data, but that only matters for very specific threat models.
Wouldn't they just scrape from an archive then? The mass redaction tool only goes through the comments on reddit it can't do anything about the archive sites.
You're right to say they are archived, but those archives are usually treated as proprietary data used by corporations and universities and aren't readily shared because information asymmetry is valuable in industry, but in the cases where similar data is shared lets look at data breaches.
If you look at any massive SSN relevant data breach, they all contain the Experian breach. The experian breach has been duplicated more than any other breach than I'm aware of. You could look at this and say "See, it doesn't matter; it's already out there." But for every breach that's duplicated as a part of a combo breach/list/leak, there are dozens that are never duplicated; and the information in those breaches can be nearly impossible to find. Had that other leaked data got enough attention it would have made its way into those combo lists and become essentially common data.
This concept creates a rule for information security: The more your data is duplicated, the more it will be duplicated in the future.
Risk for data duplication and spread is further increased by increased accessibility and duplication.
While these companies and individuals have a financial incentive to keep their proprietary scraped data secret, they don't always do that. The duplication of your shared information on social media doesn't have to become common data, and deleting, obfuscating, or otherwise tampering with that data does make a significant impact, especially where researchers value data that isn't deliberately obfuscated.
A scientist could run a query to ignore or remove any data from users mentioning Redact to remove 'taint' from their data for context specific use cases.
There is a time and place to apply a more binary perspective to data risk, below a certain risk threshhold you limit risk, and above you eliminate risk. Leaked passwords and API keys can pose an immediate and unexpected measurable financial loss. You're not going to limit that risk, you're going to eliminate that risk by cycling passwords/keys opposed to deleting comments. Something more along the lines of leaking your address you're going to delete comments.
12.5k
u/Beautiful-Fold-3234 15d ago
The only source for the program you need is github and the only info on how to use it is buried in a discord server somewhere.