3

Will Wikipedia Be Written by AI? Jimmy Wales is Thinking About It - Slashdot

 1 year ago
source link: https://news.slashdot.org/story/23/04/03/040214/will-wikipedia-be-written-by-ai-jimmy-wales-is-thinking-about-it
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Will Wikipedia Be Written by AI? Jimmy Wales is Thinking About It

Please create an account to participate in the Slashdot moderation system

binspamdupenotthebestofftopicslownewsdaystalestupid freshfunnyinsightfulinterestingmaybe offtopicflamebaittrollredundantoverrated insightfulinterestinginformativefunnyunderrated descriptive typodupeerror

Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 30 million monthly users. It takes less than a minute. Get new users downloading your project releases today!

Sign up for the Slashdot newsletter! or check out the new Slashdot job board to browse remote jobs or jobs in your area.
×

Will Wikipedia Be Written by AI? Jimmy Wales is Thinking About It (standard.co.uk) 52

Posted by EditorDavid

on Monday April 03, 2023 @03:34AM from the citation-needed dept.

The Evening Standard interviewed Wikipedia founder Jimmy Wales, in a piece headlined "Will Wikipedia be written by AI?"

"The discussion in the Wikipedia community that I've seen so far is...people are cautious in the sense that we're aware that the existing models are not good enough but also intrigued because there seems like there's a lot of possibility here," Wales said. "I think we're still a way away from: 'ChatGPT, please write a Wikipedia entry about the empire state building', but I don't know how far away we are from that, certainly closer than I would have thought two years ago," he said.

Wales says that as much as ChatGPT has gripped the world's imagination over the past few weeks, his own tests of the technology show there are still plenty of flaws. "One of the issues with the existing ChatGPT is what they call in the field 'hallucinating' — I call it lying," he said. "It has a tendency to just make stuff up out of thin air which is just really bad for Wikipedia — that's just not OK. We've got to be really careful about that...."

But while full AI authorship is off the cards in the near-term, there's already plenty of discussion at Wikipedia on what role AI technology could have in improving the encyclopaedia in the months ahead. "I do think there are some interesting opportunities for human assistance where if you had an AI that were trained on the right corpus of things — to say, for example here are two Wikipedia entries, check them and see if there are any statements that contradict each other and identify tensions where one article sems to be saying something slightly different to the other," Wales said. "A human could detect this but you'd have to read both articles side by side and think it through — if you automate feeding it in so you get out hundreds of examples I think our community could find that quite useful."

Wales says another problem is AI technology's failure to spot internal contradictions within its responses. He once called out ChatGPT on this — "And it said, you're right, I apologise for my error."

Do you have a GitHub project? Now you can sync your releases automatically with SourceForge and take advantage of both platforms.
Do you have a GitHub project? Now you can automatically sync your releases to SourceForge & take advantage of both platforms. The GitHub Import Tool allows you to quickly & easily import your GitHub project repos, releases, issues, & wiki to SourceForge with a few clicks. Then your future releases will be synced to SourceForge automatically. Your project will reach over 35 million more people per month and you’ll get detailed download statistics.
Sync Now

  • I don't really get his position. Wikipedia is basically a bunch of humans trawling the internet/archives and writing up summaries of what they've found. That is essentially ChatGPT in a nutshell, but it has the added advantages that it will generate answers on anything you ask, whereas wiki will only have pages for things with broad popularity.

    I regularly use wiki as a starting point, but almost always have to deep dive into reddit/blogs/archives to find more specific information. This is exactly the sort of job that ChatGPT is good at, and eventually why would I even bother starting with wikipedia in the first place?

    It will be sad but remember there were people manually creating search engine trees once upon a time, and this was argued by many as a better way to do it until Google came along.

    • Re:

      It's not "good enough" at it, not (yet) by a long shot. Wikipedia is right 99.9% of the time (on noncontroversial topics at least), chatgpt is what, 80%? It might get there one day, but it's definitely not today.

      • I think ChatGPT is fundamentally the wrong approach.

        It works out to very fancy statistics. Which is very cool, but it doesn't think. So it can get mixed up very easily, and will say nonsense with great confidence. Eg, earlier I tried this:

        What are the lyrics to Theatre of Tragedy's "Cassandra"?

        Sure, here are the lyrics to Theatre of Tragedy's "Cassandra":

        The sun sets in the west,
        a breeze from the fields of harvest.
        The sky, red as blood,
        reflects the sign of the eternal oath.

        This not only doesn't match the actual song (https://www.youtube.com/watch?v=_J4Onll7VNs) but as far as I can tell it seems to have completely made it up.

          • Re:

            >It is true, but at the same time it is nonsense, said with great confidence.

            Chatsplaining.

        • Re:

          Is that any worse than having "controversial topics" that get hijacked by highly political editors that slant the content?

          For the mundane stuff "tell me everything you know about the periodic table" kinds of queries the status quo seems adequate, though it would be interesting to have bots that constantly refresh the content or generate new content on the fly for something that isn't "popular enough" to attract a few editors to curate the information.

          • Re:

            Because this comes completely out of the blue. It's not the case of a political difference about how the world should be, or even the case of that a song was misheard and the wrong lyrics wormed into the public consciousness. It's that it somehow managed to invent the completely wrong lyrics for an existing song. As far as I can tell, what it came up with was its own invention, and doesn't exist anywhere on the internet.

            It's really a new failure mode that didn't exist before. Like you'll have people unc

    • would I even bother starting with wikipedia in the first place

      It provides a consistent overview. Say you're looking about a historical battle from the middle ages, or to learn about fluid dynamics. You can't much make a chatGPT question about it, you want already organized information. Also, might that you can bookmark and come back to later on. If your came come up with a specific question like "what were the changes in use of the auto keyword in C++14 vs C++11" then indeed a chat could be quicker.

      there were people manually creating search engine trees once upon a time, and this was argued by many as a better way to do it until Google came along.

      It would still be superior, but the trees disappeared because at the same time, the main place for information moved from individual websites (everybody learning HTML to write their Geocities page) to a centralized places (Wikipedia), and the opinion moved from individual blogs to centralized social networks (Facebook).

    • Re:

      That's right. The moment Wikipedia can be written by AI, it's the moment Wikipedia will become just a stale cache of that AI.

      • Re:

        No no no. The AI will be able to use Wikipedia as a reference and constantly improve itself.

    • eep dive into reddit/blogs/archives to find more specific information. This is exactly the sort of job that ChatGPT is good at

      No, it's not. How do you think that program works?

      No matter. The real problem with trying to use ChatGPT like a search engine is that you can't trust anything it tells you. It will very confidently tell you things that are false, even going so far as to defend the nonsense with more nonsense.

      I had a conversation with it a day or so ago where I asked it some basic things about binary tree mazes after it failed spectacularly writing a function to generate one. It insisted, against all reason, that binary tree mazes were not always perfect mazes. (A perfect maze has exactly one path between any two cells, no loops or isolations, a subset of which is the only kind of maze you can represent with a binary tree.) Asking it follow up questions netted some bizarre contradictions and confusing lies.

      Wikipedia has some serious problems, but it's not that bad.

      • Re:

        Cool. It's not peer-reviewed scientific journal level good, it's not Wikipedia level good either, but Stack Overflow good? Yeah. At the very least in the same league, while being far easier to query for knowledge, especially if you have a problem that you can describe but don't know the formal name of, and thus, don't know the keywords to put into google search.
      • Re:

        > The real problem with trying to use ChatGPT like a search engine is that you can't trust
        > anything it tells you. It will very confidently tell you things that are false, even going so
        > far as to defend the nonsense with more nonsense.

        I asked all three AI engines I could access the same question, "how many neutrons are in a liter of water" (ChatGTP insisted I spell it wrong, it is, of course litre). Here are the answers:

        The older one in bing: there are no neutrons in a liter of water

        ChatGPT 3.5: t

        • Re:

          That's pretty mucb it in a nutshell. The chatbots simply don't truly understand the questions and they come up with an answer that is clear, simple, and wrong. Not to mention that they don't really know how to vet sources. They are much better at fudging their way through the Turing test than older systems like, for example, ELIZA, but they're ultimately in the same category: conversation simulators rather than actual conversationalists.

    • Re:

      That is essentially ChatGPT in a nutshell

      Yeah maybe, except ChatGPT cannot do it accurately - it is even less accurate than what kids on the internet can write on Wikiepdia.

      The moment you start asking ChatGPT for something like schedule details of a past event.. It starts getting many details horribly wrong. Like even matters of attribution - such as Who author'd a certain item or gave a certain event..

    • Re:

      Wikipedia is often a major and reliable training source of LLMs. LLMs don't do anything they aren't trained to do, it's all inference. If they don't have quality data going in they can't generate anything of quality. What ChatGPT can do is essentially distill down information and say it in multiple different ways, you still need that information. It's still 100% Human-powered, it's just that the bulk of the contributors aren't being compensated.
  • It would certainly be a step up from Wikipedia being written by the kind of person who likes to edit Wikipedia all day.
    • This is a pertinent point. There are thousands of WP articles that could do with regular updates, for eg. latest MLB scores. Other types of things where this could be safely set loose are article portals, sidebars and formatting. Ensuring notes and references are accurate, navigation is consistent with other articles on the same subject, etc.

      The question for you then is: what do all the pedantic quasi-aspy eds move on to now that these sort of edits are done in milliseconds? Truth Social?

  • WP is already worthless (other than to pump up Jimmy's stock market value); chapGPT isn't going to make it any worse.

    More broadly, WP's success is due entirely to Google's failure to produce a good search engine. When you go to a WP page and there's a list of citations and references at the bottom - THAT's the stuff Google should be showing you, and more besides, when you did your original search.

    • Re:

      "WP is already worthless"... here, let me describe how useful it is and why it is one of the most used sites on the 'net...

    • Re:

      One would hope so. Nonetheless it doesn't actually understand what it read, it just predicts what comes next based on that.
      • it doesn't actually understand what it read, it just predicts what comes next based on that.

        We need to crunch the digits ourselves and filter everything we speak of online through an algorithm that swaps out the elements of phrases for less likely or less common outcomes.

        POISON THE WELLES, if you like fries with that.

        Spoonerisms with a twist. Discombobulated phrases. Make malapropisms part of everyday speaking. Our pizza resistance. Make it deep seeded. Make ChatGPT know it has another think coming. May

      • Still, you would think Wikipedia has a heavy weight in the model.
    • Re:

      Just my thought! I would expect that almost any answers about actual facts from ChatGPT and their ilk will have come from Wikipedia. I guess it's one of the prime inputs for training.

      I wondered if the referenced was an April Fool, but a check shows it was written 31st March, so no.

  • The internet was meant for humans to communicate with each other, and gather & research information, AI will not be a good substitute
  • ...because regardless of Jimmy Whales' eventual thinking how can he possibly know that someone is not already using ChatGPT to write their edits?
  • Chat GPT often tells things with such confidence that one simply wants to believe it. When I came out, I tried it on math problems, computations and mathematical proofs. There were often subtle mistakes, circular thinking or simply hand-waving arguments included. We live in a time, where every statement needs to be examined for mistakes. These days this includes of course all media. While trivial for social media, it applies more and more to established media (which often depend these days on advertisements or funding or grants or influence of governments or businesses and good old lobby or PR work) and even to published articles (where a small guild of folks in a field decide what can pass peer review). What is missing in Chat GPT is complete access to the data which were used, complete access to the sources which were used. complete transparency who has written the text. Access to all algorithms which were used (also filters). Wikipedia is in this respect already very good. Most claims have sources attached. There is a history of what has been edited. As for now, I do not trust GPT. And as we speak, it is going to be ruined by a large company who plans to spike it with advertisements.
    • Re:

      Completely agree. The danger is in the confidence it shows. I often ask it 'why?' and it starts with an apology and rambles on with new mistakes. People seem to forget it was designed to predict the next word in a sentence. To say it is stupid would be too much credit, simply because being stupid requires one to think.

    • Re:

      Rather that writing Wikipedia, ChatGPT could edit it. I often see awkward sentences or questionable grammar on Wikipedia. I'd correct it but they don't allow editing from VPNs, and I'm not motivated enough to disconnect and do it.

      ChatGPT could offer suggested fixes for that kind of thing.

      • Re:

        > Rather that writing Wikipedia, ChatGPT could edit it

        I think the most value is in summarizing it.

        Someone ran it on something I wrote, and it was a superb summary, hitting most of the important points and leaving out the cruft.

        And when summarizing, it's less likely to just make crap up.

    • I've come to the conclusion that the best word to describe what ChatGPT does is mimicry. Any person can take text written about things they do not understand in any way, and write more text about that text with zero understanding of the meaning or context of that text. You can rephrase bits and pieces, and refactor all you want, and that is fine - it is merely restating things in different ways.

      However, the moment you begin to extrapolate that into something new and different with no actual understanding o

  • "One of the issues with the existing ChatGPT is what they call in the field 'hallucinating' â" I call it lying,"

    I'm not sure this is fair. Lying is knowingly making a false statement, and I'm not sure GPT3/4 knows it's making false statements. I'd hazard a guess that, since GPT3/4 doesn't experience the world directly, it may suppose that "evidence", like language, is a social construction.

    • Re:

      If part of its instruction prompt (that you and I don't get to see) is to appear confident that what is spews is correct, then, yes, it's lying and fraudulent. I'm pretty sure it's trained on enough material to be able to sound convincing, no matter the quality of the input content. Unless it can show you its sources and how it came up with the answers, trusting it is foolish.
    • Re:

      True. ChatGPT is a more of a young child that cannot tell fantasy from reality and will just say anything that you want to hear. It takes a child a few more years to learn what a lie is (and how to use them, and why adults say it's wrong).

  • If I want to use AI, I use it directly and not Wikipedia. There is no need to dump the contents in Wikipedia format.

  • The field of AI could right now already be useful for translations. Quite often some information is only available in an English, German, Spanish or French wiki entry. And yes, translate tools exist but may not be as good as a quality controlled AI translation.

    Having said that, my main worry would be the future feedback loop, as AI generated information will be used to train future models. Its highly recommended to tag any page or paragraph that was generated by AI as having an AI author. Apart that, i do n

  • Wikipedia is already unreliable. Especially the articles about history and politics. Unreliable sources such as posts in social media or news site or even homemade websites.
    • Re:

      It is unreliable but it cites its sources (in principle). You can know some information came from homemade website, or that it's not sourced, and you can decide to trust it or not (or even edit it). You can do none of that with chatGPT. Nobody knows for sure what sources of information were included in the database, if a bias (advertisement) was voluntarily added; it will not tell you where the information came from and you can do nothing about it when you can tell it's wrong. And it's not clear if even the

      • Re:

        Wikipedia is very slow to adapt to facts or the lack of facts on the ground. There is no reason believe that ChatGP does not rely on mass of references over quality metrics for the short term.

        Covid Lab Leak had editorial fights for months, then suddenly as one news source (now without a blue check) changed their editorial board stance the entire wiki editor base crumbled.

        How long does it take to adapt to quantum change in discovery and consensus of the informed behind a topic?
  • AI is using Wikipedia as one of it's main sources of training data most likely. The issue is if AI is trained on Wikipedia, and other things and the other things are less accurate and you feed that into wikipedia... AI will progressively get lower and lower quality training data and get worse and worse.

    You would essentually start training AI on AI data and how would you weight what is higher and lower quality? The algos would spiral down and contradict themselves until things just don't work.

    This is the c

  • It has over six million articles all written by bots. Nobody uses it and there have been complaints from actual speakers that it destroyed the ability to have an actual Wikipedia in that language.
  • A good test would be to ask an AI to participate in the moderation process and see how things go.

  • I figured the gross inaccuracies could only be due to AI or petulant children. guess I can rule one out.
  • We know that ChatGPT can make up stuff that vaguely matches its training set. And you could train it on a corpus of peer-reviewed academic papers and newspapers (not including Wikipedia or other websites in the corpus) and it might then be able to write encylopaedia articles. But as Jimmy Wales noted, you'd have to take its output with a very large pinch of salt, and by that point you might as well write the article by hand.

    A more fruitful approach, as well as one that's more of a challenge for current te

  • Why bother with a wiki database, if the AI can simply generate anything you want on the fly?

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK