Hallo Du!
Bevor du loslegst den Talk zu transkribieren, sieh dir bitte noch einmal unseren Style Guide an: https://wiki.c3subtitles.de/de:styleguide. Solltest du Fragen haben, dann kannst du uns gerne direkt fragen oder unter https://webirc.hackint.org/#irc://hackint.org/#subtitles erreichen.
Bitte vergiss nicht deinen Fortschritt im Fortschrittsbalken auf der Seite des Talks einzutragen.
Vielen Dank für dein Engagement!

Hey you!
Prior to transcribing, please look at your style guide: https://wiki.c3subtitles.de/en:styleguide. If you have some questions you can either ask us personally or write us at https://webirc.hackint.org/#irc://hackint.org/#subtitles.
Please don't forget to mark your progress in the progress bar at the talk's website.
Thank you very much for your commitment!

======================================================================


Hello, everyone, hello to the Stream and the worldwide Internet, welcome to Wikipedia, Asima. And now I want to give you all I want to ask you to give a warm round of applause for me who will explain to you how artificial intelligence and Wikipedia can come together. Hello, everyone, my name is let's start with that. So my name is Ameristar with any I've been a developer at Wikimedia and Wikipedia for 12 more than 12 years now. And I've been in that staff of Wikimedia Deutschland for two years almost. And then I live in Berlin and I will explain about my work and my place a little bit more. So I work at a place called Wikimedia Deutschland, which is the German chapter of Wikimedia Foundation. Wikimedia Foundation is the place that runs Wikipedia and all of its sister projects, which includes Wikimedia Commons and the KeyData and all sorts of things. The total number of engineers who run Wikipedia is 300, which is incredibly crazy, low amount compared to other companies and because it's a nonprofit. And so it's a really it's an honor, more of an honor than a job to be in this among 300 engineers to do this type of work. So I'm in a team called the Wikimedia Scoring Platform Team, which is sort of a team that seems like Wikimedia Foundation among there's another employee of the committee, Dutch Lander, and there's two other media foundation who lives in the United States. So it's basically lots of different times and coordination and our responsibilities to build services that are immersion learning and an idea that empowers and helps users. I get to I get to that in more details later. And so what we build is something called Orris. And I want to talk about why or why or what is Auris and what is needed. And it is of with something called eddied quality. So you know about this problem of Wikipedia, that's there's like a lot of Handelsman finding this amount of vandal's among all sorts of ideas is really hard. So there were there was some ehi that it's built to help i
t and we built this and then we saw like maybe we should just make it a platform for EHI in general and we move forward. So the first project was edited quality and then it moved forward to other things. So let's talk about why it's needed. So this is planet Earth. And by the way, this data is from March of 2012 because I was too lazy to update it. And then so it's seven billion people and a number of monthly Wikipedia readers is 500 million. So basically, if one person in ten person uses it at least once a month and all of these people, 500 million people, only 10000 of only 100000 of them, edit Wikipedia, which means out of these five hundred people for each 10000 viewers and readers, just one editor, how many of you are consider yourself editor of Wikipedia and that explains it. Uh. So what how Wikipedia is built is Wikipedia completely all of its content is volunteer work, and if you get some money basically to run the servers and pay for these 300 engineers and all sorts of other things. So all of this content is really huge. It's five million articles and it's just a number, right? Five million articles. It's really hard to grasp it. So I start with an example. There's an article in Wikipedia called Lists of Lists of Lists. And if you click on the first one, there's a list of ancient kings and then you click on the next one. It's there's like other lists, which is lists of ancient kings, and there's a list of pharaohs. And if you click on that and then you get to the list of errors and then if you click on the first one, you get the first real article and not a list. It's like three orders, three layers of leasing until you get to the real article and how it works. Wikipedia is powered by something called Mediaweek. How many of you know about media wiki? That's really good. That's good. So the media, the key of it is really ugly and it's really old piece of software that has been written back. And it's basically empowers anything that it's a sort of revision c
ontent management. So and it's really good to do it in scale and end in a mentality of this thing is like you first make an edit, it goes live and then it gets reviews. And that's that's the whole mentality and design of this software. And also it has a saying that you have assured authorship. So it's really hard to say I own this article. There's no way to say that. And there's like a policies against that that you cannot say. I own this article even if you started or you write most of it. And so how quality control in Wikipedia works, it's a really complex situation because there are several lines of defense in Wikipedia, how to prevent vandalism. And so there's one fully automated but called Tanji, which is built on a neural network and it's ChiX edits and just review them really quickly and revert them to things are really bad. And there are that those like tools called Huggler or some other things that people have and they're like laptops. The way that they actually contribute to Wikipedia is not that they don't they don't write articles or improve, they don't improve articles. They just in their work and they're like bored and they just check the recent changes and check, oh, these are expensive. And just this is some way to contribute to Wikipedia. And it is, for example, in case of Hugo, which is a software and it basically shows you the one that's most likely to be vandalism first. And then you can move forward to the ones that are least likely to be vandalism. And there are admins. So I think there's 3000 admins in English, Wikipedia and another because it's smaller and it's completely community elected. It's completely volunteer and they can ban people or protect pages and several levels. So here comes the machine classifier and the machine classifier or I. It's like machine learning is a part of A.I. And if you want to describe what machine learning is, is basically in really simple terms, is like in ordinary cases, you have some sort of algorithms that 
you say it is edit is vandalism or not. So you say it is edited, contains this bad word is vandalism and there's no other way. But what we do in here is actually say look at we have this at this hour, we know it. That's our vandalism and we know this. It is that are not vandalism. So just go and figure it out at what is vandalism and then you can use this pattern. And for future cases or like in another case, it's really simple. It's like you show ten pictures of orange and then you showed them pictures of Apple and then you say, just go figure out how like an apple look like. And so what we do is that we basically try to describe and edit in features or turn them to like a bunch of numbers or in, mathematically speaking, a vector. So we say if the edit is made by a user that's not registered because there tend to be more likely to be vandalism, how many characters are spinet? How many characters have been removed? Because, for example, if a person removes like lots of page or how many repeated character has been added, so, for example, there are lots of vandalism that's actually they just add a lot of discrimination marks everywhere or longest token or a number of bad words. Edit finding bad words is actually really funny. I use it in l.p system to build this and it goes. Through history of Wikipedia and based on that, he just looks at this as are most likely to be in edits that are reverted, but they are not happening. Other types of edits, it's it has some funny results. For example, for Norwegian, Scientology came up as a bad word. I don't know why a so and then we say and it's a black I don't know how it finds its patterns. It just gives me a number between zero and one and that's all. So how it's useful. And so basically in English, Wikipedia, there's one hundred sixty thousand edits per day. That's really vast amounts of edits and machine classifier just gives a number to all of them from zero to one. And what it does, we put a threshold that has the highest 
recall. It doesn't have really good precision. It means like if we get this 10 percent edits, there might be a lot of it is a very good. But still, we want to collect all the things that are might be vandalism. So we just remove all of this 90 percent edits and we say just review. It is 10 percent and we give this to humans to just review that. And so we that machine prediction, it used to be like thirty three people that worked eight hours a day just to review this for this English Wikipedia and now it's for people. And then lots of machine classifiers have been written for Wikipedia is like sticky, like a Claverton John Hagel, but they are written by people who are computer scientists and has a lot of experience skills and machine learning and see through the system because it's a really large system. And so what we did was they built just an API. We say we give you that number, we give you this machine learning number, and you just go figure that and build anything that you want on top of that. And I want to talk about like values cases we had for this. So in this example, there is in the top end, it is a really good edit that changes if it fixes a link in Wikipedia and it says with probability of 94 percent, it's not a vandalism. And then someone just removes the references, says it almost grows on trees. I don't know. And the true for this is like 92 percent. This is vandalism if somebody needs to review it so we don't do any sorts of edits on our own, we just provide this API. It's sort of like and I would call your machine learning as a service. So then it makes it really easy for people who doesn't have much knowledge about machine learning or about software engineering to just build a tool on top of our tools. And I'm going to talk about it like. So one thing that happened was, um. So first of all, do you know how many of you do know about especially recent changes? OK, so I'll explain it about a little bit further. And so if you the way that Wikipedia find
s out which these are vandals, vandalism or not, like a lot of people find out, is there's a if you make an edit, it just immediately goes to a special page called especially since you just automatically generated. And then anyone who checks is page, you can say, oh, OK, this is probably needs to be checked. So they just go there and look it up. And there was a product team in Wikimedia Foundation. They said, oh, this is really nice that we have this API. So we built this tool that just basically highlights the edits that are likely to be vandalism for anyone who checks the recent changes. And then you can say, oh, OK, I will review this or I revert this. So it's up to here. And there was like other things that there are like Bossert automatic reverse things that are made on top of ORUs or other things. But now we realize, OK, we have this machine learning service, that we can do all sorts of things on top of that of why not just just stick it if it is vandalism or not, you just can do all sorts of other things. So we move forward with, too. There are lots of other classification and machine learning models that we built, but I really like this too. First is article quality. It basically says what? Or it basically quantifies a quality of an article, says so in English, Wikipedia. There's five levels of quality for an article. It's called Feature Article, good article. And then like an article B, C, start and stop. It's like this spectrum. And so we turn this into number. So we give the people a number saying, oh, this article seems to be a see article. And this is one thing about this. Like, so there's five. Million articles, and they change all the time, so how are you going to assess the quality of these things? So there's people who go there and say, oh, this is a C-level article, and then five years pass by and no one reads this article. But the article has improved a lot. So I think that we did was actually going through all of these assessments and saying, OK,
 it seems that these articles have improved a lot. You might need to assess this thing that you did. So this was I will talk about this more often. And there's another one is that called draft topic. It basically uses a word to make an LP system to say if this article without knowing anything, if this article belongs to this category of space or physics or chemistry or like, it's about culture. So it's basically just gets the content of the article and gives you what type of subject it is. And it's really useful for a huge amount of new articles that's being added to Wikipedia and people don't know what topic they are. So I want to so I know a little bit about physics so I can review these type of articles, but I don't know. I don't want to see anything about like art because I don't know how to I can say if it's a good article or not. So one thing that someone did, it's called an ostrich dashboard. So it's basically education program. This is like a foundation that works closely to the community foundation. And they go to the students and they give them articles and they say, like how to improve Wikipedia. They turn a students, university students, especially to Wikipedia editors. And they built like it's sort of a dashboard in that you can give an article and it's a suggestion of what type of other articles similar to that you can improve. And it gives a really high percentage to things that needs to be completed. And this completeness, the score that I can I don't know if you can see it here in here. This completeness score is comes from ORUs. So it gives this sort of priority to things that or is things it's not completed. And this is one of my favorites. So I do you know this woman, Carlana or Emily Temple. So you really should know her as she was a Wikipedian a year for two years ago, three years ago. And she was a person who did a lot of editing for about articles of woman scientist, and she got lots of hate mail. So one day she said, that's enough. I vow to 
write an article about a woman scientist every time I get the hate hate mail. And she did a lots of folks and she had lots of works to improve the articles of woman scientists. We published the paper and we were able to measure her impact. This is like I really love this graph. This is amazing. So we sort of got all of the quality of all of Wikipedia over the time. And then we got the average of quality and then we looked at the quality and say, OK, this is the average, and then got another average, which is average of woman scientist. And we realized, OK, starting from 2000 to 2004, they didn't get much attention. So the quality was below the average of Wikipedia. And then it comes to 2012. And then Achelen, our Emily Tembo started working on articles and then she started making a wiki project and doing all sorts of outreach to universities. And then it got improved until 2014. That is actually get bypassed the average quality of all of Wikipedia. And now it's way better than the average Coltons going improving exponentially. And there's another use case. And so sorry, how many of you know about Wiki data? That's nice. So it's basically a repository that has the basic information for everything, so Google uses it, Siri uses it everything, and it has a structured way of handling the data. But because it's a wiki, it can be vandalism happening in it, too. And it was this famous case that if you would ask, Siri was the national anthem of Algeria, it would say is this is this Esposito? Because it's someone vandalized this item in the data and saying, like, change the national anthem of Algeria to discuss it. And so I think that after that, Amazon used Auris and looked it up and says when they are like sort of gathering our data from the data, they look at Orissa's scores and they see if they or as scores for that item is dropping really low. And it seems that probably evangelism is happening there. They just don't put them inside their internal system and. And there's 
another thing about this is that ethical layer is really important here, so everything about Orris is open, all of its data, all of it. Of course, everything about is completely open. And then this is an API. You just can flip everything. So one thing is in Facebook, you usually get things that Facebook thinks is the most important to you, but you cannot flip the switch and check out. OK, I want to see things that are least important to me. But you can do this in Wikipedia. You can say, oh, no, no, I don't want to see this that are likely to be vandalism. I want to see the best efforts. You can just do it and just change the order and you get the results. And there's another problem that we used to have was after there was a robot that basically was reversing all of this and it was really good. But the problem was they was too good because there was lots of new editors who would do a good faith mistake. They wanted to fix something, but because Wikipedia is super complex, you cannot get everything right in the first edit and a lot of cases it just randomly gets rejected by a bunch and then people just run away. And there was studies that prove that that's changing. And this robot to a human would change the course of things. And like there was like how it was proving. It was like this robot was broken for a couple of days because some reasons it was less of a problems. But at the end, user retention for that month went up. So one thing that we did here was that we built two systems saying, like, it's just not determine if it's a vandalism or not, just determine if it's if it is damaging, it is made in a good faith or not. And we don't let people to we tell people not to worry about edits that are made in good faith by robots. And we also can a lot of people use this and they go to like sort of health pages and find these type of edits using Auris and they go to the user and say, oh, thank you, that you come here. But this edit has this small problems, please take ca
re. And it would give it like more socialized and more welcoming space to them. So it's we are working on that to make it better. And so now I want to talk about Auris components and how it's built together. Any questions up to here? Nothing. OK, so the first thing is something called the killable, the source code, because everything about us is public. So the source code for that is in GitHub, Wikimedia slash cannibals. And it's you can try it out in the labels of dabbling meth labs dot org. And it's basically a people can go there and get their work set, like usually volunteers. And they say if there it is, if you scroll down and you see an edit here and they say if it's damaging or if it's a good face and you can choose if it's good faith or not, and it would collect all of its data and improves its models or make this models for new things, Rescoring is the basically the scientific part behind Auris. It's a basically a wrapper around this Python library called Psyche's Learn. And it's just something that if I can I want to show it. You can build this tool and you build a model, just load it up from pixels and then you give it an API endpoint and you say, OK, this revision, this is a revision ID one, two, three, four, five, six, seven, eight, nine. And then you say, OK, give me probability if it's a vandalism or not, it just it gets all of the futures and the futures and tell you if it's a good idea or bad. But and also so the thing is, like we build this all of these models, but how we're going to gather the data, how we are going to build this model and how we are going to ship these models to the production, it's has sort of built in a place called Model Repositories. So for the one that detects if it's a this is a vandalism or not, it's in its quality. And if it's the one that's actually assessed quality of articles, it's called article quality and so on. And Orest itself, it just acquires everything. A basically and also it has a like it's a flask app and it
 has a nice caching around it. And so you can have this discussion so you don't get a lot of requests so it can just run in scale and is accessible for everyone in the community. So I want to just show you quickly one of the like results. So this is a little something probably. Can you read it now or more? Mark, I OK, so this is just output of arrests that the commuters, especially three scores and some random number, that is a revision idea of edit an article of Milkovich. So we say, OK, we support this model. These are versions of these models and this is Ajka quality. And it thinks it's a B level article, also a B level. It's less than gay and F A, but it's top of C start and stop and it gives us some numbers of how confident it is. And so it says if it's the last major, this article is damaging or not. It says with ninety three percent confidence, it's not a damaging edit and a giraffe quality says if it's a spam vandalism or attack article and says, no, it's OK. And drafts are a draft topic says it's more probably stem as subcategory of space. And so, like we have this all of these categories here, the the the the and the last on this, uh, this article is made on a good face with a really high confidence. So basically it's like a really simple this is just Jason Prettify. So if you want to see it in reality, it's like this just you can just pass it from Jason and everything. Everything will be fine. So this is just the output. That's the only thing that we provide. We don't do anything after that. And people build tools on top of this. OK, let's see. OK. Yes. Um, so, uh, any questions is basically all of it. Yeah. Mother weighed support, according to labor today, to. Now, I can't repeat the question. So the question. You can see how there is like and it's good, though, for me to do this more than you could in. It a way statements by the. So one thing about this is that we usually crowdsource like all of the labels, so individual labels, if we can go there and l
ook at what people actually editors, how many edits and we can even check who edited. But so it's a sort of really open that everyone can see what you did. But we never given a score to anyone, so we basically give it to them more of a that's how we can say it. It's more like objective because Orison or in Orissa's actually stands for objective. So you want to be more objective and don't give sort of values to more people. But at the end, if there is evangelism in there like labels and some people just go to mislabeling everything, we just can remove or remove everything from the dedicated user and just remove them from the database. And that would be it. Yeah. Another question. What I really liked about the systems that you talked about is that they they don't that's always a human in the loop, right? They don't really try to replace anyone's work. They just try to focus it better. So that's that's something I really like about that. To what extent is that like a philosophical principle that you have or what you also if you if you could have I don't know if you could have about that. That perfectly divides good from from bad edits or whatever what you want to deploy it or what you still keep humans in the loop. Hmm. And so there are two parts of it. So one thing about is that Wikimedia Foundation engineers usually try to build a platform and they don't want to an infrastructure. They don't want to a intervene in any things in the community. And anything that's related to content is the community's work. And it's really our principle to just not meddle with that. And so we build tools, we make a life for them easier. So like, for example, in this climate, change was only existed in English, Wikipedia, but after Auris came to life. So lots of people who didn't know much about machine learning was able to use that. So this sort of reversed but actually came into life in a Spanish Wikipedia wiki data program, Wikipedia and some other Vickie's. So it removed the barrier
 of doing that. But it's like a catalyst. But we are never going to do anything on behalf of the community. Would it answer your question? Sorry if this question was already answered, but can I query or as with any text that I like only with Wikipedia articles, so can I do topic modeling for any text or so? Can you be more specific? What do you mean? Like anything? Because like so in Wikipedia we also like or a specific data to one items of data. What do you mean. Something is specific in Wikipedia itself. Like or something like outside if I can. Hello. OK, um I was a little late so maybe I didn't get all of it. So does this have some kind of text input field where I can basically put in my own text and that get topic labeling for that specific text, something that you can do and some people actually do it is that you want to get a score for a revision that made this happen in this article. But also you can tell Orris just change this feature. And like, for example, if you want to access codes, you have an article. And so it goes and looks up at how many headings it has, how many sections it has for how many progress has. And then A, you can say, just imagine it doesn't have like ten paragraphs. Imagine it has 12 paragraphs and give me another number so you can make it or as to sort of give you just dedicated future to. And for that, it's actually someone uses this ad to give us some sort of recommender system that allows rich people. So they go to the students and say, so this article looks like a B level article. And if you want to make it like a good article, just add 10 more references for this article so they would know. OK, so if I add just one paragraph and ten more articles, it will level up this article so it can be used this way. But anything we say that, for example, we don't do any sort of users or anything and also for security reasons, we don't give it to get text or like data sources for that from users Vlady to get it from Wikipedia. So it goes throu
gh Wikipedia's API. But you can change futures. I'm about a bit interested about the differences between Wikipedia and Wiki data, which is easier to classify, is easier classify wiki data edits like I assume there are lots of truth and kind of eternally true, like the national anthem of a country would change very rarely. Yeah. So the question to answer is there's two types of so there's one pro and one can I can like sort of simplify to this. And one thing about the wiki data is accuracy is really high because you can just make it and easily say, OK, this item is about a living person or a day item, like really common patterns like vandalism where items are can be easily added as a feature because it's like a structured data. So it's really accurate. It's advertised like add much, feature it just really quickly. It was able to pick up everything and but there are two things. So lots of vandalism comes from editing descriptions. So and these are free from text and it's a little bit hard to detect that. And there's another part is that ninety nine point two percent of these ads are really good apps. But so the percentage of vandalism in Wikipedia is seven percent. Seven percent of it is vandalism, but the data is zero point one percent or something like it's really, really smart. So finding a really, really good and not classifier, but a really good dataset to label was really hard. I think we basically got around half a million revisions and just expected all of the features and say, yeah, maybe we should just label this one. So it was really because it's sort of like a but dominant wiki. It's not a more like English Wikipedia. It's just more human than what. Did you think about the case that someone trained his own eye against yours and then hides his vandalism and a cloud of words, which did you think about that all the time? I think so. And so there are two things. So one good thing about this is being really public is that people can come and say problems that i
t has. So, for example, someone came and said it has some bias towards newcomers because newcomers tend to be more like vandalizing. So we also have problems. But the good thing about being public is that people can just come to us and say these problems exist. And but also, on the other hand, if we are not trying to catch sneaky vandalism, this is some sort of thing called sneaky vandalism. And there are like other lines of defense for that. And so it's OK, let me assure you. Hmm. So there's a there's another line of defense we want to do this, we catch this type of vandalism, which is according to research, is actually made only by a 14 year old high school kids that just are bored. And so it's not trying to catch those sophisticated attacks against Wikipedia. I can say more questions if you want to reach out to us. So these are my emails. The first one in here is within the media. That is my work account, Allard's, Grubert, Gmail is my personal account. Don't ask why you chose this username. I was told when I chose it, and this is my username, Wikipedia Islamicized within the DMD, because this is the law and this is my personal account. And in Twitter you can find me in those accounts. The second the first one is in plain English. The first one is mostly Persian. And if you want to say one thing about our team is screenplay's from team is some people are living in the San Francisco. Some of it live in Minneapolis. I live in Berlin and people live all around the world. So we have sort of an online office, which is a channel called a Wikimedia I Freenode. And you can just come there and just ask questions and say, hey, I saw this presentation. I'm asking. I have discussions and I think just. Thank you.