Hallo Du!
Bitte vergiss nicht deinen Fortschritt im Fortschrittsbalken auf der Seite des Talks einzutragen.
Vielen Dank für dein Engagement!
Hey you!
Please don't forget to mark your progress in the progress bar at the talk's website.
Thank you very much for your commitment!
========================================================================
[Music]
Hello everyone!
Next up is going to be how to become a mediawiki hacker by Andrew.
so yeah that's that thank you
Welcome everybody, thanks for interest.
This is not going to take I guess the full 90 minutes,
except for if you come up with lots of lots of questions in the end
This is going to be I think pretty overwhelming overview of all the many technical areas we have in wikimedia movement when it comes to software development and software maintenance.
The development infrastructure we're using a little bit an overview of it and mostly, providing ideas where to get involved.
That also means, hopefully I'm going to provide some links with more information show a few things and afterwards: Questions and the answers feel free to ask really anything and everything when it comes to software development related things in Wikimedia.
To make more people leave what this session is not:
It's not a hands-on workshop I'm not gonna install MediaWiki or register any accounts on stage or teach how to program and in programming languages
First of all as I never really know which kind of audience I have the terms are very confusing there is this website which is pretty popular called Wikipedia in many languages which is a free online encyclopedia.
There is Wikimedia which is the name of the entire movement: All the volunteers also organizations who have paid staff like Wikimedia Deutschland or Wikimedia Foundation.
There is a software called MediaWiki one of the wiki software's out there, which hosts and provides the functionality to view pages on Wikipedia and our other projects and to edit them.
And there's the term wiki which just means well literally means fast in that language, but it means the general kind of software where anyone can edit something on website.
As a last overview on on a very high level: Many people only know Wikipedia but there's a few more projects and websites we have, which are also about sharing and providing open and free knowledge.
This list I would even say is incomplete because we have a few more websites but they're more like technical or development related or cross movement
Any questions so far?
okay
Before I dive into the technical areas for the basic understanding:
How do we communicate?
We use a lot of on wiki discussions talk pages we do use IRC many developers are still on IRC.
We have mailing listsm I think that's the main general discussion areas or discussing higher-level plans when it comes to mostly software development, but not only because some chapters also use our ticket system for planning their activities or sometimes even conferences.
We use a software called Fabricator. It is mostly for software bug reports and feature requests. We have quite some projects in there – you probably know issue trackers you can prioritize things you can plan in your team assign tasks all these things.
I hope this is large enough.
We have this information page on MediaWiki dot org which is called fabricator which also links to the help page and some basic information has some videos how to use it and the tool itself looks like this on the default front page.
If you've seen things like let's say Bugzilla, Jira, Trello, Mingle – it's I would say it's it's pretty similar to this.
If you have an account on MediaWiki.org you don't have a separate password on Fabricator: You can just log in by OAuth.
For the development of code we mostly use git repositories and we use Gerrit for the code review part as Wikimedia is pretty large pretty complex on many levels with many different people backgrounds and languages it's also pretty much up to each developer or team what they prefer at least when it comes to volunteer developers so there's also a good bunch of projects which are in GitHub instead.
But Wikimedia we we offer git and Garrett for that mostly who has seen up who uses gates who is used to get that's a majority okay.
Garrett? – Okay that's already less
OK, Garrett. Oh, OK, that's already less so to take a quick look at it and Mediaweek, York, we we have a page called Garrett, which also links to getting started, tutorials, things like these and the software itself. That's the main view. I'm not locked in here where you see the latest proposed code changes, but somebody proposed and if my Wi-Fi is stable, it is you can see here, if I expand this area of a proposed code, changes the reviewers, and then you can also at the bottom comment on things. And to use Garretts, we have a tutorial on Mediaweek York, which explains the basic steps. You register a so-called developer account. This is separate from the account you have on media wiki or Wikipedia, I think both for historical reasons and also for security reasons. It's for free. First, nobody who has to give their OK and then you upload your SS, he basically and install the software on your machine. And then once you've checked out Git Repository, you can propose patches that then get uploaded into Garretts for somebody else to take a look and review, because we usually have a policy that Four-Eyes need to take a look at a proposed code change to have at least one other person reviewing the proposed code change. Any comments, questions so far? OK, and I said mostly Garrett, and to some extent GitHub, because we also have some code on Wiki and Wiki pages, I'll come to that in the next slide. And for the social organizational structure, because that's also pretty often asked, how does this all work with hundreds or thousands of volunteers working on code together? A lot of work is done by employees or contractors of organizations. Probably the biggest is the Wikimedia Foundation, who is also running. The servers are responsible for it, but there are also large entities like Wikimedia Deutschland who are pretty present here. Wikimedia Sweden is doing some development work. We have third party companies who use Wikipedia software, for example, the media wiki software.
There's people in Wikia who develop on this and sometimes contribute a proposed code changes back that they want to see available for everybody and not maintain in their installation as a custom patch or other companies. For example, NASAA is also using Mediaweek internally. So I think we even lately saw a few patches by Microsoft volunt Microsoft employees, proposing them for us to include so and last but not least, because it's also a large number, lots of volunteers who do this in their free time when it comes to decision making or prioritization or ownership. That's also sometimes pretty confusing. Um, you've probably experienced it that you contribute. I want to contribute some codes to some project and sometimes don't get a reply. But you really would like to see that included. And then you start to find out who shall I contact in different ways or via different media, like sending private emails or something to maybe realize it's unmaintained. Um, and Wikimedia, the organizations like Wikimedia Foundation or Wikimedia Deutschland, they have their internal annual plans, quarterly goals, things like these planning processes, internal, not in a way that it's nonpublic. So when it comes to Wikimedia Foundation, you can look up the annual plans for each engineering team and also other technology and nanotechnology teams on the on our wiki page wiki site. But those software projects who are mainly or only maintained by volunteers and their free time volunteers sometimes have less time or a way without announcing it or might not realize. But they have way less time now because sometimes you're in this mood like I'm going to work on this again. I just need to find the time which never happens. And we try to track who's maintaining or who is the owner of which of our pieces on developers maintainers page we have. And we also try to make explicit that this piece of software is missing maintainer, owner and things like this. So that is sometimes helpful and sometimes a
lso need that myself. When I try to track down somebody who might be able to review contributor patch or a proposed code change. When it comes to non developers and trying to influence what should be worked on by somebody else, whoever that somebody else might be, the Wikimedia Foundation has something called community wishlists, which takes place once a year. I think it finished five or six weeks ago for the last edition of it, where anybody who's not necessarily a developer. So this can also be readers or authors. Editors of Wikimedia websites propose technical changes. And then there is a voting phase and the top 10, at least the top 10. But that's a promise is going to be looked at and worked on by development teams of the foundation. And Wikimedia Deutschland has something similar called Danish Venture and also a team called what is it called? Community Biddulph. I think nowadays or maybe it has changed. I'm not sure. But so this is one way how politics and development can be influenced. But that's only once a year. And pretty often it feels a bit unclear to outsiders who decides and things like these. So, oops, sorry for that. That's why I'm mentioning this here, because it often feels like a black box that you don't understand how things work. And as I sometimes like to look at statistics, at least for the court repositories that are and Gates and Garretts, we have some very shiny graphs on an external website by a company who's also providing these services to other open source organizations, where you can also see how many volunteers have contributed. So which percentage of volunteers versus other organizations, things like these comments, questions. So I once tried to make some kind of scheme, something graphical, I still it doesn't feel complete. It never feels complete because there's so many bits and pieces and there's also enough things missing here. For example, I don't have wiki data somewhere here, but this was at least an approach in one simple ima
ge, simple. Well, in one image to give a bit of an overview of areas we have and for related programing languages, because pretty often volunteers come up mostly on Iasi or mailing lists and ask, hey, I want to contribute to Wikimedia, how can I start? Or I think in half of the cases they also mentioned their programing language. So they want to go by the programing language and we have a pretty good choice when it comes to this. So I'm going to start bombing you with lots of technical areas, and I hope that some of them might interest you and to investigate further here, as I mentioned earlier, not all code is in code repositories and some code is on Wiki. And that is the case for gadgets and user scripts. The difference is a user script is something that you for yourself have under your namespace. So like on user, then your username slash coming J.S. you can put some custom JavaScript and whenever you're logged in, this is automatically loaded and changes your experience in your browser. And when user scripts get pretty popular, they can become an option site wide for other Lockton users. But you have. And then this becomes available in your user preferences, so you don't have to add this code somehow yourself anymore to load somebody else's user script, but you have a wonderful checkbox that you can enable. This is obviously JavaScript and there's a page called Gadget Kitchen, and pretty often you have to look up things in the core software of Milwaukee. That's the next slide, I think, to know the names of functions and things you want to call. Let me switch to. My browser, I think I thought, yeah, I changed the order, of course, before I started with this talk. That's why. This is the kitchen kitchen page, which has some basic introduction. How you can try out things, loading a first gadget from somebody else. We have statistics, Proeski. Which gadgets are popular, you can see here some gadgets are by default on for every user and an it's sorted by some of the m
ost popular ones which are not enabled by default. And all of them add to such a certain functionality that is not included in the media wiki software itself that somebody considered helpful for themselves. I guess at first, at some point it got very popular. And sometimes, of course, functionality and gadgets at some point is also transferred into codes in the media, which mean code base. This is the preferences section, which you can when you're logged in up there as the preferences link on every Wikimedia website like Wikipedia or Commons, and then you have a list here where you can play with things. And this can, for some users, get pretty long this this is the code of one user who has a lot of custom things in their. And of course, you can also load JavaScript gadgets, you don't have to have this all in one single file, which is named Commandeers under the username, you can, of course, also if is in separate pages and then load them from common jass. And this is exactly it is a wiki page. So it is exactly like an article on Wikipedia. You can also at least one. It's not that popular one. Edit the source here. I can only view the source because this is used on many pages and instead of a classic gridlock change history, you also have the history like on any Wikipedia article to see when the last changes happened. So the advantage of gadgets is that you don't have to fiddle with get Garret's, it's a bit more low level when it comes to entrance barriers. That's also why I put this here first. And if you're used to editing wiki pages already, it's the same interface, more or less. Any questions, comments? I think we have a microphone. Yeah, OK. Other restrictions on what I can put there in JavaScript, because, I mean, can I track users, can I call APIs whatever kind of log user data or. Yeah, Daniel, if you want to go ahead, I mean, I won't stop you. Hi, I'm Daniel. I work for a leukemia foundation as a software engineer and maybe I can answer this. So, yes, there
are a lot of researchers actually on this. You can do pretty much anything you want in your own JavaScript, but it will only load for yourself. So, yes, you can track yourself when it comes to writing JavaScript that runs for all the users. You need elevator privileges, right? You need actually, it used to be the case that any wiki admin could do this. And for a couple of months now, this is a additional restriction that you would actually need to, well, be approved for by the community. And there is review processes, basically. Well, OK, this is per community, but essentially you have code review there. And if you try to put something there that will track user data, then you will hopefully be told off. And, you know, every now and then it actually happens that an admin goes rough and does things that they shouldn't do, sometimes intentionally, like putting a crypto miner on there. We actually had that happen for like a few minutes. Um, but sometimes people are actually well-intentioned but don't really pay attention to, I think, some Wikipedia. I don't recall the language had it just a page view counter by Google there. They weren't aware that they were exposing private data. So, yeah, that happens. But it gets rolled back pretty quickly usually. Thanks. And this sometimes happens also, for example, with custom JavaScript access that is loaded per site, but for every visitor, also for nonlocal and once, because that is also possible and it's rare nowadays because we usually check for this now, but sometimes it happens by the font, for example, is loaded from a third party website, and that is against the privacy policy. And I think I saw that two or three times in the last year and it was quickly removed usually once it's detected. Yeah. Further comments, questions on gadget's on onWe key code. OK. Then I'm going to continue with things that are in get repositories. The very base of all of this is the software called Media Wiki, it is 15 years old, written in clas
sical Lampard, studied Linux, Apache, MySQL. It is nearly everything is there is some JavaScript query in there. You might find a little bit of success, less these things. If you're new to the Wikimedia code stack. I do not recommend starting with this code base because it's huge and some of its code areas are 15 years old. And nowadays you would probably use different concepts when it comes to software architecture and things like this. However, for many things that are based on media wiki, and if you want to work or change the codes of those things that are based on media wiki, you need to install a media wiki and we have a dedicated page for that's how to become a media wiki hacker on Mediaweek York. And you have Veigar and you have Docker and you can do that manually. So this is a page hopefully guiding you through. This journey. Um, yeah, set up your development environment steps for Vagrant Stalker, how to manually install some generals suggested reading for new contributors who might not be aware or to exposed before one comes to the culture and the expectations, like where to ask questions, which questions to ask. That's how can I start is not the best question ever. But it should be more specific things like these. And yeah, it's gotten way shorter, I realized, and I had it in mind, which is nice because. I have this general impression that we have actually too much documentation as a problem and not usually it's like the documentation is missing. But in the Wiki Wikipedia community, I often have the feeling, OK, that five people wrote five different pages on that topic. And which one shall we recommend? Which one is the best, depending on your knowledge of things like these? And that can sometimes be a pretty tough choice. I'm sorry, five years old. Oh, yeah. Yeah. Look, are looking at the age of the page can sometimes be helpful. Like when was the last I did done. Like did somebody care about something. Yeah that's true. So you need Mediaweek. It's going
to be pretty central for many things, not for all the things I'm going to show, but at least when it comes to Extension's and Skins, you will have to install media wiki, but you don't necessarily have to touch its code base. On top of media, we keep our extension's we have at least yesterday, I think it was close to 180 extensions are deployed, available on the Wikimedia servers who provide Wikipedia. And we have a project session on Mediaweek work in that category. We have nearly 2000 extensions listed. That's probably more that don't have a page or nobody created a page on media wiki work. So that is very likely. More on GitHub and in other places, but harder to find. And that's PHP, JavaScript, also some less than access. And I kept one example here for an extension because it's pretty well maintained and the court review also usually happens pretty fast, which is not always the case, depending on how maintain the is so newsletter extension on that page. Also, all the pages should have links to where to find tasks to work on and generally how to fund things to work on as a starter also at the end. And how do things look like their skins? There's a lot of skins out there, I don't know an exact number, dozens, definitely. I know about six different skins are available and deployed on Wikimedia sites. And you can also change that in your preferences. Some are more popular. Some are nowadays less popular or were 10 years ago. You can see the difference in skins here at four sides. So on top, you see even Minerva's skin, which originally comes from mobile. And now I think it's still being decouples, but it's not. The code base doesn't depend any more on the mobile front end and down. There is an older one, which is called Monbulk. So if you're more into the design parts like success, then this is an area where also lots of small bits and pieces can be improved because skins have bugs and content sometimes overlays with other content, especially on smaller screens. Or
there's also, of course, the whole. Yeah, that's related to smaller screens, mobile devices and things like these. This part does not necessarily require installing software, but if you're using or developing your own software and want to use data from, for example, Wikipedia, we have a Web API and the rest API. I think we have three APIs now. And because there's also the internal one and naming things as hard as one example in the screenshot, I probably shouldn't do that. And I didn't ask for sound beforehand, but there is a website which visualizes which sounds whenever an article is edited, creating circles. But as many things like these and this is the starting point usually where you have our API documentation, it's on Mediaweek, you organic slash API will redirect to this one. And you can see here per section of things you might want to do, like the basics, how you can authenticate, especially page operations, researching things, and in quite some pages nowadays have examples. For example, having to have an API sandbox pretty often where you can enter things and just try out things. But that's also currently being worked on to improve this documentation. Questions, comments, confusion. Um, when it comes to this data, you can use the APIs we have, but of course you can also get data dumps, for example, for offline use, or if you're into research, which is another area here, we have different data dumps available, which you can play with whatever you would like to find out. And we have some offline applications. I think the most known one, or at least to me visible one, is called Quix. So that also uses dump's. And it's basically you can take, for example, Wikipedia with you and then read it without any Internet activity connectivity. So and they're also very friendly people and welcome more developers. Quicks, I think runs on quite some operating systems nowadays, also mobile and things like these. Mobile truth, there are Wikipedia applications for mobile. This
is for Wikipedia, actually on both iOS and Android. There were other systems supported in the past, but I think that's now mostly volunteer based work, if at all. Like this is really currently the concentration on the market. I guess so. And it's Wikipedia up. It's not a generic application you could use for your own wiki that you host or something, though. There's also been some people experimenting with making it a bit more generic at a past hackathon and a volunteer based application, which is very nice. Is the Android application for Wikimedia Commons, where you can upload freely licensed images, videos, media files, which is an Gebo, and they also welcome contributions. I won't and I can't say too many things about this, because for the last days here, there have been already many we data people around. Uh, Daniel has also been involved and we KeyData for many years. If you have some specific questions about that, it's basically a centralized repository of structured data and javascript. And if you want to run queries or formulate queries questions, this is an sparkie. If you're wondering how long this list is, I think I have 24 slides or something. We're on 16. Yeah, 23. So you will survive. I hope. We have cloud services, it's an open stack based hosting environment because we also have a lot of developers who want to run bots, for example, to automatically reverts vandalism when it's detected and things like this. So, of course, there's also some rules and some of the server is gone, but there's some restrictions. What what what you're not allowed to do, especially when it comes to privacy policies and restrictions. But you do have access. To some, some of the data that you can play with. And there are currently, I think, over OK, I forgot to show the data dumps one mobile up over 1000 tools we're hosting here. And if the things that you're but you're allowed to do with tools are not enough, we also have full cloud VPs, which, of course, has a bit more rest
rictions. So you have to argument why Tool Forge is not sufficient for the things you want to do if you want to cloud VPs for projects. And we also have about more than one hundred sixty one. I'm going to check like. Commons questions. Microphone. So who will use those services? I couldn't hear. Could you speak up? Well, use the cloud services. Is it only like who uses those services? Who uses them? Yeah. And and yeah, that's a good question. Actually, many people in the community rely on the services provided on to forge so any developer can create a project on to it. And many of the tools we have are actually quite popular and common. For example, one tool is for showing you all your latest contributions, page ads across all the Wikimedia wikis. I think we have about 900 wikis nowadays, and that's not functionality you would have in media wiki isolated on on one Web site, services across all wikis, global contributions, I think it's called. And that's linked, I think, on all wikis from the bottom of your list of contributions only for that one wiki. That's, for example, one one pretty popular tool. We have so many things that that people have come up with to run them on to forge or on cloud VPs. Also, there was another one I like which shows an open street map map in the window and then an overlay because of a GPS data of items on wiki data who do not have photos yet. So that's another thing on I think it's on to forge, which is pretty nice when I realize in my area, oh, this thing still needs a photo. Maybe I can upload that to Wikimedia Commons and then in Wiki data, connect that item to a photo on Wikimedia Commons, for example, such things. Does that answer your question? OK, Daniel. So the general idea how this was born is really there were people, you know, editors on Wikipedia, users of Wikipedia who had ideas of what tools they needed, but they didn't have a place to. And they were actually, you know, developers, they could write them, but they didn't have
a place to put them and didn't have the access they would need for for these tools. And so just this platform was created where you could just basically ask for an account and you get basically webspace into place to to put your your stuff and provide the tool you want to basically many people create these tools for their own use and then just share it with with their with the other editors on Wikipedia and other sorts of projects. Then there are bots which are pretty similar, like they're also automated tools that can revert vandalism or common spell checking mistakes, a difference to a tool simplified as a tool runs on itself all the time on the server, while the bots you usually manually started yourself on your own machine, usually to simplify this. And the most common one and also framework we have is called Pi Wiki Bot written in Python, which already provides you a lot of functionality, like, for example, handling logging in so you don't have to reinvent the wheel. So this page also, for example, has introductions how to write about yourself or developing on on Palicki box yourself a framework itself if you're into Python. Something totally different, machine learning, unfortunately. I mean, it's already gone. He was here till till yesterday evening. He's pretty much into that. Um, if you're into that area, we're also currently experimenting with that, which basically means having artificial intelligence to identify how likely an edit is vandalism and things like these to revert that at it. That's, for example, one approach to. With external desktop applications, that's another thing, but this requires already a bit to know how Wikimedia works because they do some things you don't want to do, either via the API yourself or via the Web browser user interface, because it's too much work. So Hagel is an anti vandalism tool which runs on your desktop and lets you also automatic or semi automatically revert to changes. And there are similar other projects like Au
toWeek, your browser, Wikipedia cleaner. As you can see, people are free to come up with things they want or create them in parallel. And I guess sometimes the case is also that you're not necessarily aware of all of the stuff happening and the developer areas. So sometimes you even work on a project for two or three weeks. And so you realize, oh, there is already something similar. So in such a large plays like the Wikimedia movement, it's hard to keep an overview of everything going on. And this is while I probably shouldn't say for completeness, because it can never come up with a complete list. There's more you can play with a lot of statistics, apart from the data dumps I mentioned earlier, creating visualizations as possible with the stuff. We have an analytics. We also have tutorials if that interests you, browser testing, QR automation, apart from the continuous integration we have when new patches are proposed and gavitt, which is Jenkins', we also run browser tests, selenium, not just on the lower level and the stack. We do have data centers. We do have servers. We host ourselves. So there are a configuration and maintenance of that. This is mostly pie from some Pappert some shell scripts. Um, there's no real reference or page I could come up with, so I put the one about puppet coding here. But on this level, on the network level and our stack, the best place you can find is yet another wiki, which is called Wiki Techdirt, Wikipedia dot org, which is for the really technical parts and documenting servers and stuff like that. And we have lately, when it comes to changes in the architecture, more and more servers based, not just based architecture, also things like sea tide, Cytori. I've never tried to pronounce it Cytori. It I guess for for locating citation data on that thyroid, converting my input into my formal plus avgi or into bitmap images, things like these. It's a pretty big, beautiful, complex system. I hope I could cover maybe something that might
interest you if you're interested in checking out or playing with things and Wikimedia as the last slide. These are the basic links. I usually nowadays try or give out a generic how to contribute page, which also covers non-technical areas, because sometimes we we also have new contributors or people who are interested in contributing and at some point realize they might not yet have the skills or might have different interests and then wander off into other areas. We have like translation or design or documentation or maybe even handling or triaging of incoming bug reports or things like this, or communicating technical changes to a non-technical communities. All of these things happen like tech ambassadors. We have, for example, for this on volunteers. We have a page for recommended software projects. They are developed by people who actively said, yeah, I'm very happy to mentor new contributors to help them on boards, to help getting started. So there's only a few of whoops, only a few of them listed here, but it's the most condensed shortest page we could come up with for a new developer to join explaining the basics here and then listing some of the software projects that we would recommend to start with. And some of them I already mentioned and some of them I haven't here. And as the last one, most of the pages for each project, but I've shown should lists and I should link in theory to the issue tracking system, which very often is that Wikimedia fabricator instance that I showed earlier, but sometimes can also be GitHub or something like that, or depends entirely on the developers and maintainers. But we try to collect at least some links on a page called Good First Bugs. But this is more for self learners. So it is a list of tasks of things somebody should do at some point, which shouldn't be too complicated for somebody who just started or wants to get started. But it's not that you do not have a dedicated mentor or something. You're more on your own. So
that's called good first bugs on media wiki work. And yeah, we also have some links here to some of the projects or at least areas that we have like multimedia or the entire search stock front end and back end and all these things. Why is meteorically core listed here? Oh, well, anyway, that's nothing I would recommend. That's it so far. Please, I hope you have found some ideas or interest, or at least that it was interesting, informative or just confusing how complex this entire wiki universe is when it comes to technical things on many levels. And I'm happy to answer discuss anything that comes to your mind when it comes to our software development process, I guess. Feel free to go ahead. Commons questions, ideas, criticism. I will say no one has an answer. Applause for oh. Thank you.