Wikimedia Research/Showcase: Difference between revisions

Content deleted Content added
Pablo (WMF) (talk | contribs)
No edit summary
KGordon (WMF) (talk | contribs)
No edit summary
 
(39 intermediate revisions by 6 users not shown)
Line 1:
The '''Monthly Wikimedia Research Showcase''' is a public showcase of recent research by the Wikimedia Foundation's [[Wikimedia Research|Research Team]] and guest presenters from the academic community. The showcase is hosted at the Wikimedia Foundationvirtually '''every 3rd Wednesday of the month at 9:30 a.m. Pacific Time/18:30 p.m. CET''' and is [https://www.youtube.com/playlist?list=PLhV3K_DS5YfLQLgwU3oDFiGaU3K7pUVoW '''live-streamed on YouTube''']. The schedule may change, see the calendar below for a list of confirmed showcases.
{{toclimit|limit=3}}
==How to attend==
Line 5:
 
=Upcoming Events=
=== {{Ym|2024|6}} ===
No Research Showcase due to Wiki Workshop. [https://pretix.eu/wikimedia/wikiworkshop2024/ Register here]
 
=== {{Ym|20232024|117}} ===
 
;Time
: Wednesday, July 24, 16.:30 UTC: Find your local time [https://zonestamp.toolforge.org/16976466411721838600 here]
;Theme: Machine Translation on Wikipedia
;Theme: Bibliometrics
=Archive=
{{/Event
For information about past research showcases (2013-present), you can search below or see [https://www.mediawiki.org/wiki/Special:PrefixIndex?prefix=Wikimedia+Research%2FShowcase%2FArchive&namespace=0&stripprefix=1 listing of all months here].
| date = November 15, 2023
<inputbox>
| youtube-url = https://www.youtube.com/watch?v=IxNa6vgMCDY
type=fulltext
| talk1-title = Contextualizing the bibliographic references of Wikipedia
width=25
| talk1-presenter = Wenceslao Arroyo-Machado, Universidad de Granada
searchbuttonlabel=Search
| talk1-abstract = This study aims to enhance the value of bibliographic references in Wikipedia articles by moving beyond just citation counts and exploiting Wikipedia article features and engagement metrics, like page views and talks, to enrich the context of references and deepen the understanding of the relationship between science and society.
break=no
:::*Papers:
prefix=Wikimedia_Research/Showcase/Archive
::::Arroyo-Machado, W., Torres-Salinas, D., & Costas, R. (2022). Wikinformetrics: Construction and description of an open Wikipedia knowledge graph data set for informetric purposes. Quantitative Science Studies, 1-22. https://doi.org/10.1162/qss_a_00226
</inputbox>
::::Arroyo-Machado, W., Díaz-Faes, A. A., Herrera-Viedma, E., & Costas, R. (2023). From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?. arXiv preprint arXiv:2307.05366. https://doi.org/10.48550/arXiv.2307.05366
::::Arroyo-Machado, W., & Costas, R. (2023, April). Do popular research topics attract the most social attention? A first proposal based on OpenAlex and Wikipedia. In 27th International Conference on Science, Technology and Innovation Indicators (STI 2023). https://doi.org/10.55835/6442bb04903ef57acd6dab9e
 
== 2024 ==
| talk2-title = Gender and country biases in Wikipedia citations to scholarly publications
=== {{Ym|2024|5}} ===
| talk2-presenter = Chaoqun Ni, University of Wisconsin-Madison
| talk2-abstract =Ensuring Wikipedia cites scholarly publications based on quality and relevancy without biases is critical to credible and fair knowledge dissemination. We investigate gender- and country-based biases in Wikipedia citation practices using linked data from the Web of Science and a Wikipedia citation dataset. Using coarsened exact matching, we show that publications by women are cited less by Wikipedia than expected, and publications by women are less likely to be cited than those by men. Scholarly publications by authors affiliated with non-Anglosphere countries are also disadvantaged in getting cited by Wikipedia, compared with those by authors affiliated with Anglosphere countries. The level of gender- or country-based inequalities varies by research field, and the gender-country intersectional bias is prominent in math-intensive STEM fields. To ensure the credibility and equality of knowledge presentation, Wikipedia should consider strategies and guidelines to cite scholarly publications independent of the gender and country of authors.
:::*Paperː Zheng, X., Chen, J., Yan, E., & Ni, C. (2023). Gender and country biases in Wikipedia citations to scholarly publications. Journal of the Association for Information Science and Technology, 74(2), 219-233. https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/asi.24723
}}
 
;Time
 
: Wednesday, May 15, 16:30 UTC: Find your local time [https://zonestamp.toolforge.org/1715790600 here]
=== {{Ym|2023|12}} ===
;Theme: Reader to Editor Pipeline
{{:Wikimedia_Research/Showcase/Event
| date = May 15, 2024
| youtube-url=https://www.youtube.com/watch?v=G-8CbpcwGV8
|talk1-presenter=Mike Raish and Daisy Chen
|talk2-presenter=Morten Warncke-Wang and Kirsten Stoller
|talk1-title=Journey Transitions|talk1-abstract=What kinds of events do readers and editors identify as separating the stages of their relationship with Wikipedia, and which of these kinds of events might the Wikimedia Foundation possibly support through design interventions? In the Journey Transitions qualitative research project, the WMF Design Research team interviewed readers and editors in Arabic, Spanish, and English in order to answer these questions and provide guidance to WMF Product teams making strategic decisions. A series of semi-structured interviews revealed that readers and editors describe their relationships with Wikipedia in different ways, with readers describing a static and transactional relationship, and that even many experienced editors express confusion about core functions of the Wikimedia ecosystem, such as the role of Talk pages. This presentation will describe the Journey Transitions research, as well as present its implications for the sponsoring Product teams in order to shed light on the way that qualitative research is used to inform strategic decisions in the Wikimedia Foundation.
:::* Project: [[Journey transitions]]
|talk2-title=Increasing participation in peer production communities with the Growth features|talk2-abstract=For peer production communities to be sustainable, they must attract and retain new contributors. Studies have identified social and technical barriers to entry and discovered some potential solutions, but these solutions have typically focused on a single highly successful community, the English Wikipedia, been tested in isolation, and rarely evaluated through controlled experiments. In this talk, we show how the Wikimedia Foundation’s Growth team collaborates with Wikipedia communities to develop and experiment with new features to improve the newcomer experience in Wikipedia. We report findings from a large-scale controlled experiment using the Newcomer Homepage, a central place where newcomers can learn how peer production works and find opportunities to contribute, and show how the effectiveness depends on the newcomer’s context. Lastly, we show how the Growth team has continued developing features that further improve the newcomer experience while adapting to community needs.
:::* Paper: https://arxiv.org/abs/2308.09642}}
=== {{Ym|2024|4}} ===
 
;Time
: Wednesday, April 17., 16:30 UTC: Find your local time [https://zonestamp.toolforge.org/17024022001713371400 here]
;Theme: Supporting Multimedia on Wikipedia
;Theme: A year of Generative AI: future directions for Wikimedia
{{:Wikimedia_Research/Showcase/Event
| date = DecemberApril 1217, 20232024
| talk1-presenter = Elisa Kreiss
| youtube-url = https://www.youtube.com/watch?v=UnAsD7-hZpo
| talk2-presenter=Daniel Nkemelu
| talk1-title=Towards image accessibility solutions grounded in communicative principles
| talk1-abstract=Images have become an omnipresent communicative tool -- and this is no exception on Wikipedia. However, the undeniable benefits they carry for sighted communicators turns into a serious accessibility challenge for people who are blind or have low vision (BLV). BLV users often have to rely on textual descriptions of those images to equally participate in an ever-increasing image-dominated online lifestyle. In this talk, I will present how framing accessibility as a communication problem highlights important ways forward in redefining image accessibility on Wikipedia. I will present the Wikipedia-based dataset Concadia and use it to discuss the successes and shortcomings of image captions and alt texts for accessibility, and how the usefulness of accessibility descriptions is fundamentally contextual. I will conclude by highlighting the potential and risks of AI-based solutions and discussing implications for different Wikipedia editing communities.
:::* Code: https://github.com/elisakreiss/concadia
:::* Paper: https://arxiv.org/abs/2104.08376
| youtube-url=https://www.youtube.com/watch?v=wpSQD9Bc8Ek
| talk2-title=Automatic Multi-Path Web Story Creation from a Structural Article
| talk2-abstract=Web articles such as Wikipedia serve as one of the major sources of knowledge dissemination and online learning. However, their in-depth information--often in a dense text format--may not be suitable for mobile browsing, even in a responsive user interface. We propose an automatic approach that converts a structured article of any length into a set of interactive Web Stories that are ideal for mobile experiences. We focused on Wikipedia articles and developed Wiki2Story, a pipeline based on language and layout models, to demonstrate the concept. Wiki2Story dynamically slices an article and plans one to multiple Story paths according to the document hierarchy. For each slice, it generates a multi-page summary Story composed of text and image pairs in visually appealing layouts. We derived design principles from an analysis of manually created Story practices. We executed our pipeline on 500 Wikipedia documents and conducted user studies to review selected outputs. Results showed that Wiki2Story effectively captured and presented salient content from the original articles and sparked interest in viewers.
:::* Paper: https://arxiv.org/abs/2310.02383
}}
=== {{Ym|2024|3}} ===
 
;Time
=Archive=
: Wednesday, March 20, 16:30 UTC: Find your local time [https://zonestamp.toolforge.org/1710952200 here]
;Theme: Addressing Gender Gaps
{{:Wikimedia_Research/Showcase/Event
| date = Wednesday, March 20, 2023
| youtube-url =https://www.youtube.com/live/D6wrr9WShTk?si=Lo7CT1K81EGkv11i
| talk1-presenter = Mo Houtti
| talk2-presenter=Nicole Schwitter
| talk2-title=Bridging the offline and online- Offline meetings of Wikipedians
| talk2-abstract=Wikipedia is primarily known as an online encyclopaedia, but it also features a noteworthy offline component: Wikipedia and particularly its German-language edition – which is one of the largest and most active language versions – is characterised by regular local offline meetups which give editors the chance to get to know each other. This talk will present the recently published dewiki meetup dataset which covers (almost) all offline gatherings organised on the German-language version of Wikipedia. The dataset covers almost 20 years of offline activity of the German-language Wikipedia, containing 4418 meetups that have been organised with information on attendees, apologies, date and place of meeting, and minutes recorded. The talk will explain how the dataset can be used for research, highlight the importance of considering offline meetings among Wikipedians, and place these insights within the context of addressing gender gaps within Wikipedia.
:::* Paper: https://link.springer.com/article/10.1007/s42001-023-00225-8
| talk1-title=Leveraging Recommender Systems to Reduce Content Gaps on Wikipedia
| talk1-abstract=Many Wikipedians use algorithmic recommender systems to help them find interesting articles to edit. The algorithms underlying those systems are driven by a straightforward assumption: we can look at what someone edited in the past to figure out what they’ll most likely want to edit next. But the story of what Wikipedians want to edit is almost definitely more complex than that. For example, our own prior research shows that Wikipedians prefer prioritizing articles that would minimize content gaps. So, we asked, what would happen if we incorporated that value into Wikipedians’ personalized recommendations? Through a controlled experiment on SuggestBot, we found that recommending more content gap articles didn’t significantly impact editing, despite those articles being less “optimally interesting” according to the recommendation algorithm. In this presentation, I will describe our experiment, our results, and their implications - including how recommender systems can be one useful strategy for tackling content gaps on Wikipedia.
:::* Paper: https://arxiv.org/abs/2307.08669
|talk2-slides=https://commons.wikimedia.org/wiki/File:March_2024_Research_Showcase_Slides_(offline_meetings_of_Wikipedians).pdf}}
=== {{Ym|2024|2}} ===
 
;Time
== 2023 ==
: Wednesday, February 21, 16:30 UTC: Find your local time [https://zonestamp.toolforge.org/1708533000 here]
;Theme: Platform Governance and Policies
{{:Wikimedia_Research/Showcase/Event
| date = Wednesday, February 21, 2023
| youtube-url =https://www.youtube.com/live/Q1xYwRw1rHU?si=zuY42MxbdCuVeHph
| talk1-presenter = Amy X. Zhang, University of Washington
|talk1-title=Sociotechnical Designs for Democratic and Pluralistic Governance of Social Media and AI|talk1-abstract=Decisions about policies when using widely-deployed technologies, including social media and more recently, generative AI, are often made in a centralized and top-down fashion. Yet these systems are used by millions of people, with a diverse set of preferences and norms. Who gets to decide what are the rules, and what should the procedures be for deciding them---and must we all abide by the same ones? In this talk, I draw on theories and lessons from offline governance to reimagine how sociotechnical systems could be designed to provide greater agency and voice to everyday users and communities. This includes the design and development of: 1) personal moderation and curation controls that are usable and understandable to laypeople, 2) tools for authoring and carrying out governance to suit a community's needs and values, and 3) decision-making workflows for large-scale democratic alignment that are legitimate and consistent.}}
 
=== {{Ym|20232024|101}} ===
 
;Time
;Theme: Data Privacy
: Wednesday, January 17, 17:30 UTC: Find your local time [https://zonestamp.toolforge.org/1705512600 here]
{{/Event
;Theme: Connecting Actions with Policy
| date = October 18, 2023
{{:Wikimedia_Research/Showcase/Event
| youtube-url = https://www.youtube.com/watch?v=ntgRsMaDlsw
| date = January 17, 2023
| talk1-title = Wikipedia Reader Navigation<nowiki>:</nowiki> When Synthetic Data Is Enough
| youtube-url = https://www.youtube.com/live/UUuC6Q1SIoM?si=Ui_arT9nCi2zHNDk
| talk1-presenter = Akhil Arora, EPFL
| talk1-presenter = Amber Berson and Monika Sengul-Jones
| talk1-abstract = Every day millions of people read Wikipedia. When navigating the vast space of available topics using hyperlinks, readers describe trajectories on the article network. Understanding these navigation patterns is crucial to better serve readers’ needs and address structural biases and knowledge gaps. However, systematic studies of navigation on Wikipedia are hindered by a lack of publicly available data due to the commitment to protect readers' privacy by not storing or sharing potentially sensitive data. In this paper, we ask: How well can Wikipedia readers' navigation be approximated by using publicly available resources, most notably the Wikipedia [https://wikinav.toolforge.org clickstream data]? We systematically quantify the differences between real navigation sequences and synthetic sequences generated from the clickstream data, in 6 analyses across 8 Wikipedia language versions. Overall, we find that the differences between real and synthetic sequences are statistically significant, but with small effect sizes, often well below 10%. This constitutes quantitative evidence for the utility of the Wikipedia clickstream data as a public resource: clickstream data can closely capture reader navigation on Wikipedia and provides a sufficient approximation for most practical downstream applications relying on reader data. More broadly, this study provides an example for how clickstream-like data can generally enable research on user navigation on online platforms while protecting users’ privacy.
| talk1-title = Presenting the report "Unreliable Guidelines"
:::* Paper: Akhil Arora, Martin Gerlach, Tiziano Piccardi, Alberto García-Durán, Robert West. 2022. [https://arxiv.org/abs/2201.00812 Wikipedia Reader Navigation: When Synthetic Data Is Enough]. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM '22). https://doi.org/10.1145/3488560.3498496
| talk1-abstract = The goal behind the report Unreliable Guidelines: Reliable Sources and Marginalized Communities in French, English and Spanish Wikipedias was to understand the effects of the set of reliable source guidelines and rules on the participation of and the content about marginalized communities on three Wikipedias. Two years following the release of their report, researchers Berson and Sengul-Jones reflect on the impact of their research as well as the actionable next steps.
| talk2-title = How to tell the world about data you cannot show them<nowiki>:</nowiki> Differential privacy at the Wikimedia Foundation
:::* Paper: https://artandfeminism.org/resources/research/unreliable-guidelines/
| talk2-presenter = Hal Triedman, Wikimedia Foundation
| talk2-presenter=Lucie-Aimée Kaffee and Arnav Arora|talk2-title=Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions
| talk2-abstract = The Wikimedia Foundation (WMF), by virtue of its centrality on the internet, collects lots of data about platform activities. Some of that data is made public (e.g. global daily pageviews); other data types are not shared (or are pseudonymized prior to sharing), largely due to privacy concerns. Differential privacy is a statistical definition of privacy that has gained prominence in academia, but is still an emerging technology in industry. In this talk, I share the story of how we put differential privacy into production at the WMF, through looking at the case study of geolocated daily pageview counts.
| talk2-abstract=The moderation of content on online platforms is usually non-transparent. On Wikipedia, however, this discussion is carried out publicly and the editors are encouraged to use the content moderation policies as explanations for making moderation decisions. However, currently only a few comments explicitly mention those policies. To aid in this process of understanding how content is moderated, we construct a novel multilingual dataset of Wikipedia editor discussions along with their reasoning in three languages. We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process.
:::* Paper: Temilola Adeleye, Skye Berghel, Damien Desfontaines, Michael Hay, Isaac Johnson, Cléo Lemoisson, Ashwin Machanavajjhala, Tom Magerlein, Gabriele Modena, David Pujol, Daniel Simmons-Marengo, Hal Triedman. 2023. [https://arxiv.org/abs/2308.16298 Publishing Wikipedia usage data with strong privacy guarantees]. Theory and Practice of Differential Privacy (TPDP) 2023. https://doi.org/10.48550/arXiv.2308.16298
:::* Paper: Kaffee, Lucie-Aimée, Arnav Arora, and Isabelle Augenstein. [https://arxiv.org/abs/2310.05779 Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions]. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.
:::* Meta page: https://meta.wikimedia.org/wiki/Differential_privacy
|talk2-slides=https://commons.wikimedia.org/wiki/File:Research_Showcase_Differential_Privacy.pdf}}
 
=== {{Ym|2023|9}} ===
 
;Theme
: Rules on Wikipedia
{{/Event
| date = September 20, 2023
| youtube-url = https://www.youtube.com/watch?v=h89l9JWZBCU
| talk1-title = Wikipedia Community Policies and Experiential Epistemology: Critical Information Literacy, Social Justice, and Inclusive Practices
| talk1-presenter = Zachary J. McDowell, University of Illinois at Chicago and Matthew Vetter, Indiana University of Pennsylvania
| talk1-abstract =Drawing from a meta-analysis of research on learning outcomes in Wikipedia-based education, this presentation addresses Wikipedia community policies and practices through the Framework for Information Literacy in Higher Education from the Association of College and Research Libraries’ (ACRL). Wikipedia-based educational practices, which promote newcomers’ active engagement in the encyclopedia, have been shown to support experiential learnings in critical information literacy, communication and research outcomes, and social justice. Exploring the connections between participation in Wikipedia and transferable skills for information literacy in the context of the current new media landscape, this presentation grapples with new questions for the future of information literacies alongside the implications of large language models (LLMs), systemic biases, and the representation and inclusion of non-western and indigenous knowledge sources.
:::*Papers:
 
::::McDowell, Z. J., & Vetter, M. A. (2022). Wikipedia as Open Educational Practice: Experiential Learning, Critical Information Literacy, and Social Justice. ''Social Media + Society, 8''(1). https://doi.org/10.1177/20563051221078224
 
::::McDowell, Z. J., & Vetter, M. A. (2020). It Takes a Village to Combat a Fake News Army: Wikipedia’s Community and Policies for Information Literacy. ''Social Media + Society, 6''(3). https://doi.org/10.1177/2056305120937309
 
::::McDowell, Z., & Vetter, M. (2022). Fast “Truths” and Slow Knowledge; Oracular Answers and Wikipedia’s Epistemology. ''Fast Capitalism, 19''(1). https://doi.org/10.32855/fcapital.202201.009
 
:::*Book: McDowell, Z.J. & Vetter, M.A. ''Wikipedia and the representation of reality''. Routledge, 2021. https://doi.org/10.4324/9781003094081
| talk2-title = Variation and overlap in the peer production of community rules: the case of five Wikipedias
| talk2-presenter = Sohyeon Hwang, Northwestern University
| talk2-abstract =In this talk, I present work analyzing the rules and rule-making on Wikipedia. The governance of many online communities relies on rules created by participants. However, work predominantly focuses on efforts within a single community or on a platform as a whole. Here we investigate the comparative and relational dimensions of online self-governance in a set of similar communities by looking at the five largest language editions of Wikipedia. Using exhaustive trace data spanning almost 20 years since their founding, we examine patterns in rule-making and overlaps in rule sets. Our findings show that language editions have similar trajectories of rule-making activity, replicating and extending a rich body of work that have focused on English-language Wikipedia alone. We also find that the language editions have increasingly unique rule sets, even as editing activity concentrates on rules shared between them. The results suggest that self-governing communities aligned in key ways may share a common core of rules and rule-making practices even as they develop and sustain institutional variations.
:::* Paper: Hwang, S., & Shaw, A. (2022, May). Rules and Rule-Making in the Five Largest Wikipedias. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 16, pp. 347-357). https://ojs.aaai.org/index.php/ICWSM/article/view/19297
|talk1-slides=File:McDowell_%26_Vetter_Sept._2023_WMF_Research_Showcase.pdf|talk2-slides=File:Wikimedia_Research_Showcase_September_2023_-_Rules_-_Sohyeon%27s_Slides.pdf}}
 
=== {{Ym|2023|8}} ===
No Showcase due to [[wmania:2023:Wikimania|Wikimania]]
 
=== {{ym|2023|7}} ===
;Time: 16.30 UTC: Find your local time [https://zonestamp.toolforge.org/1689784227 here]
;Theme: Improving knowledge integrity in Wikimedia projects
{{/Event
| date = July 19, 2023
| youtube-url = https://www.youtube.com/watch?v=_8DevIsi44s
| talk1-title = Assessment of Reference Quality on Wikipedia
| talk1-presenter = Aitolkyn Baigutanova, KAIST
| talk1-abstract =In this talk, I will present our research on the reliability of Wikipedia through the lens of its references. I will primarily discuss our paper on the longitudinal assessment of reference quality on English Wikipedia, where we operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references. I will share our research findings on two key aspects: (1) the evolution of reference quality over a 10-year period and (2) factors that affect reference quality. We discover that the RN score has dropped by 20 percent point, with more than half of verifiable statements now accompanying references. The RR score has remained below 1% over the years as a result of the efforts of the community to eliminate unreliable references. As an extension of this work, we explore how community initiatives, such as the perennial source list, help with maintaining reference quality across multiple language editions of Wikipedia. We hope our work encourages more active discussions within Wikipedia communities to improve reference quality of the content.
:::*Paper: Aitolkyn Baigutanova, Jaehyeon Myung, Diego Saez-Trumper, Ai-Jou Chou, Miriam Redi, Changwook Jung, and Meeyoung Cha. 2023. Longitudinal Assessment of Reference Quality on Wikipedia. In Proceedings of the ACM Web Conference 2023 (WWW '23). Association for Computing Machinery, New York, NY, USA, 2831–2839. https://doi.org/10.1145/3543507.3583218
| talk2-title = Multilingual approaches to support knowledge integrity in Wikipedia
| talk2-presenter = Diego Saez-Trumper & Pablo Aragón, Wikimedia Foundation
| talk2-abstract =Knowledge integrity in Wikipedia is key to ensure the quality and reliability of information. For that reason, editors devote a substantial amount of their time in patrolling tasks in order to detect low-quality or misleading content. In this talk we will cover recent multilingual approaches to support knowledge integrity. First, we will present a novel design of a system aimed at assisting the Wikipedia communities in addressing vandalism. This system was built by collecting a massive dataset of multiple languages and then applying advanced filtering and feature engineering techniques, including multilingual masked language modeling to build the training dataset from human-generated data. Second, we will showcase the Wikipedia Knowledge Integrity Risk Observatory, a dashboard that relies on a language-agnostic version of the former system to monitor high risk content in hundreds of Wikipedia language editions. We will conclude with a discussion of different challenges to be addressed in future work.
:::* Papers:
::::Trokhymovych, M., Aslam, M., Chou, A. J., Baeza-Yates, R., & Saez-Trumper, D. (2023). Fair multilingual vandalism detection system for Wikipedia. arXiv e-prints, arXiv-2306. https://arxiv.org/pdf/2306.01650.pdf
::::Aragón, P., & Sáez-Trumper, D. (2021). A preliminary approach to knowledge integrity risk assessment in Wikipedia projects. arXiv preprint arXiv:2106.15940. https://arxiv.org/abs/2106.15940
:::* Slides: https://figshare.com/articles/presentation/Multilingual_approaches_to_support_knowledge_integrity_in_Wikipedia_-_Wikimedia_Research_Showcase_-_July_2023/23716152
}}
 
=== {{ym|2023|6}} ===
;Time: 16.30 UTC: Find your local time [https://zonestamp.toolforge.org/1687365012 here]
;Theme: Wikimedia and LGBTQIA+
{{/Event
| date = June 21, 2023
| youtube-url = https://www.youtube.com/watch?v=AOD2ZdxRNfo
| talk1-title = Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia
| talk1-presenter = Chan Park, Carnegie Mellon University
| talk1-abstract = Abstract: In this talk, I present our research on analyzing the portrayal of LGBT individuals in their biographies on Wikipedia, with a particular focus on subtle word connotations and cross-cultural comparisons. We aim to address two primary research questions: 1) How can we effectively measure the nuanced connotations of words in multilingual texts, which reflect sentiments, power dynamics, and agency? 2) How can we analyze the portrayal of a specific group, such as the LGBT community, and compare these portrayals across different languages? To answer these questions, we collect the Multilingual Contextualized Connotation Frames dataset, comprising 2,700 examples in English, Spanish, and Russian. We also develop a new multilingual model based on pre-trained multilingual language models. Additionally, we devise a matching algorithm to construct a comparison corpus for the target corpus, isolating the attribute of interest. Finally, we showcase how our developed models and constructed corpora enable us to conduct cross-cultural analysis of LGBT People Portrayals on Wikipedia. Our results reveal systematic differences in how the LGBT community is portrayed across languages, surfacing cultural differences in narratives and signs of social biases.
:::*Paperː [https://arxiv.org/pdf/2010.10820.pdf Park, C. Y., Yan, X., Field, A., & Tsvetkov, Y. (2021, May). Multilingual contextual affective analysis of LGBT people portrayals in Wikipedia. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 15, pp. 479-490).]
:::*Slidesː https://doi.org/10.6084/m9.figshare.23589054.v1
| talk2-title = How do you represent my gender? Challenges and opportunities from the Wikidata Gender Diversity project
| talk2-presenter = Daniele Metilli, University College London
| talk2-abstract = Abstract: Wikidata Gender Diversity (WiGeDi) is a one-year project funded through the Wikimedia Research Fund. The project is studying gender diversity in Wikidata, focusing on marginalized gender identities such as those of trans and non-binary people, and adopting a queer and intersectional feminist perspective. The project is organised in three strands — model, data, and community. First, we are looking at how the current Wikidata ontology model represents gender, and the extent to which this representation is inclusive of marginalized gender identities. We are analysing the data stored in the knowledge base to gather insights and identify possible gaps and biases. Finally, we are looking at how the community has handled the move towards the inclusion of a wider spectrum of gender identities by studying a corpus of user discussions through computational linguistics methods. This presentation will report on the current status of the Wikidata Gender Diversity project and the envisioned outcomes. We will discuss the main challenges that we are facing and the opportunities that our project will potentially enable, on Wikidata and beyond.
:::*Paperː [https://wigedi.com/chapter.pdf Metilli D. & Paolini C. (in press). ‘Non-binary gender representation in Wikidata’. In: Provo A., Burlingame K. & Watson B.M. Ethics in Linked Data. Litwin Books.]
}}
 
=== {{ym|2023|5}} ===
No Showcase this month. Join us in the 10th edition of [https://wikiworkshop.org/2023/ Wiki Workshop] on May 11th starting 12:00 UTC instead.
 
=== {{ym|2023|4}} ===
;Time: 16.30 UTC: Find your local time [https://zonestamp.toolforge.org/1681921857 here]
;Theme: Images on Wikipedia
{{/Event
| date = April 19, 2023
| youtube-url = https://youtube.com/live/vW0waU-QArU?feature=share
| talk1-title = A large scale study of reader interactions with images on Wikipedia
| talk1-presenter = Daniele Rama, University of Turin
| talk1-abstract = Wikipedia is the largest source of free encyclopedic knowledge and one of the most visited sites on the Web. To increase reader understanding of the article, Wikipedia editors add images within the text of the article’s body. However, despite their widespread usage on web platforms and the huge volume of visual content on Wikipedia, little is known about the importance of images in the context of free knowledge environments. To bridge this gap, we collect data about English Wikipedia reader interactions with images during one month and perform the first large-scale analysis of how interactions with images happen on Wikipedia. First, we quantify the overall engagement with images, finding that one in 29 pageviews results in a click on at least one image, one order of magnitude higher than interactions with other types of article content. Second, we study what factors associate with image engagement and observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people. Third, we look at interactions with Wikipedia article previews and find that images help support reader information need when navigating through the site, especially for more popular pages. The findings in this study deepen our understanding of the role of images for free knowledge and provide a guide for Wikipedia editors and web user communities to enrich the world’s largest source of encyclopedic knowledge.
:::*Paperː https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-021-00312-8
| talk2-title = Visual gender biases in Wikipediaː A systematic evaluation across the ten most spoken languages
| talk2-presenter = Pablo Beytia, Catholic University of Chile
| talk2-abstract = The existing research suggests a significant gender gap in Wikipedia biographical articles, with a minimal representation of women and gender asymmetries in the textual content. However, the visual aspects of this gap (e.g., image volume and quality) have received little attention. This study examined asymmetries between women's and men's biographies, exploring written and visual content across the ten most widely spoken languages. The cross-lingual analysis reveals that (1) the most salient male biases appear when editors select which personalities should have a Wikipedia page, (2) the trends in written and visual content are dissimilar, (3) male biographies tend to have more images across languages, and (4) female biographies have better visual quality on average. The open database of this study provides eight indicators of gender asymmetries in ten occupational domains and ten languages. That information allows for a granular view of gender biases, as well as exploring more macroscopic phenomena, such as the similarity between Wikipedia versions according to their gender bias structures.
:::*Papersː
::::Beytía, P., Agarwal, P., Redi, M., &amp; Singh, V. K. (2022). Visual Gender Biases in Wikipedia: A Systematic Evaluation across the Ten Most Spoken Languages. Proceedings of the International AAAI Conference on Web and Social Media, 16(1), 43-54. https://doi.org/10.1609/icwsm.v16i1.19271
::::https://ojs.aaai.org/index.php/ICWSM/article/view/19271
::::Beytía, P. &amp; Wagner, C. (2022). Visibility layers: a framework for systematizing the gender gap in Wikipedia content. Internet Policy Review, 11(1). https://doi.org/10.14763/2022.1.1621
::::https://policyreview.info/articles/analysis/visibility-layers-framework-systematising-gender-gap-wikipedia-content
:::*Slidesː https://commons.wikimedia.org/w/index.php?title=File:Visual_Gender_Biases_in_Wikipedia.pdf
}}
 
=== {{ym|2023|3}} ===
;Time: 9:30am PDT / 12:30pm EDT / 16.30 UTC: Find your local time [https://zonestamp.toolforge.org/1678897840 here]
;Theme: Gender and Equity
{{/Event
| date = March 15, 2023
| youtube-url = https://www.youtube.com/watch?v=lw4MzJgDIzo
| talk1-title = Men Are elected, women are marriedː events gender bias on Wikipedia
| talk1-presenter = Jiao Sun, University of Southern California
| talk1-abstract = Abstract: Human activities can be seen as sequences of events, which are crucial to understanding societies. Disproportional event distribution for different demographic groups can manifest and amplify social stereotypes, and potentially jeopardize the ability of members in some groups to pursue certain goals. Our study discovers that Wikipedia pages tend to intermingle personal life events with professional events for females but not for males, which calls for the awareness of the Wikipedia community to formalize guidelines and train the editors to mind the implicit biases that contributors carry.
:::*Paperː [https://aclanthology.org/2021.acl-short.45.pdf Sun, J. & Peng, N. (2021). Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Conference on Natural Language Processing, 350-360.]
| talk2-title = Twitter reacts to absence of women on Wikipediaː a mixed-methods analysis of #VisibleWikiWomen campaign
| talk2-presenter = Sneh Gupta, Guru Gobind Singh Indraprastha University
| talk2-abstract = Digital gender divide (DGD) is visible in access, participation, representation, and biases against women embedded in Wikipedia, the largest digital reservoir of co-created content. This article examined the content of #VisibleWikiWomen, a global digital advocacy campaign aimed at encouraging inclusion of women voices in the global technology conversation and improving digital sustainability of feminist data on Wikipedia. In a mixed-methods study, Sentiment Analysis followed by a Feminist Critical Discourse Analysis of the campaign tweets reveals how digital gender divide manifested in the public response. An overwhelming majority of tweets expressed positive sentiment towards the objective of the campaign. An inductive reading of the coded tweets (n = 1067) generated five themes: Feminist Activism, Invisibility & Marginalization of Women, Technology for Women Empowerment, Gendered Knowledge Inequity, and Power Dynamics in the Digital Sphere. Twitter discourse presented many agitated digital users calling out the epistemic injustice on Wikipedia that goes beyond the invisibility of women. Their tweets reveal that they want an equal social platform inclusive of women of color and varied identities currently absent in the Wikipedia universe. Extracting ideas, values, and themes from new media campaigns holds unparalleled potential in the diffusion of interventions and messages on a larger scale.
:::*Paperː [https://www.researchgate.net/publication/356909618_Twitter_reacts_to_absence_of_women_on_Wikipedia_a_mixed-methods_analysis_of_VisibleWikiWomen_campaign Gupta, S., & Trehan, K. (2022). Twitter reacts to absence of women on Wikipedia: a mixed-methods analysis of #VisibleWikiWomen campaign. Media Asia, 49(2), 130-154.]
}}
 
=== {{ym|2023|2}} ===
;Time: 9:30am PDT / 12:30pm EDT / 17ː30 UTC Find your local time [https://zonestamp.toolforge.org/1676482256 here]
;Theme: The Free Knowledge Ecosystem
{{/Event
| date = February 15, 2023
| youtube-url = https://www.youtube.com/watch?v=8VJmR-3lTac
| talk1-title = The evolution of humanitarian mapping in OpenStreetMap (OSM) and how it affects map completeness and inequalities in OSM
| talk1-presenter = Benjamin Herfort, Heidelberg Institute for Geoinformation Technology
| talk1-abstract = Mapping efforts of communities in OpenStreetMap (OSM) over the previous decade have created a unique global geographic database, which is accessible to all with no licensing costs. The collaborative maps of OSM have been used to support humanitarian efforts around the world as well as to fill important data gaps for implementing major development frameworks such as the Sustainable Development Goals (SDGs). Besides the well-examined Global North - Global South bias in OSM, the OSM data as of 2023 shows a much more spatially diverse spread pattern than previously considered, which was shaped by regional, socio-economic and demographic factors across several scales. Humanitarian mapping efforts of the previous decade have already made OSM more inclusive, contributing to diversify and expand the spatial footprint of the areas mapped. However, methods to quantify and account for the remaining biases in OSM’s coverage are needed so that researchers and practitioners will be able to draw the right conclusions, e .g. about progress towards the SDGs in cities.
:::*Slidesː https://figshare.com/articles/presentation/The_evolution_of_humanitarian_mapping_in_OpenStreetMap_OSM_and_how_it_affects_map_completeness_and_inequalities_in_OSM/22101728
| talk2-title = Dataset reuseː Toward translating principles to practice
| talk2-presenter = Laura Koesten, University of Vienna
| talk2-abstract = The web provides access to millions of datasets. These data can have additional impact when used beyond the context for which they were originally created. But using a dataset beyond the context in which it originated remains challenging. Simply making data available does not mean it will be or can be easily used by others. At the same time, we have little empirical insight into what makes a dataset reusable and which of the existing guidelines and frameworks have an impact.In this talk, I will discuss our research on what makes data reusable in practice. This is informed by a synthesis of literature on the topic, our studies on how people evaluate and make sense of data, and a case study on datasets on GitHub. In the case study, we describe a corpus of more than 1.4 million data files from over 65,000 repositories. Building on reuse features from the literature, we use GitHub’s engagement metrics as proxies for dataset reuse and devise an initial model, using deep neural networks, to predict a dataset’s reusability. This demonstrates the practical gap between principles and actionable insights that might allow data publishers and tool designers to implement functionalities that facilitate reuse.
:::*Papersː [https://www.sciencedirect.com/science/article/pii/S1071581920301646 Talking datasets – Understanding data sensemaking behaviours], [https://www.sciencedirect.com/science/article/pii/S2666389920301847 Dataset Reuse: Toward Translating Principles to Practice]
}}
 
=== {{ym|2023|1}} ===
;Time: 9:30am PDT / 12:30pm EDT Find your local time [https://zonestamp.toolforge.org/1674063059 here]
;Theme: Editor Retention
{{/Event
| date = January 18, 2023
| youtube-url = https://www.youtube.com/watch?v=gS8ELcVZ8Q4
|
| talk1-title = Learning to Predict the Departure Dynamics of Wikidata Editors
| talk1-presenter = Guangyuan Piao, Maynooth University
| talk1-abstract = Wikidata as one of the largest open collaborative knowledge bases has drawn much attention from researchers and practitioners since its launch in 2012. As it is collaboratively developed and maintained by a community of a great number of volunteer editors, understanding and predicting the departure dynamics of those editors are crucial but have not been studied extensively in previous works. In this paper, we investigate the synergistic effect of two different types of features: statistical and pattern-based ones with DeepFM as our classification model which has not been explored in a similar context and problem for predicting whether a Wikidata editor will stay or leave the platform. Our experimental results show that using the two sets of features with DeepFM provides the best performance regarding AUROC (0.9561) and F1 score (0.8843), and achieves substantial improvement compared to using either of the sets of features and over a wide range of baselines.
:::*Paperː [https://parklize.github.io/publications/ISWC2021.pdf Learning to Predict the Departure Dynamics of Wikidata Editors]
:::*Slidesː https://figshare.com/articles/presentation/Learning_to_Predict_the_Departure_Dynamics_of_Wikidata_Editors/21922146
}}
 
== 2022 ==
=== {{ym|2022|12}} ===
;Time: 9:30am PDT / 12:30pm EDT Find your local time [https://zonestamp.toolforge.org/1671039024 here]
;Theme: A year in review from the WMF Research teamː Tying our work to the research community
{{/Event
| date = December 14, 2022
| youtube-url = https://www.youtube.com/watch?v=a0ss9ckUlvQ
| talk1-title = Research as a service
| talk1-presenter = [https://research.wikimedia.org/team.html The WMF Research team]
| talk1-abstract = The Wikimedia Research community is key to tackling the many strategic challenges of the Wikimedia movement. As we are ending the year, the Research team will reflect on why working with the community is important to us. We will share the initiatives, tools, and resources developed throughout 2022 to bring the community together, facilitate researchers’ contributions to the Wikimedia projects, and encourage a diversity of research questions.
:::*slides [https://figshare.com/articles/presentation/A_year_in_review_from_the_WMF_Research_team_Tying_our_work_to_the_research_community_-_Wikimedia_Research_Showcase_-_December_2022/21719402 figshare]
}}
 
=== {{ym|2022|11}} ===
;Time: 9:30am PDT / 12:30pm EDT View your local time [https://zonestamp.toolforge.org/1668619830 here]
;Theme: Libraries and Wikimedia knowledge
{{/Event
| date = November 16, 2022
| youtube-url = https://www.youtube.com/watch?v=sFanZoHjUnY
| talk1-title = [https://en.wikisource.org/wiki/Wikipedia_and_Academic_Libraries:_A_Global_Project Wikipedia and Academic Libraries]
| talk1-presenter = Laurie Bridges (Oregon State University)
| talk1-abstract = In 2021 an open-access edited book, Wikipedia and Academic Libraries: A Global Project, was published, featuring 20 chapters from over 50 authors (https://doi.org/10.3998/mpub.11778416). In this presentation, Laurie Bridges, one of the co-editors, will discuss the process for creating and publishing an OA-edited book. Michael David Miller, one of the chapter authors, will discuss his chapter about contributions to local Québécois LGBTQ+ content in Francophone Wikipedia.
| talk3-title = Ethical Considerations of Including Gender Information in Open Knowledge Platforms
| talk2-title = [https://en.wikisource.org/wiki/Wikipedia_and_Academic_Libraries:_A_Global_Project/Chapter_7#Chapter_7:_WP:Cat%C3%A9gorie_Is_._._._Liaison_Librarian_Contribution_to_Local_Qu%C3%A9b%C3%A9cois_LGBTQ+_Content_in_Francophone_Wikipedia Liaison Librarian Contribution to Local Quebecois LGBTQ+ Content in Francophone Wikipedia]
| talk2-presenter = Michael David Miller (McGill University)
| talk2-abstract =
| talk3-presenter = Nerissa Lindsey (San Diego State University)
| talk3-abstract = In recent years, galleries, libraries, archives, and museums (GLAMs) have sought to leverage open knowledge platforms such as Wikidata to highlight or provide more visibility for traditionally marginalized groups and their work, collections, or contributions. Efforts like Art + Feminism, local edit-a-thons, and, more recently, GLAM institution-led projects have promoted open knowledge initiatives to a broader audience of participants. One such open knowledge project, the Program for Cooperative Cataloging (PCC) Wikidata Pilot, has brought together over seventy GLAM organizations to contribute linked open data for individuals associated with their institutions, collections, or archives. However, these projects have brought up ethical concerns around including potentially sensitive personal demographic information, such as gender identity, sexual orientation, race, and ethnicity, in entries in an open knowledge base about living persons. GLAM institutions are thus in a position of balancing open access with ethical cataloging, which should include adhering to the personal preferences of the individuals whose data is being shared. People working in libraries and archives have been increasingly focusing their energies on issues of diversity, equity, and inclusion in their descriptive practices, including remediating legacy data and addressing biased language. Moving this work into a more public sphere and scaling up in volume creates potential risks to the individuals being described. While adding demographic information on living people to open knowledge bases has the potential to enhance, highlight, and celebrate diversity, it could also potentially be used to the detriment of the subjects through surveillance and targeting activities. In our research we investigated the changing role of metadata and open knowledge in addressing, or not addressing, issues of under- and misrepresentation, especially as they pertain to gender identity as described in the sex or gender property in Wikidata. We reported our findings from a survey investigating how organizations participating in open knowledge projects are addressing ethical concerns around including personal demographic information as part of their projects, including what, if any, policies they have implemented and what implications these activities may have for the living people being described.
:::* Related paper: [https://kula.uvic.ca/index.php/kula/article/view/228 Ethical Considerations of Including Gender Information in Open Knowledge Platforms], KULA ([https://kula.uvic.ca/index.php/kula/article/view/228/457 pdf])
:::* Slidesː https://doi.org/10.6084/m9.figshare.21574686.v1
}}
 
=== {{ym|2022|10}} ===
;Time: 9:30am PDT / 12:30pm EDT / 16ː30 UTC Find your local time [https://zonestamp.toolforge.org/1666197004 here]
;Theme: Panel discussion celebrating Wikidata's 10th birthdayǃ
{{/Event
| date = October 19, 2022
| youtube-url = https://www.youtube.com/watch?v=ML-ULyARpU4
| talk1-title =
| talk1-presenter = Denny Vrandečić (WMF) with panelists Lydia Pintscher (WMDE), Elena Simperl (King's College London), Katherine Thornton (Yale), and Markus Krötzsch (Technical University of Dresden).
| talk1-abstract = October 2022 marks the tenth anniversary of the launch of Wikidata (www.wikidata.org). In ten years, this project has become the largest community-driven free knowledge graph in the world, enabling a common knowledge base for Wikimedia projects. The language-independent nature of Wikidata has greatly improved the maintenance and consistency of knowledge across Wikipedia language editions, fostering knowledge equity in Wikimedia. In addition, since Wikidata is a collaborative project that can be read and edited by humans and machines alike, it is also widely used in third-party applications delivering knowledge as a service for all. The Wikimedia Research community has devoted significant effort and resources in studying the foundations, capabilities and applications of Wikidata, from the complex requirements of representing real-world knowledge in a multilingual environment to the needs to assess the quality of data and sources in Wikidata. To learn more about the state of the art of Wikidata and research challenges in the era of AI/ML, we will celebrate this tenth anniversary with a panel that will bring together established researchers/practitioners in this field.
}}
 
=== {{ym|2022|9}} ===
No Showcase this month. The [https://research.wikimedia.org/team.html Research team] will meet for an in-person offsite in Prague September 19-22. We are very excited that we finally can see/meet each other in person after almost 3 years of not being able to meet. If you are in Prague in that period, feel free to ping. We would be happy to catch up in-person with you if we can align schedules. Otherwise, see you all in October.
=== {{ym|2022|8}} ===
No Showcase due to [https://wikimania.wikimedia.org/wiki/Wikimania Wikimania]ǃ
=== {{ym|2022|7}} ===
;Time: 9:30am PDT / 12:30pm EDT/ 18:30pm CEST View your local time [https://zonestamp.toolforge.org/1658334607 here]
;Theme: 2022 Wikimedia Foundation Research of the Year Award Winnersǃ
{{/Event
| date = July 20, 2022
| youtube-url = https://www.youtube.com/watch?v=KMvXOQU5fX4
| talk1-title = Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
| talk1-presenter = Krishna Srinivasan (Google)
| talk1-abstract = The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information across image and text modalities. In this talk, I introduce the Wikipedia-based Image Text (WIT) Dataset to better facilitate multimodal, multilingual learning. WIT is composed of a curated set of 37.5 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages.
 
WIT’s unique advantages include:
WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing).
WIT is massively multilingual (first of its kind) with coverage over 100+ languages.
WIT represents a more diverse set of concepts and real world entities relative to what previous datasets cover.
 
WIT Dataset is available for download and use via a Creative Commons license here: https://github.com/google-research-datasets/wit.
 
I conclude the talk with future directions to expand and extend the WIT dataset.
Link to paperː https://arxiv.org/pdf/2103.01913.pdf
| talk2-title = Assessing the Quality of Sources in Wikidata Across Languages
| talk2-presenter = Gabriel Amaral (King's College London)
| talk2-abstract = Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers. As a secondary source, its contents must be backed by credible references; this is particularly important as Wikidata explicitly encourages editors to add claims for which there is no broad consensus, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata’s ability to systematically assess and assure the quality of its references remains limited. To this end, we carry out a mixed-methods study to determine the relevance, ease of access, and authoritativeness of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics, and machine learning. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web. Link to paperː https://dl.acm.org/doi/abs/10.1145/3484828 Link to slidesː https://figshare.com/articles/presentation/Wikimedia_Research_Showcase_Assessing_the_quality_of_sources_in_Wikidata_across_languages/20384322
}}
 
=== {{ym|2022|6}} ===
;Time: {{zonestamp|2022-06-15 11:00|June 15, 11:00 UTC}} (4:00am PDT / 7:00am EDT/ 13:00pm CEST)
;Theme: Wikipedia's languages.
{{/Event
| date = June 15, 2022
| youtube-url = https://www.youtube.com/watch?v=AZQM1dtn3g0
| commons-file =
| talk1-title = Quantifying knowledge synchronisation in the 21st century
| talk1-presenter = Jisung Yoon (Pohang University of Science and Technology)
| talk1-slides =
| talk1-abstract = Humans acquire and accumulate knowledge through language usage and eagerly exchange their knowledge for advancement. Although geographical barriers had previously limited communication, the emergence of information technology has opened new avenues for knowledge exchange. However, it is unclear which communication pathway is dominant in the 21st century. Here, we explore the dominant path of knowledge diffusion in the 21st century using Wikipedia, the largest communal dataset. We evaluate the similarity of shared knowledge between population groups, distinguished based on their language usage. When population groups are more engaged with each other, their knowledge structure is more similar, where engagement is indicated by socio-economic connections, such as cultural, linguistic, and historical features. Moreover, geographical proximity is no longer a critical requirement for knowledge dissemination. Furthermore, we integrate our data into a mechanistic model to better understand the underlying mechanism and suggest that the knowledge "Silk Road" of the 21st century is based online.
Relevant links: [https://arxiv.org/abs/2202.01466 paper (preprint)], [https://figshare.com/articles/presentation/Quantifying_knowledge_synchronisation_in_the_21st_century_-_Wikimedia_Research_Showcase_-_June_2022/20306481 slides]
 
| talk2-title = The Language Geography of Wikipedia
| talk2-presenter = Martin Dittus
| talk2-slides =
| talk2-abstract = Every language is a system of being, doing, knowing, and imagining. With over 7,000 active languages in the world, how many languages are fully represented online? To answer this question, digital non-profit Whose Knowledge? initiated the first ever report on the State of the Internet's Languages. As part of this report, Martin Dittus and Mark Graham have investigated the languages of Wikipedia. Wikipedia began with a single English-language edition more than two decades ago, and now offers more than 300 language editions, which places it at the forefront of digital language support. However, this does not mean that speakers of these languages get access to the same content: Wikipedia’s language editions vary widely in scale. We further find that this inequality is also reflected in Wikipedia’s geographic coverage: not all places are captured in every language. Wikipedia's coverage often follows the global distribution of speakers of the respective language. Yet even when we account for the distribution of language populations, certain language communities are much more strongly represented on Wikipedia than others. As a consequence, we find that for many countries in Africa, Central and South America, and South Asia, most of the content about those countries is in a foreign language, often a European-colonial language. In other words, in many of these places, people may need to be able to speak a second (possibly foreign) language in order to access Wikipedia information about their own places. Why do we see these differences? And what can be done to improve things?
Relevant links: [https://internetlanguages.org/en/numbers/wikipedia-language-geography/ The Language Geography of Wikipedia], [https://internetlanguages.org/en/ State of the Internet's Languages Report], [https://commons.wikimedia.org/wiki/File:Wikimedia_Research_Showcase_June_2022_STIL_Slide_Deck.pdf Slides]
}}
 
=== {{ym|2022|5}} ===
;Time: {{zonestamp|2022-05-18 16:30|May 18, 16:30 UTC}} (9:30am PDT/ 12:30pm EDT/ 18:30pm CEST)
;Theme: Gaps and Biases in Wikipedia.
{{/Event
| date = May 18, 2022
| youtube-url = https://www.youtube.com/watch?v=Q8FlunZ0mH4
| commons-file =
| talk1-title = Ms. Categorized: Gender, notability, and inequality on Wikipedia
| talk1-presenter = [https://ftripodi.com/ Francesca Tripodi] (University of North Carolina at Chapel Hill)
| talk1-slides =
| talk1-abstract = For the last five decades, sociologists have argued that gender is one of the most pervasive and insidious forms of inequality. Research demonstrates how these inequalities persist on Wikipedia - arguably the largest encyclopedic reference in existence. Roughly eighty percent of Wikipedia's editors are men and pages about women and women's interests are underrepresented. English language Wikipedia contains more than 1.5 million biographies about notable writers, inventors, and academics, but less than nineteen percent of these biographies are about women. To try and improve these statistics, activists host “edit-a-thons” to increase the visibility of notable women. While this strategy helps create several biographies previously inexistent, it fails to address a more inconspicuous form of gender exclusion. Drawing on ethnographic observations, interviews, and quantitative analysis of web-scraped metadata this talk demonstrates that women’s biographies are more frequently considered non-notable and nominated for deletion compared to men’s biographies. This disproportionate rate is another dimension of gender inequality on Wikipedia previously unexplored by social scientists and provides broader insights into how women’s achievements are (under)valued in society.
Relevant paperː [https://journals.sagepub.com/doi/10.1177/14614448211023772 Ms. Categorized: Gender, notability, and inequality on Wikipedia - Francesca Tripodi, 2021 (sagepub.com)]
| talk2-title = Controlled Analyses of Social Biases in Wikipedia Bios
| talk2-presenter = [https://homes.cs.washington.edu/~yuliats/ Yulia Tsvetkov] (University of Washington)
| talk2-slides =
| talk2-abstract = Social biases on Wikipedia could greatly influence public opinion. Wikipedia is also a popular source of training data for NLP models, and subtle biases in Wikipedia narratives are liable to be amplified in downstream NLP models. In this talk I'll present two approaches to unveiling social biases in how people are described on Wikipedia, across demographic attributes and across languages. First, I'll present a methodology that isolates dimensions of interest (e.g., gender), from other attributes (e.g., occupation). This methodology allows us to quantify systemic differences in coverage of different genders and races, while controlling for confounding factors. Next, I'll show an NLP case study that uses this methodology in combination with people-centric sentiment analysis to identify disparities in Wikipedia bios of members of the LGBTQIA+ community across three languages: English, Russian, and Spanish. Our results surface cultural differences in narratives and signs of social biases. Practically, these methods can be used to automatically identify Wikipedia articles for further manual analysis—articles that might contain content gaps or an imbalanced representation of particular social groups.
Relevant papers: [https://arxiv.org/pdf/2101.00078.pdf TheWebConf'22], [https://arxiv.org/pdf/2010.10820.pdf ICWSM'21]
}}
 
=== {{ym|2022|4}} ===
''No showcase'' this month. See you in [https://wikiworkshop.org/2022/ Wiki Workshop 2022] and [[metawiki:Wiki-M3L|Wiki-M3L]].
 
=== {{ym|2022|3}} ===
;Theme: Patterns and dynamics of article quality
{{/Event
| date = March 16, 2022
| youtube-url = https://www.youtube.com/watch?v=o5e6S7ac4q4
| commons-file =
| talk1-title = Quality monitoring in Wikipedia - A computational perspective
| talk1-presenter = [https://cse.iitkgp.ac.in/~animeshm/ Animesh Mukherjee] (Indian Institute of Technology, Kharagpur)
| talk1-slides =
| talk1-abstract = In this talk, I shall summarize our five-year long research highlights concerning Wikipedia. In particular, I shall deep dive into two of our recent works; while the first one attempts to understand the early indications of which editors would soon go "missing" (aka missing editors) [1], the second one investigates how the quality of a Wikipedia article transitions over time and whether computational models could be built to understand the characteristics of future transitions [2]. In each case, I will present a suite of key results and the main insights that we obtained thereof.
:::* [1] [https://link.springer.com/chapter/10.1007/978-3-030-91669-5_23 When expertise gone missing: Uncovering the loss of prolific contributors in Wikipedia], ICADL 2021 ([https://arxiv.org/pdf/2109.09979 pdf])
:::* [2] [https://arxiv.org/abs/2111.01496 Quality Change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia], CSCW 2022 ([https://arxiv.org/pdf/2111.01496 pdf])
:::* Slides on [https://figshare.com/articles/presentation/Quality_monitoring_in_Wikipedia_A_computational_perspective_-_Wikimedia_Research_Showcase_-_March_2022/19382390 figshare]
| talk2-title = Automatically Labeling Low Quality Content on Wikipedia by Leveraging Editing Behaviors
| talk2-presenter = [http://sumitasthana.xyz/ Sumit Asthana] (University of Michigan, Ann Arbor)
| talk2-slides = File:WMF research showcase automatically label low quality content.pdf
| talk2-abstract = Wikipedia articles aim to be definitive sources of encyclopedic content. Yet, only 0.6% of Wikipedia articles have high quality according to its quality scale due to insufficient number of Wikipedia editors and enormous number of articles. Supervised Machine Learning (ML) quality improvement approaches that can automatically identify and fix content issues rely on manual labels of individual Wikipedia sentence quality. However, current labeling approaches are tedious and produce noisy labels. In this talk, I will discuss an automated labeling approach that identifies the semantic category (e.g., adding citations, clarifications) of historic Wikipedia edits and uses the modified sentences prior to the edit as examples that require that semantic improvement. Highest-rated article sentences are examples that no longer need semantic improvements. I will discuss the performance of models training with this labeling approach over models trained with existing labeling approaches, and also the implications of such a large scale semi supervised labeling approach in capturing the editing practices of Wikipedia editors and helping them improve articles faster.
:::* Related paper: [https://dl.acm.org/doi/10.1145/3479503 Automatically Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing Behaviors], CSCW 2021 ([https://arxiv.org/pdf/2108.02252 pdf])
}}
 
=== {{ym|2022|2}} ===
;Theme: Collective Attention in Wikipedia
{{/Event
| date = February 16, 2022
| youtube-url = https://www.youtube.com/watch?v=bg2aE2m08Qo
| commons-file =
 
| talk1-title = Modeling Collective Anticipation and Response on Wikipedia
| talk1-presenter = [https://www.maths.ox.ac.uk/people/renaud.lambiotte Renaud Lambiotte] (University of Oxford)
| talk1-slides =
| talk1-abstract = The dynamics of popularity in online media are driven by a combination of endogenous spreading mechanisms and response to exogenous shocks including news and events. However, little is known about the dependence of temporal patterns of popularity on event-related information, e.g. which types of events trigger long-lasting activity. Here we propose a simple model that describes the dynamics around peaks of popularity by incorporating key features, i.e., the anticipatory growth and the decay of collective attention together with circadian rhythms. The proposed model allows us to develop a new method for predicting the future page view activity and for clustering time series. To validate our methodology, we collect a corpus of page view data from Wikipedia associated to a range of planned events, that are events which we know in advance will have a fixed date in the future, such as elections and sport events. Our methodology is superior to existing models in both prediction and clustering tasks. Furthermore, restricting to Wikipedia pages associated to association football, we observe that the specific realization of the event, in our case which team wins a match or the type of the match, has a significant effect on the response dynamics after the event. Our work demonstrates the importance of appropriately modeling all phases of collective attention, as well as the connection between temporal patterns of attention and characteristic underlying information of the events they represent.
:::*Related paper: [https://ojs.aaai.org/index.php/ICWSM/article/view/18063 Modeling Collective Anticipation and Response on Wikipedia], ICWSM 2021 ([https://ojs.aaai.org/index.php/ICWSM/article/view/18063/17866 pdf])
:::*Slides on [https://figshare.com/articles/presentation/Modeling_Collective_Anticipation_and_Response_on_Wikipedia_-_Wikimedia_Research_Showcase_-_February_2022/19187873 figshare]
| talk2-title = Sudden Attention Shifts on Wikipedia During the COVID-19 Crisis
| talk2-presenter = [https://kristinagligoric.github.io/ Kristina Gligorić] (EPFL)
| talk2-slides =
| talk2-abstract = We study how the COVID-19 pandemic, alongside the severe mobility restrictions that ensued, has impacted information access on Wikipedia, the world’s largest online encyclopedia. A longitudinal analysis that combines pageview statistics for 12 Wikipedia language editions with mobility reports published by Apple and Google reveals massive shifts in the volume and nature of information seeking patterns during the pandemic. Interestingly, while we observe a transient increase in Wikipedia’s pageview volume following mobility restrictions, the nature of information sought was impacted more permanently. These changes are most pronounced for language editions associated with countries where the most severe mobility restrictions were implemented. We also find that articles belonging to different topics behaved differently; e.g., attention towards entertainment-related topics is lingering and even increasing, while the interest in health- and biology-related topics was either small or transient. Our results highlight the utility of Wikipedia for studying how the pandemic is affecting people’s needs, interests, and concerns.
:::*Related paper: [https://ojs.aaai.org/index.php/ICWSM/article/view/18054 Sudden Attention Shifts on Wikipedia During the COVID-19 Crisis], ICWSM 2021 ([https://ojs.aaai.org/index.php/ICWSM/article/view/18054/17857 pdf])
:::*Slides on [https://figshare.com/articles/presentation/Sudden_Attention_Shifts_on_Wikipedia_During_the_COVID-19_Crisis_-_Wikimedia_Research_Showcase_-_February_2022/19187921 figshare]
}}
 
 
=== {{ym|2022|1}} ===
;Theme: Beyond English Wikipedia
{{/Event
| date = January 19, 2022
| youtube-url = https://www.youtube.com/watch?v=PRaCa-v8nfQ
| commons-file =
 
| talk1-title = Comparing Language Communities - Characterizing Collaboration in the English, French and Spanish Language Editions of Wikipedia
| talk1-presenter = [https://tarynbipat.me Taryn Bipat] (Microsoft, formerly University of Washington)
| talk1-slides =
| talk1-abstract = Is Wikipedia a standardized platform with a common model of collaboration or is it a set of 312 active language editions with distinct collaborative models? In the last 20 years, researchers have extensively analyzed the complexities of group work that enable the creation of quality articles in the English Wikipedia, but most of our intellectual assumptions about collaborative practices on Wikipedia remain solely based on an Anglocentric perspective. This research extends the current Anglocentric body of literature in human-computer interaction (HCI) and computer-supported cooperative work (CSCW) through three studies that mutually help build an understanding of collaboration models in the English (EN), French (FR), and Spanish (ES) editions of Wikipedia. In the first study, I replicated a model by Viégas et al. (2007) based on editors' behaviors in the English Wikipedia. This model was used as a lens to examine collaborative activity in EN, FR, and ES. In the second study, I leveraged a collaboration model by Kriplean et al. (2007) that suggested editors used “power plays” – how groups of editors claim control over article content through the discourse of Wikipedia policy – in their talk page debates to justify their edits made on articles. In the third study, I interviewed editors from each language edition to build a typology of collaborative behavior and further understand the editor's perceptions of power and authority on Wikipedia.
:::*Related papers:
::::* [https://dl.acm.org/doi/10.1145/3449129 Wikipedia Beyond the English Language Edition: How do Editors Collaborate in the Farsi and Chinese Wikipedias?], CSCW 2021 ([https://dl.acm.org/doi/pdf/10.1145/3449129 pdf])
::::* [https://dl.acm.org/doi/10.1145/3233391.3233542 Do We All Talk Before We Type?: Understanding Collaboration in Wikipedia Language Editions], OpenSym '18 ([https://www.opensym.org/wp-content/uploads/2018/07/OpenSym2018_paper_14.pdf pdf])
 
| talk2-title = Understanding Wikipedia Practices Through Hindi, Urdu, and English Takes on an Evolving Regional Conflict
| talk2-presenter = [https://jacob.thebault-spieker.com Jacob Thebault-Spieker] (Information School, University of Wisconsin – Madison)
| talk2-slides =
| talk2-abstract = Wikipedia is the product of thousands of editors working collaboratively to provide free and up-to-date encyclopedic information to the project’s users. This article asks to what degree Wikipedia articles in three languages — Hindi, Urdu, and English — achieve Wikipedia’s mission of making neutrally-presented, reliable information on a polarizing, controversial topic available to people around the globe. We chose the topic of the recent revocation of Article 370 of the Constitution of India, which, along with other recent events in and concerning the region of Jammu and Kashmir, has drawn attention to related articles on Wikipedia. This work focuses on the English Wikipedia, being the preeminent language edition of the project, as well as the Hindi and Urdu editions. Hindi and Urdu are the two standardized varieties of Hindustani, a lingua franca of Jammu and Kashmir. We analyzed page view and revision data for three Wikipedia articles to gauge popularity of the pages in our corpus, and responsiveness of editors to breaking news events and problematic edits. Additionally, we interviewed editors from all three language editions to learn about differences in editing processes and motivations, and we compared the text of the articles across languages as they appeared shortly after the revocation of Article 370. Across languages, we saw discrepancies in article tone, organization, and the information presented, as well as differences in how editors collaborate and communicate with one another. Nevertheless, in Hindi and Urdu, as well as English, editors predominantly try to adhere to the principle of neutral point of view (NPOV), and for the most part, the editors quash attempts by other editors to push political agendas.
:::* Related paper: [https://dl.acm.org/doi/10.1145/3392561.3397586 Understanding Wikipedia Practices Through Hindi, Urdu, and English Takes on an Evolving Regional Conflict], CSCW 2021 ([https://jacob.thebault-spieker.com/papers/CSCW21_WikiHue.pdf pdf])
}}
 
== 2021 ==
 
=== {{ym|2021|12}} ===
;Theme: Online Education Landscapes
{{/Event
| date = December 15, 2021
| youtube-url = https://www.youtube.com/watch?v=HKODaHgmQWw
| commons-file =
 
| talk1-title = Latin American Youth and their Information Ecosystem - Finding, Evaluation, Creating, and Sharing Content Online
| talk1-presenter = Lionel Brossi and Ana María Castillo. Artificial Intelligence and Society Hub at University of Chile
| talk1-slides =
| talk1-abstract = The increased importance the Internet plays as a core source of information in youth's lives, now underscored by the pandemic, gives new urgency to the need to better understand young people’s information habits and attitudes. Answers to questions like where young people go to look for information, what information they decide to trust and how they share the information they find, hold important implications for the knowledge they obtain, the beliefs they form and the actions they take in areas ranging from personal health, professional employment or their educational training. In this research showcase, we will be summarizing insights from focus group interviews in Latin America that offer a window into the experiences of young people themselves. Taken together, these perspectives might help us to develop a more comprehensive understanding of how young people in Latin America use the Internet in general and interact with information from online sources in particular.
 
| talk2-title = Characterizing the Online Learning Landscape - What and How People Learn Online
| talk2-presenter = Sean Kross, University of California San Diego
| talk2-slides =
| talk2-abstract = Hundreds of millions of people learn something new online every day. Simultaneously, the study of online education has blossomed with new systems, experiments, and observations creating and exploring previously undiscovered online learning environments. In this talk I will discuss our study, in which we endeavor to characterize this entire landscape of online learning experiences using a national survey of 2260 US adults who are balanced to match the demographics of the U.S. We examine the online learning resources that they consult, and we analyze the subjects that they pursue using those resources. Furthermore, we compare both formal and informal online learning experiences on a larger scale than has ever been done before, to our knowledge, to better understand which subjects people are seeking for intensive study. We find that there is a core set of online learning experiences that are central to other experiences and these are shared among the majority of people who learn online.
:::* Related paper: [https://dl.acm.org/doi/abs/10.1145/3449220 Characterizing the Online Learning Landscape: What and How People Learn Online], CSCW 2021 ([https://dl.acm.org/doi/pdf/10.1145/3449220 pdf])
}}
 
=== {{ym|2021|11}} ===
;Theme: Content moderation
{{/Event
| date = November 17, 2021
| youtube-url = https://www.youtube.com/watch?v=Rx3xesDkp2o
| commons-file =
| talk1-title = Is Deplatforming Censorship? What happened when controversial figures were deplatformed, with philosophical musings on the nature of free speech.
| talk1-presenter = [https://www.cc.gatech.edu/~asb/ Amy S. Bruckman] (Georgia Institute of Technology)
| talk1-slides =
| talk1-abstract = When a controversial figure is deplatformed, what happens to their online influence? In this talk, first, I’ll present results from a study of the deplatforming from Twitter of three figures who repeatedly broke platform rules (Alex Jones, Milo Yiannopoulos, and Owen Benjamin). Second, I’ll discuss what happened when this study was on the front page of Reddit, and the range of angry reactions from people who say that they’re in favor of “free speech.” I’ll explore the nature of free speech, and why our current speech regulation framework is fundamentally broken. Finally, I’ll conclude with thoughts on the strength of Wikipedia’s model in contrast to other platforms, and highlight opportunities for improvement.
:::* Related paper: [https://dl.acm.org/doi/10.1145/3479525 Evaluating the Effectiveness of Deplatforming as a Moderation Strategy on Twitter], CSCW 2021 ([https://dl.acm.org/doi/pdf/10.1145/3479525 pdf])
 
| talk2-title = Effects of Algorithmic Flagging on Fairness. Quasi-experimental Evidence from Wikipedia
| talk2-presenter = [https://teblunthuis.cc/ Nathan TeBlunthuis] (University of Washington / Northwestern University)
| talk2-slides = File:Effects_of_algorithmic_flagging_on_fairness-wikiresearch_11-2021.pdf
| talk2-abstract = Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can lead to "overprofiling" bias when moderators focus on these signals but overlook the misbehavior of others. We propose that algorithmic flagging systems deployed to improve the efficiency of moderation work can also make moderation actions more fair to these users by reducing reliance on social signals and making norm violations by everyone else more visible. We analyze moderator behavior in Wikipedia as mediated by RCFilters, a system which displays social signals and algorithmic flags, and estimate the causal effect of being flagged on moderator actions. We show that algorithmically flagged edits are reverted more often, especially those by established editors with positive social signals, and that flagging decreases the likelihood that moderation actions will be undone. Our results suggest that algorithmic flagging systems can lead to increased fairness in some contexts but that the relationship is complex and contingent.
:::* Related paper: [https://dl.acm.org/doi/10.1145/3449130 Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia], CSCW 2021 ([https://dl.acm.org/doi/pdf/10.1145/3449130 pdf])
}}
 
=== {{ym|2021|10}} ===
;Theme: Bridging knowledge gaps
{{/Event
| date = October 27, 2021
| youtube-url = https://www.youtube.com/watch?v=d0Qg98EVmuI
| commons-file =
| talk1-title = Automatic approaches to bridge knowledge gaps in Wikimedia projects
| talk1-presenter = [https://research.wikimedia.org/team.html WMF Research Team]
| talk1-slides =
| talk1-abstract = In order to advance knowledge equity as part of the [[meta:Strategy/Wikimedia_movement/2018-20|Wikimedia Movement’s 2030 strategic direction]], the Research team at the Wikimedia Foundation has been conducting research to [https://research.wikimedia.org/knowledge-gaps.html “Address Knowledge Gaps” as one of its main programs]. One core component of this program is to develop technologies to bridge knowledge gaps. In this talk, we give an overview on how we approach this task using tools from Machine Learning in four different contexts: section alignment in content translation, link recommendation in structured editing, image recommendation in multimedia knowledge gaps, and the equity of the recommendations themselves. We will present how these models can assist contributors in addressing knowledge gaps. Finally, we will discuss the impact of these models in applications deployed across Wikimedia projects supporting different Product initiatives at the Wikimedia Foundation.
::: More information on the individual projects:
::: * Section alignment: [[meta:Research:Expanding_Wikipedia_articles_across_languages/Inter_language_approach#Section_Alignment]]
::: * Link recommendation: [[meta:Research:Link_recommendation_model_for_add-a-link_structured_task]]
::: * Image recommendation: [[meta:Research:Recommending_Images_to_Wikipedia_Articles]]
::: * Equity in recommendations: [[meta:Research:Prioritization_of_Wikipedia_Articles/Recommendation]]
::: Slide deck:
::: * Slides on [https://figshare.com/articles/presentation/Automatic_approaches_to_bridge_knowledge_gaps_in_Wikimedia_projects_-_Wikimedia_Research_Showcase_-_October_2021/16895356 figshare]
}}
 
=== {{ym|2021|9}} ===
;Theme: Socialization on Wikipedia
{{/Event
| date = September 15, 2021
| youtube-url = https://www.youtube.com/watch?v=YVqabVvLIZU
| commons-file =
| talk1-title = Unlocking the Wikipedia clubhouse to newcomers. Results from two studies.
| talk1-presenter = [http://rosta-farzan.net/index.html Rosta Farzan] (School of Computing and Information, University of Pittsburgh)
| talk1-slides = File:WikimediaShowcaseSep2021.pdf
| talk1-abstract = It is no news to any of us that success of online production communities such as Wikipedia highly relies on a continuous stream of newcomers to replace the inevitable high turnover and to bring on board new sources of ideas and workforce. However, these communities have been struggling with attracting newcomers, especially from a diverse population of users, and further retention of newcomers. In this talk, I will present about two different approaches in engaging new editors in Wikipedia: (1) newcomers joining through the Wiki Ed program, an online program in which college students edit Wikipedia articles as class assignments; (2)newcomers joining through a Wikipedia Art+Feminism edit-a-thon. I present how each approach incorporated techniques in engaging newcomers and how they succeed in attracting and retention of newcomers.
::: * [https://link.springer.com/chapter/10.1007/978-3-319-47880-7_2 Bring on Board New Enthusiasts! A Case Study of Impact of Wikipedia Art + Feminism Edit-A-Thon Events on Newcomers], SocInfo 2016 (pdf [http://saviaga.com/wp-content/uploads/2016/06/socinfo_ediathons.pdf author's copy])
::: * [https://dl.acm.org/doi/abs/10.1145/3392857 Successful Online Socialization: Lessons from the Wikipedia Education Program], CSCW 2020 (pdf [https://www.cc.gatech.edu/~dyang888/docs/cscw_li_2020_wiki.pdf author's copy])
| talk2-title = The Effect of Receiving Appreciation on Wikipedias. A Community Co-Designed Field Experiment.
| talk2-presenter = [https://natematias.com/ J. Nathan Matias] ([http://citizensandtech.org/ Citizens and Technology Lab], Cornell University Departments of Communication and Information Science)
| talk2-slides =
| talk2-abstract = Can saying “thank you” make online communities stronger & more inclusive? Or does thanking others for their voluntary efforts have little effect? To ask this question, the Citizens and Technology Lab (CAT Lab) organized 344 volunteers to send thanks to Wikipedia contributors across the Arabic, German, Polish, and Persian languages. We then observed the behavior of 15,558 newcomers and experienced contributors to Wikipedia. On average, we found that organizing volunteers to thank others increases two-week retention of newcomers and experienced accounts. It also caused people to send more thanks to others. This study was a field experiment, a randomized trial that sent thanks to some people and not to others. These experiments can help answer questions about the impact of community practices and platform design. But they can sometimes face community mistrust, especially when researchers conduct them without community consent. In this talk, learn more about CAT Lab's approach to community-led research and discuss open questions about best practices.
::: * [https://osf.io/ueq5f/ The Diffusion and Influence of Gratitude Expressions in Large-Scale Cooperation: A Field Experiment in Four Knowledge Networks], paper preprint
::: * [https://citizensandtech.org/2020/06/effects-of-saying-thanks-on-wikipedia/ Volunteers Thanked Thousands of Wikipedia Editors to Learn the Effects of Receiving Thanks], blogpost (in EN, DE, AR, PL, FA)
}}
 
=== {{ym|2021|8}} ===
''No showcase'' due to [https://wikimania.wikimedia.org/wiki/Wikimania Wikimania 2021]
 
=== {{ym|2021|7}} ===
;Theme: Effects of campaigns to close content gaps
{{/Event
| date = July 21, 2021
| youtube-url = https://www.youtube.com/watch?v=otN3H-hIImQ
| commons-file =
| talk1-title = Content Growth and Attention Contagion in Information Networks. Addressing Information Poverty on Wikipedia
| talk1-presenter = [https://kaizhu.me/ Kai Zhu] (McGill University, Canada)
| talk1-slides =
| talk1-abstract = Open collaboration platforms have fundamentally changed the way that knowledge is produced, disseminated, and consumed. In these systems, contributions arise organically with little to no central governance. Although such decentralization provides many benefits, a lack of broad oversight and coordination can leave questions of information poverty and skewness to the mercy of the system’s natural dynamics. Unfortunately, we still lack a basic understanding of the dynamics at play in these systems and specifically, how contribution and attention interact and propagate through information networks. We leverage a large-scale natural experiment to study how exogenous content contributions to Wikipedia articles affect the attention that they attract and how that attention spills over to other articles in the network. Results reveal that exogenously added content leads to significant, substantial, and long-term increases in both content consumption and subsequent contributions. Furthermore, we find significant attention spillover to downstream hyperlinked articles. Through both analytical estimation and empirically informed simulation, we evaluate policies to harness this attention contagion to address the problem of information poverty and skewness. We find that harnessing attention contagion can lead to as much as a twofold increase in the total attention flow to clusters of disadvantaged articles. Our findings have important policy implications for open collaboration platforms and information networks.
::: Related papers:
::: * Content Growth and Attention Contagion in Information Networks: Addressing Information Poverty on Wikipedia. Informations Systems Research (2020) ([https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3191128 Link to pdf])
::: * Slides on [https://figshare.com/articles/presentation/Content_Growth_and_Attention_Contagion_in_Information_Networks_Addressing_Information_Poverty_on_Wikipedia_-_Wikimedia_Research_Showcase_-_July_2021/15052116 figshare]
 
| talk2-title = Bridging Wikipedia’s Gender Gap. Quantifying and Assessing the Impact of Two Feminist Interventions
| talk2-presenter = [https://www.asc.upenn.edu/people/graduate-student/isabelle-langrock Isabelle Langrock] (University of Pennsylvania, USA)
| talk2-slides =
| talk2-abstract = Wikipedia has a well-known gender divide affecting its biographical content. This bias not only shapes social perceptions of knowledge, but it can also propagate beyond the platform as its contents are leveraged to correct misinformation, train machine-learning tools, and enhance search engine results. What happens when feminist movements intervene to try to close existing gaps? In this talk, we present a recent study of two popular feminist interventions designed to counteract digital knowledge inequality. Our findings show that the interventions are successful at adding content about women that would otherwise be missing, but they are less successful at addressing several structural biases that limit the visibility of women within Wikipedia. We argue for more granular and cumulative analysis of gender divides in collaborative environments and identify key areas of support that can further aid the feminist movements in closing Wikipedia’s gender gaps.
::: Related papers:
::: * The Gender Divide in Wikipedia: Quantifying and Assessing the Impact of Two Feminist Interventions (2021) ([https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3739176 Link to pdf])
}}
::: * Slides on [https://figshare.com/articles/presentation/Bridging_Wikipedia_s_Gender_Gap_Quantifying_and_Assessing_the_Impact_of_Two_Feminist_Interventions_-_Wikimedia_Research_Showcase_-_July_2021/15059532 figshare]
 
=== {{ym|2021|6}} ===
;Theme: AI model governance
{{/Event
| date = June 23, 2021
| youtube-url = https://www.youtube.com/watch?v=USSBuwebWt4
| commons-file =
| talk1-title = Bridging AI and HCI. Incorporating Human Values into the Development of AI Technologies
| talk1-presenter = [https://haiyizhu.com/ Haiyi Zhu] (Carnegie Mellon University)
| talk1-slides =
| talk1-abstract = The increasing accuracy and falling costs of AI have stimulated the increased use of AI technologies in mainstream user-facing applications and services. However, there is a disconnect between mathematically rigorous AI approaches and the human stakeholders’ needs, motivations, and values, as well as organizational and institutional realities, contexts, and constraints; this disconnect is likely to undermine practical initiatives and may sometimes lead to negative societal impacts. In this presentation, I will discuss my research on incorporating human stakeholders’ values and feedback into the creation process of AI technologies. I will describe a series of projects in the context of the Wikipedia community to illustrate my approach. I hope this presentation will contribute to the rich ongoing conversation concerning bridging HCI and AI and using HCI methods to address AI challenges.
::: * Slides on [https://figshare.com/articles/presentation/Bridging_AI_and_HCI_Incorporating_Human_Values_into_the_Development_of_AI_Technologies_-_Wikimedia_Research_Showcase_-_June_2021/15022422 figshare]
| talk2-title = ML Governance. First Steps
| talk2-presenter = [https://www.mediawiki.org/wiki/User:ACraze_(WMF) Andy Craze] (Wikimedia Foundation, Machine Learning Team)
| talk2-slides =
| talk2-abstract = The WMF Machine Learning team is upgrading the Foundation's infrastructure to support the modern machine learning ecosystem. As part of this work, the team seeks to understand its ethical and legal responsibilities for developing and hosting predictive models within a global context. Drawing from previous WMF research related to ethical & human-centered machine learning, the team wishes to begin a series of conversations to discuss how we can deploy responsible systems that are inclusive to newcomers and non-experts, while upholding our commitment to free and open knowledge.
::: * Slides on [https://figshare.com/articles/presentation/ML_Model_Governance_First_Steps_-_Wikimedia_Research_Showcase_-_June_2021/14838243 figshare]
}}
 
=== {{ym|2021|5}} ===
;Theme: The value and importance of Wikipedia
{{/Event
| date = May 19, 2021
| youtube-url = https://www.youtube.com/watch?v=VoX5rFNzkXs
| commons-file =
| talk1-title = The Importance of Wikipedia to Search Engines and Other Systems
| talk1-presenter = [http://nickmvincent.com/#/ Nick Vincent] (Northwestern University)
| talk1-slides =
| talk1-abstract = A growing body of work has highlighted the important role that Wikipedia’s volunteer-created content plays in helping search engines achieve their core goal of addressing the information needs of hundreds of millions of people. In this talk, I will discuss a recent study looking at how often, and where, Wikipedia links appear in search engine results. In this study, we found that Wikipedia links appeared prominently and frequently in Google, Bing, and DuckDuckGo results, though less often for searches from a mobile device. I will connect this study to past work looking at the value of Wikipedia links to other online platforms, and to ongoing discussions around Wikipedia's value as a training source for modern AI.
::: * Related paper: A Deeper Investigation of the Importance of Wikipedia Links to Search Engine Results. To Appear in CSCW 2021. ([https://nickmvincent.com/static/wikiserp_cscw.pdf Link to pdf])
::: * Slides on [https://figshare.com/articles/presentation/WMF_Showcase_Wikipedia_in_SERPs_Other_Systems_pdf/14636994 figshare]
| talk2-title = On the Value of Wikipedia as a Gateway to the Web
| talk2-presenter = [https://piccardi.me/ Tiziano Piccardi] (EPFL)
| talk2-slides =
| talk2-abstract = By linking to external websites, Wikipedia can act as a gateway to the Web. However, little is known about the amount of traffic generated by Wikipedia's external links. We fill this gap in a detailed analysis of usage logs gathered from Wikipedia users' client devices. We discovered that in one month, English Wikipedia generated 43M clicks to external websites, with the highest click-through rate on the official links listed in the infoboxes. Our analysis highlights that the articles about businesses, educational institutions, and websites show the highest engagement, and for some content, Wikipedia act as a stepping stone to the intended destination. We conclude our analysis by quantifying the hypothetical economic value of the clicks received by external websites. We estimate that the respective website owners would need to pay a total of $7--13 million per month to obtain the same volume of traffic via sponsored search. These findings shed light on Wikipedia's role not only as an important source of information but also as a high-traffic gateway to the broader Web ecosystem.
 
::: Related papers:
::: * On the Value of Wikipedia as a Gateway to the Web. WWW 2021. ([https://arxiv.org/pdf/2102.07385 Link to pdf])
::: * Slides on [https://figshare.com/articles/presentation/On_the_Value_of_Wikipedia_as_a_Gateway_to_the_Web_-_Wikimedia_Research_Showcase_-_May_2021/14687190 figshare]
}}
 
=== {{ym|2021|4}} ===
''No showcase'' due to [https://wikiworkshop.org/2021/ Wiki Workshop 2021]
 
=== {{ym|2021|3}} ===
 
;Theme: Curiosity
{{/Event
| date = March 17, 2021
| youtube-url = https://www.youtube.com/watch?v=jw2s_Y4J2tI
| commons-file =
| talk1-title = The curious human
| talk1-presenter = [https://directory.seas.upenn.edu/danielle-s-bassett/ Danielle S. Bassett] (University of Pennsylvania)
| talk1-slides =
| talk1-abstract = The human mind is curious. It is strange, remarkable, and mystifying; it is eager, probing, questioning. Despite its pervasiveness and its relevance for our well-being, scientific studies of human curiosity that bridge both the organ of curiosity and the object of curiosity remain in their infancy. In this talk, I will integrate historical, philosophical, and psychological perspectives with techniques from applied mathematics and statistical physics to study individual and collective curiosity. In the former, I will evaluate how humans walk on the knowledge network of Wikipedia during unconstrained browsing. In doing so, we will capture idiosyncratic forms of curiosity that span multiple millennia, cultures, languages, and timescales. In the latter, I will consider the fruition of collective curiosity in the building of scientific knowledge as encoded in Wikipedia. Throughout, I will make a case for the position that individual and collective curiosity are both network building processes, providing a connective counterpoint to the common acquisitional account of curiosity in humans.
::: Related papers:
::: * Lydon-Staley, D. M., Zhou, D., Blevins, A. S., Zurn, P., & Bassett, D. S. (2019). Hunters, busybodies, and the knowledge network building associated with curiosity. https://doi.org/10.31234/osf.io/undy4
::: * Ju, H., Zhou, D., Blevins, A. S., Lydon-Staley, D. M., Kaplan, J., Tuma, J. R., & Bassett, D. S. (2020). The network structure of scientific revolutions. http://arxiv.org/abs/2010.08381
}}
 
=== {{ym|2021|2}} ===
;Theme: Censorship
 
{{/Event
| date = February 17, 2021
| youtube-url = https://www.youtube.com/watch?v=z52wPt34rJc
| commons-file =
| talk1-title = Shocking the Crowd - The Effect of Censorship Shocks on Chinese Wikipedia
| talk1-presenter = [http://dromero.org/ Daniel Romero] (University of Michigan)
| talk1-slides =
| talk1-abstract = Collaborative crowdsourcing has become a popular approach to organizing work across the globe. Being global also means being vulnerable to shocks – unforeseen events that disrupt crowds – that originate from any country. In this study, we examine changes in collaborative behavior of editors of Chinese Wikipedia that arise due to the 2005 government censorship in mainland China. Using the exogenous variation in the fraction of editors blocked across different articles due to the censorship, we examine the impact of reduction in group size, which we denote as the shock level, on three collaborative behavior measures: volume of activity, centralization, and conflict. We find that activity and conflict drop on articles that face a shock, whereas centralization increases. The impact of a shock on activity increases with shock level, whereas the impact on centralization and conflict is higher for moderate shock levels than for very small or very high shock levels. These findings provide support for threat rigidity theory – originally introduced in the organizational theory literature – in the context of large-scale collaborative crowds.
::: * [https://ojs.aaai.org/index.php/ICWSM/article/view/14895/14745 paper published at ICWSM 2017]
::: * [https://figshare.com/articles/presentation/Shocking_the_Crowd_The_Effect_of_Censorship_Shocks_on_Chinese_Wikipedia_-_Wikimedia_Research_Showcase_-_February_2021/14060906 slides on figshare]
| talk2-title = Censorship's Effect on Incidental Exposure to Information - Evidence from Wikipedia
| talk2-presenter = [http://www.margaretroberts.net/ Margaret Roberts] (University of California San Diego)
| talk2-slides =
| talk2-abstract = The fast-growing body of research on internet censorship has examined the effects of censoring selective pieces of political information and the unintended consequences of censorship of entertainment. However, we know very little about the broader consequences of coarse censorship or censorship that affects a large array of information such as an entire website or search engine. In this study, we use China’s complete block of Chinese language Wikipedia (zh.wikipedia.org) on May 19, 2015, to disaggregate the effects of coarse censorship on proactive consumption of information—information users seek out—and on incidental consumption of information—information users are not actively seeking but consume when they happen to come across it. We quantify the effects of censorship of Wikipedia not only on proactive information consumption but also on opportunities for exploration and incidental consumption of information. We find that users from mainland China were much more likely to consume information on Wikipedia about politics and history incidentally rather than proactively, suggesting that the effects of censorship on incidental information access may be politically significant.
}}
 
=== {{ym|2021|1}} ===
;Theme: Macro-level organizational analysis of peer production communities
{{/Event
| date = January 20, 2021
| youtube-url = https://www.youtube.com/watch?v=ujd8S82YfmA
| commons-file =
| talk1-title = The importance of thinking big. Convergence, divergence, and interdependence among wikis and peer production communities
| talk1-presenter = Aaron Shaw (Northwestern University)
| talk1-slides =
| talk1-abstract = Designing and governing collaborative, peer production communities can benefit from large-scale, macro-level thinking that focuses on communities as the units of analysis. For example, understanding how and why seemingly comparable communities may follow convergent, divergent, and/or interdependent patterns of behavior can inform more parsimonious theoretical and empirical insights as well as more effective strategic action. This talk gives a sneak peak at research-in-progress by members of the [http://communitydata.science/ Community Data Science Collective] to illustrate these points. In particular, I focus on studies of (1) convergent trends of formalization in several large Wikipedias; (2) divergent editor engagement among three small Wikipedias; and (3) commensal patterns of ecological interdependence across communities. Together, the studies underscore the value and challenges of macro-level organizational analysis of peer production and social computing systems.
}}
 
== 2020 ==
=== {{ym|2020|12}} ===
;Theme: Disinformation and reliability of sources in Wikipedia
 
{{/Event
| date = December 16, 2020
| youtube-url = https://www.youtube.com/watch?v=v9Wcc-TeaEY
| commons-file =
| talk1-title = Quality assessment of Wikipedia and its sources
| talk1-presenter = Włodzimierz Lewoniewski (Poznań University of Economics and Business, Poland)
| talk1-slides =
| talk1-abstract = Information in Wikipedia can be edited in over 300 languages independently. Therefore often the same subject in Wikipedia can be described differently depending on language edition. In order to compare information between them one usually needs to understand each of considered languages. We work on solutions that can help to automate this process. They leverage machine learning and artificial intelligence algorithms. The crucial component, however, is assessment of article quality therefore we need to know how to define and extract different quality measures. This presentation briefly introduces some of the recent activities of Department of Information Systems at Poznań University of Economics and Business related to quality assessment of multilingual content in Wikipedia. In particular, we demonstrate some of the approaches for the reliability assessment of sources in Wikipedia articles. Such solutions can help to enrich various language editions of Wikipedia and other knowledge bases with information of better quality.
:::* Modeling Popularity and Reliability of Sources in Multilingual Wikipedia, https://doi.org/10.3390/info11050263
:::* Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics , https://doi.org/10.3390/computers8030060
:::* Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia, https://doi.org/10.1007/978-3-030-04849-5_53
:::* [https://figshare.com/articles/presentation/Quality_assessment_of_Wikipedia_and_its_sources/13406039 slides on figshare]
| talk2-title = Challenges on fighting Disinformation in Wikipedia: Who has the (ground-)truth?
| talk2-presenter = Diego Saez-Trumper (Research, Wikimedia Foundation)
| talk2-slides = File:Challenges on fighting Disinformation in Wikipedia.pdf
| talk2-abstract = Different from the major social media websites where the fight against disinformation mainly refers to preventing users to massively replicate fake content, fighting disinformation in Wikipedia requires tools that allows editors to apply the content policies of: verifiability, non-original research, and neutral point of view. Moreover, while other platforms try to apply automatic fact checking techniques to verify content, the ground-truth for such verification is done based on Wikipedia, for obvious reasons we can't follow the same pipeline for fact checking content on Wikipedia. In this talk we will explain the ML approach we are developing to build tools to efficiently support wikipedians to discover suspicious content and how we collaborate with external researchers on this task. We will also describe a group of datasets we are preparing to share with the research community in order to produce state-of-the-art algorithms to improve the verifiability of content on Wikipedia.
}}
:::* Online Disinformation and the Role of Wikipedia, https://arxiv.org/abs/1910.12596
 
=== {{ym|2020|11}} ===
;Theme: Interpersonal communication between editors
 
{{/Event
| date = November 18, 2020
| youtube-url = https://www.youtube.com/watch?v=G35OEDJ53bY
| commons-file =
| talk1-title = Talk before you type - Interpersonal communication on Wikipedia
| talk1-presenter = Dr Anna Rader, Research Consultant
| talk1-slides =
| talk1-abstract = Formally, the work of Wikipedia’s community of volunteers is asynchronous and anarchic: around the world, editors labor individually and in disorganized ways on the collective project. Yet this work is also underscored by informal and vibrant interpersonal communication: in the lively exchanges of talk pages and the labor-sharing of editorial networks, anonymous strangers communicate their intentions and coordinate their efforts to maintain the world’s largest online encyclopaedia. This working paper offers an overview of academic research into editors’ communication networks and patterns, with a particular focus on the role of talk pages. It considers four communication dynamics of editor interaction: cooperation, deliberation, conflict and coordination; and reviews key recommendations for enhancing peer-to-peer communication within the Wikipedia community.
:::[https://figshare.com/articles/presentation/Talk_before_you_type_Interpersonal_communication_on_Wikipedia_-_Wikipedia_Research_Showcase_-_November_2020/13289348 slides on figshare]
| talk2-title = All Talk - How Increasing Interpersonal Communication on Wikis May Not Enhance Productivity
| talk2-presenter = Sneha Narayan, Assistant Professor, Carlton College
| talk2-slides =
| talk2-abstract = What role does interpersonal communication play in sustaining production in online collaborative communities? This paper sheds light on that question by examining the impact of a communication feature called "message walls" that allows for faster and more intuitive interpersonal communication in a population of wikis on Wikia. Using panel data from a sample of 275 wiki communities that migrated to message walls and a method inspired by regression discontinuity designs, we analyze these transitions and estimate the impact of the system's introduction. Although the adoption of message walls was associated with increased communication among all editors and newcomers, it had little effect on productivity, and was further associated with a decrease in article contributions from new editors. Our results imply that design changes that make communication easier in a social computing system may not always translate to increased participation along other dimensions.
:::* [https://dl.acm.org/doi/10.1145/3359203 Related paper]
}}
 
=== {{ym|2020|10}} ===
''No Showcase in October.''
 
=== {{ym|2020|9}} ===
;Theme: Knowledge gaps
 
{{/Event
| date = September 23, 2020
| youtube-url = https://www.youtube.com/watch?v=GJDsKPsz64o
| commons-file =
| talk1-title = A first draft of the knowledge gaps taxonomy for Wikimedia projects
| talk1-presenter = [https://research.wikimedia.org/ WMF Research Team]
| talk1-slides = File:%28Research_Showcase%29_The_Knowledge_Gaps_Taxonomy.pdf
| talk1-abstract = In response to [[meta:Strategy/Wikimedia_movement/2018-20|Wikimedia Movement’s 2030 strategic direction]], the [https://research.wikimedia.org/team.html Research team] at the Wikimedia Foundation is developing a framework to understand and measure knowledge gaps. The goal is to capture the multi-dimensional aspect of knowledge gaps and inform long-term decision making. The first milestone was to develop a taxonomy of knowledge gaps which offers a grouping and descriptions of the different Wikimedia knowledge gaps. The first draft of the taxonomy [https://arxiv.org/abs/2008.12314 is now published] and we seek your feedback to improve it. In this talk, we will give an overview over the first draft of the taxonomy of knowledge gaps in Wikimedia projects. Following that, we will host an extended Q&A in which we would like to get your feedback and discuss with you the taxonomy and knowledge gaps more generally.
 
:::* More information: [[meta:Research:Knowledge_Gaps_Index/Taxonomy]]
 
}}
 
=== {{ym|2020|8}} ===
;Theme:Readership and navigation
 
{{/Event
| date = August 19, 2020
| youtube-url = https://www.youtube.com/watch?v=MeUl0zjHdF8&list=PLhV3K_DS5YfLQLgwU3oDFiGaU3K7pUVoW&index=4&t=0s
| commons-file =
| talk1-title = What matters to us most and why? Studying popularity and attention dynamics via Wikipedia navigation data.
| talk1-presenter = [https://tahayasseri.com/ Taha Yasseri] (University College Dublin), [https://www.oii.ox.ac.uk/people/patrick-gildersleve/ Patrick Gildersleve] (Oxford Internet Institute)
| talk1-slides = File:Wikipedia_Viewership_Data_August_2020_Research_Showcase_Slides.pdf
| talk1-abstract = While Wikipedia research was initially focused on editorial behaviour or the content to a great extent, soon researchers realized the value of the navigation data both as a reflection of readers interest and, more generally, as a proxy for behaviour of online information seekers. In this talk we will report on various projects in which we utilized pageview statistics or readers navigation data to study: movies financial success [1], electoral popularity [2], disaster triggered collective attention [3] and collective memory [4], general navigation patterns and article typology [5], and attention patterns in relation to news breakouts.
* [1] Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data. ''PLoS One'' (2013). https://doi.org/10.1371/journal.pone.0071226
* [2] Wikipedia traffic data and electoral prediction: towards theoretically informed models. ''EPJ Data Science'' (2016). https://doi.org/10.1140/epjds/s13688-016-0083-3
* [3] Dynamics and biases of online attention: the case of aircraft crashes. ''Royal Society Open Science'' (2016). https://doi.org/10.1098/rsos.160460
* [4] The memory remains: Understanding collective memory in the digital age. ''Science Advances'' (2018). https://doi.org/10.1126/sciadv.1602368
* [5] Inspiration, captivation, and misdirection: Emergent properties in networks of online navigation. ''Springer'' (2018). https://ora.ox.ac.uk/objects/uuid:73baed3c-d3fe-4200-8e90-2d80b11f21cf
* additional slides from 2nd part of the talk [https://commons.wikimedia.org/wiki/File:Navigation,_Networks,_News,_Wikipedia_%E2%80%93_Patrick_Gildersleve.pdf Navigation, Networks, News, Wikipedia (P. Gildersleve)]
 
| talk2-title = Query for Architecture, Click through Military. Comparing the Roles of Search and Navigation on Wikipedia
| talk2-presenter = [http://dimitardimitrov.info/ Dimitar Dimitrov] (GESIS - Leibniz Institute for the Social Sciences)
| talk2-slides = File:Search-vs-navigation wikimedia showcase.pdf
| talk2-abstract = As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and employ two main metrics, namely (i) searchshare -- the relative amount of views an article received by search --, and (ii) resistance -- the ability of an article to relay traffic to other Wikipedia articles -- to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a "dead end" for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach. The results of this paper constitute a step towards understanding human information seeking behavior on the Web.
* Different Topic, Different Traffic: How Search and Navigation Interplay on Wikipedia. ''Journal of Web Science'' (2019). https://doi.org/10.34962/jws-71
}}
 
 
=== {{ym|2020|7}} ===
;Theme:Medical knowledge on Wikipedia
 
{{/Event
| date = July 15, 2020
| youtube-url = https://www.youtube.com/watch?v=qIV26lWrD9c
| commons-file =
| talk1-title = Wikipedia for health information - Situating Wikipedia as a health information resource
| talk1-presenter = [http://orcid.org/0000-0001-8226-2263 Denise Smith] (McMaster University, Health Sciences Library & Western University, Faculty of Information & Media Studies)
| talk1-slides =File:Wikipedia as a health information resource in various contexts.pdf
| talk1-abstract = Wikipedia is the most frequently accessed web site for health information, but the various ways users engage with Wikipedia’s health content has not been thoroughly investigated or reported. This talk will summarize the findings of a comprehensive literature review published in February. It explores all the contexts in which Wikipedia’s health content is used that have been reported in academic literature. The talk will focus on the findings reported in this paper, the potential impact of this study in health and medical librarianship, the practice of medicine, and medical or health education.
* D.A. Smith (2020). "Situating Wikipedia as a health information resource in various contexts: A scoping review". ''PLoS ONE.'' doi: [https://doi.org/10.1371/journal.pone.0228786 10.1371/journal.pone.0228786]
| talk2-title = COVID-19 research in Wikipedia
| talk2-presenter = [https://www.uva.nl/en/profile/c/o/g.colavizza/g.colavizza.html Giovanni Colavizza] (University of Amsterdam, Netherlands)
| talk2-slides =
| talk2-abstract = Wikipedia is one of the main sources of free knowledge on the Web. During the first few months of the pandemic, over 4,500 new Wikipedia pages on COVID-19 have been created and have accumulated close to 250M pageviews by early April 2020.1 At the same time, an unprecedented amount of scientific articles on COVID-19 and the ongoing pandemic have been published online. Wikipedia’s contents are based on reliable sources, primarily scientific literature. Given its public function, it is crucial for Wikipedia to rely on representative and reliable scientific results, especially so in a time of crisis. We assess the coverage of COVID-19-related research in Wikipedia via citations. We find that Wikipedia editors are integrating new research at an unprecedented fast pace. While doing so, they are able to provide a largely representative coverage of COVID-19-related research. We show that all the main topics discussed in this literature are proportionally represented from Wikipedia, after accounting for article-level effects. We further use regression analyses to model citations from Wikipedia and show that, despite the pressure to keep up with novel results, Wikipedia editors rely on literature which is highly cited, widely shared on social media, and has been peer-reviewed.
*G. Colavizza (2020). "COVID-19 research in Wikipedia". ''bioRxiv.'' doi:[https://doi.org/10.1101/2020.05.10.087643 10.1101/2020.05.10.087643]
* [https://zenodo.org/record/3946495#.Xw8ukhGxWal presentation slides]
}}
 
=== {{ym|2020|6}} ===
;Theme:Credibility and Verifiability
 
{{/Event
| date = June 17, 2020
| youtube-url = https://www.youtube.com/watch?v=GS9Jc3IFhVQ
| commons-file =
| talk1-title = Today’s News, Tomorrow’s Reference, and The Problem of Information Reliability - An Introduction to NewsQ
| talk1-presenter = Connie Moon Sehat, NewsQ, Hacks/Hackers
| talk1-slides =File:NewsQ_Slides_-_Wikimedia_Research_Showcase_June_2020.pdf
| talk1-abstract = The effort to make Wikipedia more reliable is related to the larger challenges facing the information ecosystem overall. These challenges include the discovery of and accessibility to reliable news amid the transformation of news distribution through platform and social media products. Connie will present some of the challenges related to the ranking and recommendation of news that are addressed by the NewsQ Initiative, a collaboration between the Tow-Knight Center for Entrepreneurial Journalism at the Craig Newmark Graduate School of Journalism and Hacks/Hackers. In addition, she’ll share some of the ways that the project intersects with Wikipedia, such as supporting research around the [https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources US Perennial Sources list].
Related resources
 
*NewsQ Initiative site (https://newsq.net/)
 
*DUE JUNE 15 (Please apply if interested!): Social Science Research Council Call for Papers, “News Quality in the Platform Era” https://www.ssrc.org/programs/component/media-democracy/news-quality-in-the-platform-era/
 
*M. Bhuiyan, A. Zhang, C. Sehat, T. Mitra, 2020. Investigating "Who" in the Crowdsourcing of News Credibility, C+J 2020 (https://cpb-us-w2.wpmucdn.com/express.northeastern.edu/dist/d/53/files/2020/02/CJ_2020_paper_32.pdf)
| talk2-title = Quantifying Engagement with Citations on Wikipedia
| talk2-presenter = Tiziano Piccardi, EPFL
| talk2-slides = File:Quantifying Engagement with Citations on Wikipedia.pdf
| talk2-abstract =Wikipedia, the free online encyclopedia that anyone can edit, is one of the most visited sites on the Web and a common source of information for many users. As an encyclopedia, Wikipedia is not a source of original information, but was conceived as a gateway to secondary sources: according to Wikipedia's guidelines, facts must be backed up by reliable sources that reflect the full spectrum of views on the topic. Although citations lie at the very heart of Wikipedia, little is known about how users interact with them. To close this gap, we built client-side instrumentation for logging all interactions with links leading from English Wikipedia articles to cited references for one month and conducted the first analysis of readers' interaction with citations on Wikipedia. We find that overall engagement with citations is low: about one in 300 page views results in a reference click (0.29% overall; 0.56% on desktop; 0.13% on mobile). Matched observational studies of the factors associated with reference clicking reveal that clicks occur more frequently on shorter pages and on pages of lower quality, suggesting that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user. Moreover, we observe that recent content, open access sources, and references about life events (births, deaths, marriages, etc) are particularly popular. Taken together, our findings open the door to a deeper understanding of Wikipedia's role in a global information economy where reliability is ever less certain, and source attribution ever more vital.
 
*Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West (https://arxiv.org/abs/2001.08614)
}}
 
=== {{ym|2020|5}} ===
;Theme: Human in the Loop Machine Learning
 
{{/Event
| date = May 20, 2020
| youtube-url = https://www.youtube.com/watch?v=8nDiu2ebdOI
| commons-file =
| talk1-title = OpenCrowd -- A Human-AI Collaborative Approach for Finding Social Influencers via Open-Ended Answers Aggregation
| talk1-presenter = Jie Yang, Amazon (current), Delft University of Technology (starting soon)
| talk1-slides =
| talk1-abstract = Finding social influencers is a fundamental task in many online applications ranging from brand marketing to opinion mining. Existing methods heavily rely on the availability of expert labels, whose collection is usually a laborious process even for domain experts. Using open-ended questions, crowdsourcing provides a cost-effective way to find a large number of social influencers in a short time. Individual crowd workers, however, only possess fragmented knowledge that is often of low quality. To tackle those issues, we present OpenCrowd, a unified Bayesian framework that seamlessly incorporates machine learning and crowdsourcing for effectively finding social influencers. To infer a set of influencers, OpenCrowd bootstraps the learning process using a small number of expert labels and then jointly learns a feature-based answer quality model and the reliability of the workers. Model parameters and worker reliability are updated iteratively, allowing their learning processes to benefit from each other until an agreement on the quality of the answers is reached. We derive a principled optimization algorithm based on variational inference with efficient updating rules for learning OpenCrowd parameters. Experimental results on finding social influencers in different domains show that our approach substantially improves the state of the art by 11.5% AUC. Moreover, we empirically show that our approach is particularly useful in finding micro-influencers, who are very directly engaged with smaller audiences. [https://dl.acm.org/doi/fullHtml/10.1145/3366423.3380254 Paper]
| talk2-title = Keeping Community in the Machine-Learning Loop
| talk2-presenter = C. Estelle Smith, MS, PhD Candidate, GroupLens Research Lab at the University of Minnesota
| talk2-slides = File:C.EstelleSmith_ResearchShowcase_5_20_20.pdf
| talk2-abstract =On Wikipedia, sophisticated algorithmic tools are used to assess the quality of edits and take corrective actions. However, algorithms can fail to solve the problems they were designed for if they conflict with the values of communities who use them. In this study, we take a Value-Sensitive Algorithm Design approach to understanding a community-created and -maintained machine learning-based algorithm called the Objective Revision Evaluation System (ORES)—a quality prediction system used in numerous Wikipedia applications and contexts. Five major values converged across stakeholder groups that ORES (and its dependent applications) should: (1) reduce the effort of community maintenance, (2) maintain human judgement as the final authority, (3) support differing peoples’ differing workflows, (4) encourage positive engagement with diverse editor groups, and (5) establish trustworthiness of people and algorithms within the community. We reveal tensions between these values and discuss implications for future research to improve algorithms like ORES. [https://commons.wikimedia.org/wiki/File:Keeping_Community_in_the_Loop-_Understanding_Wikipedia_Stakeholder_Values_for_Machine_Learning-Based_Systems.pdf Paper]
}}
 
=== {{ym|2020|3}} ===
;Theme: Topic modeling
 
{{/Event
| date = March 18, 2020
| youtube-url = https://www.youtube.com/watch?v=fiD9QTHNVVM
| commons-file =
| talk1-title = Big Data Analysis with Topic Models: Evaluation, Interaction, and Multilingual Extensions
| talk1-presenter = Jordan Boyd-Graber, University of Maryland
| talk1-slides =
| talk1-abstract = A common information need is to understand large, unstructured datasets: millions of e-mails during e-discovery, a decade worth of science correspondence, or a day's tweets. In the last decade, topic models have become a common tool for navigating such datasets even across languages. This talk investigates the foundational research that allows successful tools for these data exploration tasks: how to know when you have an effective model of the dataset; how to correct bad models; how to measure topic model effectiveness; and how to detect framing and spin using these techniques. After introducing topic models, I argue why traditional measures of topic model quality---borrowed from machine learning---are inconsistent with how topic models are actually used. In response, I describe interactive topic modeling, a technique that enables users to impart their insights and preferences to models in a principled, interactive way. I will then address measuring topic model effectiveness in real-world tasks.
:::* [http://users.umiacs.umd.edu/~jbg/temp/2020_tm.pdf '''Presentation slides''']
:::* [https://mimno.infosci.cornell.edu/papers/2017_fntir_tm_applications.pdf Overview of topic models]
:::* [http://umiacs.umd.edu/~jbg//docs/nips2009-rtl.pdf Topic model evaluation]
:::* [http://umiacs.umd.edu/~jbg//docs/2014_mlj_itm.pdf Interactive topic modeling]
:::* [http://umiacs.umd.edu/~jbg//docs/2016_acl_doclabel.pdf Topic Models for Categorization]
| talk2-title = Topic Classification for Wikipedia
| talk2-presenter = [[User:Isaac_(WMF)|Isaac Johnson]], Wikimedia Foundation
| talk2-slides =
| talk2-abstract = This talk will provide a high-level overview of how the Wikimedia Foundation is approaching the challenges of topic classification and topic modeling for Wikipedia. An overview will be given of the importance of being able to model topics to Wikipedia [[Inuka_team|readers]] and [[Growth/Personalized_first_day/Newcomer_tasks#Recommending_tasks|editors]] as well as a description of some of the existing technologies ([[ORES#Topic_routing|ORES articletopic API]]; [https://tools.wmflabs.org/wiki-topic/ Wikidata-based topic API]) and future work in this space. ([https://figshare.com/articles/Topic_Classification_for_Wikipedia_March_2020_Research_Showcase/12003105 '''Presentation slides'''])
 
}}
=== {{ym|2020|2}} ===
{{/Event
| date = February 19, 2020
| youtube-url = https://www.youtube.com/watch?v=fj0z20PuGIk
| commons-file =
| talk1-title = Autonomous tools and the design of work
| talk1-presenter = Jeffrey V. Nickerson, Stevens Institute of Technology
| talk1-slides =
| talk1-abstract = Bots and other software tools that exhibit autonomy can appear in an organization to be more like employees than commodities. As a result, humans delegate to machines. Sometimes the machines turn and delegate part of the work back to humans. This talk will discuss how the design of human work is changing, drawing on a recent study of editors and bots in Wikipedia, as well as a study of game and chip designers. The Wikipedia bot ecosystem, and how bots evolve, will be discussed. Humans are working together with machines in complex configurations; this puts constraints on not only the machines but also the humans. Both software and human skills change as a result. [https://dl.acm.org/doi/pdf/10.1145/3359317?download=true Paper]
| talk2-title = When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata
| talk2-presenter = Lucie-Aimée Kaffee, University of Southampton
| talk2-slides =
| talk2-abstract = The quality and maintainability of any knowledge graph are strongly influenced in the way it is created. In the case of Wikidata, the knowledge graph is created and maintained by a hybrid approach of human editing supported by automated tools. We analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. Wikidata is a good example for a hybrid multilingual knowledge graph as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data. This presentation is based on the paper “When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata”, published in OpenSym 2019 in collaboration with Kemele M. Endris and Elena Simperl. [https://opensym.org/wp-content/uploads/2019/08/os19-paper-A16-kaffee.pdf Paper]
}}
 
=== {{ym|2020|1}} ===
''No Showcase in January.''
 
==2019==
=== {{ym|2019|12}} ===
{{/Event
| date = December 18, 2019
| youtube-url = https://www.youtube.com/watch?v=b4VrphM_TTA
| commons-file =
| talk1-title = Making Knowledge Bases More Complete
| talk1-presenter = [https://suchanek.name Fabian Suchanek], Télécom Paris, Institut Polytechnique de Paris
| talk1-slides =
| talk1-abstract = A Knowledge Base (KB) is a computer-readable collection of facts about the world (examples are Wikidata, DBpedia, and YAGO). The problem is that these KBs are often missing entities or facts. In this talk, I present some new methods to combat this incompleteness. I will also quickly talk about some other research projects we are currently pursuing, including a new version of YAGO. ([https://figshare.com/articles/Making_knowledge_bases_more_complete_-_Wikimedia_Research_Showcase_-_December_2019/11401416 presentation slides], [https://suchanek.name/work/publications/ related publications])
| talk2-title = The Dynamics of Peer-Produced Political Information During the 2016 U.S. Presidential Campaign
| talk2-presenter = [[w:User:Madcoverboy|Brian Keegan, Ph.D.]], Assistant Professor, Department of Information Science, University of Colorado Boulder
| talk2-slides =
| talk2-abstract = Wikipedia plays a crucial role for online information seeking and its editors have a remarkable capacity to rapidly revise its content in response to [[w:Portal:Current events|current events]]. How did the production and consumption of political information on Wikipedia mirror the dynamics of the [[w:2016 United States presidential election|2016 U.S. Presidential campaign]]? Drawing on [[w:System justification|systems justification]] theory and methods for measuring the enthusiasm gap among voters, this paper quantitatively analyzes the candidates' biographical and related articles and their editors. Information production and consumption patterns match major events over the course of the campaign, but [[w:Category:Donald Trump|Trump-related articles]] show consistently higher levels of engagement than [[w:Category:Hillary Clinton|Clinton-related]] articles. Analysis of the editors' participation and backgrounds show analogous shifts in the composition and durability of the collaborations around each candidate. The implications for using Wikipedia to monitor political engagement are discussed. ([https://figshare.com/articles/The_dynamics_of_peer-produced_political_information_-_Wikimedia_Research_Showcase_-_December_2019/11401410 Presentation slides], [http://www.brianckeegan.com/papers/CSCW_2019_Elections.pdf Paper])
}}
 
==={{ym|2019|11}}===
{{/Event
| date = November 20, 2019
| youtube-url = https://www.youtube.com/watch?v=tIko_V1k09s
| commons-file =
| talk1-title = Wikipedia Text Reuse<nowiki>:</nowiki> Within and Without
| talk1-presenter = Martin Potthast, Leipzig University
| talk1-slides =
| talk1-abstract = We study text reuse related to Wikipedia at scale by compiling the first corpus of text reuse cases within Wikipedia as well as without (i.e., reuse of Wikipedia text in a sample of the Common Crawl). To discover reuse beyond verbatim copy and paste, we employ state-of-the-art text reuse detection technology, scaling it for the first time to process the entire Wikipedia as part of a distributed retrieval pipeline. We further report on a pilot analysis of the 100 million reuse cases inside, and the 1.6 million reuse cases outside Wikipedia that we discovered. Text reuse inside Wikipedia gives rise to new tasks such as article template induction, fixing quality flaws, or complementing Wikipedia’s ontology. Text reuse outside Wikipedia yields a tangible metric for the emerging field of quantifying Wikipedia’s influence on the web. To foster future research into these tasks, and for reproducibility’s sake, the Wikipedia text reuse corpus and the retrieval pipeline are made freely available ([https://webis.de/publications.html#?q=stein_2019c paper, slides, and related resources],[https://demo.webis.de/wikipedia-text-reuse/ Demo])
| talk2-title = Characterizing Wikipedia Reader Demographics and Interests
| talk2-presenter = [[meta:User:Isaac_(WMF)|Isaac Johnson]], Wikimedia Foundation
| talk2-slides =
| talk2-abstract = Building on two past surveys on the motivation and needs of Wikipedia readers ([[Wikimedia_Research/Showcase#November_2016|Why We Read Wikipedia]]; [[Wikimedia_Research/Showcase#December_2018|Why the World Reads Wikipedia]]), we examine the relationship between Wikipedia reader demographics and their interests and needs. Specifically, we run surveys in thirteen different languages that ask readers three questions about their motivation for reading Wikipedia (motivation, needs, and familiarity) and five questions about their demographics (age, gender, education, locale, and native language). We link these survey results with the respondents' reading sessions -- i.e. sequence of Wikipedia page views -- to gain a more fine-grained understanding of how a reader's context relates to their activity on Wikipedia. We find that readers have a diversity of backgrounds but that the high-level needs of readers do not correlate strongly with individual demographics. We also find, however, that there are relationships between demographics and specific topic interests that are consistent across many cultures and languages. This work provides insights into the reach of various Wikipedia language editions and the relationship between content or contributor gaps and reader gaps. See the [[meta:Research:Characterizing_Wikipedia_Reader_Behaviour/Demographics_and_Wikipedia_use_cases#Reader_Surveys|meta page]] for more details. [https://figshare.com/articles/Reader_Demographics_November_2019_Wikimedia_Research_Showcase_Presentation/10565882 Slides (figshare)].
}}
 
==={{ym|2019|10}}===
{{/Event
| date = October 16, 2019
| youtube-url = https://www.youtube.com/watch?v=KZ35weAVlIU
| commons-file =
| talk1-title = Elections Without Fake: Deploying Real Systems to Counter Misinformation Campaigns
| talk1-presenter = Fabrício Benevenuto, Computer Science Department, Universidade Federal de Minas Gerais (UFMG), Brazil
| talk1-slides =
| talk1-abstract = The political debate and electoral dispute in the online space during the 2018 Brazilian elections were marked by an information war. In order to mitigate the misinformation problem, we created the project [http://www.eleicoes-sem-fake.dcc.ufmg.br Elections Without Fake] and developed a few technological solutions able to reduce the abuse of misinformation campaigns in the online space. Particularly, we created a system to monitor public groups in WhatsApp and a system to monitor ads in Facebook. Our systems showed to be fundamental for fact-checking and investigative journalism, and are currently being used by over 150 journalists with editorial lines and various fact-checking agencies.
| talk2-title = Protecting Wikipedia from Disinformation: Detecting Malicious Editors and Pages to Protect
| talk2-presenter = Francesca Spezzano, Computer Science Department, Boise State University
| talk2-slides =
| talk2-abstract = Wikipedia is based on the idea that anyone can make edits in order to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the online encyclopedia that do not align with Wikipedia’s intended uses. In this talk, we present different forms of disinformation on Wikipedia including vandalism and spam and introduce to the mechanism that Wikipedia implements to protects its integrity such as blocking malicious editors and page protection. Next, we provide an overview of effective algorithms based on the user editing behavior we have developed to detect malicious editors and pages to protect across multiple languages. ([https://figshare.com/articles/Protecting_Wikipedia_from_Disinformation_-_Wikimedia_Research_Showcase_-_October_2019/11497047 Slides on Figshare], related research papers<ref>https://arxiv.org/pdf/1507.01272.pdf</ref><ref>https://static.aminer.org/pdf/fa/cikm2016/shp1112-suyehiraA.pdf</ref><ref>https://www.aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/viewFile/15678/14872</ref>)
}}
 
=== {{ym|2019|9}} ===
{{/Event
| date = September 18, 2019
| youtube-url = https://www.youtube.com/watch?v=fDhAnHrkBks
| commons-file =
| talk1-title = Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability
| talk1-presenter = Miriam Redi, Research, Wikimedia Foundation
| talk1-slides =
| talk1-abstract = Among Wikipedia's core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate and fact-check Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In [[meta:Research:Identification_of_Unsourced_Statements|this project]], we aimed to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we constructed a taxonomy of reasons why inline citations are required by collecting labeled data from editors of multiple Wikipedia language editions. We then collected a large-scale crowdsourced dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we designed and evaluated algorithmic models to determine if a statement requires a citation, and to predict the citation reason based on our taxonomy. We evaluated the robustness of such models across different classes of Wikipedia articles of varying quality, as well as on an additional dataset of claims annotated for fact-checking purposes. ''[https://figshare.com/articles/Citation_Needed_September_2019_Wikimedia_Research_Showcase_Presentation/11387646 Slides on FigShare]''
 
::Redi, M., Fetahu, B., Morgan, J., & Taraborelli, D. (2019, May). ''Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability''. In The World Wide Web Conference (pp. 1567-1578). ACM. https://arxiv.org/abs/1902.11116
 
 
| talk2-title = Patrolling on Wikipedia
| talk2-presenter = Jonathan T. Morgan, Research, Wikimedia Foundation
| talk2-slides = File:Patrolling_on_Wikipedia_September_2019_research_showcase_slides.pdf
| talk2-abstract = I will present initial findings from an [[meta:Research:Patrolling_on_Wikipedia|ongoing research study]] of patrolling workflows on Wikimedia projects. Editors patrol recent pages and edits to ensure that Wikimedia projects maintains high quality as new content comes in. Patrollers revert vandalism and review newly-created articles and article drafts. Patrolling of new pages and edits is vital work. In addition to making sure that new content conforms to Wikipedia project policies, patrollers are the first line of defense against disinformation, copyright infringement, libel and slander, personal threats, and other forms of vandalism on Wikimedia projects. This research project is focused on understanding the needs, priorities, and workflows of editors who patrol new content on Wikimedia projects. The findings of this research can inform the development of better patrolling tools as well as non-technological interventions intended to support patrollers and the activity of patrolling.
}}
 
==={{ym|2019|7}}===
{{/Event
| date = July 17, 2019
| youtube-url = https://www.youtube.com/watch?v=i9vvwV5KfW4
| commons-file =
| talk1-title = Characterizing Incivility on Wikipedia
| talk1-presenter = Elizabeth Whittaker, University of Michigan School of Information
| talk1-slides =
| talk1-abstract = In a society whose citizens have a variety of viewpoints, there is a question of how citizens can govern themselves in ways that allow these viewpoints to co-exist. Online deliberation has been posited as a problem solving mechanism in this context, and civility can be thought of as a mechanism that facilitates this deliberation. Civility can thus be thought of as a method of interaction that encourages collaboration, while incivility disrupts collaboration. However, it is important to note that the nature of online civility is shaped by its history and the technical architecture scaffolding it. Civility as a concept has been used both to promote equal deliberation and to exclude the marginalized from deliberation, so we should be careful to ensure that our conceptualizations of incivility reflect what we intend them to in order to avoid unintentionally reinforcing inequality.
 
:::To this end, we examined Wikipedia editors’ perceptions of interactions that disrupt collaboration through 15 semi-structured interviews. Wikipedia is a highly deliberative platform, as editors need to reach consensus about what will appear on the article page, a process that often involves deliberation to coordinate, and any disruption to this process should be apparent. We found that incivility on Wikipedia typically occurs in one of three ways: through weaponization of Wikipedia’s policies, weaponization of Wikipedia’s technical features, and through more typical vitriolic content. These methods of incivility were gendered, and had the practical effect of discouraging women from editing. We implicate this pattern as one of the underlying causes of Wikipedia’s gender gap.
| talk2-title = Hidden Gems in the Wikipedia Discussions - The Wikipedians’ Rationales
| talk2-presenter = Lu Xiao, Syracuse University School of Information Studies
| talk2-slides =
| talk2-abstract = I will present a series of completed and ongoing studies that are aimed at understanding the role of the Wikipedians’ rationales in Wikipedia discussions. We define a rationale as one’s justification of her viewpoint and suggestions. Our studies demonstrate the potential of leveraging the Wikipedians’ rationales in discussions as resources for future decision-making and as resources for eliciting knowledge about the community’s norms, practices and policies. Viewed as rich digital traces in these environments, we consider them to be beneficial for the community members, such as helping newcomers familiarize themselves on the commonly accepted justificatory reasoning styles. We call for more research attention to the discussion content from this rationale study perspective.
}}
==={{ym|2019|6}}===
{{/Event
| date = June 26, 2019
| youtube-url = https://www.youtube.com/watch?v=WiUfpmeJG7E
| commons-file =
| talk1-title = Trajectories of Blocked Community Members: Redemption, Recidivism and Departure
| talk1-presenter = Jonathan Chang, Cornell University
| talk1-slides = File:Trajectories_of_Blocked_Community_Members_-_Slides.pdf
| talk1-abstract = Community norm violations can impair constructive communication and collaboration online. As a defense mechanism, community moderators often address such transgressions by temporarily blocking the perpetrator. Such actions, however, come with the cost of potentially alienating community members. Given this tradeoff, it is essential to understand to what extent, and in which situations, this common moderation practice is effective in reinforcing community rules. In this work, we introduce a computational framework for studying the future behavior of blocked users on Wikipedia. After their block expires, they can take several distinct paths: they can reform and adhere to the rules, but they can also recidivate, or straight-out abandon the community. We reveal that these trajectories are tied to factors rooted both in the characteristics of the blocked individual and in whether they perceived the block to be fair and justified. Based on these insights, we formulate a series of prediction tasks aiming to determine which of these paths a user is likely to take after being blocked for their first offense, and demonstrate the feasibility of these new tasks. Overall, this work builds towards a more nuanced approach to moderation by highlighting the tradeoffs that are in play. For more information, see [http://www.cs.cornell.edu/%7Ejpchang/papers/recidivism_online.pdf the full paper].
| talk2-title = [[:meta:University of Virginia/Automatic Detection of Online Abuse|Automatic Detection of Online Abuse in Wikipedia]] <-- see project page
| talk2-presenter = Lane Rasberry, University of Virginia
| talk2-slides = File:Automated detection of Wiki misconduct!.pdf
| talk2-abstract = '''Please see the [[:File:Automated detection of Wikipedia misconduct!.webm|researchers' own video]] and [[:File:Automatic Detection of Online Abuse in Wikipedia.pdf|their own slides]]!''' This presentation comes from the research coordinator and will consider the research administration more than the research process. Researchers analyzed all English Wikipedia blocks prior to 2018 using machine learning. With insights gained, the researchers examined all English Wikipedia users who are not blocked against the identified characteristics of blocked users. The results were a ranked set of predictions of users who are not blocked, but who have a history of conduct similar to that of blocked users. This research and process models a system for the use of computing to aid human moderators in identifying conduct on English Wikipedia which merits a block.
| talk3-title = First Insights from Partial Blocks in Wikimedia Wikis
| talk3-presenter = Morten Warncke-Wang, Wikimedia Foundation
| talk3-slides = File:First Insights from Partial Blocks in Wikimedia Wikis.pdf
| talk3-abstract =The Anti-Harassment Tools team at the Wikimedia Foundation released the partial block feature in early 2019. Where previously blocks on Wikimedia wikis were sitewide (users were blocked from editing an entire wiki), partial blocks makes it possible to block users from editing specific pages and/or namespaces. The Italian Wikipedia was the first wiki to start using this feature, and it has since been rolled out to other wikis as well. In this presentation, we will look at how this feature has been used in the first few months since release.
}}
 
=== {{ym|2019|5}} ===
''No showcase''
 
==={{ym|2019|4}}===
{{/Event
| date = April 17, 2019
| youtube-url = https://www.youtube.com/watch?v=zmb5LoJzOoE
| commons-file =
| talk1-title = Group Membership and Contributions to Public Information Goods: The Case of WikiProject
| talk1-presenter = Ark Fangzhou Zhang
| talk1-slides =
| talk1-abstract = We investigate the effects of group identity on contribution behavior on the English Wikipedia, the largest online encyclopedia that gives free access to the public. Using an instrumental variable approach that exploits the variations in one’s exposure to WikiProject, we find that joining a WikiProject has a significant impact on one’s level of contribution, with an average increase of 79 revisions or 8,672 character per month. To uncover the potential mechanism underlying the treatment effect, we use the size of home page for WikiProject as a proxy for the number of recommendations from a project. The results show that the users who join a WikiProject with more recommendations significantly increase their contribution to articles under the joined project, but not to articles under other projects.
| talk2-title = Thanks for Stopping By: A Study of “Thanks” Usage on Wikimedia
| talk2-presenter = Swati Goel
| talk2-slides =
| talk2-abstract = The Thanks feature on Wikipedia, also known as "Thanks," is a tool with which editors can quickly and easily send one other positive feedback. The aim of this project is to better understand this feature: its scope, the characteristics of a typical "Thanks" interaction, and the effects of receiving a thank on individual editors. We study the motivational impacts of "Thanks" because maintaining editor engagement is a central problem for crowdsourced repositories of knowledge such as Wikimedia. Our main findings are that most editors have not been exposed to the Thanks feature (meaning they have never given nor received a thank), thanks are typically sent upwards (from less experienced to more experienced editors), and receiving a thank is correlated with having high levels of editor engagement. Though the prevalence of "Thanks" usage varies by editor experience, the impact of receiving a thank seems mostly consistent for all users. We empirically demonstrate that receiving a thank has a strong positive effect on short-term editor activity across the board and provide preliminary evidence that thanks could compound to have long-term effects as well. More information is available [[meta:Research:Understanding_thanks|on the research project page]].
}}
==={{ym|2019|3}}===
{{/Event
| date = March 20, 2019
| youtube-url =https://www.youtube.com/watch?v=6p62PMhkVNM
| commons-file =
| talk1-title =Learning How to Correct a Knowledge Base from the Edit History
| talk1-presenter =Thomas Pellissier Tanon (Télécom ParisTech), Camille Bourgaux (DI ENS, CNRS, ENS, PSL Univ. & Inria), Fabian Suchanek (Télécom ParisTech), WWW'19.
| talk1-slides = File:Learning How to Correct Wikidata from the Edit History - WM Research Showcase slides.pdf
| talk1-abstract =The curation of Wikidata (and other knowledge bases) is crucial to keep the data consistent, to fight vandalism and to correct good faith mistakes. However, manual curation of the data is costly. In this work, we propose to take advantage of the edit history of the knowledge base in order to learn how to correct constraint violations automatically. Our method is based on rule mining, and uses the edits that solved violations in the past to infer how to solve similar violations in the present. For example, our system is able to learn that the value of the [[d:Property:P21|sex or gender]] property [[d:Q467|woman]] should be replaced by [[d:Q6581072|female]]. We provide [https://tools.wmflabs.org/wikidata-game/distributed/#game=43 a Wikidata game] that suggests our corrections to the users in order to improve Wikidata. Both the evaluation of our method on past corrections, and the Wikidata game statistics show significant improvements over baselines.
| talk2-title =TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables
| talk2-presenter =Besnik Fetahu
| talk2-slides = File:Tablenet_www2019.pdf
| talk2-abstract =Wikipedia tables represent an important resource, where information is organized w.r.t table schemas consisting of columns. In turn each column, may contain instance values that point to other Wikipedia articles or primitive values (e.g. numbers, strings etc.). In this work, we focus on the problem of interlinking Wikipedia tables for two types of table relations: equivalent and subPartOf. Through such relations, we can further harness semantically related information by accessing related tables or facts therein. Determining the relation type of a table pair is not trivial, as it is dependent on the schemas, the values therein, and the semantic overlap of the cell values in the corresponding tables. We propose TableNet, an approach that constructs a knowledge graph of interlinked tables with subPartOf and equivalent relations. TableNet consists of two main steps: (i) for any source table we provide an efficient algorithm to find all candidate related tables with high coverage, and (ii) a neural based approach, which takes into account the table schemas, and the corresponding table data, we determine with high accuracy the table relation for a table pair. We perform an extensive experimental evaluation on the entire Wikipedia with more than 3.2 million tables. We show that with more than 88% we retain relevant candidate tables pairs for alignment. Consequentially, with an accuracy of 90% we are able to align tables with subPartOf or equivalent relations. Comparisons with existing competitors show that TableNet has superior performance in terms of coverage and alignment accuracy.
}}
==={{ym|2019|2}}===
{{/Event
| date = February 20, 2019
| youtube-url = https://www.youtube.com/watch?v=_jpJIFXwlEg
| commons-file =
| talk1-title =The_Tower_of_Babel.jpg: Diversity of Visual Encyclopedic Knowledge Across Wikipedia Language Editions
| talk1-presenter =Shiqing He (presenting, University of Michigan), Brent Hecht (presenting, Northwestern University), Allen Yilun Lin (Northwestern University), Eytan Adar (University of Michigan), ICWSM'18.
| talk1-slides =
| talk1-abstract =Across all Wikipedia language editions, millions of images augment text in critical ways. This visual encyclopedic knowledge is an important form of wikiwork for editors, a critical part of reader experience, an emerging resource for machine learning, and a lens into cultural differences. However, Wikipedia research--and cross-language edition Wikipedia research in particular--has thus far been limited to text. In this paper, we assess the diversity of visual encyclopedic knowledge across 25 language editions and compare our findings to those reported for textual content. Unlike text, translation in images is largely unnecessary. Additionally, the Wikimedia Foundation, through the Wikipedia Commons, has taken steps to simplify cross-language image sharing. While we may expect that these factors would reduce image diversity, we find that cross-language image diversity rivals, and often exceeds, that found in text. We find that diversity varies between language pairs and content types, but that many images are unique to different language editions. Our findings have implications for readers (in what imagery they see), for editors (in deciding what images to use), for researchers (who study cultural variations), and for machine learning developers (who use Wikipedia for training models).
| talk2-title =A Warm Welcome, Not a Cold Start: Eliciting New Editors' Interests via Questionnaires
| talk2-presenter =Ramtin Yazdanian (presenting, Ecole Polytechnique Federale de Lausanne)
| talk2-slides =
| talk2-abstract =Every day, thousands of users sign up as new Wikipedia contributors. Once joined, these users have to decide which articles to contribute to, which users to reach out to and learn from or collaborate with, etc. Any such task is a hard and potentially frustrating one given the sheer size of Wikipedia. Supporting newcomers in their first steps by recommending articles they would enjoy editing or editors they would enjoy collaborating with is thus a promising route toward converting them into long-term contributors. Standard recommender systems, however, rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. Our aim is to address the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We have assessed the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines. By addressing the cold-start problem, this work can help with the sustainable growth and maintenance of Wikipedia's diverse editor community. [https://commons.wikimedia.org/wiki/File:A_Warm_Welcome,_Not_a_Cold_Start_-_Eliciting_New_Editors%27_Interests_via_Questionnaires.pdf Slides]
}}
 
==={{ym|2019|1}}===
{{/Event
| date = January 16, 2019
| youtube-url = https://www.youtube.com/watch?v=Fc51jE_KNTc
| commons-file =
| talk1-title = Understanding participation in Wikipedia: Studies on the relationship between new editors’ motivations and activity
| talk1-presenter = Martina Balestra, New York University
| talk1-slides =
| talk1-abstract = Peer production communities like Wikipedia often struggle to retain contributors beyond their initial engagement. Theory suggests this may be related to their levels of motivation, though prior studies either center on contributors’ activity or use cross-sectional survey methods, and overlook accompanied changes in motivation. In this talk, I will present a series of studies aimed at filling this gap. We begin by looking at how Wikipedia editors’ early motivations influence the activities that they come to engage in, and how these motivations change over the first three months of participation in Wikipedia. We then look at the relationship between editing activity and intrinsic motivation specifically over time. We find that new editors’ early motivations are predictive of their future activity, but that these motivations tend to change with time. Moreover, newcomers’ intrinsic motivation is reinforced by the amount of activity they engage in over time: editors who had a high level of intrinsic motivation entered a virtuous cycle where the more they edited the more motivated they became, whereas those who initially had low intrinsic motivation entered a vicious cycle. Our findings shed new light on the importance of early experiences and reveal that the relationship between motivation and activity is more complex than previously understood.
| talk2-title = Geography and knowledge. Reviving an old relationship with Wiki Atlas
| talk2-presenter = Anastasios Noulas, New York University
| talk2-slides =
| talk2-abstract = [https://www.wiki-atlas.org Wiki Atlas] is an interactive cartography tool. The tool renders Wikipedia content in a 3-dimensional, web-based cartographic environment. The map acts as a medium that enables the discovery and exploration of articles in a manner that explicitly associates geography and information. At its current prototype form, a Wikipedia article is represented on the map as a 3D element whose height property is proportional to the number of views the article has on the website. This property enables the discovery of relevant content, in a manner that reflects the significance of the target element by means of collective attention by the site’s audience.
}}
 
== 2018 ==
=== {{ym|2018|12}} ===
 
{{/Event
| date = 12 December 2018
| youtube-url = https://www.youtube.com/watch?v=RKMFvi_CCB0
| commons-file =
| talk1-title = Why the World Reads Wikipedia
| talk1-presenter = Florian Lemmerich, RWTH Aachen University; [[User:Diego (WMF)|Diego Sáez-Trumper]], Wikimedia Foundation; Robert West, EPFL; and [[User:LZia (WMF)|Leila Zia]], Wikimedia Foundation
| talk1-slides = File:(WtWRW-20181211)_Research_Showcase_Presentation.pdf
| talk1-abstract = So far, little is known about why users across the world read Wikipedia's various language editions. To bridge this gap, we conducted a comparative study by combining a large-scale survey of Wikipedia readers across 14 language editions with a log-based analysis of user activity. For analysis, we proceeded in three steps: First, we analyzed the survey results to compare the prevalence of Wikipedia use cases across languages, discovering commonalities, but also substantial differences, among Wikipedia languages with respect to their usage. Second, we matched survey responses to the respondents' traces in Wikipedia's server logs to characterize behavioral patterns associated with specific use cases, finding that distinctive patterns consistently mark certain use cases across language editions. Third, we could show that certain Wikipedia use cases are more common in countries with certain socio-economic characteristics; e.g., in-depth reading of Wikipedia articles is substantially more common in countries with a low Human Development Index. The outcomes of this study provide a deeper understanding of Wikipedia readership in a wide range of languages, which is important for Wikipedia editors, developers, and the reusers of Wikipedia content.
| talk2-title =
| talk2-presenter =
| talk2-slides =
| talk2-abstract =
}}
 
=== {{ym|2018|11}} ===
There was no showcase in November due to US holidays.
 
=== {{ym|2018|10}} ===
 
{{/Event
| date = 17 October 2018
| youtube-url = https://www.youtube.com/watch?v=UJrJLWuNvXo
| commons-file =
| talk1-title = "Welcome" Changes? Descriptive and Injunctive Norms in a Wikipedia Sub-Community
| talk1-presenter = Jonathan T. Morgan, Wikimedia Foundation and Anna Filippova, GitHub
| talk1-slides =
| talk1-abstract = Open online communities rely on social norms for behavior regulation, group cohesion, and sustainability. Research on the role of social norms online has mainly focused on one source of influence at a time, making it difficult to separate different normative influences and understand their interactions. In this study, we use the Focus Theory to examine interactions between several sources of normative influence in a Wikipedia sub-community: local descriptive norms, local injunctive norms, and norms imported from similar sub- communities. We find that exposure to injunctive norms has a stronger effect than descriptive norms, that the likelihood of performing a behavior is higher when both injunctive and descriptive norms are congruent, and that conflicting social norms may negatively impact pro-normative behavior. We contextualize these findings through member interviews, and discuss their implications for both future research on normative influence in online groups and the design of systems that support open collaboration.(''[https://osf.io/84gvh/ research paper], [https://figshare.com/articles/Injunctive_and_Descriptive_Norms_in_a_Wikipedia_Sub-Community_-_Wikimedia_Research_Showcase_October_2018/7221395 slides with notes]'')
| talk2-title = The pipeline of online participation inequalities - The case of Wikipedia Editing
| talk2-presenter = Aaron Shaw, Northwestern University and Eszter Hargittai, University of Zurich
| talk2-slides =
| talk2-abstract = Participatory platforms like the Wikimedia projects have unique potential to facilitate more equitable knowledge production. However, digital inequalities such as the Wikipedia gender gap undermine this democratizing potential. In this talk, I present new research in which Eszter Hargittai and I conceptualize a "pipeline" of online participation and model distinct levels of awareness and behaviors necessary to become a contributor to the participatory web. We test the theory in the case of Wikipedia editing, using new survey data from a diverse, national sample of adult internet users in the U.S.
 
:::The results show that Wikipedia participation consistently reflects inequalities of education and internet experiences and skills. We find that the gender gap only emerges later in the pipeline whereas gaps along racial and socioeconomic lines explain variations earlier in the pipeline. Our findings underscore the multidimensionality of digital inequalities and suggest new pathways toward closing knowledge gaps by highlighting the importance of education and Internet skills.
 
:::We conclude that future research and interventions to overcome digital participation gaps should not focus exclusively on gender or class differences in content creation, but expand to address multiple aspects of digital inequality across pipelines of participation. In particular, when it comes to overcoming gender gaps in the case of Wikipedia, our results suggest that continued emphasis on recruiting female editors should include efforts to disseminate the knowledge that Wikipedia can be edited. Our findings support broader efforts to overcome knowledge- and skill-based barriers to entry among potential contributors to the open web.
}}
 
=== {{ym|2018|9}} ===
 
{{/Event
| date = 19 September 2018
| youtube-url = https://www.youtube.com/watch?v=OY8vZ6wES9o
| commons-file =
| talk1-title = The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic
| talk1-presenter = Michele Tizzoni, André Panisson, Daniela Paolotti, Ciro Cattuto
| talk1-slides =
| talk1-abstract = In recent years, many studies have drawn attention to the important role of collective awareness and human behaviour during epidemic outbreaks. A number of modelling efforts have investigated the interaction between the disease transmission dynamics and human behaviour change mediated by news coverage and by information spreading in the population. Yet, given the scarcity of data on public awareness during an epidemic, few studies have relied on empirical data. Here, we use fine-grained, geo-referenced data from three online sources - Wikipedia, the GDELT Project and the Internet Archive - to quantify population-scale information seeking about the 2016 Zika virus epidemic in the U.S., explicitly linking such behavioural signal to epidemiological data. Geo-localized Wikipedia pageview data reveal that visiting patterns of Zika-related pages in Wikipedia were highly synchronized across the United States and largely explained by exposure to national television broadcast. Contrary to the assumption of some theoretical models, news volume and Wikipedia visiting patterns were not significantly correlated with the magnitude or the extent of the epidemic. Attention to Zika, in terms of Zika-related Wikipedia pageviews, was high at the beginning of the outbreak, when public health agencies raised an international alert and triggered media coverage, but subsequently exhibited an activity profile that suggests nonlinear dependencies and memory effects in the relationship between information seeking, media pressure, and disease dynamics. This calls for a new and more general modelling framework to describe the interaction between media exposure, public awareness, and disease dynamics during epidemic outbreaks.
| talk2-title = Deliberation and resolution on Wikipedia<nowiki>:</nowiki> A case study of requests for comments
| talk2-presenter = Jane Im (University of Michigan) and Amy X. Zhang (MIT)
| talk2-slides =
| talk2-abstract = Resolving disputes in a timely manner is crucial for any online production group. We present [[meta:Research:Supporting_deliberation_and_resolution_on_Wikipedia|an analysis of Requests for Comments]] (RfCs), one of the main vehicles on Wikipedia for formally resolving a policy or content dispute. We collected an exhaustive dataset of 7,316 RfCs on English Wikipedia over the course of 7 years and conducted a qualitative and quantitative analysis into what issues affect the RfC process. Our analysis was informed by 10 interviews with frequent RfC closers. We found that a major issue affecting the RfC process is the prevalence of RfCs that could have benefited from formal closure but that linger indefinitely without one, with factors including participants' interest and expertise impacting the likelihood of resolution. From these findings, we developed a model that predicts whether an RfC will go stale with 75.3% accuracy, a level that is approached as early as one week after dispute initiation. ''([https://figshare.com/articles/rfc_sql/7038575 RfC Dataset], [https://trusttri.github.io/papers/final_version_CSCW_2018.pdf CSCW paper])''
}}
 
=== {{ym|2018|8}} ===
{{/Event
| date = 13 August 2018
| youtube-url = https://www.youtube.com/watch?v=OGPMS4YGDMk
| commons-file =
| talk1-title = Quicksilver: Training an ML system to generate draft Wikipedia articles and Wikidata entries simultaneously
| talk1-presenter = John Bohannon and Vedant Dharnidharka, [https://primer.ai/ Primer]
| talk1-slides = File:Primer Quicksilver research presentation at Wikimedia Aug 2018.pdf
| talk1-abstract = The automatic generation and updating of Wikipedia articles is usually approached as a multi-document summarization task: Given a set of source documents containing information about an entity, summarize the entity. Purely sequence-to-sequence neural models can pull that off, but getting enough data to train them is a challenge. Wikipedia articles and their reference documents can be used for training, [https://arxiv.org/abs/1801.10198 as was recently done] by a team at Google AI. But how do you find new source documents for new entities? And besides having humans read all of the source documents, how do you fact-check the output? What is needed is a self-updating knowledge base that learns jointly with a summarization model, keeping track of data provenance. Lucky for us, the world’s most comprehensive public encyclopedia is tightly coupled with Wikidata, the world’s most comprehensive public knowledge base. We have built a system called Quicksilver uses them both.
| talk2-title =
| talk2-presenter =
| talk2-slides =
| talk2-abstract =
}}
 
=== {{ym|2018|7}} ===
 
{{/Event
| date = 11 July 2018
| youtube-url = https://www.youtube.com/watch?v=uK7AvNKq0sg
| commons-file =
| talk1-title = Mind the (Language) Gap. Neural Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders
| talk1-presenter = Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis
| talk1-slides = File:Wikimedia_Research_Showcase_2018_ArticlePlaceholder.pdf
| talk1-abstract =While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is therefore of the utmost social and cultural interests to address languages for which native speakers have only access to an impoverished Wikipedia. In this work, we investigate the generation of summaries for Wikipedia articles in underserved languages, given structured data as an input.<br />In order to address the information bias towards widely spoken languages, we focus on an important support for such summaries: ArticlePlaceholders, which are dynamically generated content pages in underserved Wikipedia versions. They enable native speakers to access existing information in Wikidata, a structured Knowledge Base (KB). Our system provides a generative neural network architecture, which processes the triples of the KB as they are dynamically provided by the ArticlePlaceholder, and generate a comprehensible textual summary. This data-driven approach is tested with the goal of understanding how well it matches the communities' needs on two underserved languages on the Web: Arabic, a language with a big community with disproportionate access to knowledge online, and Esperanto.<br />With the help of the Arabic and Esperanto Wikipedians, we conduct an extended evaluation which exhibits not only the quality of the generated text but also the applicability of our end-system to any underserved Wikipedia version.
| talk2-title =Token-level change tracking. Data, tools and insights
| talk2-presenter = Fabian Flöck
| talk2-slides = File:WM-showcase apr18 token-level-change-tracking.pdf
| talk2-abstract =This talk first gives an overview of the WikiWho infrastructure, which provides tracking of changes to single tokens (~words) in articles of different Wikipedia language versions. It exposes APIs for accessing this data in near-real time, and is complemented by a published static dataset. Several insights are presented regarding provenance, partial reverts, token-level conflict and other metrics that only become available with such data. Lastly, the talk will cover several tools and scripts that are already using the API and will discuss their application scenarios, such as investigation of authorship, conflicted content and editor productivity.
}}
 
=== {{ym|2018|6}} ===
 
{{/Event
| date = 18 June 2018
| youtube-url = https://www.youtube.com/watch?v=Q1sSzKKoHB8
| commons-file =
| talk1-title = Conversations Gone Awry. Detecting Early Signs of Conversational Failure
| talk1-presenter = Justine Zhang and Jonathan Chang, Cornell University
| talk1-slides = File:Conversations_Gone_Awry_(slides).pdf
| talk1-abstract = One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversation might still be salvaged. To this end, we develop a framework for capturing pragmatic devices—such as politeness strategies and rhetorical prompts—used to start a conversation, and analyze their relation to its future trajectory. Applying this framework in a controlled setting, we demonstrate the feasibility of detecting early warning signs of antisocial behavior in online discussions.
| talk2-title = Building a rich conversation corpus from Wikipedia Talk pages
| talk2-presenter = TBA
| talk2-slides =
| talk2-abstract = We present a corpus of conversations that encompasses the complete history of interactions between contributors to English Wikipedia's Talk Pages. This captures a new view of these interactions by containing not only the final form of each conversation but also detailed information on all the actions that led to it: new comments, as well as modifications, deletions and restorations. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. As an example, we present a small study of removed comments highlighting that contributors successfully take action on more toxic behavior than was previously estimated.
}}
 
=== {{ym|2018|5}} ===
 
{{/Event
| date = 08 May 2018
| youtube-url = https://www.youtube.com/watch?v=t7cHxlGgEt4
| commons-file =
| talk1-title = Case studies in the appropriation of ORES
| talk1-presenter = [[User:Halfak (WMF)|Aaron Halfaker]], Wikimedia Foundation
| talk1-slides = File:ORES_appropriation_and_reflection_(Research_Showcase,_May_2018).pdf
| talk1-abstract = [[ORES]] is an open, transparent, and auditable machine prediction platform for Wikipedians to help them do their work. It's currently used in 33 different Wikimedia projects to measure the quality of content, detect vandalism, recommend changes to articles, and to identify good-faith newcomers. The primary way that Wikipedians use ORES' predictions is through the tools developed by volunteers. These javascript gadgets, MediaWiki extensions, and web-based tools make up a complex ecosystem of Wikipedian processes -- encoded into software. In this presentation, Aaron will walk through a three key tools that Wikipedians have developed that make use of ORES, and he'll discuss how these novel process support technologies and the discussions around them have prompted Wikipedians to reflect on their work processes.
 
| talk2-title = Exploring Wikimedia Donation Patterns
| talk2-presenter = [https://www.hcde.washington.edu/hsieh Gary Hsieh], University of Washington
| talk2-slides = File:Reciprocity & Donation Research Showcase presentation.pdf
| talk2-abstract = Every year, the Wikimedia Foundation relies on fundraising campaigns to help maintain the services it provides to millions of people worldwide. However, despite a large number of individuals who donate through these campaigns, these donors represent only a small percentage of Wikimedia users. In this work, we seek to advance our understanding of donors and their donation behaviors. Our findings offer insights to improve fundraising campaigns and to limit the burden of these campaigns on Wikipedia visitors.
}}
 
 
=== {{ym|2018|4}} ===
{{/Event
| date = 18 April 2018
| youtube-url = https://www.youtube.com/watch?v=Z1pa-pr6xis
| commons-file =
| talk1-title = The Critical Relationship of Volunteer-Created Wikipedia Content to Large-Scale Online Communities
| talk1-presenter = Nicholas Vincent, Northwestern University
| talk1-slides =
| talk1-abstract = The extensive Wikipedia literature has largely considered Wikipedia in isolation, outside of the context of its broader Internet ecosystem. Very recent research has demonstrated the significance of this limitation, identifying critical relationships between Google and Wikipedia that are highly relevant to many areas of Wikipedia-based research and practice. In this talk, I will present a study which extends this recent research beyond search engines to examine Wikipedia’s relationships with large-scale online communities, Stack Overflow and Reddit in particular. I will discuss evidence of consequential, albeit unidirectional relationships. Wikipedia provides substantial value to both communities, with Wikipedia content increasing visitation, engagement, and revenue, but we find little evidence that these websites contribute to Wikipedia in return. Overall, these findings highlight important connections between Wikipedia and its broader ecosystem that should be considered by researchers studying Wikipedia. Overall, this talk will emphasize the key role that volunteer-created Wikipedia content plays in improving other websites, even contributing to revenue generation.
| talk2-title = The Rise and Decline of an Open Collaboration System, a Closer Look
| talk2-presenter = Nate TeBlunthuis, University of Washington
| talk2-slides =
| talk2-abstract = Do patterns of growth and stabilization found in large peer production systems such as Wikipedia occur in other communities? This study assesses the generalizability of Halfaker etal.’s influential 2013 paper on “The Rise and Decline of an Open Collaboration System.” We replicate its tests of several theories related to newcomer retention and norm entrenchment using a dataset of hundreds of active peer production wikis from Wikia. We reproduce the subset of the findings from Halfaker and colleagues that we are able to test, comparing both the estimated signs and magnitudes of our models. Our results support the external validity of Halfaker et al.’s claims that quality control systems may limit the growth of peer production communities by deterring new contributors and that norms tend to become entrenched over time.
}}
 
=== {{ym|2018|3}} ===
{{/Event
| date = 21 March 2018
| youtube-url =https://www.youtube.com/watch?v=ACevHs0sMMw
| commons-file =
| talk1-title = Using Wikipedia categories for research<nowiki>:</nowiki> opportunities, challenges, and solutions
| talk1-presenter = Tiziano Piccardi, EPFL
| talk1-slides = File:Using Wikipedia categories for research.pdf|thumb|Using Wikipedia categories for research
| talk1-abstract = The category network in Wikipedia is used by editors as a way to label articles and organize them in a hierarchical structure. This manually created and curated network of 1.6 million nodes in English Wikipedia generated by arranging the categories in a child-parent relation (i.e., Scientists-People, Cities-Human Settlement) allows researchers to infer valuable relations between concepts. A clean structure in this format would be a valuable resource for a variety of tools and application including automatic reasoning tools. Unfortunately, Wikipedia category network contains some "noise" since in many cases the association as subcategory does not define an is-a relation (Scientists is-a People vs. Billionaires‎ is-a Wealth). Inspired to develop a model for recommending sections to be added to the already existing Wikipedia articles, we developed a method to clean this network and to keep only the categories that have a high chance to be associated with their children by an is-a relation. The strategy is based on the concept of "pure" categories, and the algorithm uses the types of the attached articles to determine how homogenous the category is. The approach does not rely on any linguistic feature and therefore is suitable for all Wikipedia languages. In this talk, we will discuss the high-level overview of the algorithm and some of the possible applications for the generated network beyond article section recommendations.
| talk2-title = Beyond Automatic Translation<nowiki>:</nowiki> Aligning Wikipedia sections across multiple languages
| talk2-presenter =Diego Saez-Trumper
| talk2-slides = File:Beyond Automatic Translation, Aligning Wikipedia sections across multiple languages.pdf
| talk2-abstract =Sections are the building blocks of Wikipedia articles. For editors, they can be used as an entry point for creating and expanding articles. For readers, they enhance readability of Wikipedia content. In this talk, we present an ongoing research to align article sections across Wikipedia languages. We show how the available technology for automatic translations are not good enough for translating section titles. We then show a complementary approach for section alignment, using Wikidata and cross-lingual word embeddings. We will present some of the use-cases of a methodology for aligning sections across languages, including improved section recommendation, especially in medium to smaller size languages where the language itself may not contain enough signal about the structure of the articles and signals can be inferred from other larger Wikipedia languages.
}}
=== {{ym|2018|2}} ===
{{/Event
| date = 21 February 2018
| youtube-url =https://www.youtube.com/watch?v=fpmRWCE7F_I
| commons-file =
| talk1-title = Visual Enrichment of Collaborative Knowledge Bases
| talk1-presenter = [[User:Miriam (WMF)|Miriam Redi]], Wikimedia Foundation
| talk1-slides = File:Visual enrichment of collaborative KB.pdf
| talk1-abstract =
 
Images allow us to explain, enrich and complement knowledge without language barriers.<ref>{{cite journal|url=http://journals.sagepub.com/doi/abs/10.1177/1475240910395788|last=Van Hook|first= Steven R. |title=Modes and models for transcending cultural differences in international classrooms|journal= Journal of Research in International Education |volume=10.1 (2011): 5-27}}</ref> They can help illustrate the content of an item in a language-agnostic way to external data consumers. Images can be extremely helpful in multilingual collaborative knowledge bases such as Wikidata.
 
::: However, a large proportion of Wikidata items lack images. More than 3.6M Wikidata items are about humans (Q5), but only 17% of them have an image associated with them. Only 2.2M of 40 Million Wikidata items have an image. A wider presence of images in such a rich, cross-lingual repository could enable a more complete representation of human knowledge.
 
::: In this talk, we will discuss challenges and opportunities faced when using machine learning and computer vision tools for the visual enrichment of collaborative knowledge bases. We will share research to help Wikidata contributors make Wikidata more “visual” by recommending high-quality Commons images to Wikidata items. We will show the first results on free-licence image quality scoring and recommendation and discuss future work in this direction.
 
| talk2-title = Backlogs—backlogs everywhere: Using machine classification to clean up the new page backlog
| talk2-presenter = [[User:Halfak (WMF)|Aaron Halfaker]], Wikimedia Foundation
| talk2-slides = File:Backlogs! Backlogs everywhere... (Research Showcase, Feb. 2018).pdf
| talk2-abstract =
If there's one insight that I've had about the functioning of Wikipedia and other wiki-based online communities, it's that eventually self-directed work breaks down and some form of organization becomes important for task routing.  In Wikipedia specifically, the notion of "backlogs" has become dominant.  There's backlogs of articles to create, articles to clean up, articles to assess, new editor contributions to review, manual of style rules to apply, etc.  To a community of people working on a backlog, the state of that backlog has deep effects on their emotional well being.  A backlog that only grows is frustrating and exhausting.  
::: Backlogs aren't inevitable though and there are many shapes that backlogs can take.  In my presentation, I'll tell a story about where English Wikipedia editors defined a process and set of roles that formed a backlog around new page creations.  I'll make the argument that this formalization of quality control practices has created a choke point and that alternatives exist. Finally I'll present a vision for such an alternative using models that we have developed for [[:mw:ORES|ORES]], the open machine prediction service my team maintains.}}
 
=== {{ym|2018|1}} ===
{{/Event
| date = 17 January 2018
| youtube-url =https://www.youtube.com/watch?v=L-1uzYYneUo
| commons-file =
| talk1-title = What motivates experts to contribute to public information goods? A field experiment at Wikipedia
| talk1-presenter = Yan Chen, University of Michigan
| talk1-slides =
| talk1-abstract = Wikipedia is among the most important information sources for the general public. Motivating domain experts to contribute to Wikipedia can improve the accuracy and completeness of its content. In a field experiment, we examine the incentives which might motivate scholars to contribute their expertise to Wikipedia. We vary the mentioning of likely citation, public acknowledgement and the number of views an article receives. We find that experts are significantly more interested in contributing when citation benefit is mentioned. Furthermore, cosine similarity between a Wikipedia article and the expert's paper abstract is the most significant factor leading to more and higher-quality contributions, indicating that better matching is a crucial factor in motivating contributions to public information goods. Other factors correlated with contribution include social distance and researcher reputation.
| talk2-title =Wikihounding on Wikipedia
| talk2-presenter =Caroline Sinders, WMF
| talk2-slides =
| talk2-abstract =Wikihounding (a form of digital stalking on Wikipedia) is incredibly qualitative and quantitive. What makes wikihounding different then mentoring? It's the context of the action or the intention. However, all interactions inside of a digital space has a quantitive aspect to it, every comment, revert, etc is a data point. By analyzing data points comparatively inside of wikihounding cases and reading some of the cases, we can create a baseline for what are the actual overlapping similarities inside of wikihounding to study what makes up wikihounding. Wikihounding currently has a fairly loose definition. Wikihounding, as defined by the Harassment policy on en:wp, is: “the singling out of one or more editors, joining discussions on multiple pages or topics they may edit or multiple debates where they contribute, to repeatedly confront or inhibit their work. This is with an apparent aim of creating irritation, annoyance or distress to the other editor. Wikihounding usually involves following the target from place to place on Wikipedia.” This definition doesn't outline parameters around cases such as frequency of interaction, duration, or minimum reverts, nor is there a lot known about what a standard or canonical case of wikihounding looks like. What is the average wikihounding case? This talk will cover the approaches myself and members of the research team: Diego Saez-Trumper, Aaron Halfaker and Jonathan Morgan are taking on starting this research project.
}}
 
 
Note: If you'd like to learn more about this research, [[:m:Research:Wikihounding_and_Machine_Learning_Analysis|we have started to document it]] (the page is a ''work in progress'').
 
==2017==
=== {{ym|2017|12}} ===
{{/Event
| date = 13 December 2017
| youtube-url =https://www.youtube.com/watch?v=OoVwus1Owtk
| talk1-title = The State of the Article Expansion Recommendation System
| talk1-presenter = [https://meta.wikimedia.org/wiki/User:LZia_(WMF) Leila Zia]
| talk1-slides = File:Research Showcase December 2017.pdf|thumb|The state of recommendation systems and knowledge gaps
 
| talk1-abstract = Only 1% of English Wikipedia articles are labeled with quality class ''Good'' or better, and 37% of the articles are stubs. We are building an article expansion recommendation system to change this in Wikipedia, across many languages. In this presentation, I will talk with you about our current thinking of the vision and direction of the research that can help us build such a recommendation system, and share more about one specific area of research we have heavily focused on in the past months: building a recommendation system that can help editors identify what ''sections'' to add to an already existing article. I present some of the challenges we faced, the methods we devised or used to overcome them, and the result of the first line of experiments on the quality of such recommendations (teaser: the results are really promising. The precision and recall at 10 is 80%.)
}}
=== {{ym|2017|11}} ===
{{/Event
| date = 15 November 2017
| youtube-url = https://www.youtube.com/watch?v=nMENRAkeHnQ
| commons-file =
| talk1-title = Conversation Corpora, Emotional Robots, and Battles with Bias
| talk1-presenter = Lucas Dixon (Google/Jigsaw)
| talk1-slides =
| talk1-abstract = I'll talk about interesting experimental setups for doing large-scale analysis of conversations in Wikipedia, and what it even means to grapple with the concept of conversation when one is talking about revisions on talk pages. I'll also describe challenges with having good conversations at scale, some of the dreams one might have for AI in the space, and I'll dig into measuring unintended bias in machine learning and what one can do to make ML more inclusive. This talk will cover work from the [[:m:Research:Detox|WikiDetox]] project as well as ongoing research on the [[:m:Research:Study_of_harassment_and_its_impact|nature and impact of harassment in Wikipedia discussion spaces]] – part of a collaboration between Jigsaw, Cornell University, and the Wikimedia Foundation. The ML model training code, datasets, and the supporting tooling developed as part of this project are openly available. [https://github.com/conversationai/unintended-ml-bias-analysis/blob/master/presentations/Conversation%20Corpora%2C%20Emotional%20Robots%2C%20and%20Battles%20with%20Bias%20-%20Wikimedia%20Research%20Talk%20-%20Nov%202017.pdf '''(slides)''']
| talk2-title =
| talk2-presenter =
| talk2-slides =
| talk2-abstract =
}}
 
=== {{ym|2017|10}} ===
''There was no showcase in October 2017. We attended [[d:Wikidata:WikidataCon_2017|WikidataCon in Berlin]]. We'll be back in November.''
 
=== {{ym|2017|9}} ===
{{/Event
| date = [https://www.timeanddate.com/worldclock/fixedtime.html?msg=Wikimedia+Research+Showcase+-+September+2017&iso=20170920T1830&p1=%3A&ah=1 September 20, 2017, 11:30am PDT]
| youtube-url = https://www.youtube.com/watch?v=VR5JwqyVGSk
| commons-file =
| talk1-title = A Glimpse into Babel: An Analysis of Multilinguality in Wikidata
| talk1-presenter = Lucie-Aimée Kaffee
| talk1-slides = File:Multilingual Wikidata - Wikimedia Research Showcase 2017.pdf
| talk1-abstract = Multilinguality is an important topic for knowledge bases, especially Wikidata, that was build to serve the multilingual requirements of an international community. Its labels are the way for humans to interact with the data. In this talk, we explore the state of languages in Wikidata as of now, especially in regard to its ontology, and the relationship to Wikipedia. Furthermore, we set the multilinguality of Wikidata in the context of the real world by comparing it to the distribution of native speakers. We find an existing language maldistribution, which is less urgent in the ontology, and promising results for future improvements. An outlook on how users interact with languages on Wikidata will be given.
 
:::See the paper<ref>Kaffee, Lucie-Aimée, et al. "A Glimpse into Babel: An Analysis of Multilinguality in Wikidata." Proceedings of the 13th International Symposium on Open Collaboration. ACM, 2017. https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_Wikidata_Multilingual.pdf</ref>
| talk2-title = Science is Shaped by Wikipedia: Evidence from a Randomized Control Trial
| talk2-presenter = Neil C. Thompson and Douglas Hanley
| talk2-slides =
| talk2-abstract = As the largest encyclopedia in the world, it is not surprising that Wikipedia reflects the state of scientific knowledge. However, Wikipedia is also one of the most accessed websites in the world, including by scientists, which suggests that it also has the potential to shape science. This paper shows that it does. Incorporating ideas into a Wikipedia article leads to those ideas being used more in the scientific literature. This paper documents this in two ways: correlationally across thousands of articles in Wikipedia and causally through a randomized experiment where we added new scientific content to Wikipedia. We find that fully a third of the correlational relationship is causal, implying that Wikipedia has a strong shaping effect on science. Our findings speak not only to the influence of Wikipedia, but more broadly to the influence of repositories of scientific knowledge. The results suggest that increased provision of information in accessible repositories is a very cost-effective way to advance science. We also find that such gains are equity-improving, disproportionately benefitting those without traditional access to scientific information.
 
:::See the paper<ref>Thompson, Neil and Hanley, Douglas, Science Is Shaped by Wikipedia: Evidence from a Randomized Control Trial (September 19, 2017). Available at SSRN: https://ssrn.com/abstract=3039505</ref>
}}
 
=== {{ym|2017|8}} ===
{{/Event
| date = [https://www.timeanddate.com/worldclock/fixedtime.html?msg=Wikimedia+Research+Showcase+-+August+2017&iso=20170823T1830&p1=%3A&ah=1 August 23, 2017, 11:30am PDT]
| youtube-url = https://www.youtube.com/watch?v=Fa0Ztv2iF4w
| commons-file =
| talk1-title = The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users
| talk1-presenter = Sneha Narayan
| talk1-slides =
| talk1-abstract = Integrating new users into a community with complex norms presents a challenge for peer production projects like Wikipedia. We present The Wikipedia Adventure (TWA): an interactive tutorial that offers a structured and gamified introduction to Wikipedia. In addition to describing the design of the system, we present two empirical evaluations. First, we report on a survey of users, who responded very positively to the tutorial. Second, we report results from a large-scale invitation-based field experiment that tests whether using TWA increased newcomers' subsequent contributions to Wikipedia. We find no effect of either using the tutorial or of being invited to do so over a period of 180 days. We conclude that TWA produces a positive socialization experience for those who choose to use it, but that it does not alter patterns of newcomer activity. We reflect on the implications of these mixed results for the evaluation of similar social computing systems.
 
:::See the paper<ref>Sneha Narayan, Jake Orlowitz, Jonathan Morgan, Benjamin Mako Hill, and Aaron Shaw. 2017. The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 1785-1799. DOI: https://doi.org/10.1145/2998181.2998307 [https://mako.cc/academic/narayan_etal-the_wikipedia_adventure-cscw2017.pdf PDF]</ref> and slides.<ref>https://figshare.com/articles/Narayan_TWA_wikiresearch_pdf/5339425</ref>
 
| talk2-title = The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
| talk2-presenter = Andrew Su
| talk2-slides =
| talk2-abstract = The Gene Wiki project began in 2007 with the goal of creating a collaboratively-written, community-reviewed, and continuously-updated review article for every human gene within Wikipedia. In 2013, shortly after the creation of the Wikidata project, the project expanded to include the organization and integration of structured biomedical data. This talk will focus on our current and future work, including efforts to encourage contributions from biomedical domain experts, to build custom applications that use Wikidata as the back-end knowledge base, and to promote CC0-licensing among biomedical knowledge resources.
 
:::Comments, feedback and contributions are welcome at https://github.com/SuLab/genewikicentral and https://www.wikidata.org/wiki/WD:MB. See the slides<ref>https://www.slideshare.net/andrewsu/the-gene-wiki-using-wikipedia-and-wikidata-to-organize-biomedical-knowledge</ref>
}}
 
=== {{ym|2017|7}} ===
{{/Event
| date = July 26, 2017, 11:30am PDT
| youtube-url = https://www.youtube.com/watch?v=yC1jgK8C8aQ
| commons-file = :File:Wikimedia Research Showcase - July 2017.webm
| talk1-title = Freedom versus Standardization<nowiki>:</nowiki> Structured Data Generation in a Peer Production Community
| talk1-presenter = Andrew Hall
| talk1-slides =
| talk1-abstract = In addition to encyclopedia articles and software, peer production communities produce ''structured data'', e.g., Wikidata and OpenStreetMap’s metadata. Structured data from peer production communities has become increasingly important due to its use by computational applications, such as CartoCSS, MapBox, and Wikipedia infoboxes. However, this structured data is usable by applications only if it follows ''standards.'' We did an interview study focused on OpenStreetMap’s knowledge production processes to investigate how – and how successfully – this community creates and applies its data standards. Our study revealed a fundamental tension between the need to produce structured data in a standardized way and OpenStreetMap’s tradition of contributor freedom. We extracted six themes that manifested this tension and three overarching concepts, '''correctness, community,''' and '''code,''' which help make sense of and synthesize the themes. We also offer suggestions for improving OpenStreetMap’s knowledge production processes, including new data models, sociotechnical tools, and community practices.
 
:::See the paper<ref>Andrew Hall, Sarah McRoberts, Jacob Thebault-Spieker, Yilun Lin, Shilad Sen, Brent Hecht, and Loren Terveen. "Freedom versus Standardization: Structured Data Generation in a Peer Production Community", CHI 2017. https://doi.org/10.1145/3025453.3025940 [http://www-users.cs.umn.edu/~hall/freedom_versus_standardization.pdf PDF]</ref> and slides<ref>https://doi.org/10.6084/m9.figshare.5270962</ref>.
}}
 
=== {{ym|2017|6}} ===
{{/Event
| date = June 21, 2017, 11:30am PDT
| youtube-url = https://www.youtube.com/watch?v=i2jpKRwPT-Q
| commons-file = :File:Wikimedia_Research_Showcase_-_June_2017.webm
| talk1-title = Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia
| talk1-presenter = Allen Yilun Lin
| talk1-slides = File:Problematizing_and_Addressing_the_Article-as-Concept_Assumption_in_Wikipedia.pdf
| talk1-abstract =Wikipedia-based studies and systems frequently assume that each article describes a separate concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors’ tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. “United States” and “American literature” in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the subarticle matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.
:::* Related CSCW 2017 paper: ([http://www.brenthecht.com/publications/cscw17_subarticles.pdf preprint], [http://dl.acm.org/citation.cfm?id=2998274 citation]), [http://z.umn.edu/WikiSubarticles Open-source code]
:::* Slides: [https://commons.wikimedia.org/w/index.php?title=File:Problematizing_and_Addressing_the_Article-as-Concept_Assumption_in_Wikipedia.pdf Commons], [https://figshare.com/articles/Problematizing_and_Addressing_the_Article-as-Concept_Assumption_in_Wikipedia/5393191 Figshare]
| talk2-title = Understanding Wikidata Queries
| talk2-presenter = Markus Kroetzsch
| talk2-slides =
| talk2-abstract =
Wikimedia provides a public service that lets anyone answer complex questions over the sum of all knowledge stored in Wikidata. These questions are expressed in the query language SPARQL and range from the most simple fact retrievals ("What is the birthday of Douglas Adams?") to complex analytical queries ("[[d:Wikidata:SPARQL_query_service/queries/examples#Average_lifespan_by_occupation|Average lifespan of people by occupation]]"). The talk presents ongoing efforts to analyse the server logs of the millions of queries that are answered each month. It is an important but difficult challenge to draw meaningful conclusions from this dataset. One might hope to learn relevant information about the usage of the service and Wikidata in general, but at the same time one has to be careful not to be misled by the data. Indeed, the dataset turned out to be highly heterogeneous and unpredictable, with strongly varying usage patterns that make it difficult to draw conclusions about "normal" usage. The talk will give a status report, present preliminary results, and discuss possible next steps. ([[m:Research:Understanding_Wikidata_Queries|Project page on meta]])
}}
 
=== {{ym|2017|5}} ===
''There was no showcase in May 2017. The team attended the [[Wikimedia_Hackathon_2017|Wikimedia Hackathon in Vienna]] and [https://meta.wikimedia.org/wiki/WikiCite_2017 WikiCite]. :)''
 
=== {{ym|2017|4}} ===
{{/Event
| date = April 19, 2017
| youtube-url = https://www.youtube.com/watch?v=_Prf0Vb-k1I
| commons-file =
| talk1-title = Using WikiBrain to visualize Wikipedia's neighborhoods
| talk1-presenter = Dr. [[User:Shilad|Shilad Sen]]
| talk1-slides = File:Shilad_Sen,_Wikimedia_Research_Showcase,_April_2017.pdf
| talk1-abstract = While Wikipedia serves as the world's most widely reference for humans, it also represents the most widely use body of knowledge for algorithms that must reason about the world. I will provide an overview of [//shilad.github.io/wikibrain/ WikiBrain], a software project that serves as a platform for Wikipedia-based algorithms. I will also demo a brand new system built on WikiBrain that visualizes any dataset as a topographic map whose neighborhoods correspond to related Wikipedia articles. I hope to get feedback about which directions for these tools are most useful to the Wikipedia research community.
| talk1-related-info =
&nbsp;
::;See also
::* http://shilad.github.io/wikibrain/
::* http://atlasify.northwestern.edu/
::* http://matt.might.net/articles/crapl/
::* http://cartograph.info
::* [[:m:Grants:IEG/WikiBrainTools]]
::* [[Phab:T161554|T161554 -- Provide large disk space to WikiBrain for memory-mapped file]]
| talk2-title =
| talk2-presenter =
| talk2-slides =
| talk2-abstract =
}}
 
=== {{ym|2017|3}} ===
''There was no showcase in March 2017.''
 
=== {{ym|2017|2}} ===
{{/Event
| date = February 15, 2017
| youtube-url =https://www.youtube.com/watch?v=m6smzMppb-I
| commons-file =
| talk1-title = Wikipedia and the Urban-Rural Divide
| talk1-presenter = Isaac Johnson (GroupLens/University of Minnesota)
| talk1-slides =File:Wikipedia and Urban-Rural Divide (commons version).pdf
| talk1-abstract = Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. We explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in Wikipedia (as well as OpenStreetMap), peer-produced content about rural areas is of systematically lower quality, less likely to have been produced by contributors who focus on the local area, and more likely to have been generated by automated software agents (i.e. “bots”). We continue to explore and codify the systemic challenges inherent to characterizing rural phenomena through peer production as well as discuss potential solutions. ([http://www.brenthecht.com/publications/CHI2016_ruralurbanpeerproduction.pdf read more in this paper])
| talk2-title = Wikipedia Navigation Vectors
| talk2-presenter = [[User:Ewulczyn (WMF)|Ellery Wulczyn]]
| talk2-slides =
| talk2-abstract = In this project, we learned embeddings for Wikipedia articles and [[d:Wikidata:Main_Page|Wikidata]] items by applying [[:en:Word2vec|Word2vec]] models to a corpus of reading sessions. Although Word2vec models were developed to learn word embeddings from a corpus of sentences, they can be applied to any kind of sequential data. The learned embeddings have the property that items with similar neighbors in the training corpus have similar representations (as measured by the [[:en:Cosine_similarity|cosine similarity]], for example). Consequently, applying Wor2vec to reading sessions results in article embeddings, where articles that tend to be read in close succession have similar representations. Since people usually generate sequences of semantically related articles while reading, these embeddings also capture semantic similarity between articles. ([[:m:Research:Wikipedia_Navigation_Vectors|read more...]])
}}
 
 
=== {{ym|2017|1}} ===
''There was no showcase in January 2017.''
 
==2016==
=== {{ym|2016|12}} ===
{{/Event
| date = December 21, 2016
| youtube-url = https://www.youtube.com/watch?v=nmrlu5qTgyA
| commons-file =
| talk1-title = English Wikipedia Quality Dynamics and the Case of WikiProject Women Scientists
| talk1-presenter = [[:m:User:Halfak (WMF)|Aaron Halfaker]]
| talk1-slides = File:English Wikipedia Quality Dynamics (Women Scientists & Visual Arts) -- Research Showcase (December 2016).pdf
| talk1-abstract = With every productive edit, Wikipedia is steadily progressing towards higher and higher quality. In order to track quality improvements, Wikipedians have developed an article quality assessment rating scale that ranges from "Stub" at the bottom to "Featured Articles" at the top. While this quality scale has the promise of giving us insights into the dynamics of quality improvements in Wikipedia, it is hard to use due to the sporadic nature of manual re-assessments. By developing a highly accurate prediction model (based on work by Warncke-Wang et al.), we've developed a method to assess an articles quality at any point in history. Using this model, we explore general trends in quality in Wikipedia and compare these trends to those of an interesting cross-section: Articles tagged by WikiProject Women Scientists. Results suggest that articles about women scientists were lower quality than the rest of the wiki until mid-2013, after which a dramatic shift occurred towards higher quality. This shift may correlate with (and even be caused by) this WikiProjects initiatives.
| talk2-title = Privacy, Anonymity, and Perceived Risk in Open Collaboration. A Study of Tor Users and Wikipedians
| talk2-presenter = [[User:Andicat | Andrea Forte]]
| talk2-slides = File:WMFTalk12-21-16ForteAnonymity.pdf
| talk2-abstract = In a recent qualitative study to be published at CSCW 2017, collaborators Rachel Greenstadt, Naz Andalibi, and I examined privacy practices and concerns among contributors to open collaboration projects. We collected interview data from people who use the anonymity network Tor who also contribute to online projects and from Wikipedia editors who are concerned about their privacy to better understand how privacy concerns impact participation in open collaboration projects. We found that risks perceived by contributors to open collaboration projects include threats of surveillance, violence, harassment, opportunity loss, reputation loss, and fear for loved ones. We explain participants’ operational and technical strategies for mitigating these risks and how these strategies affect their contributions. Finally, we discuss chilling effects associated with privacy loss, the need for open collaboration projects to go beyond attracting and educating participants to consider their privacy, and some of the social and technical approaches that could be explored to mitigate risk at a project or community level.
}}
 
=== {{ym|2016|11}} ===
{{/Event
| date = November 16, 2016
| youtube-url = https://www.youtube.com/watch?v=xIaMuWA84bY
| commons-file =
| talk1-title = Why We Read Wikipedia
| talk1-presenter = [[User:LZia_(WMF)|Leila Zia]]
| talk1-slides = File:Why_We_Read_Wikipedia.pdf
| talk1-abstract = Every day, millions of readers come to Wikipedia to satisfy a broad range of information needs, however, little is known about what these needs are. In this presentation, I share the result of a research that sets to help us understand Wikipedia readers better. Based on an initial user study on English, Persian, and Spanish Wikipedia, we build a taxonomy of Wikipedia use-cases along several dimensions, capturing users’ motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia. Then, we quantify the prevalence of these use-cases via a large-scale user survey conducted on English Wikipedia. Our analyses highlight the variety of factors driving users to Wikipedia, such as current events, media coverage of a topic, personal curiosity, work or school assignments, or boredom. Finally, we match survey responses to the respondents’ digital traces in Wikipedia’s server logs, enabling the discovery of behavioral patterns associated with specific use-cases. Our findings advance our understanding of reader motivations and behavior on Wikipedia and have potential implications for developers aiming to improve Wikipedia’s user experience, editors striving to cater to (a subset of) their readers’ needs, third-party services (such as search engines) providing access to Wikipedia content, and researchers aiming to build tools such as article recommendation engines.
}}
 
=== {{ym|2016|10}} ===
{{/Event
| date = October 19, 2016
| youtube-url = https://www.youtube.com/watch?v=BsYi4evMlV0
| commons-file =
| talk1-title = Human centered design for using and editing structured data in Wikipedia infoboxes
| talk1-presenter = [[User:Charlie_Kritschmar_(WMDE)|Charlie Kritschmar]] UX Intern, [[m:Wikimedia_Deutschland|Wikimedia Deutschland]]
| talk1-slides = File:Human_centered_design_for_using_and_editing_structured_data_in_Wikipedia_infoboxes.pdf
| talk1-abstract = Wikidata is a Wikimedia project which stores structured data to be used by other Wikimedia projects like Wikipedia. Currently, integrating its data in Wikipedia is difficult for users, since there’s no predefined way to do so and requires some technical knowledge. To tackle these issues, human-centered design methods were applied to find needs from which solutions were generated and evaluated with the help of the community. The concept may serve as a basis which may be implemented into various Wiki projects in the future to make editing Wikidata from within another Wikimedia project more user-friendly and improve the project’s acceptance in the community.
| talk2-title =Emergent Work in Wikipedia
| talk2-presenter =[http://oferarazy.com/ Ofer Arazy] (University of Haifa)
| talk2-slides =
| talk2-abstract =Online production communities present an exciting opportunity for investigating novel organizational forms. Extant theoretical accounts of knowledge co-production point to organizational policies, norms, and communication as key mechanisms enabling the coordination of work. Yet, in practice participants in initiatives such as Wikipedia are often occasional contributors who are unaware of community policies and do not communicate with other members. How then is work coordinated and how does the organization maintain stability in the face of dynamics in individuals’ task enactment? In this study we develop a conceptualization of emergent roles - the prototypical activity patterns that organically emerge from individuals’ spontaneous actions – and investigate the temporal dynamics of emergent role behaviors. Conducing a multi-level large-scale empirical study stretching over a decade, we tracked co-production of a thousand Wikipedia articles, logging two hundred thousand distinct participants and seven hundred thousand co-production activities. Using a combination of manual tagging and machine learning, we annotated each activity type, and then clustered participants’ activity profiles to arrive at seven prototypical emergent roles. Our analysis shows that participants’ behavior is turbulent, with substantial flow in and out of co-production work and across roles. Our findings at the organizational level, however, show that work is organized around a highly stable set of emergent roles, despite the absence of traditional stabilizing mechanisms such as pre-defined work procedures or role expectations. We conceptualize this dualism in emergent work as “Turbulent Stability”. Further analyses suggest that co-production is artifact-centric, where contributors mutually adjust according to the artifact’s changing needs. Our study advances the theoretical understandings of self-organizing knowledge co-production and particularly the nature of emergent roles.
}}
 
=== {{ym|2016|9}} ===
{{/Event
| date = September 21, 2016
| youtube-url = https://www.youtube.com/watch?v=fTDkVeqjw80
| commons-file =
| talk1-title = Finding News Citations for Wikipedia
| talk1-presenter = [http://www.l3s.de/~fetahu/ Besnik Fetahu] (Leibniz University of Hannover)
| talk1-slides =
| talk1-abstract = Slides: [http://www.slideshare.net/BesnikFetahu/finding-news-citations-for-wikipedia]<br />An important editing policy in Wikipedia is to provide citations for added statements in Wikipedia pages, where statements can be arbitrary pieces of text, ranging from a sentence to a paragraph. In many cases citations are either outdated or missing altogether. In this work we address the problem of finding and updating news citations for statements in entity pages. We propose a two- stage supervised approach for this problem. In the first step, we construct a classifier to find out whether statements need a news citation or other kinds of citations (web, book, journal, etc.). In the second step, we develop a news citation algorithm for Wikipedia statements, which recommends appropriate citations from a given news collection. Apart from IR techniques that use the statement to query the news collection, we also formalize three properties of an appropriate citation, namely: (i) the citation should entail the Wikipedia statement, (ii) the statement should be central to the citation, and (iii) the citation should be from an authoritative source. We perform an extensive evaluation of both steps, using 20 million articles from a real-world news collection. Our results are quite promising, and show that we can perform this task with high precision and at scale.
| talk2-title = Designing and Building Online Discussion Systems
| talk2-presenter = [http://people.csail.mit.edu/axz/ Amy X. Zhang] (MIT)
| talk2-slides =
| talk2-abstract = Today, conversations are everywhere on the Internet and come in many different forms. However, there are still many problems with discussion interfaces today. In my talk, I will first give an overview of some of the problems with discussion systems, including difficulty dealing with large scales, which exacerbates additional problems with navigating deep threads containing lots of back-and-forth and getting an overall summary of a discussion. Other problems include dealing with moderation and harassment in discussion systems and gaining control over filtering, customization, and means of access. Then I will focus on a few projects I am working on in this space now. The first is Wikum, a system I developed to allow users to collaboratively generate a wiki-like summary from threaded discussion. The second, which I have just begun, is exploring the design space of presentation and navigation of threaded discussion. I will next discuss Murmur, a mailing list hybrid system we have built to implement and test ideas around filtering, customization, and flexibility of access, as well as combating harassment. Finally, I'll wrap up with what I am working on at Google Research this summer: developing a taxonomy to describe online forum discussion and using this information to extract meaningful content useful for search, summarization of discussions, and characterization of communities.
}}
 
=== {{ym|2016|8}} ===
 
{{/Event
| date = August 17, 2016
| youtube-url = https://www.youtube.com/watch?v=rsFmqYxtt9w
| commons-file =
| talk1-title = Computational Fact Checking from Knowledge Networks
| talk1-presenter = [[User:Junkie.dolphin|Giovanni Luca Ciampaglia]]
| talk1-slides =
| talk1-abstract = Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Fact checking is often a tedious and repetitive task and even simple automation opportunities may result in significant improvements to human fact checkers. In this talk I will describe how we are trying to approximate the complexities of human fact checking by exploring a knowledge graph under a properly defined proximity measure. Framed as a network traversal problem, this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using the public knowledge graph extracted from Wikipedia by the DBPedia project, showing that the method does indeed assign higher confidence to true statements than to false ones. One advantage of this approach is that, together with a numerical evaluation, it also provides a sequence of statements that can be easily inspected by a human fact checker.
| talk2-title = Deploying and maintaining AI in a socio-technical system. Lessons learned
| talk2-presenter = [[User:Halfak (WMF)|Aaron Halfaker]]
| talk2-slides = File:Deploying_and_maintaining_AI_in_a_socio-technical_system_--_Research_Showcase_(August_2016).pdf
| talk2-abstract = We should exercise great caution when deploying AI into our social spaces. The algorithms that make counter-vandalism in Wikipedia orders of magnitude more efficient also have the potential to perpetuate biases and silence whole classes of contributors. This presentation will describe the system efficiency characteristics that make AI so attractive for supporting quality control activities in Wikipedia. Then, Aaron will tell two stories of how the algorithms brought new, problematic biases to quality control processes in Wikipedia and how the [[:m:R:Revision scoring as a service|Revision Scoring team]] learned about and addressed these issues in [[:m:ORES|ORES]], a production-level AI service for Wikimedia Wikis. He'll also make an overdue call to action toward leveraging human-review of AIs biases in the practice of AI development.}}
 
 
=== {{ym|2016|7}} ===
 
{{/Event
| date = July 20, 2016
| youtube-url = https://www.youtube.com/watch?v=eZgqzVuRDRs
| commons-file =
| talk1-title = Detecting Personal Attacks on Wikipedia
| talk1-presenter = [[m:User:Ewulczyn (WMF)|Ellery Wulczyn]], [[m:User:nthain|Nithum Thain]]
| talk1-slides = File:July 2016 Research Showcase - Understanding Personal Attacks on Wikipedia.pptx.pdf|thumb
| talk1-abstract = Ellery Wulczyn (WMF) and Nithum Thain (Jigsaw) will be speaking about their recent work on [[m:Research:Detox|Project Detox]], a research project to develop tools to detect and understand online personal attacks and harassment on Wikipedia. Their talk will cover the whole research pipeline to date, including data acquisition, machine learning model building, and some analytical insights as to the nature of personal attacks on Wikipedia talk pages.
| talk2-title = Wikipedia.org Portal Research: Search behaviors and New Language by article count Dropdown
| talk2-presenter = [[m:User:Dchen (WMF)|Daisy Chen]]
| talk2-slides = File:Discovery - Wikipedia.org Portal Study Findings - July 2016.pdf|thumb
| talk2-abstract = What part do the Wikipedia.org portal and on-wiki search mechanisms play in users' experiences finding information online? These findings reflect research participants' responses to a combination of generative and evaluative questions about their general online search behaviors, on-wiki search behaviors, interactions with the Wikipedia.org portal, and their thoughts about a partial re-design of the portal page, the new language by article count dropdown.
}}
 
=== {{ym|2016|6}} ===
 
''There was no showcase in June 2016.''
 
=== {{ym|2016|5}} ===
 
''There was no showcase in May 2016.''
 
=== {{ym|2016|4}} ===
 
''There was no showcase in April 2016.''
 
=== {{ym|2016|3}} ===
 
{{/Event
| date = March 16, 2016
| youtube-url = https://www.youtube.com/watch?v=Xle0oOFCNnk
| commons-file =
| talk1-title = Evolution of Privacy Loss in Wikipedia
| talk1-presenter = [http://www.rizoiu.eu/ Marian-Andrei Rizoiu] (Australian National University)
| talk1-slides = File:Wikimedia Research Showcase -- March 2016.pdf|thumb
| talk1-abstract = The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual’s past actions and her private traits. To quantify this effect, we [http://cm.cecs.anu.edu.au/post/wikiprivacy/ analyze the evolution of individual privacy loss] by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia’s contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.
}}
 
=== {{ym|2016|2}} ===
 
''There was no showcase in February 2016.''
 
=== {{ym|2016|1}} ===
 
{{/Event
| date = January 20, 2016
| youtube-url = https://www.youtube.com/watch?v=vRpUby3MoqU
| commons-file =
| talk1-title = Anon productivity and productive efficiency in English Wikipedia
| talk1-presenter = Aaron Halfaker ([[User:Halfak (WMF)|Halfak]]/[[User:EpochFail|EpochFail]])
| talk1-slides = File:Anon productivity and productive efficiency in English Wikipedia (Showcase, Jan. 2016).pdf
| talk1-abstract = Building from a [https://wikimania2014.wikimedia.org/wiki/Submissions/WikiCredit_-_Calculating_%26_presenting_value_contributed_to_Wikipedia call to action around measuring value-adding behavior in Wikipedia] from Wikimania 2014, I'll show preliminary results of measuring editor productivity in English Wikipedia. From this analysis some surprising results have emerged: (1) IP editors contribute about 20% of good new content to Wikipedia articles, (2) the overall productivity of registered editors has been holding constant since 2007 -- despite declines in the community and labor hours invested in editing. (1) suggests that we should consider better supporting editing without an account and (2) suggests that Wikipedians are somehow contributing more efficiently than they used to.
| talk2-title = Cooperation in a Peer Production Economy: '''Experimental Evidence from Wikipedia'''
| talk2-presenter = [[:m:User:SalimJah|Jérôme Hergueux]]
| talk2-slides =
| talk2-abstract = Relying on [[m:Research:Dynamics_of_Online_Interactions_and_Behavior| the behavior of Wikipedia contributors in a (game-theoretic) social experiment]], I will seek to engage the community in a reflection about ways to create a more inclusive Wikipedia. First, I will identify the underlying demographic and social determinants of anti-social behavior within Wikipedia -- an often cited driver of its declining retention rates. Second, I will study the relationship between Wikipedia administrators' trust in anonymous strangers and their policing activity patterns, asking the question of the optimal level of trust that admins should exhibit in order to efficiently protect Wikipedia from malicious users while avoiding to drive well-intentioned ones away from the project.
}}
 
 
==2015==
=== {{ym|2015|12}} ===
 
''There was no showcase in December 2015.''
 
==={{ym|2015|11}}===
 
{{/Event
| date = November 18, 2015
| youtube-url = https://www.youtube.com/watch?v=kXCI6whgdUA
| commons-file =
| talk1-title = Impact, Characteristics, and Detection of Wikipedia Hoaxes
| talk1-presenter = [[User:Srijankedia|Srijan Kumar]]
| talk1-slides =|alt=Slides for "Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes"|thumb|slides
| talk1-abstract = False information on Wikipedia raises concerns about its credibility. One way in which false information may be presented on Wikipedia is in the form of hoax articles, i.e. articles containing fabricated facts about nonexistent entities or events. In this talk, we study false information on Wikipedia by focusing on the hoax articles that have been created throughout its history. First, we assess the real-world impact of hoax articles by measuring how long they survive before being debunked, how many pageviews they receive, and how heavily they are referred to by documents on the Web. We find that, while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web. Second, we characterize the nature of successful hoaxes by comparing them to legitimate articles and to failed hoaxes that were discovered shortly after being created. We find characteristic differences in terms of article structure and content, embeddedness into the rest of Wikipedia, and features of the editor who created the hoax. Third, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given article is a hoax. And finally, we describe and evaluate a task involving humans distinguishing hoaxes from non-hoaxes. We find that humans are not particularly good at the task and that our automated classifier outperforms them by a big margin.
Please see the latest version of the slides at http://www.cs.umd.edu/~srijan/hoax/
}}
 
==={{ym|2015|10}}===
{{/Event
| date = October 21, 2015
| youtube-url = https://www.youtube.com/watch?v=T73vRiNsRxo
| commons-file =
| talk1-title = The impact of the Wikipedia Teahouse on new editor retention
| talk1-presenter = [[User:Jmorgan (WMF)|Jonathan Morgan]], [[User:Halfak (WMF)|Aaron Halfaker]]
| talk1-slides = File:Teahouse_research_showcase_October_2015.pdf
| talk1-abstract = New Wikipedia editors face a variety of social and technical barriers to participation. These barriers have been shown to cause even promising, highly-motivated newcomers to give up and leave Wikipedia shortly after joining.<ref>[[meta:Research:The_Rise_and_Decline]]</ref> The Wikipedia Teahouse was launched in 2012 to provide new editors with a space on Wikipedia where they could ask questions, introduce themselves, and learn the ropes of editing in a friendly and supportive environment, with the goal of increasing the percentage of good-faith newcomers who go on to become productive Wikipedians. Research has shown<ref>[[meta:Research:Teahouse/Phase_2_report]]</ref><ref>[[meta:Research:Teahouse/Phase 2 report/Metrics]]</ref> that the Teahouse provided a positive experience for participants, and suggested<ref>http://dl.acm.org/citation.cfm?doid=2441776.2441871</ref> that participating in the Teahouse led to more editing activity and longer survival for new editors who participated. The current study<ref>[[meta:Research:Teahouse_long_term_new_editor_retention]]</ref> examines the impact of Teahouse invitations on [[meta:Research:Metrics_standardization#Surviving_new_editor|new editors survival]] over a longer period of time (2-6 months), and presents findings related to contextual factors within editors' first few sessions that are associated with overall survival rate and editing patterns associated with increased likelihood of visiting the Teahouse.
}}
 
==={{ym|2015|9}}===
{{/Event
| date = September 16, 2015
| youtube-url = https://www.youtube.com/watch?v=eJk6mxJZhH8
| commons-file =
| talk1-title = Fun or Functional? The Misalignment Between Content Quality and Popularity in Wikipedia
| talk1-presenter = [[User:Nettrom|Morten Warncke-Wang]]
| talk1-slides = File:Fun or Functional? The Misalignment Between Content Quality and Popularity in Wikipedia (WMF Research Showcase 2015-09-16).pdf
| talk1-abstract = In peer production communities like Wikipedia, individual community members typically decide for themselves where to make contributions, often driven by factors such as “fun” or a belief that “information should be free”. However, the extent to which this bottom-up, interest-driven content production paradigm ''meets the need of consumers of this content'' is unclear. In this talk, I analyse four large Wikipedia language editions, finding extensive misalignment between production and consumption of quality content in all of them, and I show how this greatly impacts Wikipedia’s readers. I also examine misalignment in more detail by studying how it relates to specific topics, and to what extent high popularity is related to sudden changes in demand (i.e. “breaking news”). Finally, I discuss technologies and community practices that can help reduce misalignment in Wikipedia. See the paper<ref>Warncke-Wang, M, Ranjan, V., Terveen, L., and Hecht, B. "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities", ICWSM 2015. [http://www-users.cs.umn.edu/~morten/publications/icwsm2015-popularity-quality-misalignment.pdf pdf] See also: [[w:Wikipedia:Wikipedia Signpost/2015-04-29/Recent research|Signpost/Research Newsletter coverage]]</ref>.
| talk2-title = Automated News Suggestions for Populating Wikipedia Entity Pages
| talk2-presenter = [http://www.l3s.de/~fetahu/ Besnik Fetahu]
| talk2-slides =|thumb|Presentation at the Wikimedia Research Showcase, September 2015.
| talk2-abstract = Wikipedia entity pages are a valuable source of information for direct consumption and for knowledge-base construction, update and maintenance. Facts in these entity pages are typically supported by references. Recent studies show that as much as 20% of the references are from online news sources. However, many entity pages are incomplete even if relevant information is already available in existing news articles. Even for the already present references, there is often a delay between the news article publication time and the reference time. In this work, we therefore look at Wikipedia through the lens of news and propose a novel news-article suggestion task to improve news coverage in Wikipedia, and reduce the lag of newsworthy references. Our work finds direct application, as a precursor, to Wikipedia page generation and knowledge-base acceleration tasks that rely on relevant and high quality input sources. We propose a two-stage supervised approach for suggesting news articles to entity pages for a given state of Wikipedia. First, we suggest news articles to Wikipedia entities (article-entity placement) relying on a rich set of features which take into account the salience and relative authority of entities, and the novelty of news articles to entity pages. Second, we determine the exact section in the entity page for the input article (article-section placement) guided by class-based section templates. We perform an extensive evaluation of our approach based on ground-truth data that is extracted from external references in Wikipedia. We achieve a high precision value of up to 93% in the article-entity suggestion stage and upto 84% for the article-section placement. Finally, we compare our approach against competitive baselines and show significant improvements.
}}
 
==={{ym|2015|8}}===
The August showcase was canceled due to scheduling conflicts.
 
==={{ym|2015|7}}===
{{/Event
| date = July 29, 2015
| youtube-url = https://www.youtube.com/watch?v=vGyrVg_qKSM
| commons-file =
| talk1-title = VisualEditor's effect on newly registered users
| talk1-presenter = [[User:Halfak (WMF)|Aaron Halfaker]]
| talk1-slides = File:Visual editors effect (Showcase, July'15).pdf
| talk1-abstract = It's been nearly two years since we ran [[:m:Research:VisualEditor's effect on newly registered editors/June 2013 study|an initial study]] of VisualEditor's effect on newly registered editors. While most of the results of this study were positive (e.g. workload on Wikipedians did not increase), we still saw a significant decrease in the newcomer productivity. In the meantime, the [[Editing]] team has made substantial improvements to performance and functionality. In this presentation, I'll report on the results of a new experiment designed to test the effects of enabling this improved VisualEditor software for newly registered users by default. I'll show what we learned from the experiment and discuss some results have opened larger questions about what, exactly, is difficult about being a newcomer to English Wikipedia.
| talk2-title = Wikipedia knowledge graph with DeepDive
| talk2-presenter = Juhana Kangaspunta and Thomas Palomares
| talk2-slides = File:Presentation_Wikimedia_-_Knowledge_graph_final.pdf
| talk2-abstract = Despite the tremendous amount of information present on Wikipedia, only a very little amount is structured. Most of the information is embedded in text and extracting it is a non-trivial challenge. In this project, we try to populate Wikidata, a structured component of Wikipedia, using [http://deepdive.stanford.edu/ Deepdive] tool to extract relations embedded in the text. We finally extracted more than 140,000 relations with more than 90% average precision.This report is structured as follows: first we present DeepDive and the data that we use for this project. Second, we clarify the relations we focused on so far and explain the implementation and pipeline, including our model, features and extractors. Finally, we detail our results with a thorough precision and recall analysis.
}}
 
==={{ym|2015|6}}===
The June showcase was canceled due to scheduling conflicts.
 
==={{ym|2015|5}}===
'''May 13, 2015''' Video: ''[https://www.youtube.com/watch?v=Hj7o5d-OEis YouTube]''
:'''The people's classifier: Towards an open model for algorithmic infrastructure'''
[[File:The_people%27s_classifier_--_Research_Showcase_(May,_2015).pdf|thumb|right]]
::By [[User:Halfak (WMF)|''Aaron Halfaker'']]
:::Recent research has implicated that Wikipedia's algorithmic infrastructure is perpetuating social issues. However, these same algorithmic tools are critical to maintaining efficiency of open projects like Wikipedia at scale. But rather than simply critiquing algorithmic wiki-tools and calling for less algorithmic infrastructure, I'll propose a different strategy -- an open approach to building this algorithmic infrastructure. In this presentation, I'll demo a set of services that are designed to open a critical part Wikipedia's quality control infrastructure -- machine classifiers. I'll also discuss how this strategy unites critical/feminist HCI with more dominant narratives about efficiency and productivity.<br style="clear:right;height:0px;" />
:'''Social transparency online'''
[[File:Social_Transparency_Online.pdf|thumb|right]]
::By [http://www.aboutjmarlow.com/ ''Jennifer Marlow''] and [http://www.lauradabbish.com/ ''Laura Dabbish'']
:::An emerging Internet trend is greater social transparency, such as the use of real names in social networking sites, feeds of friends' activities, traces of others' re-use of content, and visualizations of team interactions. There is a potential for this transparency to radically improve coordination, particularly in open collaboration settings like Wikipedia. In this talk, we will describe some of our research identifying how transparency influences collaborative performance in online work environments. First, we have been studying professional social networking communities. Social media allows individuals in these communities to create an interest network of people and digital artifacts, and get moment-by-moment updates about actions by those people or changes to those artifacts. It affords and unprecedented level of transparency about the actions of others over time. We will describe qualitative work examining how members of these communities use transparency to accomplish their goals. Second, we have been looking at the impact of making workflows transparent. In a series of field experiments we are investigating how socially transparent interfaces, and activity trace information in particular, influence perceptions and behavior towards others and evaluations of their work.
 
==={{ym|2015|4}}===
'''April 30, 2015''' Video: [https://www.youtube.com/watch?v=upQXecRNcdw YouTube]
:'''Creating, remixing, and planning in open online communities'''
::By [http://web.stevens.edu/jnickerson/indextopic.htm ''Jeff Nickerson'']
:::Paradoxically, users in remixing communities don’t remix very much. But an analysis of one remix community, Thingiverse, shows that those who actively remix end up producing work that is in turn more likely to remixed. What does this suggest about Wikipedia editing? Wikipedia allows more types of contribution, because creating and editing pages are done in a planning context: plans are discussed on particular loci, including project talk pages. Plans on project talk pages lead to both creation and editing; some editors specialize in making article changes and others, who tend to have more experience, focus on planning rather than acting. Contributions can happen at the level of the article and also at a series of meta levels. Some patterns of behavior – with respect to creating versus editing and acting versus planning – are likely to lead to more sustained engagement and to higher quality work. Experiments are proposed to test these conjectures.
:'''Authority, power and culture on Wikipedia: The oral citations debate'''
:: By [[:en:Heather Ford|Heather Ford]]
:::In 2011, Wikimedia Foundation Advisory Board member, Achal Prabhala was funded by the WMF to run a project called 'People are knowledge' or the [[:m:Research:Oral_Citations|Oral citations project]]. The goal of the project was to respond to the dearth of published material about topics of relevance to communities in the developing world and, although the majority of articles in languages other than English remain intact, the English editions of these articles have had their oral citations removed. I ask why this happened, what the policy implications are for oral citations generally, and what steps can be taken in the future to respond to the problem that this project (and [[:m:Research:Indigenous_Knowledge|more recent versions of it]]) set out to solve. This talk comes out of an ethnographic project in which I have interviewed some of the actors involved in the original oral citations project, including the majority of editors of the [[:en:surr|surr]] article that I trace in a chapter of my PhD [http://www.oii.ox.ac.uk/people/?id=286].
 
==={{ym|2015|3}}===
'''March 25, 2015''' Video: [https://www.youtube.com/watch?v=PHQqicVoVx4#t=1503 YouTube]
[[File:Temporal_regularities_in_activity_sessions_--_Research_Showcase_(October,_2014).pdf|thumb|right]]
:'''User Session Identification Based on Strong Regularities in Inter-activity Time'''
::By ''[[User:Halfak (WMF)|Aaron Halfaker]]''
::Session identification is a common strategy used to develop metrics for web analytics and behavioral analyses of user-facing systems. Past work has argued that session identification strategies based on an inactivity threshold is inherently arbitrary or advocated that thresholds be set at about 30 minutes. In this work, we demonstrate a strong regularity in the temporal rhythms of user initiated events across several different domains of online activity (incl. video gaming, search, page views and volunteer contributions). We describe a methodology for identifying clusters of user activity and argue that regularity with which these activity clusters appear implies a good rule-of-thumb inactivity threshold of about 1 hour. We conclude with implications that these temporal rhythms may have for system design based on our observations and theories of goal-directed human activity.
[[File:Bob_west_wikipedia_research_showcase_2015-03-25.pdf|thumb|right]]
: '''Mining Missing Hyperlinks from Human Navigation Traces'''
::By ''[http://infolab.stanford.edu/~west1/ Bob West]''
::Wikipedia relies crucially on the links between articles, but important links are often missing. In most prior work, the problem of detecting missing links is addressed by constructing a model of the existing link structure and then predicting the missing links based on this model. In this work we propose a novel method that does not rely on such a model of the static structure of existing links, but rather starts from data capturing how these links are used by people. The approach is guided by the intuition that the ultimate purpose of hyperlinks is to aid navigation, so we argue that the objective should be to suggest links that are likely to be clicked by users. In a nutshell, our algorithm suggests an as yet non-existent link from ''S'' to ''T'' for addition if users who open ''S'' are much more likely than random to later also open ''T''. We show that this simple algorithm yields good link suggestions when run on data from the human-computation game [http://www.wikispeedia.net Wikispeedia.net]. Finally, we show preliminary results that show the method also works "in the wild", i.e., on navigation data mined directly from Wikipedia's server logs.
 
==={{ym|2015|2}}===
'''February 18, 2015''' Video: [https://www.youtube.com/watch?v=yaj9dfHjkOA YouTube]
[[File:GS survey - research showcase.pdf|thumb|Presentation slides.]]
:'''Global South User Survey 2014'''
::By ''[[m:User:HaithamS (WMF)|Haitham Shammaa]]''
::Users' trends in the Global South have significantly changed over the past two years, and given the increase in interest in Global South communities and their activities, we wanted this survey to focus on understanding the statistics and needs of our users (both readers, and editors) in the regions listed in the [[:File:WMF's New Global South Strategy.pdf|WMF's New Global South Strategy]]. This survey aims to provide a better understanding of the specific needs of local user communities in the Global South, as well as provide data that supports product and program development decision making process.
[[File:Wikimedia presentation - osm imports.pdf|thumb|Presentation slides.]]
:'''Ingesting Open Geodata: Observations from OpenStreetMap'''
::By [http://stamen.com/studio/alan ''Alan McConchie'']
::As Wikidata grapples with the challenges of ingesting external data sources such as Freebase, what lessons can we learn from other open knowledge projects that have had similar experiences? OpenStreetMap, often called "The Wikipedia of Maps", is a crowdsourced geospatial data project covering the entire world. Since the earliest years of the project, OSM has combined user contributions with existing data imported from external sources. Within the OSM community, these imports have been controversial; some core OSM contributors complain that imported data is lower quality than user-contributed data, or that it discourages the growth of local mapping communities. In this talk, I'll review the history of data imports in OSM, and describe how OSM's best-practices have evolved over time in response to these critiques.
 
==={{ym|2015|1}}===
'''January 14, 2015''' Video: [https://www.youtube.com/watch?v=fr29FobogSM YouTube]
:'''Functional roles and career paths in Wikipedia'''
[[File:Functional roles Wikipedia v1.pdf|thumb|right|Presentation slides]]
::By ''[[User:GlimmerPhoenix|Felipe Ortega]]''
::An understanding of participation dynamics within online production communities requires an examination of the roles assumed by participants. Recent studies have established that the organizational structure of such communities is not flat; rather, participants can take on a variety of well-defined functional roles. What is the nature of functional roles? How have they evolved? And how do participants assume these functions? Prior studies focused primarily on participants' activities, rather than functional roles. Further, extant conceptualizations of role transitions in production communities, such as the Reader to Leader framework, emphasize a single dimension: organizational power, overlooking distinctions between functions. In contrast, in this paper we empirically study the nature and structure of functional roles within Wikipedia, seeking to validate existing theoretical frameworks. The analysis sheds new light on the nature of functional roles, revealing the intricate "career paths" resulting from participants' role transitions.
 
:'''Free Knowledge Beyond Wikipedia'''
::A conversation facilitated by ''[[User:Benjamin Mako Hill|Benjamin Mako Hill]]''
::In [http://mako.cc/academic/buechley_hill_DIS_10.pdf some of my research with Leah Buechley], I've explored the way that increasing engagement and diversity in technology communities often means not just attacking systematic barriers to participation but also designing for new genres and ''types'' of engagement. I hope to facilitate a conversation about how WMF might engage new readers by supporting more non-encyclopedic production. I'd like to call out some examples from the [[:meta:Proposals for new projects|new Wikimedia project proposals list]], encourage folks to share entirely new ideas, and ask for ideas about how we could dramatically better support Wikipedia's sister projects.
 
==2014==
==={{ym|2014|12}}===
'''December 18, 2014''' Video: [https://www.youtube.com/watch?v=xPO8XhmeUAU YouTube]
:'''Mobile Madness: The Changing Face of Wikimedia Readers'''
[[File:Mobile madness.pdf|thumb|right|Presentation slides]]
::By [[User:Ironholds|''Oliver Keyes'']]
::A dive into the data we have around readership that investigates the rising popularity of the mobile web, countries and projects that are racing ahead of the pack, and what changes in user behaviour we can expect to see as mobile grows.
:'''Global Disease Monitoring and Forecasting with Wikipedia'''
::By [http://www.lanl.gov/expertise/profiles/view/reid-priedhorsky ''Reid Priedhorsky''] (Los Alamos National Laboratory)
::Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data, such as social media and search queries, are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r² up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.
 
==={{ym|2014|11}}===
'''November 14, 2014''' Video: [https://www.youtube.com/watch?v=-FQ-TtTCdJo YouTube]
 
:'''Does Team Competition Increase Pro-Social Lending? Evidence from Online Microfinance.'''
[[File:YanChen_Kiva_Team_Recommendation_2014_1114_WMF.pdf|thumb|right|Presentation slides]]
:By [http://yanchen.people.si.umich.edu/ ''Yan Chen'']
:In the first half of the talk, I will present our empirical analysis of the effects of team competition on pro-social lending activity on Kiva.org, the first microlending website to match lenders with entrepreneurs in developing countries. Using naturally occurring field data, we find that lenders who join teams contribute 1.2 more loans per month than those who do not. Furthermore, teams differ in activity levels. To investigate this heterogeneity, we run a field experiment by posting forum messages. Compared to the control, we find that lenders from inactive teams make significantly more loans when exposed to a goal-setting message and that team coordination increases the magnitude of this effect.
 
:In the second part of the talk, I will discuss a randomized field experiment we did in May 2014, when we recommend teams to lenders on Kiva. We find that lenders are more likely to join teams in their local area. However, after joining teams, those who join popular teams (on the leaderboard) are more active in lending.
 
==={{ym|2014|10}}===
[[File:Emotions under discussion.pdf|thumb|Slides]]
'''October 15, 2014''' Video: Commons? [https://www.youtube.com/watch?v=-We4GZbH3Iw YouTube]
:'''Emotions under Discussion: Gender, Status and Communication in Wikipedia'''
:''By [[User:Sdivad|David Laniado]]'': I will present a large-scale analysis of emotional expression and communication style of editors in Wikipedia discussions. The talk will focus especially on how emotion and dialogue differ depending on the status, gender, and the communication network of the about 12000 editors who have written at least 100 comments on the English Wikipedia's article talk pages. The analysis is based on three different predefined lexicon-based methods for quantifying emotions: ANEW, LIWC and SentiStrength. The results unveil significant differences in the emotional expression and communication style of editors according to their status and gender, and can help to address issues such as gender gap and editor stagnation.
 
:'''Wikipedia as a socio-technical system'''
[[File:Wikipedia is socio-technical -- Research Showcase (October, 2014).pdf|thumb|slides]]
:''By [[User:Halfak (WMF)|Aaron Halfaker]]'': Wikipedia is a ''socio-technical'' system. In this presentation, I'll explain how the integration of human collective behavior ("social") and information technology ("technical") has lead to phenomena that, while being massively productive, is poorly understood due to lack of precedence. Based on my work in this area, I'll describe five critical functions that healthy, Wikipedia-like socio-technical systems must serve in order to continue to function: allocation, regulation, quality control, community management and reflection. Finally, I'll conclude with an overview of three classes of new projects that should provide critical opportunities to both practically and academically understand the maintenance of Wikipedia's socio-technical fitness.
{{-}}
 
==={{ym|2014|9}}===
'''September 17, 2014'''
''The September showcase was canceled because of a conflict with other events scheduled by WMF. We will resume showcases in October.''
 
==={{ym|2014|8}}===
'''August 20, 2014''' Video: Commons? [https://www.youtube.com/watch?v=wgnnVG7sLQ0 YouTube]
:'''Everything You Know About Mobile Is WrW^Right: Editing and Reading Pattern Variation Between User Types'''
[[File:Everything You Know About Mobile Is Wrong.pdf|thumb|slides]]
:''By [[User:Ironholds|Oliver Keyes]]'': Using new geolocation tools, we look at reader and editor behaviour to understand how and when people access and contribute to our content. This is largely exploratory research, but has potential implications for our A/B testing and how we understand both cultural divides between reader and editor groups from different countries, and how we understand the differences between types of edit and the editors who make them.
 
:'''Wikipedia Article Curation: Understanding Quality, Recommending Tasks'''
[[File:Wikipedia Article Curation - Understanding Quality, Recommending Tasks (WMF Research Showcase Aug 2014).pdf|thumb|slides]]
:''By [[User:Nettrom|Morten Warncke-Wang]]'': In this talk we look at article curation in Wikipedia through the lens of task suggestions and article quality. The first part of the talk presents SuggestBot, the Wikipedia article recommender. SuggestBot connects contributors with articles similar to those they previously edited. In the second part of the talk, we discuss Wikipedia article quality using “actionable” features, features that contributors can easily act upon to improve article quality. We will first discuss these features’ ability to predict article quality, before coming back to SuggestBot and show how these predictions and actionable features can be used to improve the suggestions.
 
==={{ym|2014|7}}===
'''July 16, 2014''' Video: Commons [https://www.youtube.com/watch?v=1E4JcxTgmco YouTube]
:'''Halfak's wiki research libraries (v0.0.1)'''
[[File:Halfak%27s_wiki_research_libraries_-_WMF_R%26D_showcase_(Jul._2014).pdf|thumb|right]]
:''By [[User:Halfak_(WMF)|Aaron Halfaker]]'': Along with quantitative research comes data and analysis code. In this presentation, Aaron will introduce you to 4 python libraries that capture code he uses on a regular basis to get his wiki research done. [http://pythonhosted.org/mediawiki-utilities MediaWiki Utilities] is a general data processing library that includes connectors for the API and MySQL databases as well as an XML dump parser and revert detection. [http://pythonhosted.org/wikiclass Wiki-Class] is a machine learning library that is designed to train, test and deploy automatic quality assessment class detection for Wikipedia articles. [http://pythonhosted.org/mwoauth MediaWiki-OAuth] provides a simple interface for performing an OAuth handshake with a MediaWiki installation (e.g. Wikipedia). [http://pythonhosted.org/deltas Deltas] is an experimental text difference detection library that implements cutting-edge research to track changes to Wikipedia articles and attribute authorship of content.
 
<br style="clear:right;"/>
:'''Using Open Data and Stories to Broaden Crowd Content'''
[[File:Using_Open_Data_and_Stories_to_Broaden_Crowd_Content.pdf|thumb|right]]
:''By [https://twitter.com/natematias Nathan Matias]'': Nathan will share a series of research on gender diversity online and designs for collaborative content creation that foster learning and community. He will also demo a prototype for a system that could leverage open data to attract and support new Wikipedia contributors.
 
<br style="clear:right;"/>
 
==={{ym|2014|6}}===
'''June 18, 2014''' Video: [[:File:Wikimedia Research Showcase - June 2014.webmhd.webm|Commons]] [https://www.youtube.com/watch?v=Rn4-cBYxttA YouTube]
;[[File:Moodbar -- lightweight socialization improves long-term editor retention.pdf|thumb|slides]]MoodBar -- lightweight socialization improves long-term editor retention
:''by [[:w:User:Junkie.dolphin|Giovanni Luca Ciampaglia]]'' -- I will talk about [[:m:R:MoodBar|MoodBar]], an experimental feature deployed on the English Wikipedia from 2011 to 2013 to streamline the socialization of newcomers. I will present results from a natural experiment that measured the effect of Moodbar on the short-term engagement and long-term retention of newly registered users attempting to edit for the first time Wikipedia. Our results indicate that a mechanism to elicit lightweight feedback and to provide early mentoring to newcomers significantly improves their chances of becoming long-term contributors.
 
;[[File:Active editor survival.pdf|thumb|Slides.]]Active Editors' Survival Models
:''by [[User:LZia (WMF)|Leila Zia]]'' -- I will talk about first results in building prediction models for active editors' survival. A sample of such prediction models, their performance, and the important variables in predicting survival will be presented.
{{-}}
 
==={{ym|2014|5}}===
'''May 21, 2014''' Video: [[:File:Wikimedia Research Showcase - May 2014.webm|Commons]] [https://www.youtube.com/watch?v=AUupsnvV1oA YouTube]
;A bird's eye view of editor activation
 
:''by [[:w:User:DarTar|Dario Taraborelli]]'' -- In this talk I will give a high-level overview of data on [[:m:R:Editor activation|new editor activation]], presenting longitudinal data from the largest Wikipedias, a comparison between desktop and mobile registrations and the relative activation rates of different cohorts of newbies.{{-}}
;Collaboration patterns in Articles for Creation
[[File:AfC Process Efficiency -- Research Showcase (May, 2014).pdf|thumb|slides]]
:''by [[User:Halfak (WMF)|Aaron Halfaker]]'' -- Wikipedia needs to attract and retain newcomers while also increasing the quality of its content. Yet new Wikipedia users are disproportionately affected by the quality assurance mechanisms designed to thwart spammers and promoters. English Wikipedia’s [[:en:WP:Articles for Creation]] provides a protected space for newcomers to draft articles, which are reviewed against minimum quality guidelines before they are published. In this presentation, describe and a study of how this drafting process has affected the productivity of newcomers in Wikipedia. Using a mixed qualitative and quantitative approach, I'll show the process's pre-publication review, which is intended to improve the success of newcomers, in fact decreases newcomer productivity in English Wikipedia and offer recommendations for system designers.
{{-}}
 
==={{ym|2014|4}}===
'''April 16, 2014''' Video: [[:File:Wikimedia Research Showcase - April 2014.webm|Commons]] [https://www.youtube.com/watch?v=Pps__TkfrMs YouTube]
;WikiProjects yesterday, today and tomorrow
:[[File:Morgan_WMFresearchShowcase04162014_slides.pdf|thumb|slides ([[:File:Morgan_WMFresearchShowcase04162014_02_notes.pdf|presenter notes]])]] ''by [[:w:User:Jtmorgan|Jonathan Morgan]]'' -- in this talk I'll give an overview of some research[http://dub.washington.edu/djangosite/media/papers/cscw2014_wikiprojects_final_archival.pdf][http://opensym.org/wsos2013/proceedings/p0102-morgan.pdf] on English Wikipedia Wikiprojects: what kind of work they do, how they do it, and how they have changed over time. {{-}}
;Visualizing Wikipedia Communities using Gephi
[[File:Visualizing Wikipedia Communities using Gephi.pdf|thumb]]
:''by [[m:User:Haithams|Haitham Shammaa]]'' -- I will introduce [https://gephi.org/ Gephi] as a tool for generating a visualized representation of Wikimedia projects communities. Gephi is an open-source network analysis and visualization software, and is utilized to generate graphs that represent users and the interaction among them based on the frequency they send messages to each other on their talk pages.
{{-}}
 
==={{ym|2014|3}}===
'''March 19, 2014''' Video: [[:File:Wikimedia Research Showcase - March 2014.webmhd.webm|Commons]] [https://www.youtube.com/watch?v=bozyc1z25aQ YouTube]
;Metrics standardization
:[[File:Metrics Standardization - Wikimedia Research & Data showcase - March 2014.pdf|thumb]]''by [[User:DarTar|Dario Taraborelli]]'' -- In this talk I'll present the most recent updates on our work on [[:m:R:Metrics standardization|metrics standardization]] and give a teaser of the [[Analytics/Epics/Editor Engagement Vital Signs|Editor Engagement Vital Signs]] project. {{-}}
;<nowiki>Wikipedia: maintaining production efficiency</nowiki>
:[[File:Maintaining_production_efficiency_(March,_2014).pdf|thumb|right]]''by [[User:Halfak (WMF)|Aaron Halfaker]]'' -- In [http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/ Halfaker et al. (2013)] we present data that show that several changes the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have ironically crippled the very growth they were designed to manage. Specifically, the restrictiveness of the encyclopedia's primary quality control mechanism and the algorithmic tools used to reject contributions are implicated as key causes of decreased newcomer retention.
{{-}}
 
==={{ym|2014|2}}===
'''February 26, 2014'''
Video: [[:File:Wikimedia Research & Data Showcase - February 2014.webm|Commons]] [https://www.youtube.com/watch?v=arO9YzcTWGE YouTube]
 
;Mobile session times
:[[File:Mobile_sessions_presentation_(Feb_2014).pdf|thumb|right]] ''by [[User:Okeyes (WMF)|Oliver Keyes]]'' -- A prerequisite to many pieces of interesting reader research is being able to accurately identify the length of users' 'sessions'. I will explain one potential way of doing it, how I've applied it to mobile readers, and what research this opens up. ([[:File:Mobile sessions presentation (Feb 2014).pdf|slides]], [[:m:Research:Mobile sessions|read more]])
{{-}}
;Wikipedia article creation research
: [[File:Wikipedia article creation (Nov, 2013).pdf|thumb|right]] ''by [[User:Halfak (WMF)|Aaron Halfaker]]'' -- A brief overview of research examining trends in newcomer article creation across 10 languages with a focus on English and German Wikipedias. In wikis where anonymous users can create articles, their articles are less likely to be deleted than articles created by newly registered editors. An in-depth analysis of Articles for Creation (AfC) suggests that while AfC's process seems to result in the publication of high quality articles, it also dramatically reduces the rate at which good new articles are published. ([[:File:Wikipedia article creation (Nov, 2013).pdf|slides]], [[:m:Research:Wikipedia_article_creation|read more]])
{{-}}
 
=== {{ym|2014|1}} ===
'''January 15, 2014'''
;IP reliability tracking: ''by Oliver Keyes''
;The Wikipedia Adventure, quantitative and qualitative results from the pilot: ''by Jake Orlowitz'' ([[User:Ocaasi]]) We made a 7 mission gamified interactive onboarding tutorial to teach people how to edit Wikipedia in 1 hour. The journey involves badges, barnstars, challenges, and simulated interaction throughout a realistic quest to edit the article [[Earth]]. Game dynamics were used to create a sense of understanding, belonging, deep value identification, and technical proficiency. The use of games in open source and free culture online communities has great potential to drive participation. This talk will share the inspiration for taking a gamified approach, a review of the design highlights, and a discussion of quantitative and qualitative data and survey analysis. ([https://docs.google.com/presentation/d/1xAFrk8_VOqyBXux7lH4owEX3lOKxhig3Jpap0CY2PhQ/edit?usp=sharing slides], [https://meta.wikimedia.org/wiki/Grants:IEG/The_Wikipedia_Adventure/Final read more])
 
==2013==
=== {{ym|2013|12}} ===
'''December 18, 2013'''
 
;Metrics standardization: [[File:Metrics Standardization 10 Dec 2013.pdf|thumb|right]] ''by [[User:DarTar|Dario Taraborelli]]''{{-}}
;On the nature of Anonymous Editors
: [[File:Anonymous_editors_-_WMF_R%26D_showcase_(Dec._2013).pdf|thumb|right]] ''by [[User:Halfak (WMF)|Aaron Halfaker]]'' -- A brief discussion & critique of the use of the term "anonymous" to refer to IP editors and a presentation of research results that suggest that newly registered users who edit anonymous right before registering their account are highly productive. ([[:File:Anonymous_editors_-_WMF_R%26D_showcase_(Dec._2013).pdf|slides]], [[:m:Research:Anonymous_editor_acquisition/Volume|read more]])
{{-}}
;Overview of [https://meta.wikimedia.org/wiki/PE%26D_Reports Program Evaluation (beta) Reports]
: [[File:Program Evaluation overall responses - 2013.png|thumb|right]] ''by [[User:JAnstee (WMF)|Jaime Anstee]]'' -- A brief overview of the first round reporting for programs including summary of the target measures along with strategies and challenges in metric standardization. [https://docs.google.com/a/wikimedia.org/document/d/1LTsQ-uW-2pZ4R-QcJ7jGsks-1QdOpA3l0WVl2RomueE/edit?usp=sharing Overview outline]
 
== References ==
 
[[Category:Wikimedia Research]]