Tag: Wikimedia Foundation

The internet’s favourite website for information

Wikipedia at 17.

  • The world’s biggest encyclopedia will turn eighteen in January 2019.
  • English Wikipedia has 5.7m articles (full list of all 302 language Wikipedias)
  • 500 million visitors per month
  • 1.5 billion monthly unique devices per month.
  • 17 billion pageviews per month.
  • Completely open process and more reliable than you think
  • All edits are recorded in the View History of a page in permanent links so pages can be rolled back to their last good state if need be. e.g. View History page for Jeremy Hunt.
  • Vandalism removed more quickly than you think (only 7% of edits are considered vandalism)
  • Used in schools & universities to teach information literacy & help combat fake news.
  • Guidelines around use of reliable sources, conflict of interest, verifiability, and neutral point of view.
  • Articles ‘looked after’ (monitored and maintained) by editors from 2000+ WikiProjects.
  • Includes a quality and ratings scale – the two highest quality levels of articles are community reviewed.
  • Information organised in categories using a category tree. These categories can help create dynamic timelines.
  • Knowledge discussed on Talk pages  and at the Wikipedia Tea House where you can ask questions.
  •  87.5% of students report using Wikipedia for their academic work (Selwyn and Gorard, 2016) in “an introductory and/or clarificatory role” as part of their information gathering and research and finding it ‘academically useful’ in this context.
  • Used by 90% of medical students and 50-75% of physicians. (Masukume, Kipersztok, Shafee, Das, and Heilmam, 2017)
  • Research from the Harvard Business School has also discovered that, unlike other more partisan areas of the internet, Wikipedia’s focus on NPOV (neutral point of view) means editors actually become more moderate over time; the researchers seeing this as evidence that editing “Wikipedia helps break people out of their ideological echo chambers
  • It is the place people turn to orientate themselves on a topic.

 

More reading

Did Media Literacy backfire?

“Too many students I met were being told that Wikipedia was untrustworthy and were, instead, being encouraged to do research. As a result, the message that many had taken home was to turn to Google and use whatever came up first. They heard that Google was trustworthy and Wikipedia was not.” (Boyd, 2017)

Don’t cite Wikipedia, write Wikipedia.

  • Wikipedia does not want you to cite it. It considers itself a tertiary resource; an online encyclopedia built from articles which in turn are based on reliable, published, secondary sources.
  • Wikipedia is relentlessly transparent. Everything on Wikipedia can be checked, challenged and corrected. Cite the sources Wikipedia uses, not Wikipedia itself.
Own work by Stinglehammer, CC-BY-SA

Wikipedia does need more subject specialists to engage with it to improve its coverage, however. More eyes on a page helps address omissions and improves the content.

Six in six minutes – 3 students and 3 staff discuss Wikipedia in the Classroom

  1. Karoline Nanfeldt – 4th year Psychology undergraduate student.
  2. Tomas Sanders – 4th year History undergraduate student.
  3. Aine Kavanagh – Senior Hons. Reproductive Biology student.
  4. Ruth Jenkins – Academic Support Librarian at the University of Edinburgh Medical School.
  5. Dr. Jenni Garden – Christina Miller Research Fellow at the University of Edinburgh’s School of Chemistry.
  6. Dr. Michael Seery – Reader in Education at the University of Edinburgh’s School of Chemistry.

Wikipedia has a problem with systemic bias.

A 2011 survey suggests that on English Wikipedia around 90% of editors are male, and are typically formally educated, in white-collar jobs (or students) and living in the Global North.

“if there is a typical Wikipedia editor, he has a college degree, is 30- years-old, is computer savvy but not necessarily a programmer, doesn’t actually spend much time playing games, and lives in US or Europe.”

This means that the articles within Wikipedia typically reflect this bias. For example only 17% of biographies in English Wikipedia are of women. Many articles reflect the perspective of English speakers in the northern hemisphere, and many of the topics covered reflect the interests of this relatively small group of editors. Wikipedia needs a diverse community of editors to bring diverse perspectives and interests.

Wikipedia is also a community that operates with certain expectations and social norms in mind. Sometimes new editors can have a less than positive experience when they aren’t fully aware of this.

“5 Pillars of Wikipedia” flickr photo by giulia.forsythe https://flickr.com/photos/gforsythe/21684596874 shared under a Creative Commons (BY) license

There are only 80,000 regular contributors to Wikipedia. Of these, only 3,000 are considered ‘very active. That’s the population of a small village like Pitlochry trying to curate the world’s knowledge.

We need to increase the diversity and number of Wikipedia editors.  One way to do that is to run edit-a-thons and other facilitated activities that introduce some of these norms and expectations at the same time learning how to technically edit Wikipedia.

Isn’t editing Wikipedia hard?

Maybe it was a little hard once but not now. It’s all dropdown menus now with the Visual Editor interface. So super easy, intuitive and “addictive as hell“!

Do you need a quick overview of what all the buttons and menu options on Wikimedia do? Luckily we have just the very thing for you.

By Zeromonk (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

Search is the way we live now” – Google and Wikipedia

  • Google depends on Wikipedia. Click through rate decreases by 80% if Wikipedia links are removed. (McMahon, Johnson and Hecht, 2017)
  • Wikipedia depends on Google. 84.5% of visits to Wikipedia attributable to Google. (McMahon, Johnson and Hecht, 2017)
  • Google processed 91% of searches internationally and 97.4% of the searches made using mobile devices according to 2011 figures in Hillis, Petit & Jarrett (2013).
  • Google’s ranking algorithm also has a ‘funnelling effect’ according to Beel & Gipp (2009); narrowing the sources clicked upon 90% of the time to just the first page of results with a 42% clickthrough on first choice alone.
  • This means that addressing knowledge gaps on Wikipedia will surface the knowledge to Google’s top ten results and increase clickthrough and knowledge-sharing. Wikipedia editing can therefore be seen as a form of activism in the democratisation of access to information.

 

The Symbiotic Relationship between Wikipedia and Google.

Learn how to edit Wikipedia in 30 mins

More Reading

Search failure – Information Retrieval in an age of Infoglut

Search failure:

The challenges facing information retrieval in an age of information explosion.

 

Abstract:

This article takes, as its starting point, the news that Wikipedia were reportedly developing a ‘Knowledge Engine’ and focuses on the most dominant web search engine, Google, to examine the “consecrated status” (Hillis, Petit & Jarrett, 2013) it has achieved and its transparency, reliability & trustworthiness for everyday searchers.

A bit of light reading on information retrieval – Own work, CC-BY-SA.

“Commercial search engines dominate search-engine use of the Internet, and they’re employing proprietary technologies to consolidate channels of access to the Internet’s knowledge and information.” (Cuthbertson, 2016)

 

On 16th February 2016, Newsweek published a story entitled ‘Wikipedia Takes on Google with New ‘Transparent’ Search Engine’. The figure applied for, and granted by the Knight Foundation, was a reported $250,000 dollars as part of the Wikimedia Foundation’s $2.5 million programme to build ‘the Internet’s first transparent search engine’.

The sum applied for was relatively insignificant when compared to Google’s reported $75 billion revenue in 2015 (Robinson, 2016). Yet, it posed a significant question; a fundamental one. Just how transparent is Google?

 

Two further concerns can be identified from the letter to Wikimedia granting the application: “supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet.”(Cuthbertson, 2016). This goes to the heart of the current debate on modern information retrieval: transparency, reliability and trustworthiness? How then are we faring in these three measures?

 

  1. Defining Information Retrieval

Informational Retrieval is defined as “a field concerned with the structure, analysis, organisation, storage, searching, and retrieval of information.” (Salton in Croft, Metzler & Strohman, 2010, p.1).

Croft et al (2010) identify three crucial concepts in information retrieval:

  • Relevance – Does the returned value satisfy the user searching for it.
  • Evaluation  – Evaluating the ranking algorithm on its precision and recall.
  • Information Needs  – What needs generated the query in the first place.

Today, since the advent of the internet, this definition needs to be understood in terms of how pervasive ‘search’ has become. “Search is the way we now live.” (Darnton in Hillis, Petit & Jarrett, 2013, p.5). We are all now ‘searchers’ and the act of ‘searching’ (or ‘googling’) has become intrinsic to our daily lives.

By Typing_example.ogv: NotFromUtrecht derivative work: Parzi [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
  1. Dominance of one search engine

 

When you turn on a tap you expect clean water to come out and when you do a search you expect good information to come out” (Swift in Hillis, Petit & Jarrett, 2013)

 

With over 60 trillion pages (Fichter and Wisniewski, 2014) and terabytes of unstructured data to navigate, the need for speedy & accurate responses to millions of queries has never been more important.

 

Navigating the vast sea of information present on the web means the field of Information Retrieval necessitates wrestling with, and constantly tweaking, the design of complex computer algorithms (determining a top 10 list of ‘relevant’ page results through over 200 factors).

 

Google, powered by its PageRank algorithm, has dominated I.R. since the early 1990s, indexing the web like a “back-of-the-book” index (Chowdhury, 2010, p.5). While this oversimplifies the complexity of the task, modern information retrieval, in searching through increasingly multimedia online resources, has necessitated the addition of newer more sophisticated models. Utilising ‘artificial intelligence’ & semantic search technology to complement the PageRank algorithm, Google now navigates through the content of pages & generates suggested ‘answers’ to queries as well as the 10 clickable links users commonly expect.

 

According to 2011 figures in Hillis, Petit & Jarrett (2013), Google processed 91% of searches internationally and 97.4% of the searches made using mobile devices. This undoubted & sustained dominance has led to accusations of abuse of power in two recent instances.

 

Nicas & Kendall (2016) report that the Federal Trade Commission along with European regulators are examining claims that Google has been abusing its position in terms of smartphone companies feeling they had to give Google Services preferential treatment because of Android’s dominance.

 

In addition, Robinson (2016) states that the Authors Guild are petitioning the Supreme Court over Google’s alleged copyright-infringement; going back a decade ago when over 20 million library books were digitised without compensation or author/publisher permission. The argument is that the content taken has since been utilised by Google for commercial gain to generate more traffic, more advertising money and thus confer on them market leader status. This echoes the New Yorker article’s response to Google’s aspiration to build a digital universal library: “Such messianism cannot obscure the central truth about Google Book Search: it is a business” (Toobin in Hillis, Petit & Jarrett, 2013).

 

  1. PageRank

Google’s business is powered, like every search engine, by its ranking algorithm. For Cahill et al (2009), Google’s “PageRank is a quantitative rather than qualitative system”.  PageRank works by ranking pages in terms of how well linked a page is, how often it is clicked on and the importance of the page(s) that links to it. In this way, PageRank assigns importance to a page.

 

Other parameters are taken into consideration including, most notably, the anchor text which provides a short descriptive summary of the page it links to. However, the anchor text has been shown to be vulnerable to manipulation, primarily from bloggers, by the process known as ‘Google bombing’. Google bombing is defined as “the activity of designing Internet links that will bias search engine results so as to create an

inaccurate impression of the search target” (Price in Bar-Ilan, 2007).  Two famous examples include when Microsoft came as top result for the query ‘More evil than Satan’ and when President Bush ranked as first result for ‘miserable failure’. Bar-Ilan (2007) suggests google bombs come about for a variety of reasons: ‘fun, ‘personal promotion’, ‘commercial’, ‘justice’, ‘ideological’ and ‘political’.

 

Although reluctant to alter search results, the reputational damage google bombs were having necessitated a response. In the end, Google altered the algorithm to defuse a number of google bombs. Despite this, “spam or joke sites still float their way to the top.”(Cahill et al, 2009) so there is a clear argument to be had about Google, as a private corporation, continuing to ‘tinker’ with the results delivered by its algorithm and how much its coders should, or should not, arbitrate access to the web in this way. After all, the algorithm will already bear hallmarks of their own assumptions without any transparency on how these decisions are arrived at. Further, Google Bombs, Byrne (2004) argues, empower those web users whom the ranking system, for whatever reason, has disenfranchised.

 

Just how reliable & trustworthy is Google?

 

Easy, efficient, rapid and total access to Truth is the siren song of Google and the culture of search. The price of access: your monetizable information.”(Hillis, Petit & Jarrett, 2013, p.7)

For Cahill et al (2009), Google has made the process of searching too easy and searchers have becoming lazier as a result; accepting Google’s ranking at face value. Markland in van Dijck (2010) makes the point that students favouring of Google means they are dispensing with the services libraries provide. The implication being that, despite library information services delivering a more relevant & higher quality search result, Google’s quick & easy ‘fast food’ approach is hard to compete with.

This seemingly default trust in the neutrality of Google’s ranking algorithm also has a ‘funnelling effect’ according to Beel & Gipp (2009); narrowing the sources clicked upon 90% of the time to just the first page of results with a 42% click through on the first choice alone. This then creates a cosy consensus in terms of the fortunate pages clicked upon which will improve their ranking while “smaller, less affluent, alternative sites are doubly punished by ranking algorithms and lethargic searchers.” (Pan et al. in van Dijck, 2010)

 

While Google would no doubt argue that all search engines closely guard how their ranking algorithms are calibrated to protect them from aggressive competition, click fraud and SEO marketing, the secrecy is clearly at odds with principles of public librarianship. Further, Van Dijck (2010) argues that this worrying failure to disclose is concealing how knowledge is produced through Google’s network and the commercial nature of Google’s search engine. After all, search engines greatest asset is the metadata each search leaves behind. This data can be aggregated and used by the search engine to create profiles of individual search behaviour and collective profiles which can then be passed on to other commercial companies for profit. That is not to say it always does but there is little legislation to stop it in an area that is largely unregulated. The right to privacy does not, it seems, extend to metadata and ‘in an era in which knowledge is the only bankable commodity, search engines own the exchange floor.’ (Halavais in van Dijck, 2010)

The University of Edinburgh by Mihaela Bodlovic – http://www.aliceboreasphotography.com/ (CC-BY-SA)

 

  1. Scholarly knowledge and the reliability of Google Scholar

When considering the reliability, transparency & trustworthiness of Google and Google Scholar it is pertinent to look at its scope and differences with other similar sites. Unlike Pubmed and Web of Science, Google Scholar is not a human-curated database but is instead an internet search engine therefore its accuracy & content varies greatly depending on what has been submitted to it.  Google Scholar does have an advantage is that it searches the full text of articles therefore users may find searching easier on Scholar compared to WoS or Pubmed which are limited to searching according to the abstract, citations or tags.

Where Google Scholar could be more transparent is in its coverage as some notable publishers have been known, according to van Dijck (2010), to refuse to give access to their databases. Scholar has also been criticised for the lack of completeness of its citations, as well as its covering of social science and humanities databases; the latter an area of strength for Wikipedia according to Park (2011). But the searcher utilising Google Scholar would be unaware of these problems of scope when they came to use it.

Further, Beel & Gipp (2009) state that the ranking system on Google Scholar, leads to articles with lots of citations receiving higher rankings, and as a result, receive even more citations because of this. Hence, while the digitization of sources on the internet opens up new avenues for scholarly exploration, ranking systems can be seen to close ranks on a select few to the exclusion of others.

As Van Dijck (2010) points out: “Popularity in the Google-universe has everything to do with quantity and very little with quality or relevance.” In effect, ranking systems determine which sources we can see but conceal how this determination has come about. This means that we are unable to truly establish the scope & relevance of our search results. In this way, search engines cannot be viewed as neutral, passive instruments but are instead active “actor networks” and “co-producers of academic knowledge.” (van Dijck, 2010).

Further, it can be argued that Google decides which sites are included in its top ten results. With so much to gain commercially, from being discoverable on Google’s first page of results, the practice of Search Engine Optimising (SEO), or manipulating the algorithm to get your site in the top ten search results, has become widespread. SEO techniques can be split into ‘white hat’ (legitimate businesses with a relevant product to sell) and ‘black hat’ (sites who just want clicks and tend not to care about the ‘spamming’ techniques they employ to get them). As a result, PageRank has to be constantly manipulated, as with Google bombs, to counteract the effects of increasingly sophisticated ‘black hat’ techniques. Hence, the need for an improved vigilance & critical evaluation of the searches returned by Google has become a crucial skill in modern information retrieval.

 

  1. The solution: Google’s response to modern information retrieval – Answer Engines

Google is the great innovator and is always seeking newer, better ways of keeping users on its sites and improving its search algorithm. Hence, the arrival of Google Instant in 2010 to autofill suggested keywords to assist searchers. This was followed by Google’s Knowledge Graph (and its Microsoft equivalent Bing Snapshot). These new services seek not just to provide the top ten links to a search query but also to ‘answer’ it by providing a number of the most popular suggested answers on the page results screen (usually showing an excerpt of the related Wikipedia article & images along the side panel), based on, & learning from, previous users’ searches on that topic.

Google’s Knowledge Graph is supported by sources including Wikipedia & Freebase (and the linked data they provide) along with a further innovation, RankBrain, which utilises artificial intelligence to help decipher the 15% of queries Google has not seen before. As Barr (2016) recognises: “A.I. is becoming increasingly important to extract knowledge from Google’s sea of data, particularly when it comes to classifying and recognizing patterns in videos, images, speech and writing.”

Bing Snapshot does much the same. The difference being that Bing provides links to the sources it uses as part of the ‘answers’ it provides. Google provides information but does not attribute it. Without this, it is impossible to verify their accuracy. This seems to be one of the thorniest issues in modern information retrieval; link decay and the disappearing digital provenance of sources. This is in stark contrast to Wikimedia’s efforts in creating Wikidata: “an open-license machine-readable knowledge base” (Dewey 2016) capable of storing digital provenance & structured bibliographic data. Therefore, while Google Knowledge Panels are a step forward, there are issues again over its transparency, reliability & trustworthiness.

Moreover, the 2014 EU Court ruling onthe right to be forgotten’, which Google have stated they will honour, also muddies the waters on issues of transparency & link decay/censorship:

Accurate search results are vanishing in Europe with no public explanation, no real proof, no judicial review, and no appeals processthe result is an Internet riddled with memory holes — places where inconvenient information simply disappears.”(Fioretti, 2014).

The balance between an individual’s “right to be forgotten” and the freedom of information clearly still has to be found. At the moment, in the name of transparency, both Google and Wikimedia are posting notifications to affected pages that they have received such requests. For those wishing to be ‘forgotten’ this only highlights the matter & fuels speculation unnecessarily.

Wikipedia

 

  1. The solution: Wikipedia’s ‘transparent’ search engine: Discovery

Since the setup of the ‘Discovery’ team in April 2015 and the disclosure of the Knight Foundation grant, there have been mixed noises from Wikimedia with some claiming that there was never any plan to rival Google because a newer ‘internal’ search engine was only ever being developed in order to integrate Wikimedia projects through one search portal.

Ultimately, a lack of consultation between the board and the wider Wikimedia community members reportedly undermined the project & culminated in the resignation of Lila Tretikov, Executive Director of the Wikimedia Foundation, at the end of February and the plans for Discovery were shelved.

However, Sentance (2016) reveals that, in their leaked planning documents for Discovery, the Foundation were indeed looking at the priorities of proprietary search engines, their own reliance on them for traffic and how they could recoup traffic lost to Google (through Google’s Knowledge Graph) at the same time as providing a central hub for information from across all their projects through one search portal. Wikipedia results, after all, regularly featured in the top page of Google results anyway – why not skip the middle man?

Quite how internet searchers may have taken to a completely transparent, non-commercial search engine we’ll possibly never know. However, it remains a tantalizing prospect.

 

  1. The solution: Alternative Search Engines

An awareness of the alternative search engines available for use and their different strengths and weaknesses is a key component of the information literacy needed to navigate this sea of information. Bing Snapshot, for instance, makes greater use of providing the digital provenance for its sources than Google at present.

Notess (2016) serves notice that computational searching (e.g. Wolfram Alpha) continues to flourish along with search engines geared towards data & statistics (e.g. Zanran, DataCite.org and Google Public Data Explorer).

However, knowing about the existence of these differing search engines is one thing but knowing how to successfully navigate them is quite another as Notess (2016) himself concludes where “Finding anything beyond the most basic of statistics requires perseverance and experimenting with a variety of strategies.”

Information literacy, it seems, is key.

Information Literacy
By Ewa Rozkosz via Flickr (CC-BY-SA)

 

  1. The solution: The need for information literacy

Given that electronic library services are maintained by information professionals, “values such as quality assessment, weighed evaluation & transparency” (van Dijck, 2010) are in much greater evidence than in commercial search engines. That is not to say that there aren’t still issues in library OPAC systems: whether it be in terms of the changes in the classification system used over time or the differing levels of adherence by staff to these classification protocols; or the communication to users of best practice in utilising the system.

The use of any search engine, requires literacy among the user group. The fundamental problem remains the disconnect between what a user inputs and what they can feasibly expect at the results stage. Understanding the nature of the search engine being used (proprietary or otherwise) a critical awareness of how knowledge is formed through its network and the type of search statement that will maximise your chances of success are all vital. As van Dijck (2010) states “Knowledge is not simply brokered (‘brought to you’) by Google or other search engines… Students and scholars need to grasp the implications of these mechanisms in order to understand thoroughly the extent of networked power”(Dijck, 2010).

Educating users of this broadens the search landscape, and defuses SEO attempts to circumvent our choices. Information literacy cannot be left to academics or information professionals alone, though they can play a large part in its dissemination. As mentioned at the beginning, we are all ‘searchers’. Therefore, it is incumbent on all of us to become literate in the ways of ‘search’ and pass it on, creating our own knowledge networks. Social media offers us a means of doing this; allowing us to filter information as never before and filtering is “transforming how the web works and how we interact with our world.” (Swanson, 2012)

 

Conclusion

Google may never become any more transparent. Hence, its reliability & trustworthiness will always be hard to judge. Wikipedia’s Knowledge Engine may have offered a distinctive model more in line with these terms but it is unlikely, at least for now, to be able to compete as a global crawler search engine.

 

 

Therefore, it is incumbent on searchers not to presume neutrality or assign any kind of benign munificence on any one search engine. Rather by educating themselves as to the merits & drawbacks of Google and other search engines, users will then be able to formulate their searches, and their use of search engines, with a degree of information literacy. Only then can they hope the returned results will match their individual needs with any degree of satisfaction or success.

Bibliography

  1. Arnold, A. (2007). Artificial intelligence: The dawn of a new search-engine era. Business Leader, 18(12), pp. 22.
  2. Bar‐Ilan, Judit (2007). “Manipulating search engine algorithms: the case of Google”. Journal of Information, Communication and Ethics in Society 5 (2/3): 155–166. doi:1108/14779960710837623. ISSN1477-996X.
  3. Barr, A. (2016). WSJ.D Technology: Google Taps A.I. Chief To Replace Departing Search-Engine Head. Wall Street Journal. ISSN 00999660.
  4. Beel, J.; Gipp, B. (2009). “Google Scholar’s ranking algorithm: The impact of citation counts (An empirical study)”. 2009 Third International Conference on Research Challenges in Information Science: 439–446. doi:1109/RCIS.2009.5089308.
  5. Byrne, S. (2004). Stop worrying and learn to love the Google-bomb. Fibreculture, (3).
  6. Cahill, Kay; Chalut, Renee (2009). “Optimal Results: What Libraries Need to Know About Google and Search Engine Optimization”. The Reference Librarian 50 (3): 234–247. doi:1080/02763870902961969. ISSN0276-3877.
  7. Chowdhury, G.G. (2010). Introduction to modern information retrieval. Facet. ISBN 9781856046947.
  8. Croft, W. Bruce; Metzler, Donald; Strohman, Trevor (2010). Search Engines: Information Retrieval in Practice. Pearson Education. ISBN9780131364899.
  9. Cuthbertson, A. (2016)“Wikipedia takes on Google with new ‘transparent’ search engine”. Available at: http://europe.newsweek.com/wikipedia-takes-google-new-transparent-search-engine-427028. Retrieved 2016-05-08.
  10. Dewey, Caitlin (2016). “You probably haven’t even noticed Google’s sketchy quest to control the world’s knowledge”. The Washington Post. ISSN0190-8286. Retrieved 2016-05-13.
  11. Fichter, D. and Wisniewski, J. (2014). Being Findable: Search Engine Optimization for Library Websites. Online Searcher, 38(5), pp. 74-76.
  12. Fioretti, Julia (2014). “Wikipedia fights back against Europe’s right to be forgotten”. Reuters. Retrieved 2016-05-02.
  13. Foster, Allen; Rafferty, Pauline (2011). Innovations in Information Retrieval: Perspectives for Theory and Practice. Facet. ISBN9781856046978.
  14. Gunter, Barrie; Rowlands, Ian; Nicholas, David (2009). The Google Generation: Are ICT Innovations Changing Information-seeking Behaviour?. Chandos Publishing. ISBN9781843345572.
  15. Halcoussis, Dennis; Halverson, Aniko; Lowenberg, Anton D.; Lowenberg, Susan (2002). “An Empirical Analysis of Web Catalog User Experiences”. Information Technology and Libraries 21 (4). ISSN0730-9295.
  16. Hillis, Ken; Petit, Michael; Jarrett, Kylie (2012). Google and the Culture of Search. Routledge. ISBN9781136933066.
  17. Hoffman, A.J. (2016). Reflections: Academia’s Emerging Crisis of Relevance and the Consequent Role of the Engaged Scholar. Journal of Change Management, 16(2), pp. 77.
  18. Kendall, Susan. “LibGuides: PubMed, Web of Science, or Google Scholar? A behind-the-scenes guide for life scientists.  : So which is better: PubMed, Web of Science, or Google Scholar?”. libguides.lib.msu.edu. Retrieved 2016-05-02.
  19. Koehler, W.C. (1999). “Classifying Web sites and Web pages: the use of metrics and URL characteristics as markers”. Journal of Librarianship and Information Science 31 (1): 21–31. doi:1177/0961000994244336. ISSN0000-0000.
  20. LaFrance, Adrienne (2016). “The Internet’s Favorite Website”. The Atlantic. Retrieved 2016-05-12.
  21. Lecher, Colin (2016). “Google will apply the ‘right to be forgotten’ to all EU searches next week”. The Verge. Retrieved 2016-04-29.
  22. Mendez-Wilson, D (2000). ‘Humanizing The Online Experience’, Wireless Week, 6, 47, p. 30, Business Source Premier, EBSCOhost, viewed 1 May 2016.
  23. Milne, David N.; Witten, Ian H.; Nichols, David M. (2007). “A Knowledge-based Search Engine Powered by Wikipedia”. Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. CIKM ’07 (New York, NY, USA: ACM): 445–454. doi:1145/1321440.1321504. ISBN9781595938039.
  24. Moran, Wes & Tretikova, Lila (2016). “Clarity on the future of Wikimedia search – Wikimedia blog”. Retrieved 2016-05-10.
  25. Nicas, J. and Kendall, B. (2016). “U.S. Expands Google Probe”. Wall Street Journal. ISSN 00999660.
  26. Notess, G.R., (2013). Search Engine to Knowledge Engine? Online Searcher, 37(4), pp. 61-63.
  27. Notess, G.R. (2016). SEARCH ENGINE update. Online Searcher, 40(2), pp. 8-9.
  28. Notess, G.R., (2016). SEARCH ENGINE update. Online Searcher, 40(1), pp. 8-9.
  29. Notess, G.R., (2014). Computational, Numeric, and Data Searching. Online Searcher, 38(4), pp. 65-67.
  30. Park, Taemin Kim (2011). “The visibility of Wikipedia in scholarly publications”. First Monday 16 (8). doi:5210/fm.v16i8.3492. ISSN1396-0466.
  31. Price, Gary (2016). “Digital Preservation Coalition Releases New Tech Watch Report on Preserving Social Media | LJ INFOdocket”. www.infodocket.com. Retrieved 2016-05-01.
  32. Ratfcliff, Chris (2016).“Six of the most interesting SEM news stories of the week” | Search Engine Watch”. Retrieved 2016-05-10.
  33. Robinson, R. (2016) How Google Stole the Work of Millions of Authors. Wall Street Journal. ISSN 00999660.
  34. Rowley, J. E.; Hartley, Richard J. (2008). Organizing Knowledge: An Introduction to Managing Access to Information. Ashgate Publishing, Ltd. ISBN9780754644316.
  35. Sandhu, A. K.; Liu, T. (2014). “Wikipedia search engine: Interactive information retrieval interface design”. 2014 3rd International Conference on User Science and Engineering (i-USEr): 18–23. doi:1109/IUSER.2014.7002670
  36. Sentance, R. (2016). “Everything you need to know about Wikimedia’s ‘Knowledge Engine’ so far | Search Engine Watch. Retrieved 2016-05-02.
  37. Simonite, Tom (2013).“The Decline of Wikipedia”. MIT Technology Review. Retrieved 2016-05-09.
  38. Swanson, Troy (2012). Managing Social Media in Libraries: Finding Collaboration, Coordination, and Focus. Elsevier. ISBN9781780633770.
  39. Van Dijck, José (2010). “Search engines and the production of academic knowledge”. International Journal of Cultural Studies 13 (6): 574–592. doi:1177/1367877910376582. ISSN1367-8779.
  40. Wells, David (2007). “What is a library OPAC?”. The Electronic Library 25 (4): 386–394. doi:1108/02640470710779790. ISSN0264-0473.

 

Bibliographic databases utilised

 

#1Lib1Ref – Wikipedia turns 16

Getting citations into Wikipedia – can you spare 16 minutes to mark Wikipedia’s 16th birthday?

 

#1Lib1Ref - 1 Librarian adding 1 Reference
#1Lib1Ref – 1 Librarian adding 1 Reference

 

It’s been quite the week in politics this week. #CitationDefinitelyNeeded

#1Lib1Ref - 1 Librarian adding 1 Reference
#1Lib1Ref – 1 Librarian adding 1 Reference

On Sunday 15th January 2017, Wikipedia will turn 16 years old. How often do you think you have used the free online encyclopaedia in this time?

In this Google Talk, the Wikimedia Foundation’s Executive Director, Katherine Maher, speaks engagingly about Wikipedia’s humble beginnings in 2001, where it is now and, importantly, where it is going.

To mark Wikipedia’s birthday, the Wikipedia Library are repeating their successful #1Lib1Ref campaign from last year. This global campaign “1 Librarian 1 Reference” (#1Lib1Ref) is to get Information Services professionals and educators adding citations to Wikipedia.

Events are taking place at the National Library of Scotland, the Bodleian Library in Oxford and all over the globe from January 15th To February 3rd 2017 but here at the University of Edinburgh we are kicking things off by asking you to spare a mere 16 minutes to mark Wikipedia’s 16 years on Friday 20th January 2017. (You won’t even need to leave your desk).

Your 1,2,3 to taking part in next Friday’s #1Lib1Ref event.

 

  1. Have a nosy at what is involved: https://meta.wikimedia.org/wiki/The_Wikipedia_Library/1Lib1Ref – This link runs through what is involved (essentially finding one reference to back up a statement on Wikipedia that has no citation backing it up).
  2. Create a Wikipedia account ahead of Friday’s event. This 3 minute video shows what you need to do to setup your account. (NB: It is better if you do create an account at home ahead of time as Wikipedia limits the number of accounts that can be created from a single IP address within a 24 hour period to a mere 6 accounts.)
  3. On the day itself – This 5 minute video demos what you need to do. Essentially using the Citation Hunt tool to find a Wikipedia page that is both missing a citation  & that you are interested in helping out; and guiding you as to how to go about finding a suitable reference to fill that knowledge gap. NB: This post from the Biodiversity Heritage Library also illustrates the process too.

 

As you save your citation, please remember to add the hashtag #1Lib1Ref in your edit summary so that we can track participation in the event. We will announce these contributions on social media with the  strengthening Wikipedia’s links to scholarly publications and celebrating the collective expertise of the world’s Information Service professionals (so any pics you can share with the #1Lib1Ref hashtag would be greatly appreciated).

This is a chance to create incoming links or citations from articles that are usually the top Google hit for their topic. Citations can be to paper or electronic sources, that you are interested in professionally or otherwise. If you can supply citations for topics or authors that are under-represented in Wikipedia, then all the better. In January 2016, librarians around the world made thousands of edits to Wikipedia, with publicity seen by millions of people. You can read more about last year’s event here.

We live in the information age and the aphorism ‘one who possess information possesses the world’ of course reflects the present-day reality.” (Vladimir Putin in Interfax:Russia & CIS Presidential Bulletin, 30 June 2016).

To mark Wikipedia’s 16th birthday, and to assert that facts really do matter, let’s find Wikipedia pages we can help improve… and spend a few moments improving them with a reference (or two).

#FactsMatter #1Lib1Ref

Hot Topics and Cool Cats – Wikimania 2016 (22-26 June)

 

Wikimania
Wikimania

The annual conference celebrating Wikipedia and its sister projects was held in the alpine town of Esino Lario in the province of Lecco, Northern Italy, this year.

It was my first but I am led to believe that this year’s venue, and this year’s conference in general, was quite different from the ones in years gone by; certainly the rural location was quite different from the Hilton Hotel in Mexico City in 2015 and the Barbican in London in 2014.

This time Wikimania really was going outdoors.

IMG_20160626_145606141

Listen to a podcast roundup of Wikimania 2016 in Esino Lario, Italy, recorded on a bus after the Wikimania conference.

There was another gathering going on the day I left for the conference however: the EU referendum vote. Given that I was due to catch a 7.45am flight from Glasgow Airport on the day of the EU referendum, I left my vote in the hands of my girlfriend to vote on my behalf. (The thunder storms that delayed the flight from landing at London Heathrow should have been a portent for the political turmoil to come.)

IMG_20160623_095927957

However, I was in good spirits despite the delay and, even when the consequence of the London storm was that I missed my bus connection from Milan airport to Esino Lario, I was busy contemplating how it might be nice to spend a bit more time travelling by train from Milan Central to Varenna-Esino. Fortunately, I found myself in the same boat as Lucy Crompton-Reid, CEO of Wikimedia UK, who had been on the same flight. A quick chat with a terrifically pleasant Italian gentlemen at the Wikimania greeters’ table at the airport and a taxi was arranged to take us both the rest of the way to Esino Lario.

While we waited, and our charming Italian saviour checked our names off his list of expected delegates, we were told the sad tale of one particular delegate who earlier in the day had been told that his name definitely wasn’t on the list and would he mind checking the FIVE pages of names on the list himself to see that was the case. Perplexed, the man had taken one long look at the list and replied, “But I’m Jimmy Wales.” (Needless to say, I think he probably made it back to Esino Lario okay after that, especially after a few selfies were taken with the volunteers from the local high school.)

IMG_20160624_200214814_HDR

A picturesque drive through Alpine country to Esino Lario in the company of Lucy’s incredibly entertaining, but incredibly dark, sense of humour and I got settled into the family-run hotel I was to spend the next four nights in. Once registered, I was able to wind my way through the narrow cobbled side streets to meet with my fellow Wikimaniacs at the central reception area.

IMG_20160626_132528842

The experience of the first night’s good-humoured chats were typical of the whole conference; here were Wikimaniacs from all over the world ostensibly divided by different backgrounds, languages & cultures but who were all united by their passion for working collaboratively & sharing open knowledge through Wikimedia’s projects.

So it was with some shock that I discovered the next morning that the referendum result had been that the UK had chosen to turn its back on working together as part of the EU. It just ran contrary to everything that Wikimania, and Wikimedia in general, was all about. Consequently, Jimmy Wales in his keynote address at the opening ceremony could not help but address this seismic decision back home in Britain. Clearly emotional, Jimmy Wales referenced the murder of his friend Jo Cox MP, the EU referendum & Donald Trump, when he asserted that Wikipedia was not about the rhetoric of hate or division or of building walls but rather was about building bridges. Wikipedia was instead a “force for knowledge and knowledge is a force for peace and understanding.”

IMG_20160626_202639634_HDR

The focus of the programme for Wikimania 2016, therefore, was on Wikipedia as a ‘driver for change’.

Watch Jimmy Wales’ keynote address here

Of course, I couldn’t get in to see the keynote in person. The venue, the Gym Palace, could only hold around six hundred people and with around 1200 Wikimaniacs, plus curious townspeople attending too, the venue and the wi-fi soon because saturated. Hence, a great many people, myself included, got turned away to watch the keynote opening ceremony via the live stream at a nearby hall. Unfortunately, the one thing that everyone had been worried about prior to the conference occurred; the wi-fi couldn’t cope and we were left with a pixelated image of the opening ceremony that got stuck in buffering limbo. Little wonder then that a massive cheer went up when the young Esino Lario volunteers put on a Youtube clip of Cool cats doing crazy things’ to keep the audience entertained while they desperately tried to fix the live stream.

IMG_20160624_165556099

The town of Esino Lario itself only has a population of around 760 inhabitants so the people of Esino Lario really did invite the 1000+ Wikimaniacs into their homes and I can honestly say that we were treated extremely well by our hosts. The hope is that the experience of hosting Wikimania in such a small town will have an enormous impact on the local economy & a legacy such that their young people, who worked as volunteers to help run the events and made sure we were well looked after in terms of espresso & soft drinks while we walked in the heat of the afternoon sun from venue to venue, may hopefully look to careers in tech and become the next generation of Wikimedians.

The rest of the conference brought no further technical problems and everyone seemed to enjoy the relaxed atmosphere, and stunning views of the surrounding Alpine mountains, to learn & share both in formal presentations and informal discussions in-between times. There was also a preponderance of egalitarian community discussions to determine how each project should move forward which were recorded on Etherpad discussion pages (I made good use of these during the few days I was at the conference to follow real-time discussions at several venues at once.)

IMG_20160625_104730115_HDR

The ticketing system for meal times was a hit too as it meant you were allocated to a certain venue at a certain time so that you couldn’t stay in the same clique & always encountered new people to chat to over a delicious plate of pasta. The evening events – chocolate tasting, cheese & wine, evening hikes, line dancing, a live band, a falcon playing a theremin – all allowed for further discussions and it was a real pleasure to be able to learn through ‘play’ in such relaxed surroundings.

IMG_20160624_174940075

In terms of content, Wikidata proved its growing importance in the Wikimedia movement with a number of sessions threading through the conference and I was also pleased to see Open Street Map and Wikisource, the free content library, garnering greater attention & affection. The additional focus on education, especially higher education, with sessions on Wikipedia’s verifiability, the state of research on Wikipedia and the tidying up of citations was terrific to see. Overall though, it was great to see further focus on translation between Wikipedias and on areas of under-representation: on the gender gap and on the Global South in particular. As one session put it, there is only one international language: translation.

Watch all the talks at Wikimania 2016 on their Youtube channel

IMG_20160630_120448

In a nutshell, the weather was hot, the espresso was hot and the whole town was a hotbed of ideas with people on every street corner discussing the projects they were working on or wanted to find out more about. #Brexit was the hot topic of conversation too but it felt a million miles away; completely unreal & counter-intuitive when the fruits of cross-border collaboration were there for all to see at every turn. People I had encountered only in the online world I was finally able to meet in the flesh and warmly discuss past, present & future collaborations. It was especially pleasing to be able to meet the Wikipedia Library’s Alex Stinson and my Edinburgh Spy Week: Women in Espionage editathon collaborator, Rosie Stephenson-Goodknight from WikiProject Women in Red, who deservingly had just been made Wikipedian of the Year for the work WikiWomeninRed had done in helping to address the gender gap. Warm hugs and warm handshakes about working together was what Wikimania meant to me.

IMG_20160625_173701811

Boarding the bus for the airport home on the Monday morning, I was able to listen in on Andrew Lih’s (author of ‘The Wikipedia Revolution’) roundtable discussion with the Wikimedia Foundation’s James Forrester and Cambridge University’s Wikimedian, Deryck Chan, about their reflections on Wikimania 2016 (as it was recorded as a podcast on the bus at the table of seats nearby).

Listening to their summary of proceedings while I looked out the window at the rolling Alpine foothills & waterfalls proved a nice full-stop to proceedings as it confirmed what UNESCO Wikimedian in Residence, John Cummings, had told me first and many, many others had said since… this was the best Wikimania ever.

A little light Summer reading – Wikipedia & the PGCAP course

I was pleased we were able to host a week themed on ‘Wikimedia & Open Knowledge’ as part of the University of Edinburgh’s Postgraduate Certificate of Academic Practice.

Participants on the course were invited to think critically about the role of Wikipedia in academia.

In particular, to read, consider, contrast and discuss four articles:

  • The first by Dr. Martin Poulter, Wikimedian in Residence at the University of Oxford, is highly recommended in terms of articulating Wikipedia & its sister projects role in allowing digital ‘shiver-inducing’ contact with library & archival material;
Search Failure: The Challenge of Modern Information Retrieval in an age of information explosion.
Search Failure: The Challenge of Modern Information Retrieval in an age of information explosion.

In addition – RECOMMENDED reading on Wikipedia’s role in academia.

 

  1. https://wikiedu.org/blog/2014/10/14/wikipedia-student-writing/ – HIGHLY RECOMMENDED
  2. https://outreach.wikimedia.org/wiki/Education/Reasons_to_use_Wikipedia
  3. http://www.theatlantic.com/technology/archive/2016/05/people-love-wikipedia/482268/
  4. https://medium.com/@oiioxford/wikipedia-s-ongoing-search-for-the-sum-of-all-human-knowledge-6216fb478bcf#.5gf0mu71b  RECOMMENDED
  5. https://wikiedu.org/blog/2016/01/14/wikipedia-15-and-education/
  6. https://www.refme.com/blog/2016/01/15/wikipedia-the-digital-gateway-to-academic-research

This was my response to the reading (and some additional reading).

Title:

Search failure: the challenges facing information retrieval in an age of information explosion.

 

Abstract:

This article takes, as its starting point, the news that Wikipedia were reportedly developing a ‘Knowledge Engine’ and focuses on the most dominant web search engine, Google, to examine the “consecrated status” (Hillis, Petit & Jarrett, 2013) it has achieved and its transparency, reliability & trustworthiness for everyday searchers.

 

Introduction:

The purpose of this article is to examine the pitfalls of modern information retrieval & attempts to circumnavigate them, with a focus on the main issues surrounding Google as the world’s most dominant search engine.

 

“Commercial search engines dominate search-engine use of the Internet, and they’re employing proprietary technologies to consolidate channels of access to the Internet’s knowledge and information.” (Cuthbertson, 2016)

 

On 16th February 2016, Newsweek published a story entitled ‘Wikipedia Takes on Google with New ‘Transparent’ Search Engine’. The figure applied for, and granted by the Knight Foundation, was a reported $250,000 dollars as part of the Wikimedia Foundation’s $2.5 million programme to build ‘the Internet’s first transparent search engine’.

The sum applied for was relatively insignificant when compared to Google’s reported $75 billion revenue in 2015 (Robinson, 2016). Yet, it posed a significant question; a fundamental one. Just how transparent is Google?

 

Two further concerns can be identified from the letter to Wikimedia granting the application: “supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet.”(Cuthbertson, 2016). This goes to the heart of the current debate on modern information retrieval: transparency, reliability and trustworthiness? How then are we faring in these three measures?

 

  1. Defining Information Retrieval

Informational Retrieval is defined as “a field concerned with the structure, analysis, organisation, storage, searching, and retrieval of information.” (Salton in Croft, Metzler & Strohman, 2010, p.1).

Croft et al (2010) identify three crucial concepts in information retrieval:

  • Relevance – Does the returned value satisfy the user searching for it.
  • Evaluation  – Evaluating the ranking algorithm on its precision and recall.
  • Information Needs  – What needs generated the query in the first place.

Today, since the advent of the internet, this definition needs to be understood in terms of how pervasive ‘search’ has become. “Search is the way we now live.” (Darnton in Hillis, Petit & Jarrett, 2013, p.5). We are all now ‘searchers’ and the act of ‘searching’ (or ‘googling’) has become intrinsic to our daily lives.

 

  1. Dominance of one search engine

 

When you turn on a tap you expect clean water to come out and when you do a search you expect good information to come out” (Swift in Hillis, Petit & Jarrett, 2013)

 

With over 60 trillion pages (Fichter and Wisniewski, 2014) and terabytes of unstructured data to navigate, the need for speedy & accurate responses to millions of queries has never been more important.

 

Navigating the vast sea of information present on the web means the field of Information Retrieval necessitates wrestling with, and constantly tweaking, the design of complex computer algorithms (determining a top 10 list of ‘relevant’ page results through over 200 factors).

 

Google, powered by its PageRank algorithm, has dominated I.R. since the early 1990s, indexing the web like a “back-of-the-book” index (Chowdhury, 2010, p.5). While this oversimplifies the complexity of the task, modern information retrieval, in searching through increasingly multimedia online resources, has necessitated the addition of newer more sophisticated models. Utilising ‘artificial intelligence’ & semantic search technology to complement the PageRank algorithm, Google now navigates through the content of pages & generates suggested ‘answers’ to queries as well as the 10 clickable links users commonly expect.

 

According to 2011 figures in Hillis, Petit & Jarrett (2013), Google processed 91% of searches internationally and 97.4% of the searches made using mobile devices. This undoubted & sustained dominance has led to accusations of abuse of power in two recent instances.

 

Nicas & Kendall (2016) report that the Federal Trade Commission along with European regulators are examining claims that Google has been abusing its position in terms of smartphone companies feeling they had to give Google Services preferential treatment because of Android’s dominance.

 

In addition, Robinson (2016) states that the Authors Guild are petitioning the Supreme Court over Google’s alleged copyright-infringement; going back a decade ago when over 20 million library books were digitised without compensation or author/publisher permission. The argument is that the content taken has since been utilised by Google for commercial gain to generate more traffic, more advertising money and thus confer on them market leader status. This echoes the New Yorker article’s response to Google’s aspiration to build a digital universal library: “Such messianism cannot obscure the central truth about Google Book Search: it is a business” (Toobin in Hillis, Petit & Jarrett, 2013).

 

  1. PageRank

Google’s business is powered, like every search engine, by its ranking algorithm. For Cahill et al (2009), Google’s “PageRank is a quantitative rather than qualitative system”.  PageRank works by ranking pages in terms of how well linked a page is, how often it is clicked on and the importance of the page(s) that links to it. In this way, PageRank assigns importance to a page.

 

Other parameters are taken into consideration including, most notably, the anchor text which provides a short descriptive summary of the page it links to. However, the anchor text has been shown to be vulnerable to manipulation, primarily from bloggers, by the process known as ‘Google bombing’. Google bombing is defined as “the activity of designing Internet links that will bias search engine results so as to create an

inaccurate impression of the search target” (Price in Bar-Ilan, 2007).  Two famous examples include when Microsoft came as top result for the query ‘More evil than Satan’ and when President Bush ranked as first result for ‘miserable failure’. Bar-Ilan (2007) suggests google bombs come about for a variety of reasons: ‘fun, ‘personal promotion’, ‘commercial’, ‘justice’, ‘ideological’ and ‘political’.

 

Although reluctant to alter search results, the reputational damage google bombs were having necessitated a response. In the end, Google altered the algorithm to defuse a number of google bombs. Despite this, “spam or joke sites still float their way to the top.”(Cahill et al, 2009) so there is a clear argument to be had about Google, as a private corporation, continuing to ‘tinker’ with the results delivered by its algorithm and how much its coders should, or should not, arbitrate access to the web in this way. After all, the algorithm will already bear hallmarks of their own assumptions without any transparency on how these decisions are arrived at. Further, Google Bombs, Byrne (2004) argues, empower those web users whom the ranking system, for whatever reason, has disenfranchised.

 

Just how reliable & trustworthy is Google?

 

Easy, efficient, rapid and total access to Truth is the siren song of Google and the culture of search. The price of access: your monetizable information.”(Hillis, Petit & Jarrett, 2013, p.7)

For Cahill et al (2009), Google has made the process of searching too easy and searchers have becoming lazier as a result; accepting Google’s ranking at face value. Markland in van Dijck (2010) makes the point that students favouring of Google means they are dispensing with the services libraries provide. The implication being that, despite library information services delivering a more relevant & higher quality search result, Google’s quick & easy ‘fast food’ approach is hard to compete with.

This seemingly default trust in the neutrality of Google’s ranking algorithm also has a ‘funnelling effect’ according to Beel & Gipp (2009); narrowing the sources clicked upon 90% of the time to just the first page of results with a 42% click through on the first choice alone. This then creates a cosy consensus in terms of the fortunate pages clicked upon which will improve their ranking while “smaller, less affluent, alternative sites are doubly punished by ranking algorithms and lethargic searchers.” (Pan et al. in van Dijck, 2010)

 

While Google would no doubt argue that all search engines closely guard how their ranking algorithms are calibrated to protect them from aggressive competition, click fraud and SEO marketing, the secrecy is clearly at odds with principles of public librarianship. Further, Van Dijck (2010) argues that this worrying failure to disclose is concealing how knowledge is produced through Google’s network and the commercial nature of Google’s search engine. After all, search engines greatest asset is the metadata each search leaves behind. This data can be aggregated and used by the search engine to create profiles of individual search behaviour and collective profiles which can then be passed on to other commercial companies for profit. That is not to say it always does but there is little legislation to stop it in an area that is largely unregulated. The right to privacy does not, it seems, extend to metadata and ‘in an era in which knowledge is the only bankable commodity, search engines own the exchange floor.’ (Halavais in van Dijck, 2010)

 

  1. Scholarly knowledge and the reliability of Google Scholar

When considering the reliability, transparency & trustworthiness of Google and Google Scholar it is pertinent to look at its scope and differences with other similar sites. Unlike Pubmed and Web of Science, Google Scholar is not a human-curated database but is instead an internet search engine therefore its accuracy & content varies greatly depending on what has been submitted to it.  Google Scholar does have an advantage is that it searches the full text of articles therefore users may find searching easier on Scholar compared to WoS or Pubmed which are limited to searching according to the abstract, citations or tags.

Where Google Scholar could be more transparent is in its coverage as some notable publishers have been known, according to van Dijck (2010), to refuse to give access to their databases. Scholar has also been criticised for the lack of completeness of its citations, as well as its covering of social science and humanities databases; the latter an area of strength for Wikipedia according to Park (2011). But the searcher utilising Google Scholar would be unaware of these problems of scope when they came to use it.

Further, Beel & Gipp (2009) state that the ranking system on Google Scholar, leads to articles with lots of citations receiving higher rankings, and as a result, receive even more citations because of this. Hence, while the digitization of sources on the internet opens up new avenues for scholarly exploration, ranking systems can be seen to close ranks on a select few to the exclusion of others.

As Van Dijck (2010) points out: “Popularity in the Google-universe has everything to do with quantity and very little with quality or relevance.” In effect, ranking systems determine which sources we can see but conceal how this determination has come about. This means that we are unable to truly establish the scope & relevance of our search results. In this way, search engines cannot be viewed as neutral, passive instruments but are instead active “actor networks” and “co-producers of academic knowledge.” (van Dijck, 2010).

Further, it can be argued that Google decides which sites are included in its top ten results. With so much to gain commercially, from being discoverable on Google’s first page of results, the practice of Search Engine Optimising (SEO), or manipulating the algorithm to get your site in the top ten search results, has become widespread. SEO techniques can be split into ‘white hat’ (legitimate businesses with a relevant product to sell) and ‘black hat’ (sites who just want clicks and tend not to care about the ‘spamming’ techniques they employ to get them). As a result, PageRank has to be constantly manipulated, as with Google bombs, to counteract the effects of increasingly sophisticated ‘black hat’ techniques. Hence, the need for an improved vigilance & critical evaluation of the searches returned by Google has become a crucial skill in modern information retrieval.

 

  1. The solution: Google’s response to modern information retrieval – Answer Engines

Google is the great innovator and is always seeking newer, better ways of keeping users on its sites and improving its search algorithm. Hence, the arrival of Google Instant in 2010 to autofill suggested keywords to assist searchers. This was followed by Google’s Knowledge Graph (and its Microsoft equivalent Bing Snapshot). These new services seek not just to provide the top ten links to a search query but also to ‘answer’ it by providing a number of the most popular suggested answers on the page results screen (usually showing an excerpt of the related Wikipedia article & images along the side panel), based on, & learning from, previous users’ searches on that topic.

Google’s Knowledge Graph is supported by sources including Wikipedia & Freebase (and the linked data they provide) along with a further innovation, RankBrain, which utilises artificial intelligence to help decipher the 15% of queries Google has not seen before. As Barr (2016) recognises: “A.I. is becoming increasingly important to extract knowledge from Google’s sea of data, particularly when it comes to classifying and recognizing patterns in videos, images, speech and writing.”

Bing Snapshot does much the same. The difference being that Bing provides links to the sources it uses as part of the ‘answers’ it provides. Google provides information but does not attribute it. Without this, it is impossible to verify their accuracy. This seems to be one of the thorniest issues in modern information retrieval; link decay and the disappearing digital provenance of sources. This is in stark contrast to Wikimedia’s efforts in creating Wikidata: “an open-license machine-readable knowledge base” (Dewey 2016) capable of storing digital provenance & structured bibliographic data. Therefore, while Google Knowledge Panels are a step forward, there are issues again over its transparency, reliability & trustworthiness.

Moreover, the 2014 EU Court ruling onthe right to be forgotten’, which Google have stated they will honour, also muddies the waters on issues of transparency & link decay/censorship:

Accurate search results are vanishing in Europe with no public explanation, no real proof, no judicial review, and no appeals processthe result is an Internet riddled with memory holes — places where inconvenient information simply disappears.”(Fioretti, 2014).

The balance between an individual’s “right to be forgotten” and the freedom of information clearly still has to be found. At the moment, in the name of transparency, both Google and Wikimedia are posting notifications to affected pages that they have received such requests. For those wishing to be ‘forgotten’ this only highlights the matter & fuels speculation unnecessarily.

 

  1. The solution: Wikipedia’s ‘transparent’ search engine: Discovery

Since the setup of the ‘Discovery’ team in April 2015 and the disclosure of the Knight Foundation grant, there have been mixed noises from Wikimedia with some claiming that there was never any plan to rival Google because a newer ‘internal’ search engine was only ever being developed in order to integrate Wikimedia projects through one search portal.

Ultimately, a lack of consultation between the board and the wider Wikimedia community members reportedly undermined the project & culminated in the resignation of Lila Tretikov, Executive Director of the Wikimedia Foundation, at the end of February and the plans for Discovery were shelved.

However, Sentance (2016) reveals that, in their leaked planning documents for Discovery, the Foundation were indeed looking at the priorities of proprietary search engines, their own reliance on them for traffic and how they could recoup traffic lost to Google (through Google’s Knowledge Graph) at the same time as providing a central hub for information from across all their projects through one search portal. Wikipedia results, after all, regularly featured in the top page of Google results anyway – why not skip the middle man?

Quite how internet searchers may have taken to a completely transparent, non-commercial search engine we’ll possibly never know. However, it remains a tantalizing prospect.

 

  1. The solution: Alternatives Engines

An awareness of the alternative search engines available for use and their different strengths and weaknesses is a key component of the information literacy needed to navigate this sea of information. Bing Snapshot, for instance, makes greater use of providing the digital provenance for its sources than Google at present.

Notess (2016) serves notice that computational searching (e.g. Wolfram Alpha) continues to flourish along with search engines geared towards data & statistics (e.g. Zanran, DataCite.org and Google Public Data Explorer).

However, knowing about the existence of these differing search engines is one thing but knowing how to successfully navigate them is quite another as Notess (2016) himself concludes where “Finding anything beyond the most basic of statistics requires perseverance and experimenting with a variety of strategies.”

Information literacy, it seems, is key.

 

  1. The solution: The need for information literacy

Given that electronic library services are maintained by information professionals, “values such as quality assessment, weighed evaluation & transparency” (van Dijck, 2010) are in much greater evidence than in commercial search engines. That is not to say that there aren’t still issues in library OPAC systems: whether it be in terms of the changes in the classification system used over time or the differing levels of adherence by staff to these classification protocols; or the communication to users of best practice in utilising the system.

The use of any search engine, requires literacy among the user group. The fundamental problem remains the disconnect between what a user inputs and what they can feasibly expect at the results stage. Understanding the nature of the search engine being used (proprietary or otherwise) a critical awareness of how knowledge is formed through its network and the type of search statement that will maximise your chances of success are all vital. As van Dijck (2010) states “Knowledge is not simply brokered (‘brought to you’) by Google or other search engines… Students and scholars need to grasp the implications of these mechanisms in order to understand thoroughly the extent of networked power”(Dijck, 2010).

Educating users of this broadens the search landscape, and defuses SEO attempts to circumvent our choices. Information literacy cannot be left to academics or information professionals alone, though they can play a large part in its dissemination. As mentioned at the beginning, we are all ‘searchers’. Therefore, it is incumbent on all of us to become literate in the ways of ‘search’ and pass it on, creating our own knowledge networks. Social media offers us a means of doing this; allowing us to filter information as never before and filtering is “transforming how the web works and how we interact with our world.” (Swanson, 2012)

 

Conclusion

Google may never become any more transparent. Hence, its reliability & trustworthiness will always be hard to judge. Wikipedia’s Knowledge Engine may have offered a distinctive model more in line with these terms but it is unlikely, at least for now, to be able to compete as a global crawler search engine.

 

 

Therefore, it is incumbent on searchers not to presume neutrality or assign any kind of benign munificence on any one search engine. Rather by educating themselves as to the merits & drawbacks of Google and other search engines, users will then be able to formulate their searches, and their use of search engines, with a degree of information literacy. Only then can they hope the returned results will match their individual needs with any degree of satisfaction or success.

Bibliography

  1. Arnold, A. (2007). Artificial intelligence: The dawn of a new search-engine era. Business Leader, 18(12), pp. 22.
  2. Bar‐Ilan, Judit (2007). “Manipulating search engine algorithms: the case of Google”. Journal of Information, Communication and Ethics in Society 5 (2/3): 155–166. doi:1108/14779960710837623. ISSN1477-996X.
  3. Barr, A. (2016). WSJ.D Technology: Google Taps A.I. Chief To Replace Departing Search-Engine Head. Wall Street Journal. ISSN 00999660.
  4. Beel, J.; Gipp, B. (2009). “Google Scholar’s ranking algorithm: The impact of citation counts (An empirical study)”. 2009 Third International Conference on Research Challenges in Information Science: 439–446. doi:1109/RCIS.2009.5089308.
  5. Byrne, S. (2004). Stop worrying and learn to love the Google-bomb. Fibreculture, (3).
  6. Cahill, Kay; Chalut, Renee (2009). “Optimal Results: What Libraries Need to Know About Google and Search Engine Optimization”. The Reference Librarian 50 (3): 234–247. doi:1080/02763870902961969. ISSN0276-3877.
  7. Chowdhury, G.G. (2010). Introduction to modern information retrieval. Facet. ISBN 9781856046947.
  8. Croft, W. Bruce; Metzler, Donald; Strohman, Trevor (2010). Search Engines: Information Retrieval in Practice. Pearson Education. ISBN9780131364899.
  9. Cuthbertson, A. (2016)“Wikipedia takes on Google with new ‘transparent’ search engine”. Available at: http://europe.newsweek.com/wikipedia-takes-google-new-transparent-search-engine-427028. Retrieved 2016-05-08.
  10. Dewey, Caitlin (2016). “You probably haven’t even noticed Google’s sketchy quest to control the world’s knowledge”. The Washington Post. ISSN0190-8286. Retrieved 2016-05-13.
  11. Fichter, D. and Wisniewski, J. (2014). Being Findable: Search Engine Optimization for Library Websites. Online Searcher, 38(5), pp. 74-76.
  12. Fioretti, Julia (2014). “Wikipedia fights back against Europe’s right to be forgotten”. Reuters. Retrieved 2016-05-02.
  13. Foster, Allen; Rafferty, Pauline (2011). Innovations in Information Retrieval: Perspectives for Theory and Practice. Facet. ISBN9781856046978.
  14. Gunter, Barrie; Rowlands, Ian; Nicholas, David (2009). The Google Generation: Are ICT Innovations Changing Information-seeking Behaviour?. Chandos Publishing. ISBN9781843345572.
  15. Halcoussis, Dennis; Halverson, Aniko; Lowenberg, Anton D.; Lowenberg, Susan (2002). “An Empirical Analysis of Web Catalog User Experiences”. Information Technology and Libraries 21 (4). ISSN0730-9295.
  16. Hillis, Ken; Petit, Michael; Jarrett, Kylie (2012). Google and the Culture of Search. Routledge. ISBN9781136933066.
  17. Hoffman, A.J. (2016). Reflections: Academia’s Emerging Crisis of Relevance and the Consequent Role of the Engaged Scholar. Journal of Change Management, 16(2), pp. 77.
  18. Kendall, Susan. “LibGuides: PubMed, Web of Science, or Google Scholar? A behind-the-scenes guide for life scientists.  : So which is better: PubMed, Web of Science, or Google Scholar?”. libguides.lib.msu.edu. Retrieved 2016-05-02.
  19. Koehler, W.C. (1999). “Classifying Web sites and Web pages: the use of metrics and URL characteristics as markers”. Journal of Librarianship and Information Science 31 (1): 21–31. doi:1177/0961000994244336. ISSN0000-0000.
  20. LaFrance, Adrienne (2016). “The Internet’s Favorite Website”. The Atlantic. Retrieved 2016-05-12.
  21. Lecher, Colin (2016). “Google will apply the ‘right to be forgotten’ to all EU searches next week”. The Verge. Retrieved 2016-04-29.
  22. Mendez-Wilson, D (2000). ‘Humanizing The Online Experience’, Wireless Week, 6, 47, p. 30, Business Source Premier, EBSCOhost, viewed 1 May 2016.
  23. Milne, David N.; Witten, Ian H.; Nichols, David M. (2007). “A Knowledge-based Search Engine Powered by Wikipedia”. Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. CIKM ’07 (New York, NY, USA: ACM): 445–454. doi:1145/1321440.1321504. ISBN9781595938039.
  24. Moran, Wes & Tretikova, Lila (2016). “Clarity on the future of Wikimedia search – Wikimedia blog”. Retrieved 2016-05-10.
  25. Nicas, J. and Kendall, B. (2016). “U.S. Expands Google Probe”. Wall Street Journal. ISSN 00999660.
  26. Notess, G.R., (2013). Search Engine to Knowledge Engine? Online Searcher, 37(4), pp. 61-63.
  27. Notess, G.R. (2016). SEARCH ENGINE update. Online Searcher, 40(2), pp. 8-9.
  28. Notess, G.R., (2016). SEARCH ENGINE update. Online Searcher, 40(1), pp. 8-9.
  29. Notess, G.R., (2014). Computational, Numeric, and Data Searching. Online Searcher, 38(4), pp. 65-67.
  30. Park, Taemin Kim (2011). “The visibility of Wikipedia in scholarly publications”. First Monday 16 (8). doi:5210/fm.v16i8.3492. ISSN1396-0466.
  31. Price, Gary (2016). “Digital Preservation Coalition Releases New Tech Watch Report on Preserving Social Media | LJ INFOdocket”. www.infodocket.com. Retrieved 2016-05-01.
  32. Ratfcliff, Chris (2016).“Six of the most interesting SEM news stories of the week” | Search Engine Watch”. Retrieved 2016-05-10.
  33. Robinson, R. (2016) How Google Stole the Work of Millions of Authors. Wall Street Journal. ISSN 00999660.
  34. Rowley, J. E.; Hartley, Richard J. (2008). Organizing Knowledge: An Introduction to Managing Access to Information. Ashgate Publishing, Ltd. ISBN9780754644316.
  35. Sandhu, A. K.; Liu, T. (2014). “Wikipedia search engine: Interactive information retrieval interface design”. 2014 3rd International Conference on User Science and Engineering (i-USEr): 18–23. doi:1109/IUSER.2014.7002670
  36. Sentance, R. (2016). “Everything you need to know about Wikimedia’s ‘Knowledge Engine’ so far | Search Engine Watch. Retrieved 2016-05-02.
  37. Simonite, Tom (2013).“The Decline of Wikipedia”. MIT Technology Review. Retrieved 2016-05-09.
  38. Swanson, Troy (2012). Managing Social Media in Libraries: Finding Collaboration, Coordination, and Focus. Elsevier. ISBN9781780633770.
  39. Van Dijck, José (2010). “Search engines and the production of academic knowledge”. International Journal of Cultural Studies 13 (6): 574–592. doi:1177/1367877910376582. ISSN1367-8779.
  40. Wells, David (2007). “What is a library OPAC?”. The Electronic Library 25 (4): 386–394. doi:1108/02640470710779790. ISSN0264-0473.

 

Bibliographic databases utilised