SCROLL DOWN TO READ THE POST
JSTOR Text Analyzer
Upload or drag a document–an article, a Google document, a paper you are writing, a PDF or even an image–into what JSTOR is calling its magic box, and Text Analyzer will analyze it to identify prioritized terms.
Prioritized terms allow researchers to easily identify relevant content from within the JSTOR database. Users may manipulate these terms–add more terms or phrases, delete terms or adjust their importance–to improve the relevance of their results.
JSTOR’s announcement blog post by Alex Humphreys explains how Text Analyzer might support the novice as well as the more experienced academic researcher, noting that keyword search is not always a perfect strategy.
Thinking only of keyword search within an academic context: junior researchers sometimes flail and thrash as they figure out the right keywords for their search – they know what they want, but what set of jargon-y terms will help them find it? At the other end of the spectrum, more experienced researchers can find themselves caught in discipline- or citation-based silos, unaware of what they are unaware of (until the peer review feedback comes in…). I think JSTOR Labs might have something to help with these problems.
It’s pretty flexible. It’ll accept most kinds of documents: PDFs, Word, html, etc. You can cut-and-paste text into it. If you paste or drag a URL into what we’ve been calling “the magic box,” the tool will go to the web page and analyze the text of that page (this works for Google Docs, too). If you access the tool on your phone, it will encourage you to use your phone’s camera to take a picture of a page of text, which it will read and then search based on that text. Heck, if you upload a picture without any text in it, it will try to recognize what’s in that picture and search on that (with, to be honest, varying degrees of success – when I uploaded my headshot, the tool searched for the terms “bald hill person”).
Text Analyzer can examine any of the following file types: csv, doc, docx, gif, htm, html, jpg, jpeg, json, pdf, png, pptx, rtf, tif (tiff), txt, xlsx.
Text Analyzer works on text and PDF documents by extracting existing text. On image-based documents, the tool makes use of Optical Character Recognition (OCR). The identified text is then analyzed for topics and entities–people, places and organizations and leverages the controlled vocabulary of JSTOR’s Thesaurus.
About Joyce Valenza
Joyce is an Assistant Professor of Teaching at Rutgers University School of Information and Communication, a technology writer, speaker, blogger and learner. Follow her on Twitter: @joycevalenza
SLJ Blog Network