Do you ever wonder what the Indiana University Archives is doing to capture the universities’ online presence such as web sites and social media?
Since 2005 we’ve been capturing and archiving exact dated copies of web sites produced by administrative offices, schools, departments, service units, institutes, and faculty, student, and alumni organizations on the Indiana University Bloomington campus using Archive-It, a service of the Internet Archive. Web pages are captured and preserved exactly as they appear at a given time, so that in the future, even if a website changes in appearance or is no longer online, users will be able to access exact copies of the site’s appearance and operation at the time of the capture. Essentially, this wonderful preservation tool keeps an “online paper trail” of the updates and progressions that sites have made through the years. For example, this is how the web site for the IU Libraries appeared in September 2007!
Until recently, however, there was one area of the web that the IU Archives had yet to tackle in its online archive: all of the various Indiana University-affiliated social media sites such as Twitter, Facebook, Instagram, et cetera. This summer, we’ve taken on the exciting project of crawling the University’s social media sites for the first time. With the completion of this project, a collection of all of Indiana University’s social media sites from 2017 onward will be made publicly available for future users to access!
Web crawlers (the technology that Archive-It uses to capture copies of websites) have a lot of important applications in online work. A crawler is essentially a software which acts as a URL discovery tool – when you give a crawler a URL to start with, it follows all of the links on that page, and then it follows any new links that it discovers on those pages, and so on. Ultimately, you should end up with a complete set of data about every page-within-a-page on a given website, depending on how much content you would like to capture. Crawlers are what search engines like Google and Bing use to gather and index information about websites and then retrieve a list of those sites when a search query is entered. Crawlers are also used by web developers to gather information from sites, which can then be used for all sorts of data analysis.
And of course, as demonstrated by our social media archiving project, crawlers are also very useful for the purposes of web-archiving, or capturing and saving information about how a website exists at any given time so that it can be used for reference in the future. The Archive-It.org platform is a great resource for doing this kind of work. They have an extensive and frequently-updated help center which includes a lot of useful reference pages, including a page with information about scoping crawls for specific types of social media sites.
In addition to the aforementioned Twitter, Facebook, and Instagram pages, we are also working on archiving any YouTube, Google Plus, LinkedIn, Flickr, and Pinterest pages that are associated with various departments, units, and groups within the Indiana University community. It is amazing to look at all of the different social media platforms that these organizations are utilizing in order to share great content and to interact with people from all over the world. We can imagine that the internet users of the future will be fascinated to see what these sites looked like and what everyone at Indiana University was talking about in 2017.
Check out Indiana University at Archive-It.org to access all of the recently archived Indiana University social media sites, along with captures of many other University web pages through the years!