Title | ICM561 CASE Study IM2494ST1 |
---|---|
Author | ANIQ DANIEL RASHID |
Course | Web Archiving |
Institution | Universiti Teknologi MARA |
Pages | 22 |
File Size | 816.6 KB |
File Type | |
Total Downloads | 396 |
Total Views | 630 |
Download ICM561 CASE Study IM2494ST1 PDF
FACULTY OF INFORMATION MANAGEMENT
BACHELOR OF INFORMATION SCIENCE (HONS.) CONTENT MANAGEMENT (IM249)
WEB ARCHIVING (ICM561)
INDIVIDUAL ASSIGNMENT: CASE STUDY
GROUP: IM2494ST1
PREPARED BY: ANIQ DANIEL BIN RASHID (2019219582)
PREPARED FOR: DR FARRAH DIANA BINTI SAIFUL BAHRY
DATE OF SUBMISSION: 3RD JUNE 2021
ACKNOWLEDGEMENT
Praise be to Allah, the Most Gracious, the Most Merciful. Thank Allah for the grace and help of Almighty God I have been able to complete this individual assignment successfully. All praise and thanks to Allah for the pleasure bestowed upon me to complete this assignment. I also would like to thank my lecturer, Dr Farrah Diana Binti Saiful Bahry from the Faculty of Information Management. I am very grateful to her for the guidance given during the period to completed this assignment. With that guidance, I am able to complete this assignment within the calculated time. I also would like to thank all my friends for the great help during the process of completing this assignment. Every time I didn’t understand something, they always teach, support and guide me until the assignment is done. Lastly, not forgetting to thank my family members who always gave me encouragement and support to complete this assignment. I hope the mountains of support from them will lead me to good things and worthy.
TABLE OF CONTENT
1.0
WEBSITE SELECTION………………………………………………………
2-3
1.1 SCOPE…………………………………………………………………….
4
WEB ARCHIVING PROJECTS OBJECTIVES……………………………
4
2.1 PROBLEM STATEMENT……………………………………………….
5
3.0
IMPLEMENTATION WORKFLOW…………………………………………
6-7
4.0
PREFERRED WEB ARCHIVING TOOLS…………………………………
8
5.0
PRELIMINARY STUDY AND FINDINGS………………………………….
9-15
5.1 WEBSITE INFRASTRUCTURE AND PLATFORM IDENTIFICATION…………………………………………………………….
9-11
5.2 WEBSITE ARCHIVE EXPLORATION………………………………..
12-15
6.0
DISCUSSION OF WEB ARCHIVE FINDINGS……………………………
16
7.0
RECOMMENDATION OF REPOSITORY AND WEB ARCHIVE CONTENT UTILIZATION…………………………………………………………………
8.0
CONCLUSION………………………………………………………………..
19
REFERENCES………………………………………………………………..
20
2.0
17-18
1.0 WEBSITE SELECTION Many things that memory institutions collect, like scholarly publications, campaign materials, works of art, government documents, correspondence, and news, are now only available on the Web. Web pages are increasingly dynamic, they are constantly changing. To make sure this content survives for the next generation, it must be captured in real-time. Web archiving is the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use. Web archiving is important not only for future research but also for organisations' records management processes. There are technical, organisational, legal and social issues that Web archivists need to address, some general and some specific to types of content or archiving operations of a given scope. Terms such as appraisal and selection, acquisition, organization and storage, description and metadata, access, use and conclusion are related to web archiving. In my case, based on the figure that I chose, which is Zen Cho, all related websites are listed as shown below.
Website Name
Scope of Content
Website Authoriza tion and Owner
Website’s URL
https://zencho.org/
Zen ChoFantasy Author Goodrea ds
Local
Writer Sites
Local/Internat ional
Amazon
3
TheStar
Local
Star Media Group
4
Barnes& Noble
Local/Internat ional
Barnes&N oble
5
Malaysia kini
Local
Malaysiaki ni
https://www.malaysiakini.com/news/329672
6
Options
Local
The Edge Malaysia
https://www.optionstheedge.com/topic/culture /malaysia-born-author-zen-cho-wins-hugoawards-sci-fi-novelette
7
Sine Theta Magazin e
Local/Internat ional
Sino Diaspora
https://sinethetamag.medium.com/conversati on-zen-cho-20dcc6e443d2
1 2
https://www.goodreads.com/author/show/463 2661.Zen_Cho https://www.thestar.com.my/lifestyle/culture/2 020/08/23/zen-cho-rolls-out-a-charmingwuxia-influenced-tale-of-bandits-and-obscuremalaysian-history https://www.barnesandnoble.com/blog/sci-fifantasy/in-which-author-zen-cho-isinterviewed-by-her-husband-about-the-truequeen/
2
8
9
10
RojakDai ly Los Angeles Public Library Pan Macmilla n
11
Independ ent
12
BFM
13
sfadb
14
Uncanny
15 16
Locusma g Banana Writers
17
Fatbidin
18
Tatler
19
Bitchmed ia
20
Eksentrik a
Local
RojakDaily
https://rojakdaily.com/lifestyle/article/10025/th e-art-of-turning-local-hantu-stories-intofantasy-novels-according-to-m-sian-authorzen-cho
International
Los Angeles Public Library
https://www.lapl.org/collectionsresources/blogs/lapl/interview-author-zen-cho
International
Pan Macmillan
https://www.panmacmillan.com/authors/zencho/the-true-queen/9781509801077
https://www.independent.co.uk/artsentertainment/books/features/zen-choInternational tackling-questions-race-gender-and-socialjustice-fantasy-fiction-10496819.html https://www.bfm.my/podcast/eveningInternational BFM edition/by-the-book/by-the-book-zen-cho-andher-second-novel-syndrome International Sfadb http://www.sfadb.com/Zen_Cho Clockpunk https://uncannymagazine.com/article/firstLocal Studio witch-damansara/ https://locusmag.com/2019/05/gary-k-wolfeInternational Locusmag reviews-the-true-queen-by-zen-cho/ Banana https://www.bananawriters.com/zenchointervi Local Writers ew https://fatbidin.com/2016/12/06/siapa-kataLocal Fatbidin darah-seni-tidak-boleh-diwarisi/ https://my.asiatatler.com/society/the-firstLocal Tatler malaysian-women-honoured-at-theseacclaimed-international-awards https://www.bitchmedia.org/post/ordinaryBitchmedi malaysians-meeting-magical-creatures-isntLocal a my-only-interest-in-life-a-qa-with-zen-cho https://eksentrika.com/zen-cho-not-zenInternational Eksentrika writing-sequel/ Table 1: List of Website Selection related to Zen Cho Independe ntt
Table above is about the complete list of websites that related to Malaysian Author, Zen Cho. The contents from the list contains local and international information, which means, there are some sites that is full with oversea community. The reasons why these websites are being chosen is to archived several data about her background, career development, achievements and contribution to the nation. These links later will be used in platform identification and website archive exploration using two web identification platform which are xml-sitemaps and builtwith, and two web archiving tools, which are HTTrack and Conifer.
3
1.1 SCOPE The theme set for this case study is Malaysian Author and I chose local novelist named, Zen Cho. Although this is about the local author, but the content of the website and her achievements during his prime was widely known among novelists and readers community local and international. Zen Cho, to be exact, should be acknowledge more among readers community because she is one of Malaysian who managed to achieve something honorable and she did it in the field of novelist. She is also the one who rewrite the Malaysian folk tale story about ghost or mystery into fantasy and adventure novel series. Because of its specialty and brand new mystery, horror and thriller story, it gained a lot of attention in United Kingdom and that is where she starts to gain popularity and make Malaysian proud.
2.0 WEB ARCHIVING PROJECTS OBJECTIVES There are three objectives of archiving website in this case study. Firstly, is to preserve all websites that contains information about Zen Cho. These included her background, education, and any other basic profile data about her. Secondly, is to maintain the acknowledgement gained by Zen Cho from the international community. These also included her achievements during her prime in United Kingdom. Lastly, is to make it accessible to future researchers. The third objective is an absolute core to my case study, as it is related to web archiving exploration using web archiving tools. In order to maintain the efficiency of my objectives, there will be a section where the website itself will be identified using two sites available online, which are xml-sitemaps and builtwith. After the identification, there will be another two tools for web archiving, which are HTTrack and Conifer. However, tools for web archiving will be compared at the end of this case study to show which tool is capable for web archiving. All three objectives stated in previous paragraph is simply a sign that the important person that I chose, which is, Zen Cho, a Malaysian author, needs an attention regarding her honorable mention and her contribution to the nation. This was followed by her series of novel collection, that contains a fantasy story including prequel and sequel. If websites about Zen Cho and her achievements have not been archived, all of her great history will be gone and the future researcher would have had a hard time to do a research about Zen Cho.
4
2.1 PROBLEM STATEMENT Zen Cho is a Malaysian fantasy author that based in United Kingdom. She’s inspired by local stories and make it up into novel form which has granted her several awards at international level. These several achievements, received by Zen Cho at international level, have been recognized as the top Malaysian woman with the greatest honor. Even though she was recognized at oversea, still there are certain community whom did not even know the existence of this fantasy author especially younger generation and students. Therefore, through web archiving tools that is ready to use, any websites that contain information about Zen Cho, will be archived. In any case in the near future, where researcher would like to do a research about her, then, they can explore the link or platform that archived all related websites.
5
3.0 IMPLEMENTATION WORKFLOW
Figure 1: personal workflow for case study
6
In this case study, workflow in web archiving process is actually related to Web Archiving Life Cycle Model. For introduction, The technological tools for archiving the web have been evolving steadily for more than a decade. However, best practices and a common model of web archiving have yet to emerge. The Web Archiving Life Cycle Model is an attempt to incorporate the technological and programmatic arms of web archiving into a framework that will be relevant to any organization seeking to archive the web. Archive-It, the leading web archiving service in the community, developed this model based on its work with memory institutions around the world. The appraisal and selection is the first phase where the institution or individual, choosing a specific website that suits to their own objectives. While in this case study, the websites that has been chosen, which around 20 websites in total, will be continue in the process later. But before completing the selection, random search on website that related to Zen Cho has been done and total 25 websites has been found. Using individual judgement and following the criteria set by the lecturer, the lists has been reduced to 20 websites. Next, the scoping process is actually the step to harvest a website. But depending on the partners or an organization, whether they want to scoping the entire website or only the nested URL with specific content only. But in my case study, all chosen websites, will be crawled for their specific content only in order to save a space in device or online storage. For example, in Zen Cho portfolio website, content about her background only will be crawled while the data about others will not include in the process. For the data capture process, it is important for any individual or an organization to test their websites first. This is related to the sizes of the selected website because the efficiency of web archiving tools are heavily depending on it. For example, in my case study, there are certain websites that contains bigger sizes as much as 1GB. Because of that, the mirroring process on HTTrack will be skip to several part and stop when the specific content is already mirrored. After the selected websites has been archived, the quality in terms of content and completeness. That is why in my case study, there are two main tables which the first one is table about the identification of websites, and the other is about the exploration of web archiving tools. These tables will provide information about the websites whether its about sizes, pages archived, duration, topic and so on. During this phase, all data and limitations about web archiving tools will be include in case study report followed by feedback and recommendation.
7
4.0 PREFERRED WEB ARCHIVING TOOLS Web archiving is important as a gold because it can archive and safely kept websites in an online storage or also known as repository. Since the evolution of internet, there are several websites that has been removed because of sensitivity or in the simplest way, outdated. For example, in the United States, usually before election there are plenty of website that contains the manifesto and other data. But after the closing of election, these websites will be terminated. It is difficult for researcher there to seek information regarding that. That is why there are so many web archiving tools exist in order to secure important websites for important party such as government, lectures, institution and student. These tools, however, have their own advantages and limitations but still depending on the sizes of the websites as well. Whether it is beneficial or little advantage, still the functions are all same, which is to archive any selected websites on the internet. HTTrack and Conifer are great in archive any websites on the internet. The functions might be different as well as the interface, and both requires strong internet connection. In my point of view, Conifer would be a great tool to archive any websites. Firstly, the link of the chosen website can be put in an empty bar, and it will automatically record the whole site. The size of each session is depending on the activity of the users during the record session. For instance, during the archive activity, if users click another internal link in that website, the number of size record will increase. Usually, it did not take long for the session to end even though the website size is big. To summarize, Conifer is a tool to create high-fidelity, interactive captures of any website users’ browse and a platform to make those captured websites accessible. It offers a limited free tier 5GB of storage space with some networking quota restrictions. Access to collections that users made public is always free of charge and unlimited. Conifer can also perform complicated actions on complex website.
8
5.0 PRELIMINARY STUDY AND FINDINGS All 20 websites that has been explored are listed according to selected column and data as shown in two section below.
5.1 WEBSITE INFRASTRUCTURE AND PLATFORM IDENTIFICATION
Website Name & URL 1. Zen Cho https://zencho.org/ 2. Goodreads https://www.goodreads.com /author/show/ 4632661.Zen_Cho 3. TheStar https://www.thestar.com.my /lifestyle/culture/2020/08/23/ zen-chorolls-out-a-charming-wuxia -influencedtale-of-bandits-and-obscure-malaysianhistory 4. Barnes and Noble sci-fi fantasy blog https://www.barnesandnoble.com/blog/ sci-fi-fantasy/in-which-author-zen-chois-interviewed-by-her-husband-aboutthe-true-queen/ 5. malaysiakini https://www.malaysiakini.com/news/329 672 6. Options https://www.optionstheedge.com/topic/c ulture/malaysia-born-author-zen-chowins-hugo-awards-sci-fi-novelette
Content Operating No of No of Web SSL Website type Management System & indexed crawled Content Certificates System Servers pages pages Sizes SSL by Portfolio WordPress IPv6 36 53 2.47MB Default 15 to 19 ZMS News 500 502 78.89MB ccTLD Amazon SSL Publishing Redirects
News
Zoho Books
Amazon Route 53
LetsEncrypt
1
1
0.56MB
Blog
WordPress
Akamai DNS
DigiCertSSL
1
1
0.06MB
News
Atlassian Cloud
Cloudflare DNS
Cloudflare SSL
76
76
9.41MB
News
Drupal 7
IPv6
Cloudflare SSL
7
7
0.36MB
9
7. Sine Theta Magazine https://sinethetamag.medium.com/conv ersation-zen-cho-20dcc6e443d2 8. RojakDaily https://rojakdaily.com/lifestyle/article/10 025/the-art-of-turning-local-hantustories-into-fantasy-novels-accordingto-m-sian-author-zen-cho 9. Los Angeles Public Library https://www.lapl.org/collectionsresources/blogs/lapl/interview-authorzen-cho 10. Pan Macmillan https://www.panmacmillan.com/authors/ zen-cho/the-truequeen/9781509801077 11. Independent https://www.independent.co.uk/artsentertainment/books/features/zen-chotackling-questions-race-gender-andsocial-justice-fantasy-fiction10496819.html 12. BFM https://www.bfm.my/podcast/eveningedition/by-the-book/by-the-book-zencho-and-her-second-novel-syndrome 13. sfadb http://www.sfadb.com/Zen_Cho 14. Uncanny Magazine https://uncannymagazine.com/article/fir st-witch-damansara/ 15. Locusmag https://locusmag.com/2019/05/gary-kwolfe-reviews-the-true-queen-by-zencho/
Blog Magazine
Medium
Amazon Route 53
Cloudflare SSL
78
78
11.04MB
Blog
Kentico Xperience
Amazon Route 53
Sectigo SSL
1
2
0.11MB
Web Portal
Drupal 7
Apache
SSL by Default
429
485
44.24MB
Business Website
Netlify
Cloudflare DNS
LetsEncrypt
2
2
0.43MB
News
Escenic
Ubuntu
GlobalSign
1
3
0.33MB
News
October CMS
mod_pagesp eed
SSL by Default
2
2
0.09MB
Directory
WordPress
Debian
LetsEncrypt
109
122
7.51MB
Blog
WordPress
IPv6
Cloudflare SSL
1
1
0.07MB
Blog
WordPress
Debian
GoDaddy SSL
1
1
0.12MB
10
16. Banana Writers SSL by https://www.bananawriters.com/zenchoi Blog Wix Premium Wix DNS Default nterview 17. Fatbidin WordPress Blog WordPress LetsEncrypt https://fatbidin.com/2016/12/06/siapaDNS kata-darah-seni-tidak-boleh...