ABSTRACT
The internationalized domain name (IDN) is a mechanism that enables us to use Unicode characters in domain names. The set of Unicode characters contains several pairs of characters that are visually identical with each other; e.g., the Latin character 'a' (U+0061) and Cyrillic character 'a' (U+0430). Visually identical characters such as these are generally known as homoglyphs. IDN homograph attacks, which are widely known, abuse Unicode homoglyphs to create lookalike URLs. Although the threat posed by IDN homograph attacks is not new, the recent rise of IDN adoption in both domain name registries and web browsers has resulted in the threat of these attacks becoming increasingly widespread, leading to large-scale phishing attacks such as those targeting cryptocurrency exchange companies. In this work, we developed a framework named "ShamFinder," which is an automated scheme to detect IDN homographs. Our key contribution is the automatic construction of a homoglyph database, which can be used for direct countermeasures against the attack and to inform users about the context of an IDN homograph. Using the ShamFinder framework, we perform a large-scale measurement study that aims to understand the IDN homographs that exist in the wild. On the basis of our approach, we provide insights into an effective countermeasure against the threats caused by the IDN homograph attack.
- Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. 2015. Seven Months' Worth of Mistakes: A Longitudinal Study of Typosquatting Abuse. In Proc. Network and Distributed System Security Symposium (NDSS). http://www.internetsociety.org/doc/seven-months%E2%80%99-worth-mistakes-longitudinal-study-typosquatting-abuseGoogle ScholarCross Ref
- Alexa Top Sites [n. d.]. Alexa Top Sites. https://aws.amazon.com/alexa-top-sites/.Google Scholar
- Binance. [n. d.]. Summary of the Phishing and Attempted Stealing Incident on Binance. https://support.binance.com/hc/en-us/articles/360001547431.Google Scholar
- Daiki Chiba, Mitsuaki Akiyama, Takeshi Yagi, Kunio Hato, Tatsuya Mori, and Shigeki Goto. 2018. DomainChroma: Building actionable threat intelligence from malicious domain names. Computers & Security 77 (2018), 138--161. https://doi.org/10.1016/j.cose.2018.03.013Google ScholarCross Ref
- Daiki Chiba, Ayako Akiyama Hasegawa, Takashi Koide, Yuta Sawabe, Shigeki Goto, and Mitsuaki Akiyama. 2019. DomainScouter: Understanding the Risks of Deceptive IDNs. In Proc. Research in Attacks, Intrusions and Defenses (RAID). 413--426.Google Scholar
- Adam M. Costello. 2003. Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA). RFC 3492. https://doi.org/10.17487/RFC3492Google Scholar
- DOMAINLISTS.IO. [n. d.]. Lists of all domains updated daily. https://domainlists.io/.Google Scholar
- Patrik Fältström. 2019. IDNA2008 and Unicode 12.0.0. Internet-Draft draft-faltstrom-unicode12. Internet Engineering Task Force. https://tools.ietf.org/html/draft-faltstrom-unicode12-00 Work in Progress.Google Scholar
- Farsight DNSDB [n. d.]. Farsight DNSDB. https://www.farsightsecurity.com/solutions/dnsdb/.Google Scholar
- Patrik Fältström. 2010. The Unicode Code Points and Internationalized Domain Names for Applications (IDNA). RFC 5892. https://doi.org/10.17487/RFC5892Google Scholar
- Patrik Fältström and Paul E. Hoffman. 2003. Internationalizing Domain Names in Applications (IDNA). RFC 3490. https://doi.org/10.17487/RFC3490Google Scholar
- Evgeniy Gabrilovich and Alex Gontmakher. 2002. The homograph attack. Commun. ACM 45, 2 (2002), 128.Google ScholarDigital Library
- Google Noto Fonts [n. d.]. Google Noto Fonts. https://www.google.com/get/noto/.Google Scholar
- Google Safe Browsing [n. d.]. Google Safe Browsing. https://developers.google.com/safe-browsing/.Google Scholar
- Tobias Holgers, David E. Watson, and Steven D. Gribble. 2006. Cutting through the Confusion: A Measurement Study of Homograph Attacks. In Proc. USENIX Annual Technical Conference (ATC). 261--266. http://www.usenix.org/events/usenix06/tech/holgers.htmlGoogle Scholar
- Alain Horé and Djemel Ziou. 2010. Image Quality Metrics: PSNR vs. SSIM. In Proc. Int. Conf. Pattern Recognition (ICPR). 2366--2369.Google ScholarDigital Library
- hpHosts [n. d.]. hpHosts. http://www.hosts-file.net/.Google Scholar
- IDN World Report. [n. d.]. IDN Totals by Year. https://idnworldreport.eu/charts/idn-totals-by-year/.Google Scholar
- Internationalization of Domain Names. [n. d.]. https://tools.ietf.org/html/draft-duerst-dns-i18n-00.Google Scholar
- langid.py [n. d.]. langid.py. https://github.com/saffsd/langid.py.Google Scholar
- Baojun Liu, Chaoyi Lu, Zhou Li, Ying Liu, Haixin Duan, Shuang Hao, and Zaifeng Zhang. 2018. A Reexamination of Internationalized Domain Names: The Good, the Bad and the Ugly. In Proc. IEEE/IFIP Dependable Systems and Networks (DSN). 654--665.Google ScholarCross Ref
- Majestic Million [n. d.]. Majestic Million. https://majestic.com/reports/majestic-million.Google Scholar
- Mozilla. [n. d.]. IDN Display Algorithm. https://wiki.mozilla.org/IDN_Display_Algorithm.Google Scholar
- Victor Le Pochat, Tom van Goethem, and Wouter Joosen. 2019. Funny Accents: Exploring Genuine Interest in Internationalized Domain Names. In Proc. Passive and Active Measurement Conference (PAM). 178--194. https://doi.org/10.1007/978-3-030-15986-3_12Google ScholarDigital Library
- Puppeteer [n. d.]. Puppeteer. https://pptr.dev/.Google Scholar
- Florian Quinkert, Tobias Lauinger, William Robertson, Engin Kirda, and Thorsten Holz. 2019. It's Not What It Looks Like: Measuring Attacks and Defensive Registrations of Homograph Domains. In Proc. IEEE Conf. Communications and Network Security (CNS). 259--267.Google ScholarCross Ref
- Repository of IDN Practices. [n. d.]. https://www.icann.org/resources/pages/idn-guidelines-2003-06-20-en.Google Scholar
- Repository of IDN Practices. [n. d.]. https://www.iana.org/domains/idn-tables.Google Scholar
- Walter Rweyemamu, Tobias Lauinger, Christo Wilson, William K. Robertson, and Engin Kirda. 2019. Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research. In Proc. Passive and Active Measurement Conference (PAM). 161--177. https://doi.org/10.1007/978-3-030-15986-3_11Google ScholarDigital Library
- Yuta Sawabe, Daiki Chiba, Mitsuaki Akiyama, and Shigeki Goto. 2019. Detection Method of Homograph Internationalized Domain Names with OCR. Journal of Information Processing (JIP) 27, 5 (2019).Google Scholar
- Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists. In Proc. ACM Internet Measurement Conference (IMC). 478--493. https://dl.acm.org/citation.cfm?id=3278574Google ScholarDigital Library
- shamfinder [n. d.]. shamfinder. https://github.com/shamfinder/shamfinder.Google Scholar
- Symantec. [n. d.]. DeepSight Intelligence. https://www.symantec.com/services/cyber-security-services/deepsight-intelligence.Google Scholar
- Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Márk Félegyházi, and Chris Kanich. 2014. The Long "Taile" of Typosquatting Domain Names. In Proc. USENIX Security Symposium. 191--206. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/szurdiGoogle Scholar
- The Chromium Projects. [n. d.]. IDN in Google Chrome. https://www.chromium.org/developers/design-documents/idn-in-google-chrome.Google Scholar
- The Unicode Consortium. [n. d.]. Confusables Data Collection. http://unicode.org/reports/tr39/.Google Scholar
- Ke Tian, Steve T. K. Jan, Hang Hu, Danfeng Yao, and Gang Wang. 2018. Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild. In Proc. ACM Internet Measurement Conference (IMC). 429--442. https://dl.acm.org/citation.cfm?id=3278569Google ScholarDigital Library
- Unicode fonts [n. d.]. Unicode fonts. https://en.wikipedia.org/wiki/List_of_typefaces.Google Scholar
- Unicode Inc. [n. d.]. Unicode 12.0.0. http://unicode.org/versions/Unicode12.0.0/.Google Scholar
- Unifoundry.com. [n. d.]. http://unifoundry.com/unifont/index.html.Google Scholar
- U.S. Department of Labor. [n. d.]. Minimum Wage Laws in the States. https://www.dol.gov/whd/minwage/america.htm.Google Scholar
- Verisign. [n. d.]. Top-Level Domain Zone File Information. https://www.verisign.com/en_US/channel-resources/domain-registry-products/zone-file/index.xhtml.Google Scholar
- VirusTotal [n. d.]. VirusTotal. https://www.virustotal.com/.Google Scholar
- Thomas Vissers, Wouter Joosen, and Nick Nikiforakis. 2015. Parking Sensors: Analyzing and Detecting Parked Domains. In Proc. Network and Distributed System Security Symposium (NDSS). https://www.ndss-symposium.org/ndss2015/parking-sensors-analyzing-and-detecting-parked-domainsGoogle ScholarCross Ref
- Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing 13, 4 (2004), 600--612. https://doi.org/10.1109/TIP.2003.819861Google ScholarDigital Library
- Xudong Zheng. 2017. Phishing with Unicode Domains. https://www.xudongz.com/blog/2017/idn-phishing/.Google Scholar
Index Terms
- ShamFinder: An Automated Framework for Detecting IDN Homographs
Recommendations
Two template matching approaches to Arabic, Amharic and Latin isolated characters recognition
With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The ...
Development of an Assamese OCR using Bangla OCR
DAR '12: Proceeding of the workshop on Document Analysis and RecognitionThis paper refers to the development of an OCR for the Assamese language by modifying an existing OCR for the Bangla language. This modification is feasible because the Assamese script is similar, except for a few characters, to the Bangla script. The ...
Comments