WorryFree Computers   »   [go: up one dir, main page]

skip to main content
10.1145/3355369.3355587acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article
Open Access

ShamFinder: An Automated Framework for Detecting IDN Homographs

Published:21 October 2019Publication History

ABSTRACT

The internationalized domain name (IDN) is a mechanism that enables us to use Unicode characters in domain names. The set of Unicode characters contains several pairs of characters that are visually identical with each other; e.g., the Latin character 'a' (U+0061) and Cyrillic character 'a' (U+0430). Visually identical characters such as these are generally known as homoglyphs. IDN homograph attacks, which are widely known, abuse Unicode homoglyphs to create lookalike URLs. Although the threat posed by IDN homograph attacks is not new, the recent rise of IDN adoption in both domain name registries and web browsers has resulted in the threat of these attacks becoming increasingly widespread, leading to large-scale phishing attacks such as those targeting cryptocurrency exchange companies. In this work, we developed a framework named "ShamFinder," which is an automated scheme to detect IDN homographs. Our key contribution is the automatic construction of a homoglyph database, which can be used for direct countermeasures against the attack and to inform users about the context of an IDN homograph. Using the ShamFinder framework, we perform a large-scale measurement study that aims to understand the IDN homographs that exist in the wild. On the basis of our approach, we provide insights into an effective countermeasure against the threats caused by the IDN homograph attack.

References

  1. Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. 2015. Seven Months' Worth of Mistakes: A Longitudinal Study of Typosquatting Abuse. In Proc. Network and Distributed System Security Symposium (NDSS). http://www.internetsociety.org/doc/seven-months%E2%80%99-worth-mistakes-longitudinal-study-typosquatting-abuseGoogle ScholarGoogle ScholarCross RefCross Ref
  2. Alexa Top Sites [n. d.]. Alexa Top Sites. https://aws.amazon.com/alexa-top-sites/.Google ScholarGoogle Scholar
  3. Binance. [n. d.]. Summary of the Phishing and Attempted Stealing Incident on Binance. https://support.binance.com/hc/en-us/articles/360001547431.Google ScholarGoogle Scholar
  4. Daiki Chiba, Mitsuaki Akiyama, Takeshi Yagi, Kunio Hato, Tatsuya Mori, and Shigeki Goto. 2018. DomainChroma: Building actionable threat intelligence from malicious domain names. Computers & Security 77 (2018), 138--161. https://doi.org/10.1016/j.cose.2018.03.013Google ScholarGoogle ScholarCross RefCross Ref
  5. Daiki Chiba, Ayako Akiyama Hasegawa, Takashi Koide, Yuta Sawabe, Shigeki Goto, and Mitsuaki Akiyama. 2019. DomainScouter: Understanding the Risks of Deceptive IDNs. In Proc. Research in Attacks, Intrusions and Defenses (RAID). 413--426.Google ScholarGoogle Scholar
  6. Adam M. Costello. 2003. Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA). RFC 3492. https://doi.org/10.17487/RFC3492Google ScholarGoogle Scholar
  7. DOMAINLISTS.IO. [n. d.]. Lists of all domains updated daily. https://domainlists.io/.Google ScholarGoogle Scholar
  8. Patrik Fältström. 2019. IDNA2008 and Unicode 12.0.0. Internet-Draft draft-faltstrom-unicode12. Internet Engineering Task Force. https://tools.ietf.org/html/draft-faltstrom-unicode12-00 Work in Progress.Google ScholarGoogle Scholar
  9. Farsight DNSDB [n. d.]. Farsight DNSDB. https://www.farsightsecurity.com/solutions/dnsdb/.Google ScholarGoogle Scholar
  10. Patrik Fältström. 2010. The Unicode Code Points and Internationalized Domain Names for Applications (IDNA). RFC 5892. https://doi.org/10.17487/RFC5892Google ScholarGoogle Scholar
  11. Patrik Fältström and Paul E. Hoffman. 2003. Internationalizing Domain Names in Applications (IDNA). RFC 3490. https://doi.org/10.17487/RFC3490Google ScholarGoogle Scholar
  12. Evgeniy Gabrilovich and Alex Gontmakher. 2002. The homograph attack. Commun. ACM 45, 2 (2002), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Google Noto Fonts [n. d.]. Google Noto Fonts. https://www.google.com/get/noto/.Google ScholarGoogle Scholar
  14. Google Safe Browsing [n. d.]. Google Safe Browsing. https://developers.google.com/safe-browsing/.Google ScholarGoogle Scholar
  15. Tobias Holgers, David E. Watson, and Steven D. Gribble. 2006. Cutting through the Confusion: A Measurement Study of Homograph Attacks. In Proc. USENIX Annual Technical Conference (ATC). 261--266. http://www.usenix.org/events/usenix06/tech/holgers.htmlGoogle ScholarGoogle Scholar
  16. Alain Horé and Djemel Ziou. 2010. Image Quality Metrics: PSNR vs. SSIM. In Proc. Int. Conf. Pattern Recognition (ICPR). 2366--2369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. hpHosts [n. d.]. hpHosts. http://www.hosts-file.net/.Google ScholarGoogle Scholar
  18. IDN World Report. [n. d.]. IDN Totals by Year. https://idnworldreport.eu/charts/idn-totals-by-year/.Google ScholarGoogle Scholar
  19. Internationalization of Domain Names. [n. d.]. https://tools.ietf.org/html/draft-duerst-dns-i18n-00.Google ScholarGoogle Scholar
  20. langid.py [n. d.]. langid.py. https://github.com/saffsd/langid.py.Google ScholarGoogle Scholar
  21. Baojun Liu, Chaoyi Lu, Zhou Li, Ying Liu, Haixin Duan, Shuang Hao, and Zaifeng Zhang. 2018. A Reexamination of Internationalized Domain Names: The Good, the Bad and the Ugly. In Proc. IEEE/IFIP Dependable Systems and Networks (DSN). 654--665.Google ScholarGoogle ScholarCross RefCross Ref
  22. Majestic Million [n. d.]. Majestic Million. https://majestic.com/reports/majestic-million.Google ScholarGoogle Scholar
  23. Mozilla. [n. d.]. IDN Display Algorithm. https://wiki.mozilla.org/IDN_Display_Algorithm.Google ScholarGoogle Scholar
  24. Victor Le Pochat, Tom van Goethem, and Wouter Joosen. 2019. Funny Accents: Exploring Genuine Interest in Internationalized Domain Names. In Proc. Passive and Active Measurement Conference (PAM). 178--194. https://doi.org/10.1007/978-3-030-15986-3_12Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Puppeteer [n. d.]. Puppeteer. https://pptr.dev/.Google ScholarGoogle Scholar
  26. Florian Quinkert, Tobias Lauinger, William Robertson, Engin Kirda, and Thorsten Holz. 2019. It's Not What It Looks Like: Measuring Attacks and Defensive Registrations of Homograph Domains. In Proc. IEEE Conf. Communications and Network Security (CNS). 259--267.Google ScholarGoogle ScholarCross RefCross Ref
  27. Repository of IDN Practices. [n. d.]. https://www.icann.org/resources/pages/idn-guidelines-2003-06-20-en.Google ScholarGoogle Scholar
  28. Repository of IDN Practices. [n. d.]. https://www.iana.org/domains/idn-tables.Google ScholarGoogle Scholar
  29. Walter Rweyemamu, Tobias Lauinger, Christo Wilson, William K. Robertson, and Engin Kirda. 2019. Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research. In Proc. Passive and Active Measurement Conference (PAM). 161--177. https://doi.org/10.1007/978-3-030-15986-3_11Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yuta Sawabe, Daiki Chiba, Mitsuaki Akiyama, and Shigeki Goto. 2019. Detection Method of Homograph Internationalized Domain Names with OCR. Journal of Information Processing (JIP) 27, 5 (2019).Google ScholarGoogle Scholar
  31. Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists. In Proc. ACM Internet Measurement Conference (IMC). 478--493. https://dl.acm.org/citation.cfm?id=3278574Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. shamfinder [n. d.]. shamfinder. https://github.com/shamfinder/shamfinder.Google ScholarGoogle Scholar
  33. Symantec. [n. d.]. DeepSight Intelligence. https://www.symantec.com/services/cyber-security-services/deepsight-intelligence.Google ScholarGoogle Scholar
  34. Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Márk Félegyházi, and Chris Kanich. 2014. The Long "Taile" of Typosquatting Domain Names. In Proc. USENIX Security Symposium. 191--206. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/szurdiGoogle ScholarGoogle Scholar
  35. The Chromium Projects. [n. d.]. IDN in Google Chrome. https://www.chromium.org/developers/design-documents/idn-in-google-chrome.Google ScholarGoogle Scholar
  36. The Unicode Consortium. [n. d.]. Confusables Data Collection. http://unicode.org/reports/tr39/.Google ScholarGoogle Scholar
  37. Ke Tian, Steve T. K. Jan, Hang Hu, Danfeng Yao, and Gang Wang. 2018. Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild. In Proc. ACM Internet Measurement Conference (IMC). 429--442. https://dl.acm.org/citation.cfm?id=3278569Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Unicode fonts [n. d.]. Unicode fonts. https://en.wikipedia.org/wiki/List_of_typefaces.Google ScholarGoogle Scholar
  39. Unicode Inc. [n. d.]. Unicode 12.0.0. http://unicode.org/versions/Unicode12.0.0/.Google ScholarGoogle Scholar
  40. Unifoundry.com. [n. d.]. http://unifoundry.com/unifont/index.html.Google ScholarGoogle Scholar
  41. U.S. Department of Labor. [n. d.]. Minimum Wage Laws in the States. https://www.dol.gov/whd/minwage/america.htm.Google ScholarGoogle Scholar
  42. Verisign. [n. d.]. Top-Level Domain Zone File Information. https://www.verisign.com/en_US/channel-resources/domain-registry-products/zone-file/index.xhtml.Google ScholarGoogle Scholar
  43. VirusTotal [n. d.]. VirusTotal. https://www.virustotal.com/.Google ScholarGoogle Scholar
  44. Thomas Vissers, Wouter Joosen, and Nick Nikiforakis. 2015. Parking Sensors: Analyzing and Detecting Parked Domains. In Proc. Network and Distributed System Security Symposium (NDSS). https://www.ndss-symposium.org/ndss2015/parking-sensors-analyzing-and-detecting-parked-domainsGoogle ScholarGoogle ScholarCross RefCross Ref
  45. Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing 13, 4 (2004), 600--612. https://doi.org/10.1109/TIP.2003.819861Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xudong Zheng. 2017. Phishing with Unicode Domains. https://www.xudongz.com/blog/2017/idn-phishing/.Google ScholarGoogle Scholar

Index Terms

  1. ShamFinder: An Automated Framework for Detecting IDN Homographs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IMC '19: Proceedings of the Internet Measurement Conference
      October 2019
      497 pages
      ISBN:9781450369480
      DOI:10.1145/3355369

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 October 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      IMC '19 Paper Acceptance Rate39of197submissions,20%Overall Acceptance Rate277of1,083submissions,26%

      Upcoming Conference

      IMC '24
      ACM Internet Measurement Conference
      November 4 - 6, 2024
      Madrid , AA , Spain

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader