Academic research related to the Open Directory Project. The listed research papers may quote ODP as an example for a large web directory, they may describe studies based on ODP data, tests for which ODP data were used or they may focus on ODP itself.
By A. Maguitman, F. Menczer, F. Erdinc, H. Roinestad and A. Vespignani, Indiana University. In: World Wide Web, Volume 9, Issue 4, 2006. An information-theoretic measure of semantic similarity between pages exploiting both hierarchical and non-hierarchical ODP structure improves on taxonomy-based approaches.
By Inderjit S. Dhillon, Subramanyam Mallela and Rahul Kumar, University of Texas, Austin, USA. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002. The authors propose a new information-theoretic divisive algorithm for word clustering applied to text classification. Experimental results are based on a 20 Newsgroups data set and a 3-level hierarchy of HTML documents collected from ODP´s Science toplevel.
Ph.D thesis by Evgeniy Gabrilovich, Technion Israel Institute of Technology.
By Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen and Wei-Ying Ma. In: Proceedings of the 28th Annual International ACM SIGIR Conference, August 2005. The authors propose a ranking scheme named Affinity Ranking (AR). Yahoo, ODP and newsgroup data are used for the experiments.
By Stephen L. Reed and Douglas B. Lenat, Cycorp Inc., Austin, USA, 2002. The authors present the process by which several ontologies have been mapped or integrated with Cyc, a large commonsense knowledge base, over 15 years. ODP was among the chosen ontologies but was removed because the constant enhancements in the directory created a high maintenance burden.
By Adam L. Berger, Carnegie Mellon University, and Vibhu O. Mittal, Just Research, Pittsburgh, USA. In: Proceedings of the 23rd Annual International ACM SIGIR Conference, 2000. Probabilistic models are used to select and order words into a gist. The paper describes a technique for learning these models automatically from a collection of human-summarized web pages, the authors used ODP data for this purpose.
By Ruj Akavipat, Le-Shin Wu and Filippo Menczer. Collaborative peer network applications offer a possible solution for the scalability limitations of centralized search engines. The authors suggest to use a local adaptive routing algorithm to dynamically change the topology of the peer network based on a learning scheme driven by query response interactions among neighbors. ODP data are used to model simulated users.
By Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis and Karsten Tolle. 2001. The ODP RDF dump was used as a testbed for a suite of tools for RDF validation, storage and querying.
By Charles L. A. Clarke, University of Waterloo, Eugene Agichtein, Emory University, and Susan Dumais and Ryen W. White, Microsoft. In: Proceedings of the 30th Annual International ACM SIGIR Conference, July 2007. The results of the study suggest that relatively simple caption features such as the presence of all terms query terms, the readability of the snippet, and the length of the URL shown in the caption, can significantly influence users´ Web search behavior. The experiments are based on the Windows Live search engine, which may use ODP titles and descriptions when generating captions.
By Paul Alexandru Chirita, Wolfgang Nejdl, Raluca Paiu and Christian Kohlschütter, L3S and University of Hannover, Germany. In: Proceedings of the 28th Annual International ACM SIGIR Conference, August 2005. The paper discusses how ODP metadata can be exploited to achieve high quality personalized web search.
By Zoltan Gyongyi and Hector Garcia-Molina, Stanford University, and Jan Pedersen, Yahoo. Technical Report, June 2006. Introduces a link-based approach to classification, which can be used in isolation or in conjunction with text-based classification. The Yahoo web index and ODP are used for the experiments.
By Jian-Tao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu and Zheng Chen. In: Proceedings of the 28th Annual International ACM SIGIR Conference, August 2005. The authors propose two adapted summarization methods that take advantage of the relationships discovered from clickthrough data. For those pages not covered by clickthrough data, they put forward a thematic lexicon approach to generate implicit knowledge. The methods are evaluated on a relatively small dataset consisting of manually annotated pages as well as a large dataset crawled from ODP.
Thanks to DMOZ, which built a great web directory for nearly two decades and freely shared it with the web. About us