Web Usage Mining also known as Web Log Mining is the result of user interaction with a Web
server including Web logs click streams and database transaction or the visits of search
engine crawlers at a Website. Log files provide an immense source of information about the
behavior of users as well as search engine crawlers. Web Usage Mining concerns the usage of
common browsing patterns i.e. pages requested in sequence from Web logs. These patterns can be
utilized to enhance the design and modification of a Website. Analyzing and discovering user
behavior is helpful for understanding what online information users inquire and how they
behave. The analyzed result can be used in intelligent online applications refining Websites
improving search accuracy when seeking information and lead decision makers towards better
decisions in changing markets for instance by putting advertisements in ideal places.
Similarly the crawlers or spiders are accessing the Websites to index new and updated pages.
These traces help to analyze the behavior of search engine crawlers. The log files are
unstructured files and of huge size. These files need to be extracted and pre-processed before
any data mining functionality to follow. Pre-processing is done in unique ways for each
application. Two pre-processing algorithms are proposed based on indiscernibility relations in
rough set theory which generates Equivalence Classes. The first algorithm generates a
pre-processed file with successful user requests while the second one generates a pre-processed
file for pre-fetching and caching purposes. Two algorithms are proposed to extract usage
analytics. The first algorithm identifies the origin of visits the top referring sites and the
most popular keywords used by the visitor to arrive at a Website. The second algorithm extracts
user agents like browsers and operating systems used by a visitor to access a Website. In this
study clustering of users based on Entry Pages to a Website is done to analyze the deep linked
traffic at a Website. The Top Ten Entry Pages the traffic and the temporal information of the
Top Ten Entry Pages are also studied.