This article is going to be a working progress, while parts be added and updated over time.
Be your own Google. Starting your own search engine is a cool and weirdly fun way to spend some time. Your reasons for starting a search engine could be for a company, school or group search portal, or maybe you wanna take on Google. Either way, I’ll show you how to build you own search engine, which will work in a similar way to Google. Obviously, you won’t get the same results as google because Google has its own techniques of searching.
Some points to concider:
- Do you have somewhere to store this data?
- How much bandwidth do you have available?
- Should you host at home?
As your crawler (The thing going around the web adding sites to your database) gathers information, more and more disk space will be needed to store this information. The crawler will be collecting the site name, site contents, size and some other peices of information about the websites it access’, but each site will only come in to around 1K as we’re not storing images, stylesheets or any big pieces of data.
I recommend, only because it is fun, setting up and running your server from home. The reason of this, is because i doubt your willing to spend much money into dedicated servers with Hosting providers at this stage. It also allows you to add more storage as you need it and you can configure your own server farm as it becomes required.
Bandwidth, the bitch that makes things sad, is going to be a problem. Some ISP’s offer unlimited downloads, but of course they mostly have policies on running your own webservers and some even block required ports for this happening. If bandwidth is going to be your problem, I will show in this tutorial further on how to cap your crawlers monthly bandwidth downloads so you don’t get any nasty bills down the road.
Getting started : Setting up your server.
Your fresh online business is getting ready, and at this point in time much computing power isn’t really required. If you have a spare old computer with a bit of space (10GB is okay, but obviously if you have a 500GB drive thats better). If you must, you can use your normal computer, but for security purposes having a dedicated PC is better.
So, turn it on. I don’t recommend having Vista on any machine, so strip that off and plob on a copy of Win XP. Only because it isn’t a huge disk user, actually WORKS and isn’t going to bum around asking you for your credentials all the time… Download a copy of XAMPP. XAMPP is a program which quite easily turns your machine into a webserver, adds PHP & allows databases to be created.
Now, you’ll need to setup your router to route port 80 requests to that machine. On your dedicated machine, give your computer a IP address which will stay the same (A dedicated IP), something like 192.168.1.5 or 1.0.0.5 depending on what your current setup is. Thats something for you to Google.
Once you have your machine setup so it’ll get the same IP from your router every time it turns on its time to setup your router so it will route web requests to this computer. Login to your routers admin, and based on different routers you’ll have to add an outbound rule. So, thats generally something like Port : 80 and IP : whatever you got your ded. machine to always get. I’ll try and make this part clearer if anyone doesn’t get it, leave a comment.
To test everything is working (Make sure XAMPP is running). Go to www.whatsmyipaddress.com this will give you an IP of your data connection. Now go onto a proxy server, such as www.proxem.info and type in your IP. If a XAMPP page comes up. Yoursetup. If not, you’ll have to go back and recheck. BTW: You had to use a proxy because you were in the same location as the server, but anyone else out of your data connection could just type your IP into the address bar.
Setting up : Homepage
Now were going to start setting up your homepage. As this page at this stage will only be used to testing the basic search functionality were not going to bother adding pages like “Add a link”, “Advanced Search” etc etc. Let’s not get ahead of ourselves. We’re going to have a basic frontpage like Google at this stage, if you want, you can design your own Logo and plop it in aswell.
<html>
<head>
<title>Your search engines name</title>
</head>
<body>
<form method="post" action="search.php">
<table align="center">
<tr><td><input type="text" name="query"></td><td><input type="submit" name="go" value="Search"></td></tr>
</table>
</form>
</body>
</html>
Now, if your not familiar with HTML, I seriously recommend you visiting www.w3schools.com/html in the near future. But, basically i’ll run you through what each line of the code does.
Line 1 : Telling the browser its HTML (Hypertext Markup Language).
Line 2: Head, the section where some general info is kept (Not the page contents)
Line 3: What the top says, for example the top of this page is “Hid3 -”
Line 4: Closing the head section
Line 5: Opening the body section, where content is.
Line 6: Telling the browser were writing a form (Input) and what to do when the “Search” button is pressed. Which is go to the search page.
Line 7: Make a table
Line 8: Show the input & submit button
Line 9, 10, 11 & 12: Closing off some stuff.
From now on, I won’t be explaining HTML code in this tutorial, so reading a HTML tutorial will really benefit you. Try changing your homepage
Thanks for reading. More on its way
Btw, I’d LOVE to see some of these fresh homepages, so send them as comments and yeah !
Angus McDowell PHP, Webdev
Recent Comments