Friday, 11 May 2018

MVC Questions

MVC is a

  • a useful way to organize code , markup and control flow
  • easy to learn
  • designed to help you build and maintain software better


MVC solves Separation of Concerns problem with introduction of below things

  • Models - Data and Business Logic
  • Views - Presentation and UI 
  • Controllers - User actions and Control Flow 
ActionResults: 
ActionResult is an abstract parent class from which ViewResult class has been derived. Following are different response types whcih can be send to end user.


  • ViewResult - Renders a specified view to the response stream
  • PartialViewResult - Renders a specified partial view to the response stream
  • EmptyResult - An empty response is returned
  • RedirectResult - Performs an HTTP redirection to a specified URL
  • RedirectToRouteResult - Performs an HTTP redirection to a URL that is determined by the routing engine, based on given route data
  • JsonResult - Serializes a given object to JSON format
  • JavaScriptResult - Returns a piece of JavaScript code that can be executed on the client
  • ContentResult - Writes content to the response stream without requiring a view
  • FileContentResult - Returns a file to the client
  • FileStreamResult - Returns a file to the client, which is provided by a Stream
  • FilePathResult - Returns a file to the client


Questions
I have created a controller method foo() but i want to call view bar when this method is called. How to achieve it 
Ans: In the foo method , while you are returning view pass parameter name of view  
Example : View('Bar');



Difference between ActionResult and ViewResult

public ActionResult Index()
{
      return View(); // this is a view result class
}

The above code means that you are returning a “ViewResult” object and due to polymorphism, this object is automatically type casted to the parent class type, i.e., “ActionResult”.

In polymorphism, the parent class object can point towards any one of its child class objects at runtime. For example, in the below code snippet, we have parent “Customer” class which is inherited by “GoldCustomer” and “SilverCustomer” class


public class Customer
{}

public class GoldCustomer : Customer
{}

public class SilverCustomer : Customer
{}

So down the line, later during runtime, your parent “Customer” object can point to any one of the child objects, i.e., “GoldCustomer” or “SilverCustomer”.

Customer obj = new Customer(); obj = new GoldCustomer(); obj = new SilverCustomer();

In MVC when request is sent the response can be anything like HTTP Response, Json Response etc..

MVC Team has created a general class called ActionResult which can be further inherited from different type of result 

Tuesday, 17 January 2017

API; what and why?

API; what and why?

Let’s start right at the start and figure out what an API is and why you should consider using one.
The term API stands for ‘Application Programming Interface’. In the world of web development the term ‘API’ is synonymous with online web services which client apps can use to retrieve and update data. These online services have had several names/formats over the years such as SOAP, however the current popular choice is to create a REST (or RESTful) API.
We’ll come onto REST and why REST is now preferred later, but first let’s examine why you would even bother with an API.
Let’s consider a modern application which may include several mobile apps on various platforms and usually some kind of web application too. Without an API, a basic architecture may look like this where each client app has its own embedded business logic.

Notice that each client app has its own embedded business logic (probably written in several different languages) which connects directly to the database to get, update and manipulate data. This local business logic means that the client apps can easily become complex and all need to be maintained in sync with each other. When a new feature is required, you’ll have to update each app accordingly. This can be a very expensive process which often leads to feature fragmentation, bugs and can really hold back innovation.
Now let’s consider the same architecture with a central API which hold all the business logic.

Each app uses the same API to get, update and manipulate data. All apps have feature parity and when you need to make a change you just make it in one place in line with the ‘Don’t Repeat Yourself’ (DRY) principle of software development. The apps themselves then become relatively lightweight UI layers.

Friday, 6 January 2017

Where to use HttpHandler in ASP.Net Application

In this article we will understand a few more scenarios where we can implement HttpHandler in an ASP.NET application.

Count number of request to certain page
Yes, there might be some other solution for this problem but we can use HttpHandler to count the number of hits to a certain page. Here is the implementation of trying to count the hits of a Home.aspx page. At first we are checking the request URL. If the URL contains “Home.aspx” then we will increment the count value by one.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
 
namespace WebApp
{
    public class countHandler : IHttpHandler
    {
        static int CountRequest = 0;
        public void ProcessRequest(HttpContext context)
        {
            if (context.Request.RawUrl.Contains("Home.aspx"))
            {
                CountRequest++;
            }
         }
        public bool IsReusable
        {
            get
            {
                return false;
            }
        }
    }
}
The handler is realy simple and now we need to register the handler. Add the following code to your web.config file.
</system.webServer>
  <handlers>
    <add name="myImageHandler" verb="*" path="*.aspx" type="WebApp. countHandler"/>
  </handlers>
</system.webServer>
Ok, we have finished our implementation part. Now let’s try to call the Home.aspx page in a browser. After calling the Home.aspx page, our handler will execute and it will increase the “CountRequest” value by one.

We are seeing that now the CountRequest value is 2, because the output screen has taken in the second trial.

Count number of request

Redirect one page from other page
This is another possible scenario where we can implement HttpHandler as a solution. Let’s think about a situation where one web page exists but then it no longer exists but your client thinks that the page is still in the application.

Now to solve this problem we need to redirect the HTTP request of that page to some other page. For the redirect operation we can use HttpHandler. Have a look at the following code.

In this implementation if someone tries to reach the Home.aspx page then she will redirect to the abc.html page. In this way we can implement redirection.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
 
namespace WebApp
{
    public class loginHandler : IHttpHandler
    {
        public void ProcessRequest(HttpContext context)
        {
            if (context.Request.RawUrl.Contains("Home.aspx"))
            {
                context.Response.Redirect("abc.html"); 
            }
        }
 
        public bool IsReusable
        {
            get
            {
                return false;
            }
        }
    }
}
The registration of a HttpHandler is very similar to the previous example. Now if we try to reach the Home.aspx page then it will automatically be redirected to the abc.html page.
Redirect one page from other page

Check whether or not a cookie is present in the current HTTP request
This is another possible scenario. Again , there might be another solution for that but we can implement HttpHandler to detect cookies in HTTP requests.

The count property of the HttpContext class will tell the number of cookies in the current HTTP request. In this example we are detecting the cookie count, if the count is more than 0 then we will consider that the cookie is present in the current HTTP request and we will drop that request immediately.

Here is the implementation:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
 
namespace WebApp
{
    public class loginHandler : IHttpHandler
    {
        public void ProcessRequest(HttpContext context)
        {
            if (context.Request.Cookies.Count > 0)
            {
                context.Response.ContentType = "text/plain";
                context.Response.Write("Please don't send cookie in HTTP Request");
            }
        }
 
        public bool IsReusable
        {
            get
            {
                return false;
            }
        }
    }
}
The registration process of HttpHandler is similar to the previous example, there is no change in that. So that we are skipping that part.

When we will now try to browse the “home.aspx” page (or any .aspx page, because our handler will detect the .aspx page request). We will see the default message rather than the page contents. The reason is, we are sending the SessionId from cookies.

current HTTP request

Monday, 25 July 2016

Fixing Orphaned Users

  
This is a problem that plagues DBAs everywhere. When you restore a database, you run the risk of orphaning the users in the database. All users are linked via a SID to a login and if you have SQL Server logins, who’s SIDs are managed by SQL Server, you are at risk. Typically a restore to the same server from which the backup was taken won’t cause an issue unless you dropped and recreated the login. Generally the problem rears its ugly head when you restore a backup to a server that was not the original location. You planned ahead and created the same logins on the new server as existed on the old server, so why do the users end up orphaned? As I mentioned earlier, SQL Server manages the SIDs for SQL Server logins so there is no guarantee that the new login has the same SID as the original login did. Then when you restore your database, the users in that database are expecting SIDs that are not there and the next thing you know you have orphaned users. Just a note, this does not occur with Windows Logins because the SID is controlled by Windows or Active Directory. Unless you drop and re-create the user in Windows, the SID of an Active Directory user will be the same on all SQL Servers and hence your user accounts see the SID they are looking for. So, the million dollar question is, how do you fix the problem without dropping and re-creating the user and messing up the permissions in the process? Microsoft provides us with a handy little stored procedure called sp_change_users_login that you can use to fix orphaned users. This procedure can do several things; it can tell you which users are orphaned, it lets you fix an orphaned user manually, and it can attempt to automatically fix your issues. So let’s look at an example. I have deliberately orphaned a user called Annie in the AdventureWorks2008 database. When I run sp_change_users_login with the REPORT option, I can see that I indeed have an orphaned user.
EXEC sp_change_users_login 'REPORT'

UserName UserSID
-------- -----------------------------------
Annie 0xA5B5548F3DC81D4693E769631629CE1D

To fix this orphaned user all I have to do is run sp_change_users_login with the UPDATE_ONE action and tell SQL Server the name of my orphaned user and the name of the appropriate login.
EXEC sp_change_users_login 'UPDATE_ONE','Annie','Annie'
There you have it, a simple quick fix to orphaned users that you can use next time you have an issue. I just want to add one more thing regarding the AUTO_FIX action of sp_change_users_login. If you use this option, the procedure tries to automatically fix your orphaned users by matching user name to login name. If no match is found, it will create the appropriate login for you. The only reason I don’t like it is that is has the potential to create logins you don’t want, especially if your login names happen to deliberately differ from your user names

Saturday, 9 July 2016

Web Service , WCF , WCF Rest , Web API

Web Service

  1. It is based on SOAP and return data in XML form.
  2. It support only HTTP protocol.
  3. It is not open source but can be consumed by any client that understands xml.
  4. It can be hosted only on IIS.

WCF

  1. It is also based on SOAP and return data in XML form.
  2. It is the evolution of the web service(ASMX) and support various protocols like TCP, HTTP, HTTPS, Named Pipes, MSMQ.
  3. The main issue with WCF is, its tedious and extensive configuration.
  4. It is not open source but can be consumed by any client that understands xml.
  5. It can be hosted with in the applicaion or on IIS or using window service.

WCF Rest

  1. To use WCF as WCF Rest service you have to enable webHttpBindings.
  2. It support HTTP GET and POST verbs by [WebGet] and [WebInvoke] attributes respectively.
  3. To enable other HTTP verbs you have to do some configuration in IIS to accept request of that particular verb on .svc files
  4. Passing data through parameters using a WebGet needs configuration. The UriTemplate must be specified
  5. It support XML, JSON and ATOM data format.

Web API

  1. This is the new framework for building HTTP services with easy and simple way.
  2. Web API is open source an ideal platform for building REST-ful services over the .NET Framework.
  3. Unlike WCF Rest service, it use the full featues of HTTP (like URIs, request/response headers, caching, versioning, various content formats)
  4. It also supports the MVC features such as routing, controllers, action results, filter, model binders, IOC container or dependency injection, unit testing that makes it more simple and robust.
  5. It can be hosted with in the application or on IIS.
  6. It is light weight architecture and good for devices which have limited bandwidth like smart phones.
  7. Responses are formatted by Web API’s MediaTypeFormatter into JSON, XML or whatever format you want to add as a MediaTypeFormatter.

How the web works : HTTP

Introduction

About this tutorial

This is an attempt to give a basic understanding of how the web works

It covers the HTTP protocol, which is used to transmit and receive web pages, as well as some server workings and scripting technologies. It is assumed that you already know how to make web pages and preferably some HTML as well.. It is also assumed that you have some basic knowledge of URLs. (A URL is the address of a document, what you need to be able to get hold of the document.)
I'm not entirely happy with this, so feedback would be very welcome. If you're not really a technical person and this tutorial leaves you puzzled or does not answer all your questions I'd very much like to hear about it. Corrections and opinions are also welcome.

Some background

When you browse the web the situation is basically this: you sit at your computer and want to see a document somewhere on the web, to which you have the URL.
Since the document you want to read is somewhere else in the world and probably very far away from you some more details are needed to make it available to you. The first detail is your browser. You start it up and type the URL into it (at least you tell the browser somehow where you want to go, perhaps by clicking on a link).
However, the picture is still not complete, as the browser can't read the document directly from the disk where it's stored if that disk is on another continent. So for you to be able to read the document the computer that contains the document must run a web server. A web server is a just a computer program that listens for requests from browsers and then execute them.
So what happens next is that the browser contacts the server and requests that the server deliver the document to it. The server then gives a response which contains the document and the browser happily displays this to the user. The server also tells the browser what kind of document this is (HTML file, PDF file, ZIP file etc) and the browser then shows the document with the program it was configured to use for this kind of document.
The browser will display HTML documents directly, and if there are references to images, Java applets, sound clips etc in it and the browser has been set up to display these it will request these also from the servers on which they reside. (Usually the same server as the document, but not always.) It's worth noting that these will be separate requests, and add additional load to the server and network. When the user follows another link the whole sequence starts anew.
These requests and responses are issued in a special language called HTTP, which is short for HyperText Transfer Protocol. What this article basically does is describe how this works. Other common protocols that work in similar ways are FTP and Gopher, but there are also protocols that work in completely different ways. None of these are covered here, sorry. (There is a link to some more details about FTP in the references.)
It's worth noting that HTTP only defines what the browser and web server say to each other, not how they communicate. The actual work of moving bits and bytes back and forth across the network is done by TCP and IP, which are also used by FTP and Gopher (as well as most other internet protocols).
When you continue, note that any software program that does the same as a web browser (ie: retrieve documents from servers) is called a client in network terminology and a user agent in web terminology. Also note that the server is properly the server program, and not the computer on which the server is an application program. (Sometimes called the server machine.)

What happens when I follow a link?

Step 1: Parsing the URL

The first thing the browser has to do is to look at the URL of the new document to find out how to get hold of the new document. Most URLs have this basic form: "protocol://server/request-URI". The protocol part describes how to tell the server which document the you want and how to retrieve it. The server part tells the browser which server to contact, and the request-URI is the name used by the web server to identify the document. (I use the term request-URI since it's the one used by the HTTP standard, and I can't think of anything else that is general enough to not be misleading.)

Step 2: Sending the request

Usually, the protocol is "http". To retrieve a document via HTTP the browser transmits the following request to the server: "GET /request-URI HTTP/version", where version tells the server which HTTP version is used. (Usually, the browser includes some more information as well. The details are covered later.)
One important point here is that this request string is all the server ever sees. So the server doesn't care if the request came from a browser, a link checker, a validator, a search engine robot or if you typed it in manually. It just performs the request and returns the result.

Step 3: The server response

When the server receives the HTTP request it locates the appropriate document and returns it. However, an HTTP response is required to have a particular form. It must look like this:
HTTP/[VER] [CODE] [TEXT]
Field1: Value1
Field2: Value2

...Document content here...
The first line shows the HTTP version used, followed by a three-digit number (the HTTP status code) and a reason phrase meant for humans. Usually the code is 200 (which basically means that all is well) and the phrase "OK". The first line is followed by some lines called the header, which contains information about the document. The header ends with a blank line, followed by the document content. This is a typical header:
HTTP/1.0 200 OK
Server: Netscape-Communications/1.1
Date: Tuesday, 25-Nov-97 01:22:04 GMT
Last-modified: Thursday, 20-Nov-97 10:44:53 GMT
Content-length: 6372
Content-type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
...followed by document content...
We see from the first line that the request was successful. The second line is optional and tells us that the server runs the Netscape Communications web server, version 1.1. We then get what the server thinks is the current date and when the document was modified last, followed by the size of the document in bytes and the most important field: "Content-type".
The content-type field is used by the browser to tell which format the document it receives is in. HTML is identified with "text/html", ordinary text with "text/plain", a GIF is "image/gif" and so on. The advantage of this is that the URL can have any ending and the browser will still get it right.
An important concept here is that to the browser, the server works as a black box. Ie: the browser requests a specific document and the document is either returned or an error message is returned. How the server produces the document remains unknown to the browser. This means that the server can read it from a file, run a program that generates it, compile it by parsing some kind of command file or (very unlikely, but in principle possible) have it dictated by the server administrator via speech recognition software. This gives the server administrator great freedom to experiment with different kinds of services as the users don't care (or even know) how pages are produced.

What the server does

When the server is set up it is usually configured to use a directory somewhere on disk as its root directory and that there be a default file name (say "index.html") for each directory. This means that if you ask the server for the file "/" (as in "http://www.domain.tld/") you'll get the file index.html in the server root directory. Usually, asking for "/foo/bar.html" will give you the bar.html file from the foo directory directly beneath the server root.
Usually, that is. The server can be set up to map "/foo/" into some other directory elsewhere on disk or even to use server-side programs to answer all requests that ask for that directory. The server does not even have to map requests onto a directory structure at all, but can use some other scheme.

HTTP versions

So far there are three versions of HTTP. The first one was HTTP/0.9, which was truly primitive and never really specified in any standard. This was corrected by HTTP/1.0, which was issued as a standard in RFC 1945. (See references). HTTP/1.0 is the version of HTTP that is in common use today (usually with some 1.1 extensions), while HTTP/0.9 is rarely, if ever, used by browsers. (Some simpler HTTP clients still use it since they don't need the later extensions.)
RFC 2068 describes HTTP/1.1, which extends and improves HTTP/1.0 in a number of areas. Very few browsers support it (MSIE 4.0 is the only one known to the author), but servers are beginning to do so.
The major differences are a some extensions in HTTP/1.1 for authoring documents online via HTTP and a feature that lets clients request that the connection be kept open after a request so that it does not have to be reestablished for the next request. This can save some waiting and server load if several requests have to be issued quickly.
This document describes HTTP/1.0, except some sections that cover the HTTP/1.1 extensions. Those will be explicitly labeled.

The request sent by the client

The shape of a request

Basically, all requests look like this:
[METH] [REQUEST-URI] HTTP/[VER]
[fieldname1]: [field-value1]
[fieldname2]: [field-value2]

[request body, if any]
The METH (for request method) gives the request method used, of which there are several, and which all do different things. The above example used GET, but below some more are explained. The REQUEST-URI is the identifier of the document on the server, such as "/index.html" or whatever. VER is the HTTP version, like in the response. The header fields are also the same as in the server response.
The request body is only used for requests that transfer data to the server, such as POST and PUT. (Described below.)

GETting a document

There are several request types, with the most common one being GET. A GET request basically means "send me this document" and looks like this: "GET document_path HTTP/version". (Like it was described above.) For the URL "http://www.yahoo.com/" the document_path would be "/", and for "http://www.w3.org/Talks/General.html" it is "/Talks/General.html".
However, this first line is not the only thing a user agent (UA) usually sends, although it's the only thing that's really necessary. The UA can include a number of header fields in the request to give the server more information. These fields have the form "fieldname: value" and are all put on separate lines after the first request line.
Some of the header fields that can be used with GET are:
User-Agent
This is a string identifying the user agent. An English version of Netscape 4.03 running under Windows NT would send "Mozilla/4.03 [en] (WinNT; I ;Nav)". (Mozilla is the old name for Netscape. See the references for more details.)

Referer
The referer field (yes, it's misspelled in the standard) tells the server where the user came from, which is very useful for logging and keeping track of who links to ones pages.

If-Modified-Since
If a browser already has a version of the document in its cache it can include this field and set it to the time it retrieved that version. The server can then check if the document has been modified since the browser last downloaded it and send it again if necessary. The whole point is of course that if the document hasn't changed, then the server can just say so and save some waiting and network traffic.

From
This header field is a spammers dream come true: it is supposed to contain the email address of whoever controls the user agent. Very few, if any, browsers use it, partly because of the threat from spammers. However, web robots should use it, so that webmasters can contact the people responsible for the robot should it misbehave.

Authorization
This field can hold username and password if the document in question requires authorization to be accessed.




To put all these pieces together: this is a typical GET request, as issued by my browser (Opera):
GET / HTTP/1.0
User-Agent: Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4)
Accept: */*
Host: birk105.studby.uio.no:81

HEAD: checking documents

One may somtimes want to see the headers returned by the server for a particular document, without actually downloading the document. This is exactly what the HEAD request method provides. HEAD looks and works exactly like GET, only with the difference that the server only returns the headers and not the document content.
This is very useful for programs like link checkers, people who want to see the response headers (to see what server is used or to verify that they are correct) and many other kinds of uses.

Playing web browser

You can actually play web browser yourself and write HTTP requests directly to web servers. This can be done by telnetting to port 80, writing the request and hitting enter twice, like this:
larsga - tyrfing>telnet www.w3.org 80
Trying 18.23.0.23...
Connected to www.w3.org.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 17 Feb 1998 22:24:53 GMT
Server: Apache/1.2.5
Last-Modified: Wed, 11 Feb 1998 18:22:22 GMT
ETag: "2c3136-23c1-34e1ec5e"
Content-Length: 9153
Accept-Ranges: bytes
Connection: close
Content-Type: text/html; charset=ISO-8859-1

Connection closed by foreign host.
larsga - tyrfing>
However, this works best under Unix as the Windows telnet clients I've used are not very suitable for this (and hard to set up so that it works). Instead you can use HTTPTest, a CGI script I've linked to in the references.

The response returned by the server

Outline

What the server returns consists of a line with the status code, a list of header fields, a blank line and then the requested document, if it is returned at all. Sort of like this:
HTTP/1.0 code text
Field1: Value1
Field2: Value2

...Document content here...

The status codes

The status codes are all three-digit numbers that are grouped by the first digit into 5 groups. The reason phrases given with the status codes below are just suggestions. Server can return any reason phrase they wish.

1xx: Informational

No 1xx status codes are defined, and they are reserved for experimental purposes only.

2xx: Successful

Means that the request was processed successfully.
200 OK
Means that the server did whatever the client wanted it to, and all is well.

Others
The rest of the 2xx status codes are mainly meant for script processing and are not often used.

3xx: Redirection

Means that the resource is somewhere else and that the client should try again at a new address.
301 Moved permanently
The resource the client requested is somewhere else, and the client should go there to get it. Any links or other references to this resource should be updated.

302 Moved temporarily
This means the same as the 301 response, but links should now not be updated, since the resource may be moved again in the future.

304 Not modified
This response can be returned if the client used the if-modified-since header field and the resource has not been modified since the given time. Simply means that the cached version should be displayed for the user.



4xx: Client error

Means that the client screwed up somehow, usually by asking for something it should not have asked for.
400: Bad request
The request sent by the client didn't have the correct syntax.

401: Unauthorized
Means that the client is not allowed to access the resource. This may change if the client retries with an authorization header.

403: Forbidden
The client is not allowed to access the resource and authorization will not help.

404: Not found
Seen this one before? :) It means that the server has not heard of the resource and has no further clues as to what the client should do about it. In other words: dead link.



5xx: Server error

This means that the server screwed up or that it couldn't do as the client requested.
500: Internal server error
Something went wrong inside the server.

501: Not implemented
The request method is not supported by the server.

503: Service unavailable
This sometimes happens if the server is too heavily loaded and cannot service the request. Usually, the solution is for the client to wait a while and try again.


The response header fields

These are the header fields a server can return in response to a request.
Location
This tells the user agent where the resource it requested can be found. The value is just the URL of the new resource.

Server
This tells the user agent which web server is used. Nearly all web servers return this header, although some leave it out.

Content-length
This gives the size of the resource, in bytes.

Content-type
This describes the file format of the resource.

Content-encoding
This means that the resource has been coded in some way and must be decoded before use.

Expires
This field can be set for data that are updated at a known time (for instance if they are generated by a script). It is used to prevent browsers from caching the resource beyond the given date.

Last-modified
This tells the browser when the resource was last modified. Can be useful for mirroring, update notification etc.






Caching: agents between the server and client

The browser cache

You may have noticed that when you go back to a page you've looked at not too long before the page loads much quicker. That's because the browser stored a local copy of it when it was first downloaded. These local copies are kept in what's called a cache. Usually one sets a maximum size for the cache and a maximum caching time for documents.
This means that when a new page is visited it is stored in the cache, and if the cache is full (near the maximum size limit) some document that the browser considers unlikely to be visited again soon is deleted to make room. Also, if you go to a page that is stored in the cache the browser may find that you've set 7 days as a the maximum storage time and 8 days have now passed since the last visit, so the page needs to be reloaded.
Exactly how caches work differ between browsers, but this is the basic idea, and it's a good one because it saves both time for the user and network traffic. There are also some HTTP details involved, but they will be covered later.

Proxy caches

Browser caches are a nice feature, but when many users browse from the same site one usually ends up storing the same document in many different caches and refreshing it over and over for different uses. Clearly, this isn't optimal.
The solution is to let the users share a cache, and this is exactly what proxy caches are all about. Browsers still have their local caches, but HTTP requests for documents not in the browser cache are not sent to the server any more, instead they are sent to the proxy cache. If the proxy has the document in its cache it will just return the document (like the browser cache would), and if it doesn't it will submit the request on behalf of the browser, store the result and relay it to the browser.
So the proxy is really a common cache for a number of users and can reduce network traffic rather dramatically. It can also skew log-based statistics badly. :)
A more advanced solution than a single proxy cache is a hierarchy of proxy caches. Imagine a large ISP may have one proxy cache for each part of the country and set up each of the regional proxies to use a national proxy cache instead of going directly to the source web servers. This solution can reduce network traffic even further. More detail on this is linked to in the references.

Server-side programming

What is it and why do it?

Server-side scripts or programs are simply programs that are run on the web server in response to requests from the client. These scripts produce normal HTML (and sometimes HTTP headers as well) as output which is then fed back to the client as if the client had requested an ordinary page. In fact, there is no way for the client software to tell whether scripting has been used or not.
Technologies such as JavaScript, VBScript and Java applets all run in the client and so are not examples of this. There is a major difference between server-side and client-side scripting as the client and server are usually different computers. So if all the data the program needs are located on the server it may make sense to use server-side scripting instead of client-side. (There is also the problem that the client may not have a browser that supports the scripting technology or it may be turned off.) If the program and user need to interact often client-side scripting is probably best, to reduce the number of requests sent to the server.
So: in general, if the program needs a lot of data and infrequent interactions with the server server-side scripting is probably best. Applications that use less data and more interaction are best put in the client. Also, applications that gather data over time need to be on the server where the data file is kept.
An example of the first would be search engines like Altavista. Obviously it's not feasible to download all the documents Altavista has collected to search in them locally. An example of the last would be a simple board game. No data are needed and having to send a new request to the server for each move you make quickly gets tedious.
There is one kind of use that has been left out here: what do you do when you want a program to work on data with a lot of interaction with the user? There is no good solution right now, but there is one on the horizon called XML

Saturday, 26 September 2015

What really happens when you navigate to a URL

What really happens when you navigate to a URL

As a software developer, you certainly have a high-level picture of how web apps work and what kinds of technologies are involved: the browser, HTTP, HTML, web server, request handlers, and so on.
In this article, we will take a deeper look at the sequence of events that take place when you visit a URL.

1. You enter a URL into the browser

It all starts here:
image

2. The browser looks up the IP address for the domain name

 image
The first step in the navigation is to figure out the IP address for the visited domain. The DNS lookup proceeds as follows:
  • Browser cache – The browser caches DNS records for some time. Interestingly, the OS does not tell the browser the time-to-live for each DNS record, and so the browser caches them for a fixed duration (varies between browsers, 2 – 30 minutes).
  • OS cache – If the browser cache does not contain the desired record, the browser makes a system call (gethostbyname in Windows). The OS has its own cache.
  • Router cache – The request continues on to your router, which typically has its own DNS cache.
  • ISP DNS cache – The next place checked is the cache ISP’s DNS server. With a cache, naturally.
  • Recursive search – Your ISP’s DNS server begins a recursive search, from the root nameserver, through the .com top-level nameserver, to Facebook’s nameserver. Normally, the DNS server will have names of the .com nameservers in cache, and so a hit to the root nameserver will not be necessary.
Here is a diagram of what a recursive DNS search looks like:
500px-An_example_of_theoretical_DNS_recursion_svg
One worrying thing about DNS is that the entire domain like wikipedia.org or facebook.com seems to map to a single IP address. Fortunately, there are ways of mitigating the bottleneck:
  • Round-robin DNS is a solution where the DNS lookup returns multiple IP addresses, rather than just one. For example, facebook.com actually maps to four IP addresses.
  • Load-balancer is the piece of hardware that listens on a particular IP address and forwards the requests to other servers. Major sites will typically use expensive high-performance load balancers.
  • Geographic DNS improves scalability by mapping a domain name to different IP addresses, depending on the client’s geographic location. This is great for hosting static content so that different servers don’t have to update shared state.
  • Anycast is a routing technique where a single IP address maps to multiple physical servers. Unfortunately, anycast does not fit well with TCP and is rarely used in that scenario.
Most of the DNS servers themselves use anycast to achieve high availability and low latency of the DNS lookups.

3. The browser sends a HTTP request to the web server

image
You can be pretty sure that Facebook’s homepage will not be served from the browser cache because dynamic pages expire either very quickly or immediately (expiry date set to past).
So, the browser will send this request to the Facebook server:
GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]
The GET request names the URL to fetch: “http://facebook.com/”. The browser identifies itself (User-Agent header), and states what types of responses it will accept (Accept and Accept-Encoding headers). The Connection header asks the server to keep the TCP connection open for further requests.
The request also contains the cookies that the browser has for this domain. As you probably already know, cookies are key-value pairs that track the state of a web site in between different page requests. And so the cookies store the name of the logged-in user, a secret number that was assigned to the user by the server, some of user’s settings, etc. The cookies will be stored in a text file on the client, and sent to the server with every request.
There is a variety of tools that let you view the raw HTTP requests and corresponding responses. My favorite tool for viewing the raw HTTP traffic is fiddler, but there are many other tools (e.g., FireBug) These tools are a great help when optimizing a site.
In addition to GET requests, another type of requests that you may be familiar with is a POST request, typically used to submit forms. A GET request sends its parameters via the URL (e.g.: http://robozzle.com/puzzle.aspx?id=85). A POST request sends its parameters in the request body, just under the headers.
The trailing slash in the URL “http://facebook.com/” is important. In this case, the browser can safely add the slash. For URLs of the form http://example.com/folderOrFile, the browser cannot automatically add a slash, because it is not clear whether folderOrFile is a folder or a file. In such cases, the browser will visit the URL without the slash, and the server will respond with a redirect, resulting in an unnecessary roundtrip.

4. The facebook server responds with a permanent redirect

image
This is the response that the Facebook server sent back to the browser request:
HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
      pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP="DSP LAW"
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
      path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0
The server responded with a 301 Moved Permanently response to tell the browser to go to “http://www.facebook.com/” instead of “http://facebook.com/”.
There are interesting reasons why the server insists on the redirect instead of immediately responding with the web page that the user wants to see.
One reason has to do with search engine rankings. See, if there are two URLs for the same page, say http://www.igoro.com/ and http://igoro.com/, search engine may consider them to be two different sites, each with fewer incoming links and thus a lower ranking. Search engines understand permanent redirects (301), and will combine the incoming links from both sources into a single ranking.
Also, multiple URLs for the same content are not cache-friendly. When a piece of content has multiple names, it will potentially appear multiple times in caches.

5. The browser follows the redirect

image
The browser now knows that “http://www.facebook.com/” is the correct URL to go to, and so it sends out another GET request:
GET http://www.facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...]
Host: www.facebook.com
The meaning of the headers is the same as for the first request.

6. The server ‘handles’ the request

image
The server will receive the GET request, process it, and send back a response.
This may seem like a straightforward task, but in fact there is a lot of interesting stuff that happens here – even on a simple site like my blog, let alone on a massively scalable site like facebook.
  • Web server softwareThe web server software (e.g., IIS or Apache) receives the HTTP request and decides which request handler should be executed to handle this request. A request handler is a program (in ASP.NET, PHP, Ruby, …) that reads the request and generates the HTML for the response. In the simplest case, the request handlers can be stored in a file hierarchy whose structure mirrors the URL structure, and so for example http://example.com/folder1/page1.aspx URL will map to file /httpdocs/folder1/page1.aspx. The web server software can also be configured so that URLs are manually mapped to request handlers, and so the public URL of page1.aspx could be http://example.com/folder1/page1.
  • Request handlerThe request handler reads the request, its parameters, and cookies. It will read and possibly update some data stored on the server. Then, the request handler will generate a HTML response.
One interesting difficulty that every dynamic website faces is how to store data. Smaller sites will often have a single SQL database to store their data, but sites that store a large amount of data and/or have many visitors have to find a way to split the database across multiple machines. Solutions include sharding (splitting up a table across multiple databases based on the primary key), replication, and usage of simplified databases with weakened consistency semantics.
One technique to keep data updates cheap is to defer some of the work to a batch job. For example, Facebook has to update the newsfeed in a timely fashion, but the data backing the “People you may know” feature may only need to be updated nightly (my guess, I don’t actually know how they implement this feature). Batch job updates result in staleness of some less important data, but can make data updates much faster and simpler.

7. The server sends back a HTML response

image
Here is the response that the server generated and sent back:
HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
    pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP="DSP LAW"
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT

2b3
��������T�n�@����[...]
The entire response is 36 kB, the bulk of them in the byte blob at the end that I trimmed.
The Content-Encoding header tells the browser that the response body is compressed using the gzip algorithm. After decompressing the blob, you’ll see the HTML you’d expect:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"   
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" 
      lang="en" id="facebook" class=" no_js">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-language" content="en" />
...
In addition to compression, headers specify whether and how to cache the page, any cookies to set (none in this response), privacy information, etc.
Notice the header that sets Content-Type to text/html. The header instructs the browser to render the response content as HTML, instead of say downloading it as a file. The browser will use the header to decide how to interpret the response, but will consider other factors as well, such as the extension of the URL.

8. The browser begins rendering the HTML

Even before the browser has received the entire HTML document, it begins rendering the website:
 image

9. The browser sends requests for objects embedded in HTML

image
As the browser renders the HTML, it will notice tags that require fetching of other URLs. The browser will send a GET request to retrieve each of these files.
Here are a few URLs that my visit to facebook.com retrieved:
  • Imageshttp://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
    http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif
  • CSS style sheetshttp://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
    http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css
  • JavaScript files
    http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
    http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js
Each of these URLs will go through process a similar to what the HTML page went through. So, the browser will look up the domain name in DNS, send a request to the URL, follow redirects, etc.
However, static files – unlike dynamic pages – allow the browser to cache them. Some of the files may be served up from cache, without contacting the server at all. The browser knows how long to cache a particular file because the response that returned the file contained an Expires header. Additionally, each response may also contain an ETag header that works like a version number – if the browser sees an ETag for a version of the file it already has, it can stop the transfer immediately.
Can you guess what “fbcdn.net” in the URLs stands for? A safe bet is that it means “Facebook content delivery network”. Facebook uses a content delivery network (CDN) to distribute static content – images, style sheets, and JavaScript files. So, the files will be copied to many machines across the globe.
Static content often represents the bulk of the bandwidth of a site, and can be easily replicated across a CDN. Often, sites will use a third-party CDN provider, instead of operating a CND themselves. For example, Facebook’s static files are hosted by Akamai, the largest CDN provider.
As a demonstration, when you try to ping static.ak.fbcdn.net, you will get a response from an akamai.net server. Also, interestingly, if you ping the URL a couple of times, may get responses from different servers, which demonstrates the load-balancing that happens behind the scenes.

10. The browser sends further asynchronous (AJAX) requests

image
In the spirit of Web 2.0, the client continues to communicate with the server even after the page is rendered.
For example, Facebook chat will continue to update the list of your logged in friends as they come and go. To update the list of your logged-in friends, the JavaScript executing in your browser has to send an asynchronous request to the server. The asynchronous request is a programmatically constructed GET or POST request that goes to a special URL. In the Facebook example, the client sends a POST request to http://www.facebook.com/ajax/chat/buddy_list.php to fetch the list of your friends who are online.
This pattern is sometimes referred to as “AJAX”, which stands for “Asynchronous JavaScript And XML”, even though there is no particular reason why the server has to format the response as XML. For example, Facebook returns snippets of JavaScript code in response to asynchronous requests.
Among other things, the fiddler tool lets you view the asynchronous requests sent by your browser. In fact, not only you can observe the requests passively, but you can also modify and resend them. The fact that it is this easy to “spoof” AJAX requests causes a lot of grief to developers of online games with scoreboards. (Obviously, please don’t cheat that way.)
Facebook chat provides an example of an interesting problem with AJAX: pushing data from server to client. Since HTTP is a request-response protocol, the chat server cannot push new messages to the client. Instead, the client has to poll the server every few seconds to see if any new messages arrived.
Long polling is an interesting technique to decrease the load on the server in these types of scenarios. If the server does not have any new messages when polled, it simply does not send a response back. And, if a message for this client is received within the timeout period, the server will find the outstanding request and return the message with the response.

Conclusion

Hopefully this gives you a better idea of how the different web pieces work together.