Graduating in Computer Science: Web Fundamentals

History of the Internet

The invention of the telegraph, telephone, radio, and computer set the stage for the unprecedented revolution of communication. The Internet is at once a world-wide broadcasting capability. The Internet today is a widespread information infrastructure, the initial prototype of what is often called the National (or Global or Galactic) Information Infrastructure. Its history is complex and involves many aspects - technological, organizational, and community. And its influence reaches not only to the technical fields of computer communications but throughout society.

The first recorded description of the social interactions that could be enabled through networking was a series of memos written by J.C.R. Licklider of MIT in August 1962 discussing his "Galactic Network" concept. He envisioned a globally interconnected set of computers through which everyone could quickly access data and programs from any site. Licklider was the first head of the computer research program at DARPA, starting in October 1962. The public was first introduced to the concepts that would lead to the Internet when a message was sent over the ARPANet from computer science Professor Leonard Kleinrock's laboratory at University of California, after the second piece of network equipment was installed at Stanford Research Institute (SRI). By the end of 1969, four host computers were connected together into the initial ARPANET, and the budding Internet was off the ground.

Computers were added quickly to the ARPANET during the following years, and work proceeded on completing a functionally complete Host-to-Host protocol and other network software. Packet switched networks such as ARPANET were developed in the late 1960s and early 1970s using a variety of protocols. The ARPANET in particular led to the development of protocols for internetworking, in which multiple separate networks could be joined together into a network of networks.

In 1982, the Internet protocol suite (TCP/IP) was standardized, and consequently, the concept of a world-wide network of interconnected TCP/IP networks, called the Internet, was introduced. Access to the ARPANET was expanded in 1981 when the National Science Foundation (NSF) developed the Computer Science Network (CSNET) and again in 1986 when NSFNET provided access to supercomputer sites in the United States from research and education organizations. Commercial Internet service providers (ISPs) began to emerge in the late 1980s and early 1990s. The ARPANET was decommissioned in 1990. The Internet was commercialized in 1995 when NSFNET was decommissioned, removing the last restrictions on the use of the Internet to carry commercial traffic.

Since the mid-1990s, the Internet has had a revolutionary impact on culture and commerce, including the rise of near-instant communication by electronic mail, instant messaging, Voice over Internet Protocol (VoIP) "phone calls", two-way interactive video calls, and the World Wide Web with its discussion forums, blogs, social networking, and online shopping sites. The research and education community continues to develop and use advanced networks such as NSF's very high speed Backbone Network Service (vBNS), Internet2, and National LambdaRail. Increasing amounts of data are transmitted at higher and higher speeds over fibre optic networks operating at 1-Gbit/s, 10-Gbit/s, or more. The Internet's takeover over the global communication landscape was almost instant in historical terms: it only communicated 1% of the information flowing through two-way telecommunications networks in the year 1993, already 51% by 2000, and more than 97% of the telecommunicated information by 2007. Today the Internet continues to grow, driven by ever greater amounts of online information, commerce, entertainment, and social networking.

Basic Services

Some of the basic services available to Internet users are:

Email: A fast, easy, and inexpensive way to communicate with other Internet users around the world.
Telnet: Allows a user to log into a remote computer as though it were a local system.
FTP: Allows a user to transfer virtually every kind of file that can be stored on a computer from one Internet-connected computer to another.
Usenetnews: A distributed bulletin board that offers a combination news and discussion service on thousands of topics.
World Wide Web (WWW): A hypertext interface to Internet information resources.
Social Networking
E Commerce
Online examinations

Search engines

Search engines are the primary tools people use to find information on the web. Today, you perform searches with keywords, but the future of web search will use natural language. Currently, when you enter a keyword or phrase, the search engine finds matching web pages and show you a search engine results page (SERP) with recommended web pages listed and sorted by relevance. People-assisted search engines have also emerged, such as Mahalo, which pays people to develop search results.

A search engine operates in the following order:

Web crawling
Indexing
Searching

Web search engines work by storing information about many web pages, which they retrieve from the page's HTML. These pages are retrieved by a Web crawler (sometimes also known as a spider) — an automated Web browser which follows every link on the site. The contents of each page are then analysed to determine how it should be indexed (for example, words can be extracted from the titles, page content, headings, or special fields called meta tags).

Data about web pages are stored in an index database for use in later queries. The index helps find information relating to the query as quickly as possible. When a user enters a query into a search engine (typically by using keywords), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text.

Currently Google is the leading search engine followed by Yahoo and Microsoft.

Google Search

Google is the leading search and online advertising company, founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. Google’s success in search is largely based on its PageRank algorithm (patented by Larry Page) and its unique infrastructure of servers that uses linked PCs to achieve faster responses and increased scalability at lower costs. Estimates on the number of Google servers range over one million.

The PageRank algorithm considers the number of links into a web page and the quality of the linking sites (among other factors) to determine the importance of the page.

In addition to its regular search engine, Google offers speciality search engines for images, news, videos, blogs and more.

Vertical search engines

Vertical search engines are specialists (focusing on specific topics) in comparison to generalists like Google and Yahoo. They enable us to search for resources in a specific area, with the goal of providing you with a smaller number of more relevant results.

Location-based search

Location-based search (offered by most major search engines as well as some smaller specialized ones) uses geographic information about the searcher to provide more relevant search results. For example, search engines can ask the user for a ZIP code or estimate the user’s general location based on IP address. The engine can then use this information to give higher priority to search results physically located near the user. This is particularly useful when searching for businesses such as restaurants or car services.

e-mail

One of the most popular Internet services is electronic mail (e-mail). At the beginning of the Internet era, the messages sent by electronic mail were short and consisted of text only; they let people exchange quick memos. Today, electronic mail is much more complex. It allows a message to include text, audio, and video. It also allows one message to be sent to one or more recipients.

An e-mail system has three main components: user agent, message transfer agent, and message access agent.

User Agent: It provides service to the user to make the process of sending and receiving a message easier. A user agent is a software package (program) that composes, reads, replies to, and forwards messages. It also handles mailboxes.

Message Transfer Agent: The actual mail transfer is done through message transfer agents. The process of transferring a mail message occurs in three phases: connection establishment, mail transfer, and connection termination. It uses SMTP protocol.

Message Access Agent: Mail access starts with the client when the user needs to download e-mail from the mailbox on the mail server. The client opens a connection to the server on TCP port. It then sends its user name and password to access the mailbox. The user can then list and retrieve the mail messages, one by one. it uses protocols like POP3, IMAP etc.

Web-Based Mail

E-mail is such a common application that some websites today provide this service to anyone who accesses the site. Two common sites are Gmail, Hotmail and Yahoo. Mail transfer from browser to mail server is done through HTTP.

WWW

The World Wide Web (abbreviated as WWW or W3, commonly known as the web) is a system of interlinked hypertext documents accessed via the Internet. A broader definition comes from the organization that Web inventor Tim Berners-Lee helped found, the World Wide Web Consortium (W3C):

The World Wide Web is the universe of network-accessible information, an embodiment of human knowledge.

The WWW project was initiated by CERN (European Laboratory for Particle Physics) to create a system to handle distributed resources necessary for scientific research.

The WWW today is a distributed client-server service, in which a client using a browser can access a service using a server. However, the service provided is distributed over many locations called sites. Each site holds one or more documents, referred to as Web pages. Each Web page can contain a link to other pages in the same site or at other sites. The pages can be retrieved and viewed by using browsers.

In simple terms, The World Wide Web is a way of exchanging information between computers on the Internet, tying them together into a vast collection of interactive multimedia resources.

Web Server

Web servers is a specialized software that responds to client requests (typically from a web browser) by providing resources such as HTML/XHTML documents. For example, when users enter a Uniform Resource Locator (URL) address, such as graduatingcs.blogspot.in, into a web browser, they are requesting a specific document from a web server. The web server maps the URL to a resource on the server (or to a file on the server’s network) and returns the requested resource to the client. During this interaction, the web server and the client communicate using the platform-independent Hypertext Transfer Protocol (HTTP), a protocol for transferring requests and files over the Internet.

Multitier Application Architecture

Web-based applications are multitier applications that divide functionality into separate tiers (i.e., logical groupings of functionality). Although tiers can be located on the same computer, the tiers of web-based applications often reside on separate computers. Figure presents the basic structure of a three-tier web-based application.

3 Tier Architecture

The bottom tier (also called the data tier or the information tier) maintains the application’s data. This tier typically stores data in a relational database management system (RDBMS). They may reside on one or more computers.

The middle tier implements business logic, controller logic and presentation logic to control interactions between the application’s clients and its data. The middle tier acts as an intermediary between data in the information tier and the application’s clients. The middle-tier controller logic processes client requests and retrieves data from the database. The middle-tier presentation logic then processes data from the information tier and presents the content to the client. Web applications typically present data to clients as XHTML documents.

Business logic in the middle tier enforces business rules and ensures that data is reliable before the application updates a database or presents data to users. Business rules dictate how clients access data, and how applications process data. For example, a business rule in the middle tier of a retail store’s web-based application might ensure that all product quantities remain positive. A client request to set a negative quantity in the bottom tier’s product information database would be rejected by the middle tier’s business logic.

The top tier, or client tier, is the application’s user interface, which gathers input and displays output. Users interact directly with the application through the user interface, which is typically a web browser, keyboard and mouse, or a mobile device. In response to user actions (e.g., clicking a hyperlink), the client tier interacts with the middle tier to make requests and to retrieve data from the information tier. The client tier then displays the data retrieved for the user. The client tier never directly interacts with the information tier.

The Apache HTTP Server

The Apache HTTP Server, commonly referred to as Apache, is a web server software program playing a key role in the growth of the World Wide Web. Apache has consistently been the most popular web server on the Internet since 1996. Typically Apache is run on a Unix-like operating system, and was developed for use on Linux.

Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation. The application is available for a wide variety of operating systems, including Unix, Linux, Solaris, Novell NetWare, Microsoft Windows etc. Released under the Apache License, Apache is open-source software.

Since April 1996 Apache has been the most popular HTTP server software in use. As of June 2013, Apache was estimated to serve 54% of all active websites.

Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes like password authentication. It also features configurable error messages. Some common language interfaces supported are Perl, Python and PHP.

Internet Information Services (IIS)

Internet Information Services (IIS) – formerly called Internet Information Server – is a Microsoft web server software application created by Microsoft for use with Microsoft Windows. IIS 7.5 supports HTTP, HTTPS, FTP, FTPS, SMTP and NNTP. It is part of certain editions of Windows XP, Windows Vista and Windows 7. IIS is not turned on by default. The IIS Manager is accessed through the Administrative Tools in the Control Panel.

All versions of IIS prior to 7.0 running on client operating systems supported only 10 simultaneous connections and a single web site.

With IIS 8 you can share information with users on the Internet, an intranet, or an extranet. IIS 8 is a unified web platform that integrates IIS, ASP.NET, FTP services, PHP, and Windows Communication Foundation (WCF).

IIS is the second most popular web server in the world with 20% market share.

Protocols

In computer networks, communication occurs between entities in different systems. An entity is anything capable of sending or receiving information. However, two entities cannot simply send bit streams to each other and expect to be understood. For communication to occur, the entities must agree on a protocol. A protocol is a set of rules that govern data communications. A protocol defines what is communicated, how it is communicated, and when it is communicated.

The key elements of a protocol are syntax, semantics, and timing.

Syntax: The term syntax refers to the structure or format of the data, meaning the order in which they are presented.

Semantics: The word semantics refers to the meaning of each section of bits. How is a particular pattern to be interpreted, and what action is to be taken based on that interpretation?

Timing: The term timing refers to two characteristics: when data should be sent and how fast they can be sent.

Hyper Text Transfer Protocol (HTTP)

The Hypertext Transfer Protocol (HTTP) is a protocol used mainly to access data on the World Wide Web. HTTP functions as a combination of FTP and SMTP. The client initializes the transaction by sending a request message. The server replies by sending a response. A request message consists of a request line, a header, and sometimes a body. A response message consists of a status line, a header, and sometimes a body.

File Transfer Protocol (FTP)

File Transfer Protocol (FTP) is the standard mechanism provided by TCP/IP for copying a file from one host to another. Transferring files from one system to another need to deal with some problems. For example, two systems may use different file name conventions. Two systems may have different ways to represent text and data. Two systems may have different directory structures. All these problems have been solved by FTP in a very simple and elegant approach.

FTP differs from other client/server applications in that it establishes two connections between the hosts. One connection is used for data transfer, the other for control information (commands and responses). Separation of commands and data transfer makes FTP more efficient.

We need to transfer only a line of command or a line of response at a time. The data connection, on the other hand, needs more complex rules due to the variety of data types transferred.

The control connection remains connected during the entire interactive FTP session. The data connection is opened and then closed for each file transferred. It opens each time commands that involve transferring files are used, and it closes when the file is transferred.

TELNET

In the Internet, users may want to run application programs at a remote site and create results that can be transferred to their local site. For example, students may want to connect to their university computer lab from their home to access application programs for doing homework assignments or projects.

TELNET is general-purpose client-server program that lets a user access any application program on a remote computer; in other words, allow the user to log on to a remote computer. After logging on, a user can use the services available on the remote computer and transfer the results back to the local computer.

TELNET is an abbreviation for TErminaL NETwork. TELNET enables the establishment of a connection to a remote system in such a way that the local terminal appears to be a terminal at the remote system.

When a user wants to access an application program or utility located on a remote machine, he performs remote log-in using a user-name and password. TELNET uses only one TCP connection. The same connection is used for sending both data and control characters. TELNET accomplishes this by embedding the control characters in the data stream.

Most TELNET implementations operate in one of three modes: default mode, character mode, or line mode.

Default Mode: The default mode is used if no other modes are invoked through option negotiation. In this mode, the echoing is done by the client. The user types a character, and the client echoes the character on the screen (or printer) but does not send it until a whole line is completed.
Character Mode: In the character mode, each character typed is sent by the client to the server. The server normally echoes the character back to be displayed on the client screen.
Line Mode: A new mode has been proposed to compensate for the deficiencies of the default mode and the character mode. In this mode, called the line mode, line editing (echoing, character erasing, line erasing, and so on) is done by the client. The client then sends the whole line to the server.

Graduating in Computer Science

Tuesday, September 17, 2013

Web Fundamentals

No comments:

Post a Comment