Wednesday, August 18, 2010

XML INTERVIEW QUESTION


What is WSDL endpoint ?
The WSDL defines services as collections of network endpoints, or ports. The WSDL specification provides an XML format for documents for this purpose.
WSDL is often used in combination with SOAP and an XML Schema to provide web services over the Internet. A client program connecting to a web service can read the WSDL to determine what operations are available on the server. Any special datatypes used are embedded in the WSDL file in the form of XML Schema. The client can then use SOAP to actually call one of the operations listed in the WSDL.


Objects in a WSDL 1.1/WSDL 2.0
Service/Service: The service can be thought of as a container for a set of system functions that have been exposed to the web based protocols.


Port/Endpoint: The port does nothing more than defining the address or connection point to a web service. It is typically represented by a simple http url string.


Binding/Binding: Specifies the interface, defines the SOAP binding style (RPC/Document) and transport (SOAP Protocol). The binding section also defines the operations.


PortType/Interface: The element, which has been renamed to in WSDL 2.0, defines a web service, the operations that can be performed, and the messages that are used to perform the operation.


Operation/Operation: Each operation can be compared to a method or function call in a traditional programming language. Here the soap actions are defined and the way the message is encoded for example, "literal."


Message/N.A.: Typically, a message corresponds to an operation. The message contains the information needed to perform the operation. Each message consists of one or more logical parts. Each part is associated with a message-typing attribute. The message name attribute provides a unique name among all messages. The part name attribute provides a unique name among all the parts of the enclosing message. Parts are a description of the logical content of a message. In RPC binding, a binding may reference the name of a part in order to specify binding-specific information about the part. A part may represent a parameter in the message; the bindings define the actual meaning of the part. Messages had been removed in WSDL 2.0, where you simply and directly refer to XML schema types for defining bodies of inputs, outputs and faults.


Types/Types: The purpose of the types in WSDL is to describe the data. XML Schema is used (inline or referenced) for this purpose.

Different types of XML classes
  System.Xml.Formatting
         System.Xml.ReadState
         System.Xml.ValidationType
         System.Xml.WhitespaceHandling
         System.Xml.WriteState
         System.Xml.XmlNodeChangedAction
         System.Xml.XmlNodeOrder
         System.Xml.XmlNodeType
         System.Xml.XmlSpace
         System.Xml.XmlTokenizedType
   System.Xml.XmlConvert
   System.Xml.XmlImplementation
   System.Xml.XmlNamedNodeMap
      System.Xml.XmlAttributeCollection ---- System.Collections.ICollection
   System.Xml.XmlNamespaceManager ---- System.Collections.IEnumerable
   System.Xml.XmlNameTable
      System.Xml.NameTable
   System.Xml.XmlNode
      System.Xml.XmlAttribute
      System.Xml.XmlDocument
         System.Xml.XmlDataDocument
      System.Xml.XmlDocumentFragment
      System.Xml.XmlEntity
      System.Xml.XmlLinkedNode
         System.Xml.XmlCharacterData
            System.Xml.XmlCDataSection
            System.Xml.XmlComment
            System.Xml.XmlSignificantWhitespace
            System.Xml.XmlText
            System.Xml.XmlWhitespace
         System.Xml.XmlDeclaration
         System.Xml.XmlDocumentType
         System.Xml.XmlElement
         System.Xml.XmlEntityReference
         System.Xml.XmlProcessingInstruction
      System.Xml.XmlNotation
   System.Xml.XmlNodeChangedEventArgs
   System.Xml.XmlNodeList ---- System.Collections.IEnumerable
   System.Xml.XmlParserContext
   System.Xml.XmlQualifiedName
   System.Xml.XmlReader
      System.Xml.XmlNodeReader
      System.Xml.XmlTextReader ---- System.Xml.IXmlLineInfo
      System.Xml.XmlValidatingReader ---- System.Xml.IXmlLineInfo
   System.Xml.XmlResolver
      System.Xml.XmlSecureResolver
      System.Xml.XmlUrlResolver
   System.Xml.XmlWriter
      System.Xml.XmlTextWriter

What are the differences between SAX and DOM parser.
SAX
DOM
Both SAX and DOM are used to parse the XML document. Both has advantages and disadvantages and can be used in our programming depending on the situation.
Parses node by node
Stores the entire XML document into memory before processing
Doesn’t store the XML in memory
Occupies more memory
We cant insert or delete a node
We can insert or delete nodes
Top to bottom traversing
Traverse in any direction.
SAX is an event based parser
DOM is a tree model parser
SAX is a Simple API for XML
Document Object Model (DOM) API
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
 
doesn’t preserve comments
preserves comments
SAX generally runs a little faster than DOM
SAX generally runs a little faster than DOM
If we need to find a node and doesn’t need to insert or delete we can go with SAX itself otherwise DOM provided we have more memory.

XML Parsing

XML documents can be parsed efficiently and more critically because XML is a widely accepted language. It is extremely crucial to programming for the web that XML data be parsed efficiently, especially in cases a where the applications that are required to handle huge volumes of data. When parsing is improper it can increase memory usage and time for processing which directly affects the scalability by decreasing it.
There are many XML parsers that are available. Choosing a right one for your situation might be challenging. There are three XML parsing techniques which are extremely popular and are used for Java and it also guides you to choose the correct make right choice of method based on the application and its requirements.
An Extensive Markup Language parser takes a serialized string which is raw as input and performs a series of operations with it. First and foremost the XML data is checked for syntax errors and how well it formed is, and it also makes sure that the start tags will have end tags that match and that there are no elements which are overlapping with each other. Many parsers implement first validate the Document Type Definition (DTD) or even the XML Schema sometimes to verify if the structure along with the content are correctly specified by you. In the end the output after parsing is provided access to the XML document's content through the APIs programming modules.
The three XML parsing that are popularly used with techniques for Java is, Document Object Model (DOM), it is w3c provided mature standard, and Simple API for XML (SAX), it was one of the first to be widely adapted form of API for XML in Java and has become the standard, the third one is Streaming API for XML (StAX), which is a new model for parsing in XML but is very efficient and has a promising future. Each one of the mentioned techniques has their advantages and disadvantages.

Parsing with DOM

Data Object Model or the DOM technique that based on the tree structure parsing and it builds an entire parsing tree in the memory. It also lets the DOM have complete access to the entire XML document dynamically.
The data object model is a tree like structure. So the document is considered to be the root from which all the DOM trees take birth, and the root will have one child node at the least, and the root element, which usually catalogues elements keeps it in the sample code. Another node that is created is the Document Type, which is used for the Document Type Data declarations. The elements in the catalog usually have child nodes, and these Child nodes are used as elements.
The DOM program takes the XML filename, and then creates the DOM tree. It uses the function called getElementsByTagName() for finding all the Data Object Model element nodes that can be used as the title elements. After this it finally prints the information in the text that is associated with the title elements. It achieves this by inspecting the list of title elements and then it examines the first child separately. The first child element is usually located between the start and end tags of the element, and it also uses the function getFirstChild() method to achieve this.
The Data object model is a direct model and very straight forward in its functions. XML document can be accessed randomly at any time because the memory stores the entire tree. DOM APIs also modify the nodes like for example appending a child or restructuring and updating or removing or deleting a node. There is a lot of support for navigating the memory tree in the DOM; but simultaneously there are issues related to parsing that have to be considered. It is essential in this system that the entire document has to be parsed at one single shot and the same time, it cannot be parsed partially or in intervals. If the XML document is huge then building the entire tree in the memory will become an extensive and an expensive process. The Data object model tree can actually consume a lot of memory. Though the DOM is very interoperable and interoperability is the biggest positive point it can offer at the same time it is not very good with binding and this proves to be its draw back when it comes to object binding. There are a lot of applications which are well suited for DOM parsing. If the application needs to have immediate access to the XML document randomly then in such cases the DOM parsing is appropriate. For example an Extensive Style Language processor always has the need to navigate through an entire file and this becomes a repeated process while it is processing templates. Dom is dynamic when it comes to updating or modifying data so this feature is extremely convenient for applications, like the XML editors, which need to frequently modify data.

Parsing with SAX

SAX processing model is entirely based on stream of events and is an event-driven model for the processing of XML documents. Though it is not a standard declared by the W3C, it is still a very famous form of API that many SAX parsers use in without offending compliance or crating issues related to compliance. Unlike the DOM where it builds an entire tree to represent the data, the SAX parser streams a series of events while it reads the document. These events are forwarded to event handlers, which also provide access to the data of the document. There are three basic types of event handlers the DTD Handler which is used for accessing the data of XML DTD's. The error handlers which are used for creating a low-level access to the errors created while parsing. The last but not the least Content handler which is used for accessing the content in the document
The difference between the DOM and the SAX parser offers a great benefit in terms of performance. It provides a low-level access which is efficient at the same time to the XML documents contents. Whereas the SAX model while having the major advantage of consuming extremely low memory, mainly because the document in its entirety does not have the need to be loaded into the memory slot at one time, and this feature enables a SAX parser to be able to parse a document which is much larger than the system's own memory component. In addition to this, you don't have the need to create objects for each and every node, unlike the DOM environment. SAX "push" model finally can be used in a broad context, when it comes to multiple content handlers which can be registered and used to receive events in a parallel way, instead of receiving them one by one in a pipeline in a series.
One of the disadvantages of SAX can be that you will have to implement all the event handlers to handle each and every incoming event. The application code must be maintained in this state of events. The SAX parser is incapable of processing the events when it comes to the DOM's element supports, and you also have to keep track of the parsers position in the document hierarchy. The application logic gets tougher as the document gets complicated and bigger. It may not be required that the entire document be loaded but a SAX parser still requires to parse the whole document, similar to the DOM.
One of the biggest problems the SAX is facing today is that it lacks a built-in document support for navigation like the one which is provided by XPath. Along with the existing problem the one-pass parsing syndrome also limits the random access support. These kinds of limitations also start affecting the namespaces. These shortcomings make SAX a not so good choice when it comes to manipulating and even modifying a XML document.
Applications that can read the documents content in one single pass can derive huge benefits from SAX parsing. Many Business to Business Portals and applications use XML so that the data can be encapsulated in a format in which it can be received and retrieved using a simple process. This is the only scenario where the SAX might win hands down compared to DOM, purely due to the efficiency of SAX which results in high output. The modern SAX 2.0 also has a built-in filtering mechanism which makes very easy for the documents output to be subset. SAX parsing is also considered very useful when it comes to validating DTDs and the XML schemas. 

Parsing with STax
Stax is a brand new parsing technique which is very similar to SAX and also an improvisation to it. The STAX uses a model that is event-driven. The only difference between sax and STAAX here is that the sax uses a push model and the STAX uses a pull model for event processing. And also another notable feature is instead of using call back options the STAX parser returns events which are requested by the applications in use.

what is difference between inproc,state,sql servers which one is fast what are the situations to use

Session in ASP.net can be stored in 3 ways.
1.Inproc
2.State  Server
3. SqlServer
The InProc mode of Session State management is the fastest among all of the storage modes available and stores the Session data in the ASP.NET worker process. Performance will be effected if the amount of data stored in the session is large. Basically the session is stored in the memory space of an application domain and is volatile.So if asp.net worker process i.e. aspnet_wp.exe restarts then the session state will be lost.The Session State here entirely depends on the lifetime of the application domain that it runs on. Note that the Session_End event which is fired internally by the web server is supported only in InProc mode. Note that even if the Session State is set to read only using the EnableSessionState attribute, in the InProc mode one can still modify the session. The Session_OnEnd event is invoked by the runtime environment when we make a call to the Session.Abandon() method or when the user's session times out. Further, any change made in the settings in the web.config file unloads the application domain and the Session State too.
The StateServer mode uses a stand-alone Microsoft Windows service that is independent of IIS and can run on a separate server.In this case the session state is serialized and stored in memory in a separate process that is      managed by the  aspnet_state.exe file.This has got some performance drawbacks due to the overhead involved in serialization and de-serialization of objects.The main primary advantage of storing the Session State in a State Server is that it is not in the same process as the ASP.NET and a crash of  ASP.NET would in no way destroy the session data. Secondly, this mode of Session State storage   enables to share the information across a web garden or a web farm.
Rememeber that this mode is slow compared to the InProc mode as it is stored in an external process.
The SQLServer mode of Session State management is a reliable, secure and centralized storage of a session      state.In this the Session data is serialized and stored in a database table in the SQL Server database.It can    typically be used in the web farms. It has performance bottlenecks as in the State Server mode of Session State management due to the overhead involved in serialization and de-serialization of the objects that are stored            and  retrieved to and from the Session.
SQL Server is more secure than the InProc or the State server modes of Session State storages as the data can be secured easily by configuring the SQL Server security.

Describe the differences between XML and HTML.

It's amazing how many developers claim to be proficient programming with XML, yet do not understand the basic differences between XML and HTML. Anyone with a fundamental grasp of XML should be able describe some of the main differences outlined in the table below.

Differences Between XML and HTML
Table 1.
XML
HTML
User definable tags Defined set of tags designed for web display
Content driven Format driven
End tags required for well formed documents End tags not required
Quotes required around attributes values Quotes not required
Slash required in empty tags Slash not required

Describe the role that XSL can play when dynamically generating HTML pages from a relational database.

Even if candidates have never participated in a project involving this type of architecture, they should recognize it as one of the common uses of XML. Querying a database and then formatting the result set so that it can be validated as an XML document allows developers to translate the data into an HTML table using XSLT rules. Consequently, the format of the resulting HTML table can be modified without changing the database query or application code since the document rendering logic is isolated to the XSLT rules.

Give a few examples of types of applications that can benefit from using XML.

There are literally thousands of applications that can benefit from XML technologies. The point of this question is not to have the candidate rattle off a laundry list of projects that they have worked on, but, rather, to allow the candidate to explain the rationale for choosing XML by citing a few real world examples. For instance, one appropriate answer is that XML allows content management systems to store documents independently of their format, which thereby reduces data redundancy. Another answer relates to B2B exchanges or supply chain management systems. In these instances, XML provides a mechanism for multiple companies to exchange data according to an agreed upon set of rules. A third common response involves wireless applications that require WML to render data on hand held devices.

What is DOM and how does it relate to XML?

The Document Object Model (DOM) is an interface specification maintained by the W3C DOM Workgroup that defines an application independent mechanism to access, parse, or update XML data. In simple terms it is a hierarchical model that allows developers to manipulate XML documents easily Any developer that has worked extensively with XML should be able to discuss the concept and use of DOM objects freely. Additionally, it is not unreasonable to expect advanced candidates to thoroughly understand its internal workings and be able to explain how DOM differs from an event-based interface like SAX.

What is SOAP and how does it relate to XML?

The Simple Object Access Protocol (SOAP) uses XML to define a protocol for the exchange of information in distributed computing environments. SOAP consists of three components: an envelope, a set of encoding rules, and a convention for representing remote procedure calls. Unless experience with SOAP is a direct requirement for the open position, knowing the specifics of the protocol, or how it can be used in conjunction with HTTP, is not as important as identifying it as a natural application of XML.

Can you walk us through the steps necessary to parse XML documents?

Superficially, this is a fairly basic question. However, the point is not to determine whether candidates understand the concept of a parser but rather have them walk through the process of parsing XML documents step-by-step. Determining whether a non-validating or validating parser is needed, choosing the appropriate parser, and handling errors are all important aspects to this process that should be included in the candidate's response.

Give some examples of XML DTDs or schemas that you have worked with.

Although XML does not require data to be validated against a DTD, many of the benefits of using the technology are derived from being able to validate XML documents against business or technical architecture rules. Polling for the list of DTDs that developers have worked with provides insight to their general exposure to the technology. The ideal candidate will have knowledge of several of the commonly used DTDs such as FpML, DocBook, HRML, and RDF, as well as experience designing a custom DTD for a particular project where no standard existed.

Using XSLT, how would you extract a specific attribute from an element in an XML document?

Successful candidates should recognize this as one of the most basic applications of XSLT. If they are not able to construct a reply similar to the example below, they should at least be able to identify the components necessary for this operation: xsl:template to match the appropriate XML element, xsl:value-of to select the attribute value, and the optional xsl:apply-templates to continue processing the document.

Extract Attributes from XML Data
Example 1.

               Attribute Value:
                               
                               
 

When constructing an XML DTD, how do you create an external entity reference in an attribute value?

Every interview session should have at least one trick question. Although possible when using SGML, XML DTDs don't support defining external entity references in attribute values. It's more important for the candidate to respond to this question in a logical way than than the candidate know the somewhat obscure answer.

How would you build a search engine for large volumes of XML data?

The way candidates answer this question may provide insight into their view of XML data. For those who view XML primarily as a way to denote structure for text files, a common answer is to build a full-text search and handle the data similarly to the way Internet portals handle HTML pages. Others consider XML as a standard way of transferring structured data between disparate systems. These candidates often describe some scheme of importing XML into a relational or object database and relying on the database's engine for searching. Lastly, candidates that have worked with vendors specializing in this area often say that the best way the handle this situation is to use a third party software package optimized for XML data.
Obviously, some important areas of XML technologies were not included in this list -- namespaces, XPointer, XLink,

What is XML DTD ?
XML is a very handy format for storing and communicating your data between disparate systems in a platform-independent fashion. XML is more than just a format for computers -- a guiding principle in its creation was that it should be Human Readable and easy to create.
XML allows UNIX systems written in C to communicate with Web Services that, for example, run on the Microsoft .NET architecture and are written in ASP.NET. XML is however, only the meta-language that the systems understand -- and they both need to agree on the format that the XML data will be in. Typically, one of the partners in the process will offer a service to the other: one is in charge of the format of the data.
The definition serves two purposes: the first is to ensure that the data that makes it past the parsing stage is at least in the right structure. As such, it's a first level at which 'garbage' input can be rejected. Secondly, the definition documents the protocol in a standard, formal way, which makes it easier for developers to understand what's available.
DTD - The Document Type Definition
The first method used to provide this definition was the DTD, or Document Type Definition. This defines the elements that may be included in your document, what attributes these elements have, and the ordering and nesting of the elements.
The DTD is declared in a DOCTYPE declaration beneath the XML declaration contained within an XML document:
Inline Definition:


External Definition:


The actual body of the DTD itself contains definitions in terms of elements and their attributes. For example, the following short DTD defines a bookstore. It states that a bookstore has a name, and stocks books on at least one topic.
Each topic has a name and 0 or more books in stock. Each book has a title, author and ISBN number. The name of the topic, and the name of the bookstore are defined as being the same type of element: this store's PCDATA: just text data. The title and author of the book are stored as CDATA -- text data that won't be parsed for further characters by the XML parser. The ISBN number is stored as an attribute of the book:

 ]>

An example of a book store's inline definition might be:

 ]>

 Mike's Store
 
   XML
    Mike's Guide To DTD's and XML Schemas<
     Mike Jervis
   
 

Using an inline definition is handy when you only have a few documents and they're offline, as the definition is always in the file. However, if, for example, your DTD defines the XML protocol used to talk between two seperate systems, re-transmitting the DTD with each document adds an overhead to the communciations. Having an external DTD eliminates the need to re-send each time. We could remove the DTD from the document, and place it in a DTD file on a Web server that's accessible by the two systems:

 Mike's Store
 
   XML
   
     Mike's Guide To DTD's and XML Schemas<
     Mike Jervis
   
 

The file bookstore.dtd would contain the full defintion in a plain text file:
   

The lowest level of definition in a DTD is that something is either CDATA or PCDATA: Character Data, or Parsed Character Data. We can only define an element as text, and with this limitation, it is not possible, for example, to force an element to be numeric. Attributes can be forced to a range of defined values, but they can't be forced to be numeric.
So for example, if you stored your applications settings in an XML file, it could be manually edited so that the windows start coordinates were strings -- and you'd still need to validate this in your code, rather than have the parser do it for you.


No comments: