Developing Financial Applications using XBRL

 

XBRL (eXtensible Business Reporting Language) is a data format for company financial reporting which can be easily consumed by software applications. The ‘Language’ in the name is actually a misnomer, XBRL is simply a data format as opposed to a computer language capable of performing operations. XBRL is a variant of the XML format which ‘marks-up’ data in a tag based syntax which is human readable. For example, to report Revenues the below code could be used :

<us-gaap:Revenues> 1500000 </us-gaap:Revenues>

This is clearly a simple format and for this reason XML based formats have become popular across a wide range of applications, since Office 2007 both Microsoft Word and Microsoft Excel use XML based formats (.docx and .xlsx respectively).

However, despite the promise of an easily consumable format for financial reporting data, XBRL is deeply flawed and has several major issues.

Issues with XML

XML is very useful for representing basic data structures, however as the complexity of the data structure grows the XML file(s) becomes extremely large and unwieldy. A set of company financial reports is a large and heavily structured set of data and so the XBRL files to represent them is very large. For a single quarterly financial report there will typically be at least six interlinked documents which will in total be about five times the size of the PDF.

XML has never been a performant data format, even for simple data structures, however it becomes incredibly sluggish for heavily structured data. To read and parse and single XBRL document took a dual-processor server between 50 to 80 seconds. There are a multitude of optimisations which could be applied but none are going to be sufficient to enable scaling and compete with database access times which are measured in milliseconds.

Thus, for any serious financial analysis we will need to develop a database schema and first read the XBRL document into the database which will then be the source of all analysis queries.

The XBRL Spec

Many of the elements for the XBRL specification were directly ported from the regulatory reporting requirements. The reporting requirements are sufficient for human analysis of financial reports but perform very poorly when translated into a data format to be consumed by software applications.

When reading financial reports it is very obvious which figures relate to the current reporting period. However to determine this for an XBRL document is not nearly so easy. For any given line item such as Revenue there are numerous Revenue items which relate to various periods and business segments, only one of these elements relates to the firm’s current Revenue for the actual period and determining which is error-prone. For starters, XBRL does not define the start and end dates for the current period and some companies use non-standard dates (for example Apple ends its reporting periods not on the last day of the period but the last business day). Thus to extract the actual reporting elements related to the current period some relatively convoluted (and hence error-prone) code logic is required.

In financial reports a zero balance is often not required to be reported. This works fine when reading reports but for a data format it is not very robust. For example, there is no debt on Apple’s Balance Sheet so the Debt elements are simply not included in the XBRL document. If we read the data into an application and attempt to calculate a metric which uses debt as an input an error will be thrown since the Debt element is simply not found as opposed to being found and having a zero balance. Thus, a zero balance will need to be programatically substituted when no element is found but this is definitely not robust (for example, we would like to know if an element could not be read properly for the XBRL doc and so was excluded for this reason).

A major omission from the XBRL spec is non-financial data which is often crucial for an analyst. For example, Apple includes unit sales (eg number of iPhone sales) in its reports but not in it XBRL filings.

XBRL Taxonomies

The above issues pale into insignificance when compared with the minefield of XBRL taxonomies.

The XBRL taxonomy is basically a template for the tagging and structuring various items which comprise a financial report.

XBRL allows firms a lot of latitude in selecting and even creating taxonomies, as well as changing taxonomies between reporting periods. Taking Apple again as an example, up until 2010 it reported its Fixed Assets using the XBRL tag aaplPropertyPlantAndEquipmentAndCapitalizedSoftwareNet , from 2011 it changed to using us-gaapPropertyPlantAndEquipmentGross. There was no notification of the change or mapping information so software parsing the two documents would not be able to associate the items as being the same balance for different periods.
The differences between XBRL documents of different companies is even greater, making financial comparisons between companies almost impossible.

The above issues (and especially the taxonomy issue) means that the manual intervention is almost always necessary original promise of XBRL to allow fully automated data analysis still unfulfilled.

Protecting Against SQL Injection In .NET Applications

Despite being so well understood, SQL Injection remains one of the most common vulnerabilities in web applications.

What is SQL Injection

Any SQL which is dynamically created has the potential for having malicious SQL injected into it. For example, the below code receives a querystring and adds it to a SQL select string which will then be executed against the database.

//Querystring for example could be ?userid=1
var inputStr = Request.QueryString("userid"); 

var sqlQuery = "select createDate from users where userid = '" + inputStr  + "'";
//sqlQuery is 'select createDate from users where userid = '1' 

All the attacker has to do is append sql to the querystring for it to execute. Thus adding 1; delete from users;-- will cause the sql statement to be select createDate from users where userid = '1'; delete from users;-- and two statements will be executed by the database. The first statement returns the createDate for the ‘jude’ user, the malicious second statement deletes all records from the database. Note that no statements added to the sqlQuery will be executed since they are commented out using the — characters.

There are a multitude of variations on these basic scenarios, as well as executing operations on the database, attackers can retrieve sensitive data by displaying it on the site which does not require execute permissions on the database. Take for example a search box on a site which places the search term in the querystring which is then used to form the SQL query as below :

var searchTerm = Request.QueryString("searchterm");
var sqlQuery = "select * from articles where article.title like '%" + "%'";

The sql is then executed against the database and the search results are output to the webpage.
A malicious user can inject sql into the search term and return a listing of table names in the database using the below ‘search’ term:

' union select name from sysobjects where type = 'u' --

Once armed with the table name listing, the attacker can target a table to return the column names using the below term:

' union select * from information_schema.columns where table name = 'users' --

Then it is just a straightforward select statement to return the sensitive data:

' union select socialsecuritynumber from users --

There are a large number of sources of SQL Injection, as noted above querystrings and form values are common sources of injection, but cookies and Http Headers are also potential sources. Http header values in particular are commonly stored in the database, for example logging the referring url is common practice on many sites:

var referringUrl = ServerVariables("HTTP_REFERER")
var insertReferralSql = "insert into trafficreferrals   values ('" + referringUrl + "')"

The Http Post can easily be tampered with using a tool such as Fiddler and malicious code entered in the Referrer field in the header. Note in this particular instance, the .NET framework provides the Request.UrlReferrer property which will pass back a valid Uri and should be clear from injected code.

Problematic Fix – Manually Cleaning Strings

Before looking at robust defenses against SQL injection it is worth looking at ad-hoc protections which are often ineffective and should not be used.

Writing custom code to ‘clean’ the input string of dangerous characters is effective in theory but needs to be done with extreme rigour and many solutions can very easily be bypassed.

For example, the potentially hazardous ‘ and – – characters can be removed as below :

var inputStr = Request.QueryString("userid"); 
var cleanStr = inputStr.Replace("--", "") 
                       .Replace("'", "");

This may also be performed on SQL Server using the TSQL REPLACE function:

...
set @input = REPLACE(@input, '--', '')
select @input
set @input = REPLACE(@input, '--', '')
select @input
...

In this example the cleanStr may be consider safe of the ‘ and – – sequences. However this is not the case. Consider the below input string:

; delete from orders -'-

Note the -‘- sequence, in this case the ‘ character will be removed after the – – characters are tested for and the — character sequence will then be created leaving the below string:

; delete from orders -'-

The ‘ and – characters are not always necessary for attacks. Consider the first example we looked at

var inputStr = Request.QueryString("userid"); 
var sqlQuery = "select createDate from users where userid = " + inputStr 

In this case the below SQL could be injected without the ‘ and – characters :

19; delete from users;

If the database has SET QUOTED-IDENTIFIER OFF then attacker could simply replace the ‘ character with ” .

Defenses Against SQL Injection

Ensure Correct DataTypes Are Input
In many of the above scenarios an integer was expected and the attacker passed in an SQL string instead.
In the application we can test the input is in the correct format :

int orderId = int.Parse(Request.QueryString["OrderId"]);

//The below code can be problematic when users have different date formats
//Consider also DateTime.TryParse
DateTime orderDate = DateTime.Parse(Request.QueryString["orderDate"]);

This ensures that strings are not input when integers, date formats and other datatypes are expected. This step is sometimes seen as unnecessary overhead when parameterised queries are used but it is does provide additional protection.

Parameterised Queries
Parameterised queries are a primary defense against SQL injection. The below stored procedure uses a parameter on the where clause which will prevent malicious code being injected.

CREATE PROCEDURE GetUserJoinDate 
    @UserName nvarchar(50) 
AS 
    SELECT JoinDate
    FROM Users
    WHERE UserName = @UserName 
GO

Parameters are not always effective however, and building dynamic SQL using string concatenation can introduce vulnerabilities. The below stored procedure performs the same operation by using dynamic SQL.

CREATE PROCEDURE GetUserJoinDate(@UserName nvarchar(45))  

 AS
  DECLARE @sql nvarchar(255)
  SET @sql = 'SELECT JoinDate FROM Users WHERE UserName = ' + @UserName + '
  EXECUTE sp_executesql @sql
GO

In this case the below SQL passed in as the UserName parameter will execute malicious SQL on the database:

'; delete from users --

This is a very simplistic sample and is an unusual use of dynamic SQL, dynamic SQL is more commonly used in other scenarios where the developer may believe it is the only option. For example SQL is often passed in to build such as sorting where parameters

CREATE PROCEDURE GetUsers 
    @Sort nvarchar(50) 
AS
  DECLARE @sql nvarchar(255)
  SET @sql = 'SELECT UserName FROM Users   ' + @Sort 
  EXECUTE sp_executesql @sql
GO

This allows partial sql to be passed in for sorting:

exec @Sort = 'order by UserName ASC'

This stored procedure could be purged of dynamic SQL and written as below:

CREATE PROCEDURE GetUsers 
    @Sort int = 1 
AS
   SELECT UserName FROM Users
   ORDER BY CASE
     WHEN @Sort = 1 THEN ( Rank() OVER (ORDER BY UserName ASC) )
     WHEN @Sort = 2 THEN ( Rank() OVER (ORDER BY UserName DESC) )
     WHEN @Sort = 3 THEN ( Rank() OVER (ORDER BY CreateDate ASC) )
GO

There are numerous scenarios such as this where dynamic SQL can be removed. If dymanic SQL is absolutely necessary then string concatenation should be avoided and parameters placed within the SQL which will will ensure that the parameter is properly escaped:

CREATE PROCEDURE GetUserJoinDate 
  @UserName nvarchar(45)
AS
  DECLARE @sql nvarchar(255)
  SET @sql = N'SELECT JoinDate FROM Users WHERE UserName=@UserName'
  EXECUTE sp_executesql @sql
GO

These examples have focused on stored procedures on the database, but this applies equally to SQL created in the application :

//Vulnerable
SqlConnection conn = new SqlConnection(connectionString);
SqlCommand command = new SqlCommand("SELECT JoinDate FROM Users 
                                     WHERE UserName='" + userName + "'", conn); 

//Secure SqlConnection conn = new SqlConnection(connectionString); SqlCommand command = new SqlCommand("SELECT JoinDate FROM Users WHERE UserName=@UserName", conn); command.Parameters.Add(new SqlParameter("UserName", userName);

Permissions

The database is only vulnerable to SQL injection to the extent of the user’s permissions on the SQL Server database. It is thus essential to audit and limit the permissions on the database, a detailed discussed of permissions is beyond the scope of this article but a good starting point for auditing SQL Server permissions is Auditing SQL Server Permissions

IIS Global Filtering
As an additional defense for querystring injection, IIS can filter requests for dangerous keywords. This can be done globally using the applicationhost.config (located at %systemroot%system32inetsrvconfigapplicationhost.config ). Under the tag add your rules as below:

 <filteringRules> 
    <filteringRule name="SQLInjection" scanQueryString="true"> 
        <appliesTo> 
            <add fileExtension=".asp" /> 
            <add fileExtension=".aspx" /> 
        </appliesTo> 
        <denyStrings> 
            <add string="--" /> 
            <add string=";" /> 
            <add string="/*" /> 
            <add string="@" /> 
            <add string="char" /> 
            <add string="alter" /> 
            <add string="begin" /> 
            <add string="cast" /> 
            <add string="create" /> 
            <add string="cursor" /> 
            <add string="declare" /> 
            <add string="delete" /> 
            <add string="drop" /> 
            <add string="end" /> 
            <add string="exec" /> 
            <add string="fetch" /> 
            <add string="insert" /> 
            <add string="kill" /> 
            <add string="open" /> 
            <add string="select" /> 
            <add string="sys" /> 
            <add string="table" /> 
            <add string="update" /> 
        </denyStrings> 
    </filteringRule> 
</filteringRules>

/* Courtesy Wade Hilmo */

This will deny requests with these keywords in the querystring.

Leave ValidateRequest On
ValidateRequest is inbuilt in ASP.NET and throws an error whenever a page is submitted with potentially malicious content. However, ValidateRequest is very aggressive and out of frustration developers frequently resort to turning it off in across the entire application. Instead, it can be selectively disabled for controls, pages in ASP.NET and for controllers in ASP.NET MVC (using [ValidateInput(false)] ). This gives more granular control over where there are potential exposures to SQL injection.

Consider Using an ORM
Object Relation Mappers such as Entity Framework or Nhibernate take control of the SQL generation and execution and generally do an excellent job of protecting against SQL Injection.

Primer on Selecting Data Using Entity Framework

This question about selecting data using Entity Framework on StackOverflow got me thinking that a lot of LINQ to Entities code samples show very little consistency in how data is selected. Find(), Single(), SingleOrDefault(), First(), and FirstOrDefault() are often used interchangeably.

The first issue to address is using Where() in the data selection statement. Where() should only be used when several entities are required. There is no advantage in using Where() as below :

var db = new MyDBContext();
var employee = db.Employees.Where(x => x.ID == 1).Single();

This can be written as :

var employee = db.Employees.Single(x => x.ID == 1);

Note that there is a fundamental difference between using Where() in isolation and appending a standard LINQ operator such as Single() or ToList(). Where() and OrderBy() will return IQueryable and IOrderedQueryable objects respectively which contain core SQL Statements to be executed against the database but have not yet requested or received data from the database.
Appending Single(), First(), ToList() or similar will subsequently execute the query:

var emps = db.Where(x => x.Salary > 100000); // emps is an IQueryable() object, no database interaction yet
//.....
var highSalaryEmps = emps.ToList() //database query executed and an IEnumerable() returned as highSalaryEmps

This can have both performance and data integrity implications. In the above example, data is only read from the database when ToList() is called, intervening update/add operations will be reflected in the highSalaryEmps object which may not be the intended outcome.
In terms of performance, the primary issue to be aware of is using filtering operations after executing a large database query. For example :

var emps = db.Employees.ToList();
var highestSalaryEmp = emps.OrderBy(x => x.Salary).Single();

This will first return all employees from the database and then performance an order and selection. Far better to just query the database using these criteria :

var highestSalaryEmp = db.Employees.OrderBy(x => x.Salary).Single();

Find()

Find() is set apart from other element operations in LINQ-To-Entities in that it will first query the in-memory entities that EF is tracking and only hit the database in the event that none is found. Thus it can have superior performance and should always be used where possible – especially in lengthy operations when the entity is likely to already be in memory. Find() is unfortunately not very flexible and can only be used to look up an entity by its Primary Key :

var emp = db.Employees.Find(empId);

Note that you can use the Local property on a DbSet to access the in-memory data if you require more advanced logic :

var emp = db.Employees.Local.SingleOrDefault(x => x.Name = "Jude")
        ?? db.Employees.SingleOrDefault(x => x.Name = "Jude");

Note the use of the null coalescor  (??) above which tests if the first statement evaluates to null and if so then proceeds to execute the second statement. Thus EF will first test if there is a match in-memory and only if no match is found will then hit the database.

Single or First

A lot of EF code samples contain First() when it is clearly not optimal. First() is often used interchangeably with Single() although they are quite different. Single mandates that one and only one entity can be returned, if two matching records are found in the database an error will be thrown. First() is only concerned with returning the first record and performs no check to determine if there are multiple matching records.
Thus Single performs a data integrity test. In many scenarios there should only but one matching record in the database, and so Single will ensure this – although this will necessitate addition error trapping logic since Single will throw errors is more than one match is found. Under the covers, Single does this by using ‘SELECT TOP (2)’ for its queries and then EF inspects the returned records and throws an error if two records are returned. First by contrast simply queries using  ‘SELECT TOP (1)’.

As a side note, using .Take(1) instead of .First() has no difference in the SQL used to query the database, but Take() returns an IQueryable object and not an entity as with First() so the above discussed issues will be relevant.

OrDefault()

SingleOrDefault() or FirstOrDefault() should only be used when it is not known if the entity has already been persisted to the database. Thus SingleOrDefault() will query the database for the entity and return null if no entities are found (note that an error will still be thrown if more than one match is found). A typical pattern is querying and returning an entity if it exists or creating and adding it if there is no matching entity. In such a scenario the ?? null coalescor introduced above can be very useful.
Take a scenario where users can create Tags for Articles, if the Tag exists it shouldnt be returned as an entity but it needs to be created if it does not yet exist:

var tag = db.Tags.SingleOrDefault(x => x.TagName ="EntityFramework") ??
new Tag { TagName ="EntityFramework", CreateDate = DateTime.UtcNow };

article.Tag = tag;
db.SaveChanges(); //Note the new tag will not be persisted to the database until SaveChanges is called.

 

 

Implementing PRG (POST-Redirect-GET) In ASP.NET MVC

Site visitors are conditioned to believe that hitting the browser’s refresh button will simply refresh the page. In reality the browser re-issues the last Http Request. That’s fine if it was just a GET Request which typically only loads a page, but if the request was a POST Request which typically updates values on the server that could result in undesirable results such as a duplicate order.

Fortunately there is a very simple pattern to protect against a re-POST due to page refresh – the PRG (POST-Redirect-GET) pattern. This is very easily implemented in an MVC web framework such as ASP.NET MVC. The controller which handles the POST Request performs the necessary processing and then redirects to another controller using a GET Request which then renders the final view to be returned to the site user.

A simple implementation is shown below. Here the Order controller accepts the POST Request from the user’s browser. If the ItemOrder object has no validation errors some processing is performed and then the ASP.NET MVC RedirectToAction method is called which issues a GET Request and passes the order id to the OrderSuccess controller. The OrderSuccess controller passes the OrderSuccess view (ie page) to the user. Now, if the user refreshes the browser a GET Request for  the OrderSuccess page is issued and no duplicate POST is made.

[HttpPost]
public ActionResult Order(ItemOrder itemOrder) {
    if (ModelState.IsValid) {
      //do processing
         return RedirectToAction("OrderSuccess", new { id = itemOrder.Id })
    }
    return View(model);  //if there is an error in the model the page is returned
} 

public ViewResult OrderSuccess(int id) {

    return View(id); 
}

Preventing Parameter Tampering in ASP.NET MVC

Never trust data submitted from a web page is a core security concept for web development. When using an MVC framework, this statement takes on added relevance. MVC frameworks rely heavily on binding querystrings, route values and form values to in-code objects.

Take for example the scenario where an authenticated user needs to update their profile. A common pattern in MVC would be to first load the the user in a controller and pass the user object to a View which renders the html to be sent to the browser:

[Authorize]
ActionResult UserDetails(int userId)
{
   var user = respository.GetUser(userId); 
   return View(user);
}

Another controller will then receive the Http Post from the update page and process the update:

[HttpPost]
[Authorize]
ActionResult UserDetails(User user)
{
   respository.UpdateUser(user);
   return View(user);
}

This works seamlessly, and developers are often comforted by the [Authorize] attribute which mandates that the user must first have logged in to the service and been authenticated to access the update page.

The vulnerability here is that when the page is sent to the user the html will contain the userid (typically in a hidden field), this forms part of the Http Post sent back to the server and is used as the primary key to determine which record to update. Using a tool such as Fiddler it is trivial to change this userid to any value and hence update any user record (attackers will often change a userid to 1 which is usually an admin account).

Accepting Data From a Trusted Source

The first defense against parameter tampering to only accept data from a trusted source where possible. In the above scenario, this could mean only accepting the data from the ASP.NET Membership provider:

[HttpPost]
[Authorize]
ActionResult UserDetails(User user)
   {
	 //Test is submitted user.Name is the same as the current logged in user
     if(user.Name == User.Identity.Name)
		{
   			respository.UpdateUser(user);
   			return View(user); 
		}
		else
		{
			//error
		}
     }

Encrypting and Decrypting Key Values

A more generalised solution is to pass down an encrypted value to the webpage and then decrypt and test this against the key value on the Http Post. Take for example an orderId which is used to update and order details (such as the address the order is to be sent to).

First, in the controller we can create an encrypted value of the key orderId field using the MachineKey API:

ActionResult OrderDetails(int orderId)
{
   var order = respository.GetOrder(orderId); 

     var orderidBytes = Encoding.UTF8.GetBytes(orderId);
     var encryptedOrderId =  MachineKey.Encode(plaintextBytes, MachineKeyProtection.All);
     ViewBag.OrderIdEncrypt = encryptedOrderId;


   return View(order);
}

In the Razor view, simply add the ViewBag.OrderIdEncrypt in a hidden field within the form:

...
@Html.Hidden("OrderIdEncrypt", ViewBag.OrderIdEncrypt)
...

Now when the form is posted back there will be a OrderIdEncrypt value on the form. This can then be decrypted with the MachineKey API and compared to the submitted value of OrderId:

[HttpPost]
[Authorize]
ActionResult OrderDetails(Order order, string orderIdEncrypt)
{
   var decryptedOrderIdBytes = MachineKey.Decode(encryptedValue, MachineKeyProtection.All);
   var unEncryptedOrderId =  Encoding.UTF8.GetString(decryptedBytes);
    
   if(unEncryptedOrderId == order.Id)
    { 
       respository.UpdateOrder(order);
       return View(order);
    }
   else
    {
       //test failed - send error message
    }

}

This solution could of course be cleaned up a lot – an enhancement would be firstly to have an computed field on the Order model which returns an encrypted value of the orderId and a helper method IsOrderIdValid which would compare an encrypted orderId value to a plain text orderId. This would clean up the solution and minimise the amount of logic in the controller.

In addition to tampering with existing values in a form, MVC frameworks are also vulnerable to adding additional parameters which are then bound to objects. This is commonly referred to as a Mass Assignment vulnerability and will be dealt with in my next article article.

Security – Simulating And Protecting Against A DoS Attack

On a recent project, I was created a web service which parsed a set of financial statements into name/value pairs from an XBRL document. The complexity of the XBRL   specification means that parsing an XBRL document takes approximately 90 seconds on mid-spec server. To avoid users having to wait for the 90 seconds for the data, the parsed data was saved to a database. However, since the service covers over 5000 companies it would need to store approximately 100,000 sets of financial statements which places a strain on database resources as each set of statements contains several thousand data points. The obvious solution is to only store the most commonly requested financial statements in the database and load the others on-demand with a message to users that the financial statements were being loaded. However this inoccuous design decision creates a major security vulnerability.

Slow Loading Pages = Security Vulnerability
One of the hardest attacks to defend against is a Denial of Service (DoS) attack which is simply a repeated set of requests for a server resource such as a web page. The repeated requests generate an excess of load on the server which is then rendered incapable of serving other legitimate site users and hence the web site is taken down.
The difficulty in launching a DoS attack is that most web pages involve very little CPU processing and hence the flood of requests necessary to bring the site down would be extremely large and hard to generate. Attackers therefore will target slow loading pages which may be consuming a lot of server resources.

Simulating a DoS Attack
To test a web application’s vulernability to DoS attacks we can use a tool to issue multiple simultaneous requests. For this demo I will use  LOIC (Low Orbit Ion Cannon) with the source code available at https://github.com/NewEraCracker/LOIC.
LOIC comes with a simple Windows application front-end, although you could just use the source code in the XXPFlooder.cs file to programatically launch the attack (note the simplicity of the code for running DoS attacks – the XXPFlooder.cs file is a mere 89 lines).
Using LOIC Windows app I configured it to make Http requests to the /loadaccounts?id=1002 page on my development server:

This quickly overwhelmed the development server, bringing the CPU utilization close to 100%:

Solution
DoS attacks are one of the most challenging security issues as there are numerous different ways of launching a DoS attack. Typically the defense will be at the network or server level, but in the case of a resource intensive page there are several potential application level optimisations. Firstly, the resource intensive page can be configured as a separate application with a dedicated application pool. A attack will therefore be isolated to the specific page (or area in the website) and leave the rest of the site unaffected.
Next, the page itself can be protected by using a short term in-memory cache. After executing the the resource intensive operation the resulting object can be placed in memory for a short period and subsequent requests will be served from the in-memory cache. A basic outline of the code is below:

FinancialStatements financialStatements;

//First test if there is an an object in the cache
if(Cache["FinancialStatements" + id] == null)
{
//If there is no object in the cache, create it an then load it into the cache
financialStatements = LoadFinancialStatements(id); //This is the resouce intensive operation
CacheItemPolicy policy = new CacheItemPolicy();
policy.AbsoluteExpiration = DateTime.Now + TimeSpan.FromMinutes(1);

Cache.Add(new CacheItem("FinancialStatements" + id, financialStatements), policy);
}
else
{
//If the object is already in the memory, simply load it from the
//cache which is a quick low impact operation.
financialStatements = Cache["FinancialStatements" + id];
}

Thus the first request will run resource intensive operation, but subsequent requests made within 60 seconds will simply be loaded from the memory.

Now running the LOIC attack again results in the below CPU resouce utilisation:

Note that CPU now only reaches a manageable 31% and only one of the cores is under significant load.

WebSite Performance Optimisation – Core Concepts

When it comes to performance tuning a site, there are a multitude of possible optimisations so I thought it best to distill these down to several core concepts.

Central to most of these concepts is an oveview of how a web page is loaded in a user’s browser. The below ‘waterfall’ diagram shows the loading process for the page. The bars represent the total time to load each page asset, the first request is for the page and then subsequent separate requests are made for each asset referenced page (such as image file, css etc, javascript file etc).

Note that in this article I will focus solely on front-end optimisations since for most web-pages  server processing typically only accounts for between 10% – 15% of the total page load.

Distribute Requests
Older browsers were only permitted to open a maximum of two simultaneous connections to a domain. Thus if the page contained references to numerous assets (such as images, css and javascript files) these would be queued and loaded two at a time. Thus a quick optimization was to distribute the assets on subdomains (eg images could be on img.mydomain.com) which would allow for more files to be loaded simultaneously.
Modern browsers allow for more files to be loaded concurrently (Chrome for example allows for six), but this is still a very powerful optimization although you should now use a Content Delivery Network (CDN) to host static files. A CDN is a globally disbtributed network of servers which caches static files and serves these to a user from the closest physical server (or ‘edge location’). Using a CDN has the benefit of reducing the network latency in loading static files, since the site visitor will be served the static files (such a page images) from a server close to them.

One caveat in using a CDN is it can be problematic when working with CSS and Javascript files which may need to be updated. A CDN caches files at its various edge locations around the world and even after a file is updated, the old cached file may still be served until the TTL (Time to Live) expires which can often be days or even weeks. Thus site visitors may be served out of date CSS and Javascript files after a site update. One remedy to this issue is to version the files (eg adopting a naming convention such as ‘myjsfile1_001.js’) and so create a fresh file name for each update for which no cache exists.

Reduce The Number of Requests
In the belowexcerpt of the waterfall diagram of the page load, the yellow portion of each bar is the time taken to to download the file, the blue portion at the start is the time to open the connection to the server. Note that for the bottom three files the largest portion of the load time is opening the connection.

The time to open and close each connection is unrelated to the file size and so the time to open a connection to download a large file is the same as for a small file. Thus it would save a lot of load time if numerous small files where combined into large files. The most obvious candidates for this are CSS and Javascript files. When working with templates and frameworks dozens of these files are often requested, each of which requires separate load-time overhead.
Combining these files is often problematic in development since working with a single huge javascript would be very inefficient, however there are numerous solutions for combining these files upon deployment. ASP.NET MVC 4 ships with inbuilt bundling or other open source solutions such as SquishIt   could also be used.

 

Reduce The Size of Files
Reducing the size of the files served to the user is an obvious and necessary step in performance optimisation. The first step is to look at the html itself – ensure there are no inline styles in the html, these are not only inefficient for development purposes but they impair performance since styles in external stylesheets are cached by the user’s browser and so do not need to be loaded on each page request.

Images files should be in an appropriate format. In general jpg files are larger and should only be used for pictures or graphics which make heavy use of gradients. A quality setting of above 80 is almost always overkill (although this may change with retina displays), typically a setting of jpg quality setting of 60-70 is the sweet spot for the quality/size trade-off. Simpler graphics such as logos or screenshots should be either gif or png formats, png is certainly the preferred format now for images of any complexity since it offers very good image quality (the screenshots in this article are in png format). The simplest page elements such as arrows, pointers, lines etc should normally be gif since this format is capable of the smallest image file sizes (note that these elements can be combined into a single larger image using CSS sprites  ).

Static files such as CSS and Javascript files benefit from minification (which is typically removing whitespace and replacing long variable names with shorter ones). There are several open-source minifiers such as the YUI compressor.

 

Perceived Performance

Web pages load sequentially so placing large Javascript files at the top of the page can block the html below it from rendering until the file is fully loaded. Placing  Javascript file at the bottom of the page does not affect the total page load time but allows the Html content to be shown to the user and then the Javascript file to be downloaded. Note that this technique should not be used for CSS files which are generally integral to the page being displayed.

Testing ASP.NET MVC Server Side Validation

I came across an online random number generator recently which provided an input for the number of random numbers to be generated. Since I was looking for large samples I entered 1000000 to see what the response would be. Since such a request would likely cause too much server load, the site developer had protected against this and the site popped up the message:

 

My suspicions were aroused as to how completely this had been implemented. I fired up Fiddler, and modified the Http Request to set the random number to 8500. The site then happily returned the response with 8500 random numbers. Clearly the developers had only implemented this as a Javascript test on the browser and the server itself would accept and process any request. An attack could easily be mounted by sending off several simultaneous requests for a massive number of rows which would likely overload the server and bring the site down. In reality, modifying the Http post request is not even required as disabling Javascript on the browser would circumvent the number test.

 

Framework Protection

Frameworks such as ASP.NET MVC and Ruby on Rails provide client side and server side testing out-of-the-box. In ASP.NET MVC for example, adding a the [Range] data annotation in the model will result in both server side and client side tests being implemented:

public class RandonNumberViewModel    
{        
[Range(1, 2500)] //Only numbers between 1 and 2500 will be accepted 
public int RandomNumbers{ get; set; } 
}

Now when the developer uses this model to generated a strongly typed page it will automatically implement a Javascript test, in addition the model returned by the Http Post will be marked as invalid on the server (for full details on how to implement this see ASP.NET MVC Model Validation ).

When model validation is correctly implemented it provides excellent protection from Http Post modification attacks. However, small code mistakes such as omitting  a check of ModelState.IsValid prior to process a request will leave the application open to attack. Thus it is essential that developers do not blindly trust the framework to fully protect a site from attack, and test the site’s security by simulating an attack prior to going into production.

 

Server Side Validation Testing With Http Request Modification

The first step is to install an Http debugging proxy, the most popular of these on Windows is Fiddler which is free and well supported although the interface can be a little difficult to navigate,  a good alternative is  Charles Proxy which is a paid product but easier to use. I will use Fiddler in this demo.

The default setup in Fiddler shows a list of Web Sessions in the left pane (see below). Typically each ‘web session’ is an Http  Request and corresponding Http Response. Selecting a Web Session, and then clicking the Inspectors tab as below will load the Request in the top right pane and the Response in the bottom right pane.

 

Fiddler provides a wealth of information on the Http traffic but for this demo we are interested in modifying and sending a Post request. To do this first send a post request from your browser by submitting a form, this will then be captured in the Web Sessions pane of Fiddler. Then select the Composer tab on the right of the screen. For this demo I will use the default ASP.NET MVC site that is included in new Visual Studio ASP.NET MVC Internet projects. I will attempt to circumvent the restriction for passwords to be a minimum of six characters.

First I will attempt to register an account with a short password, this yields the below result when the register form is submitted from the website.

This is the client side Javascript preventing me from entering a short password. Now that I know the Javascript prevents an incorrect form submission I will send off a correct form submission with a valid password. This succeeds and I now have a post submission in my Web Sessions listing:

Now, select the Composer tab and then drag the Web Session to the right hand pane (this is just a short cut for  copying the Http Post so we can edit it). Now in the Request Body pane simply modify the Password and Confirm password values to invalid values (ie ‘judo’) as below and click Execute.

This will manually send an Http Post request and there will be a Web Session for the request and response. Select the session and the lower pane will show the response. The WebView tab will show a crude rendering of the Html in the response :

As can be seen the server tested the password and found that it was of insufficient length and did not register an account, and so both the client and server side validation are working correctly.

MVC Security – Model Binding Vulnerability

In my article on Parameter Tampering in ASP.NET MVC I focused on the general techniques of parameter tampering and defenses against it. In this article I will examine a specific type of parameter tampering which is often termed Mass Assignment.

In most MVC web development frameworks (including ASP.NET MVC). Model binding works by assigning an incoming html Form’s values to an object’s properties. This makes for very efficient development but one that contains a critical security flaw.

Take the below Employee model:

public class Employee
{

public int ID {get; set;}
public string FullName {get; set;}
public DateTime DateOfBirth {get; set;}
public bool IsAdmin {get; set;}

}

Note for now that the IsAdmin property is for determining if the Employee has admin rights on the application.

Assuming an employee creates a record themselves in the web app the relevant part of the html form will be as below (note that the id field is automatically set when the record is created in the database and so is not present and the IsAdmin field is not exposed).

<input id="fullname" name="fullname" type="text"  >
<input id="dateofbirth" name="dateofbirth" type="text"  >

When the form is submitted it will produce the below Http Post Request:

Request URL:http://yoursite/register
Request Method:POST
Status Code:200 OK
...
...
fullname:Jude OKelly
dateofbirth:22-04-1972

In an MVC framework like ASP.NET MVC an incoming Http Post/Get request is handled by a Controller’s method. In the scenario of a form being posted to the server, the MVC framework can automatically create an object and populate its properties with the form values:

[HttpPost]
public ActionResult Register(Employee emp)
{

//the emp object is automatically created and assigned the values from the html form:
db.Employees.Add(emp);
db.SaveChanges();

}

The above example uses an ORM such as Entity Framework to save the emp object to the database. So the database will contain the posted values for FullName and DateofBirth. The ID property is automatically assigned and since the IsAdmin value is not provided it is given a default value of False. Thus, in this case the newly created user will not have admin rights on the system.

However, using a tool such as Fiddler it is relatively simple to alter the html Post to include ‘IsAdmin:True’:

Request URL:http://yoursite/register
Request Method:POST
Status Code:200 OK
...
...
fullname:Jude OKelly
dateofbirth:22-04-1972
isadmin:true

Once the ASP.NET MVC model binder receives this http post request it will attempt to match as many properties of the incoming form with the Employee object, since the IsAdmin property is now provided it will be set and saved to the database. Thus the new user will now have full admin rights to the system.

Another way to inject ‘IsAdmin=true’ would be to simply add it to the querystring when posting back the page. ASP.NET MVC (as well as other MVC frameworks) will look for model values in the html form values, the querystring and route values, and cookies.

Solutions

There are several solutions to this vulnerability:

Matching Incoming Parameters

Instead of using a full object in the method’s constructor, just declare the parameters that are expected and manually bind them to the Employee object:

[HttpPost]
public ActionResult Register(string fullName, DateTime dateOfBirth)
{

var emp = new Employee{FullName = fullName, DateOfBirth=dateOfBirth};
db.Employees.Add(emp);
db.SaveChanges();

}

In this case that any additional data such as IsAdmin=True in the form post will be ignored and not make it into the database.

Use A ViewModel

A very common practise in MVC web development is to create custom models for each page (or ‘View’ in MVC parlance). These models can be a composite of different domain models or just a subset of properties of a single domain model.
In this case we would simply declare a ViewModel as below:

public class RegisterEmployeeViewModel
{

public string FullName {get; set;}
public DateTime DateOfBirth {get; set;} 
}

Note this ViewModel only contains the two properties of the Employee object we wish to receive from the html form.

Now in the Controller we will receive this model and use the MVC model binder to automatically create the RegisterEmployeeViewModel object and populate its properties with the values of the html form:


[HttpPost]
public ActionResult Register(RegisterEmployeeViewModel empViewModel)
{

var emp = new Employee{FullName = empViewModel.FullName, 
                       DateOfBirth= empViewModel.DateOfBirth};
db.Employees.Add(emp);
db.SaveChanges();

}

In this case we created a new Employee object and populated is values with the ViewModel object’s properties, in practice various mapping solutions can do this automatically to simplify the process.

Whitelist / Blacklist Parameters To Bind

ASP.NET MVC provides the Bind attribute which can be used to either specifically include or exclude parameters for model binding.

Whitelist parameters to be included in model binding (note only the UserName will be used in binding) :

 

[HttpPost]
[Authorize]
ActionResult UserDetails([Bind(Include = "UserName")] User user)
{ 
	//Update Logic
}

Blacklist parameters to be excluded in model binding (all parameters except IsAdmin will be used in binding) :

 

[HttpPost]
[Authorize]
ActionResult UserDetails([Bind(Exclude = "IsAdmin")] User user)
{ 
	//Update Logic
}

This approach has the weakness that it requires the Bind attribute to be used on every controller action where a mass assignment model binding vulnerability may exist, which is a reliable approach – a new controller action could easily be added in an update which omits the crucial bind attribute. In general this defense should be thought of as an extra layer of security to be used in conjunction with other approaches.

Unfortunately, so many of the early demos of ASP.NET MVC featured this simplified model binding scenario that the security flaw is very common in production sites (or at least that is my experience).

Five Reasons Not To Be On The Cloud

Hosting applications on the ‘Cloud’ was often viewed as a no-brainer, however after a mini counter revolution of companies moving applications off the cloud I thought I would compile a listing of the primary reasons to be cautious before moving your application to the cloud.

Performance

All  major cloud  providers use generic hardware – that is low-spec IDE drives, slow memory and old processors. As well as the low-spec servers you will be sharing system resources with other customers and virtualisation canot fully isolate users, hard drives in particular cannot be dedicated to a single user.

The result of all this is a sluggish performance. Performance issues can of course be addressed by applying several techniques such as utilising Content Delivery Networks, employing a variety of caching techniques and aggressive front-end optimisation, but the issues remains that out-of-the-box a cloud setup will almost certainly be less responsive and performant than a dedicated setup.

Another persistant issue is variable performance. Cloud customers repeatedly report performance ‘black holes’ when system response times sudden dip for several hours for no apparent reason. The cause of this is likely a ‘noisy neighbour’ (ie another cloud user hogging resources before being shutdown) or system maintenance/updates, but it remains a constant frustration as it is impossible to predict when performance will suddenly drop.

Cloud providers  usually quote the CPU in standard units which are not comparable to the cores on CPUs of modern servers.  Amazon Web Services, for example, quotes a ‘Compute Unit’ as equivalent to an “early-2006 1.7 GHz Xeon processor”. Thus even a Large Instance with 4 compute units will be significantly inferior to a 2012 quad-core processor. Adding more cloud instances (ie servers) or increasing the size of the instance   may seem a solution to this, however this will only be a solution if the application can take advantage of running work in multiple threads. Web sites are usually good candidates for splitting work out across multiple threads (as each request for a page can handled by a different  thread). However, more processor intensive work such as statistical analysis require significant development effort to efficiently run on multiple threads.

 

Cost

The issue isn’t the high cost of cloud hosting, it is that it is so difficult to estimate. For starters it is difficult to purchase a server with specs resembling the ‘instances’ offered by cloud providers. As noted above, cloud providers spec their offerings with CPU ‘units’ corresponding to a five or six year-old processor. How to compare these CPU units with what will be offered on a dedicated server today? At a guess I’d say 8 compute units would  be the equivalent a quad-core Intel ‘Ivy Bridge’ Xeon – but it is impossible to know without benchmarking the actual application you plan to run.

A dedicated host server bundled with most of the features needed to get an application live – storage, a bandwidth allocation, IP addresses etc. Cloud providers have an a-la-carte menu of a myriad of options and charges so it is difficult  to estimate the final monthly cost. There are several third-party services such as the Cloud Comparison Tool for estimating cloud costs.

An important cost which is often overlooked is support. For a regular provider of dedicated physical servers support is included in the price (although premium support is additional) , however most cloud providers offer no support and charge for even the basic email support.

A major benefit of the cloud is that testing and development servers can be spun up and down on-demand and so minimise the cost of having servers dedicated to these functions which only uses the server resources for a small portion of the month. In practice, cloud servers often get spun up and then forgotten about, silently running up the cloud bill. Cloud providers offer only the minimal tooling to track and prevent this and so purchasing third party monitoring tools  is often essential.

Scaling

The cloud is often touted as a packaged solution to scaling out applications, in reality most cloud solutions currently provide little in the way of in-built scaling beyond an easy method to spin up servers on-demand. All the tricky scaling issues such as maintaining user state remain as these are part of the application architecture and have little to do with the infrastructure the cloud provides.

On the contrary the cloud can often force developers to handle scaling issues before they would otherwise need to. Both Amazon and Microsoft require that a minimum of two instances to be running before their SLA (Service Level Agreement) becomes valid. In addition virtual severs on clouds are brittle and will regularly be destroyed and automatically rebuilt. The result of this is that an application needs to be designed for running across multiple servers from the moment it lands on the cloud. It could be a benefit to force developers to consider multi-server scaling early but it is worth noting that a decent specified server can cope with an impressive load – a content focused site such as a news/magazine site should be able to comfortably serve 10 million page views a month from a single physical server.

 

Support

Your site is down and it looks like a server issue, so who to call? For a lot of cloud providers there is no-one and you are left to the online forums to troubleshoot issues. Amazon (AWS) acquired a notorious reputation for referring all but the largest customers (who payed a princely $15k per month for support) to their online forums for assistance. The only worthwhile support from AWS starts at 10% of the monthly bill. A notable exception to this is RackSpace who provide 24/7 online (IM/Chat) support at no charge.

Support is also trickier with cloud services. A physical server is isolated from other users and identifying issues is comparatively straightforward. If an application is running on the cloud it can be difficult to distinguish between an issue caused by the underlying network or the application itself.

 

One Size Fits All

Different types of applications have very different hardware requirements. A Content management System which powers a news website would not require a lot CPU resources but would benefit from a large amount of memory (since a massive performance gain can be had from keeping the content in memory as opposed to on disk). By contrast a statistical analysis app would be a heavy user of processing power.

Specifying servers for different user cases is possible when physical servers are purchased/rented as each server can be custom configured with suitable processors and memory. Cloud providers only provide a small set of generic configurations designed, AWS is the only major provider to offer instances targeting both processor and memory intensive applications but it is a limited range with no opportunity for customization.

Cloud  providers built their networks using only generic commodity hardware and so do not offer access to cutting edge hardware. A current issue is the access to SSD drives which have been available on physical severs for several years but is still not available for standard cloud setups. This performance benchmark shows the enormous benefit of hosting a database on an SSD.