Developing Financial Applications using XBRL

 

XBRL (eXtensible Business Reporting Language) is a data format for company financial reporting which can be easily consumed by software applications. The ‘Language’ in the name is actually a misnomer, XBRL is simply a data format as opposed to a computer language capable of performing operations. XBRL is a variant of the XML format which ‘marks-up’ data in a tag based syntax which is human readable. For example, to report Revenues the below code could be used :

<us-gaap:Revenues> 1500000 </us-gaap:Revenues>

This is clearly a simple format and for this reason XML based formats have become popular across a wide range of applications, since Office 2007 both Microsoft Word and Microsoft Excel use XML based formats (.docx and .xlsx respectively).

However, despite the promise of an easily consumable format for financial reporting data, XBRL is deeply flawed and has several major issues.

Issues with XML

XML is very useful for representing basic data structures, however as the complexity of the data structure grows the XML file(s) becomes extremely large and unwieldy. A set of company financial reports is a large and heavily structured set of data and so the XBRL files to represent them is very large. For a single quarterly financial report there will typically be at least six interlinked documents which will in total be about five times the size of the PDF.

XML has never been a performant data format, even for simple data structures, however it becomes incredibly sluggish for heavily structured data. To read and parse and single XBRL document took a dual-processor server between 50 to 80 seconds. There are a multitude of optimisations which could be applied but none are going to be sufficient to enable scaling and compete with database access times which are measured in milliseconds.

Thus, for any serious financial analysis we will need to develop a database schema and first read the XBRL document into the database which will then be the source of all analysis queries.

The XBRL Spec

Many of the elements for the XBRL specification were directly ported from the regulatory reporting requirements. The reporting requirements are sufficient for human analysis of financial reports but perform very poorly when translated into a data format to be consumed by software applications.

When reading financial reports it is very obvious which figures relate to the current reporting period. However to determine this for an XBRL document is not nearly so easy. For any given line item such as Revenue there are numerous Revenue items which relate to various periods and business segments, only one of these elements relates to the firm’s current Revenue for the actual period and determining which is error-prone. For starters, XBRL does not define the start and end dates for the current period and some companies use non-standard dates (for example Apple ends its reporting periods not on the last day of the period but the last business day). Thus to extract the actual reporting elements related to the current period some relatively convoluted (and hence error-prone) code logic is required.

In financial reports a zero balance is often not required to be reported. This works fine when reading reports but for a data format it is not very robust. For example, there is no debt on Apple’s Balance Sheet so the Debt elements are simply not included in the XBRL document. If we read the data into an application and attempt to calculate a metric which uses debt as an input an error will be thrown since the Debt element is simply not found as opposed to being found and having a zero balance. Thus, a zero balance will need to be programatically substituted when no element is found but this is definitely not robust (for example, we would like to know if an element could not be read properly for the XBRL doc and so was excluded for this reason).

A major omission from the XBRL spec is non-financial data which is often crucial for an analyst. For example, Apple includes unit sales (eg number of iPhone sales) in its reports but not in it XBRL filings.

XBRL Taxonomies

The above issues pale into insignificance when compared with the minefield of XBRL taxonomies.

The XBRL taxonomy is basically a template for the tagging and structuring various items which comprise a financial report.

XBRL allows firms a lot of latitude in selecting and even creating taxonomies, as well as changing taxonomies between reporting periods. Taking Apple again as an example, up until 2010 it reported its Fixed Assets using the XBRL tag aaplPropertyPlantAndEquipmentAndCapitalizedSoftwareNet , from 2011 it changed to using us-gaapPropertyPlantAndEquipmentGross. There was no notification of the change or mapping information so software parsing the two documents would not be able to associate the items as being the same balance for different periods.
The differences between XBRL documents of different companies is even greater, making financial comparisons between companies almost impossible.

The above issues (and especially the taxonomy issue) means that the manual intervention is almost always necessary original promise of XBRL to allow fully automated data analysis still unfulfilled.

Protecting Against SQL Injection In .NET Applications

Despite being so well understood, SQL Injection remains one of the most common vulnerabilities in web applications.

What is SQL Injection

Any SQL which is dynamically created has the potential for having malicious SQL injected into it. For example, the below code receives a querystring and adds it to a SQL select string which will then be executed against the database.

//Querystring for example could be ?userid=1
var inputStr = Request.QueryString("userid"); 

var sqlQuery = "select createDate from users where userid = '" + inputStr  + "'";
//sqlQuery is 'select createDate from users where userid = '1' 

All the attacker has to do is append sql to the querystring for it to execute. Thus adding 1; delete from users;-- will cause the sql statement to be select createDate from users where userid = '1'; delete from users;-- and two statements will be executed by the database. The first statement returns the createDate for the ‘jude’ user, the malicious second statement deletes all records from the database. Note that no statements added to the sqlQuery will be executed since they are commented out using the — characters.

There are a multitude of variations on these basic scenarios, as well as executing operations on the database, attackers can retrieve sensitive data by displaying it on the site which does not require execute permissions on the database. Take for example a search box on a site which places the search term in the querystring which is then used to form the SQL query as below :

var searchTerm = Request.QueryString("searchterm");
var sqlQuery = "select * from articles where article.title like '%" + "%'";

The sql is then executed against the database and the search results are output to the webpage.
A malicious user can inject sql into the search term and return a listing of table names in the database using the below ‘search’ term:

' union select name from sysobjects where type = 'u' --

Once armed with the table name listing, the attacker can target a table to return the column names using the below term:

' union select * from information_schema.columns where table name = 'users' --

Then it is just a straightforward select statement to return the sensitive data:

' union select socialsecuritynumber from users --

There are a large number of sources of SQL Injection, as noted above querystrings and form values are common sources of injection, but cookies and Http Headers are also potential sources. Http header values in particular are commonly stored in the database, for example logging the referring url is common practice on many sites:

var referringUrl = ServerVariables("HTTP_REFERER")
var insertReferralSql = "insert into trafficreferrals   values ('" + referringUrl + "')"

The Http Post can easily be tampered with using a tool such as Fiddler and malicious code entered in the Referrer field in the header. Note in this particular instance, the .NET framework provides the Request.UrlReferrer property which will pass back a valid Uri and should be clear from injected code.

Problematic Fix – Manually Cleaning Strings

Before looking at robust defenses against SQL injection it is worth looking at ad-hoc protections which are often ineffective and should not be used.

Writing custom code to ‘clean’ the input string of dangerous characters is effective in theory but needs to be done with extreme rigour and many solutions can very easily be bypassed.

For example, the potentially hazardous ‘ and – – characters can be removed as below :

var inputStr = Request.QueryString("userid"); 
var cleanStr = inputStr.Replace("--", "") 
                       .Replace("'", "");

This may also be performed on SQL Server using the TSQL REPLACE function:

...
set @input = REPLACE(@input, '--', '')
select @input
set @input = REPLACE(@input, '--', '')
select @input
...

In this example the cleanStr may be consider safe of the ‘ and – – sequences. However this is not the case. Consider the below input string:

; delete from orders -'-

Note the -‘- sequence, in this case the ‘ character will be removed after the – – characters are tested for and the — character sequence will then be created leaving the below string:

; delete from orders -'-

The ‘ and – characters are not always necessary for attacks. Consider the first example we looked at

var inputStr = Request.QueryString("userid"); 
var sqlQuery = "select createDate from users where userid = " + inputStr 

In this case the below SQL could be injected without the ‘ and – characters :

19; delete from users;

If the database has SET QUOTED-IDENTIFIER OFF then attacker could simply replace the ‘ character with ” .

Defenses Against SQL Injection

Ensure Correct DataTypes Are Input
In many of the above scenarios an integer was expected and the attacker passed in an SQL string instead.
In the application we can test the input is in the correct format :

int orderId = int.Parse(Request.QueryString["OrderId"]);

//The below code can be problematic when users have different date formats
//Consider also DateTime.TryParse
DateTime orderDate = DateTime.Parse(Request.QueryString["orderDate"]);

This ensures that strings are not input when integers, date formats and other datatypes are expected. This step is sometimes seen as unnecessary overhead when parameterised queries are used but it is does provide additional protection.

Parameterised Queries
Parameterised queries are a primary defense against SQL injection. The below stored procedure uses a parameter on the where clause which will prevent malicious code being injected.

CREATE PROCEDURE GetUserJoinDate 
    @UserName nvarchar(50) 
AS 
    SELECT JoinDate
    FROM Users
    WHERE UserName = @UserName 
GO

Parameters are not always effective however, and building dynamic SQL using string concatenation can introduce vulnerabilities. The below stored procedure performs the same operation by using dynamic SQL.

CREATE PROCEDURE GetUserJoinDate(@UserName nvarchar(45))  

 AS
  DECLARE @sql nvarchar(255)
  SET @sql = 'SELECT JoinDate FROM Users WHERE UserName = ' + @UserName + '
  EXECUTE sp_executesql @sql
GO

In this case the below SQL passed in as the UserName parameter will execute malicious SQL on the database:

'; delete from users --

This is a very simplistic sample and is an unusual use of dynamic SQL, dynamic SQL is more commonly used in other scenarios where the developer may believe it is the only option. For example SQL is often passed in to build such as sorting where parameters

CREATE PROCEDURE GetUsers 
    @Sort nvarchar(50) 
AS
  DECLARE @sql nvarchar(255)
  SET @sql = 'SELECT UserName FROM Users   ' + @Sort 
  EXECUTE sp_executesql @sql
GO

This allows partial sql to be passed in for sorting:

exec @Sort = 'order by UserName ASC'

This stored procedure could be purged of dynamic SQL and written as below:

CREATE PROCEDURE GetUsers 
    @Sort int = 1 
AS
   SELECT UserName FROM Users
   ORDER BY CASE
     WHEN @Sort = 1 THEN ( Rank() OVER (ORDER BY UserName ASC) )
     WHEN @Sort = 2 THEN ( Rank() OVER (ORDER BY UserName DESC) )
     WHEN @Sort = 3 THEN ( Rank() OVER (ORDER BY CreateDate ASC) )
GO

There are numerous scenarios such as this where dynamic SQL can be removed. If dymanic SQL is absolutely necessary then string concatenation should be avoided and parameters placed within the SQL which will will ensure that the parameter is properly escaped:

CREATE PROCEDURE GetUserJoinDate 
  @UserName nvarchar(45)
AS
  DECLARE @sql nvarchar(255)
  SET @sql = N'SELECT JoinDate FROM Users WHERE UserName=@UserName'
  EXECUTE sp_executesql @sql
GO

These examples have focused on stored procedures on the database, but this applies equally to SQL created in the application :

//Vulnerable
SqlConnection conn = new SqlConnection(connectionString);
SqlCommand command = new SqlCommand("SELECT JoinDate FROM Users 
                                     WHERE UserName='" + userName + "'", conn); 

//Secure SqlConnection conn = new SqlConnection(connectionString); SqlCommand command = new SqlCommand("SELECT JoinDate FROM Users WHERE UserName=@UserName", conn); command.Parameters.Add(new SqlParameter("UserName", userName);

Permissions

The database is only vulnerable to SQL injection to the extent of the user’s permissions on the SQL Server database. It is thus essential to audit and limit the permissions on the database, a detailed discussed of permissions is beyond the scope of this article but a good starting point for auditing SQL Server permissions is Auditing SQL Server Permissions

IIS Global Filtering
As an additional defense for querystring injection, IIS can filter requests for dangerous keywords. This can be done globally using the applicationhost.config (located at %systemroot%system32inetsrvconfigapplicationhost.config ). Under the tag add your rules as below:

 <filteringRules> 
    <filteringRule name="SQLInjection" scanQueryString="true"> 
        <appliesTo> 
            <add fileExtension=".asp" /> 
            <add fileExtension=".aspx" /> 
        </appliesTo> 
        <denyStrings> 
            <add string="--" /> 
            <add string=";" /> 
            <add string="/*" /> 
            <add string="@" /> 
            <add string="char" /> 
            <add string="alter" /> 
            <add string="begin" /> 
            <add string="cast" /> 
            <add string="create" /> 
            <add string="cursor" /> 
            <add string="declare" /> 
            <add string="delete" /> 
            <add string="drop" /> 
            <add string="end" /> 
            <add string="exec" /> 
            <add string="fetch" /> 
            <add string="insert" /> 
            <add string="kill" /> 
            <add string="open" /> 
            <add string="select" /> 
            <add string="sys" /> 
            <add string="table" /> 
            <add string="update" /> 
        </denyStrings> 
    </filteringRule> 
</filteringRules>

/* Courtesy Wade Hilmo */

This will deny requests with these keywords in the querystring.

Leave ValidateRequest On
ValidateRequest is inbuilt in ASP.NET and throws an error whenever a page is submitted with potentially malicious content. However, ValidateRequest is very aggressive and out of frustration developers frequently resort to turning it off in across the entire application. Instead, it can be selectively disabled for controls, pages in ASP.NET and for controllers in ASP.NET MVC (using [ValidateInput(false)] ). This gives more granular control over where there are potential exposures to SQL injection.

Consider Using an ORM
Object Relation Mappers such as Entity Framework or Nhibernate take control of the SQL generation and execution and generally do an excellent job of protecting against SQL Injection.

Primer on Selecting Data Using Entity Framework

This question about selecting data using Entity Framework on StackOverflow got me thinking that a lot of LINQ to Entities code samples show very little consistency in how data is selected. Find(), Single(), SingleOrDefault(), First(), and FirstOrDefault() are often used interchangeably.

The first issue to address is using Where() in the data selection statement. Where() should only be used when several entities are required. There is no advantage in using Where() as below :

var db = new MyDBContext();
var employee = db.Employees.Where(x => x.ID == 1).Single();

This can be written as :

var employee = db.Employees.Single(x => x.ID == 1);

Note that there is a fundamental difference between using Where() in isolation and appending a standard LINQ operator such as Single() or ToList(). Where() and OrderBy() will return IQueryable and IOrderedQueryable objects respectively which contain core SQL Statements to be executed against the database but have not yet requested or received data from the database.
Appending Single(), First(), ToList() or similar will subsequently execute the query:

var emps = db.Where(x => x.Salary > 100000); // emps is an IQueryable() object, no database interaction yet
//.....
var highSalaryEmps = emps.ToList() //database query executed and an IEnumerable() returned as highSalaryEmps

This can have both performance and data integrity implications. In the above example, data is only read from the database when ToList() is called, intervening update/add operations will be reflected in the highSalaryEmps object which may not be the intended outcome.
In terms of performance, the primary issue to be aware of is using filtering operations after executing a large database query. For example :

var emps = db.Employees.ToList();
var highestSalaryEmp = emps.OrderBy(x => x.Salary).Single();

This will first return all employees from the database and then performance an order and selection. Far better to just query the database using these criteria :

var highestSalaryEmp = db.Employees.OrderBy(x => x.Salary).Single();

Find()

Find() is set apart from other element operations in LINQ-To-Entities in that it will first query the in-memory entities that EF is tracking and only hit the database in the event that none is found. Thus it can have superior performance and should always be used where possible – especially in lengthy operations when the entity is likely to already be in memory. Find() is unfortunately not very flexible and can only be used to look up an entity by its Primary Key :

var emp = db.Employees.Find(empId);

Note that you can use the Local property on a DbSet to access the in-memory data if you require more advanced logic :

var emp = db.Employees.Local.SingleOrDefault(x => x.Name = "Jude")
        ?? db.Employees.SingleOrDefault(x => x.Name = "Jude");

Note the use of the null coalescor  (??) above which tests if the first statement evaluates to null and if so then proceeds to execute the second statement. Thus EF will first test if there is a match in-memory and only if no match is found will then hit the database.

Single or First

A lot of EF code samples contain First() when it is clearly not optimal. First() is often used interchangeably with Single() although they are quite different. Single mandates that one and only one entity can be returned, if two matching records are found in the database an error will be thrown. First() is only concerned with returning the first record and performs no check to determine if there are multiple matching records.
Thus Single performs a data integrity test. In many scenarios there should only but one matching record in the database, and so Single will ensure this – although this will necessitate addition error trapping logic since Single will throw errors is more than one match is found. Under the covers, Single does this by using ‘SELECT TOP (2)’ for its queries and then EF inspects the returned records and throws an error if two records are returned. First by contrast simply queries using  ‘SELECT TOP (1)’.

As a side note, using .Take(1) instead of .First() has no difference in the SQL used to query the database, but Take() returns an IQueryable object and not an entity as with First() so the above discussed issues will be relevant.

OrDefault()

SingleOrDefault() or FirstOrDefault() should only be used when it is not known if the entity has already been persisted to the database. Thus SingleOrDefault() will query the database for the entity and return null if no entities are found (note that an error will still be thrown if more than one match is found). A typical pattern is querying and returning an entity if it exists or creating and adding it if there is no matching entity. In such a scenario the ?? null coalescor introduced above can be very useful.
Take a scenario where users can create Tags for Articles, if the Tag exists it shouldnt be returned as an entity but it needs to be created if it does not yet exist:

var tag = db.Tags.SingleOrDefault(x => x.TagName ="EntityFramework") ??
new Tag { TagName ="EntityFramework", CreateDate = DateTime.UtcNow };

article.Tag = tag;
db.SaveChanges(); //Note the new tag will not be persisted to the database until SaveChanges is called.

 

 

Preventing Parameter Tampering in ASP.NET MVC

Never trust data submitted from a web page is a core security concept for web development. When using an MVC framework, this statement takes on added relevance. MVC frameworks rely heavily on binding querystrings, route values and form values to in-code objects.

Take for example the scenario where an authenticated user needs to update their profile. A common pattern in MVC would be to first load the the user in a controller and pass the user object to a View which renders the html to be sent to the browser:

[Authorize]
ActionResult UserDetails(int userId)
{
   var user = respository.GetUser(userId); 
   return View(user);
}

Another controller will then receive the Http Post from the update page and process the update:

[HttpPost]
[Authorize]
ActionResult UserDetails(User user)
{
   respository.UpdateUser(user);
   return View(user);
}

This works seamlessly, and developers are often comforted by the [Authorize] attribute which mandates that the user must first have logged in to the service and been authenticated to access the update page.

The vulnerability here is that when the page is sent to the user the html will contain the userid (typically in a hidden field), this forms part of the Http Post sent back to the server and is used as the primary key to determine which record to update. Using a tool such as Fiddler it is trivial to change this userid to any value and hence update any user record (attackers will often change a userid to 1 which is usually an admin account).

Accepting Data From a Trusted Source

The first defense against parameter tampering to only accept data from a trusted source where possible. In the above scenario, this could mean only accepting the data from the ASP.NET Membership provider:

[HttpPost]
[Authorize]
ActionResult UserDetails(User user)
   {
	 //Test is submitted user.Name is the same as the current logged in user
     if(user.Name == User.Identity.Name)
		{
   			respository.UpdateUser(user);
   			return View(user); 
		}
		else
		{
			//error
		}
     }

Encrypting and Decrypting Key Values

A more generalised solution is to pass down an encrypted value to the webpage and then decrypt and test this against the key value on the Http Post. Take for example an orderId which is used to update and order details (such as the address the order is to be sent to).

First, in the controller we can create an encrypted value of the key orderId field using the MachineKey API:

ActionResult OrderDetails(int orderId)
{
   var order = respository.GetOrder(orderId); 

     var orderidBytes = Encoding.UTF8.GetBytes(orderId);
     var encryptedOrderId =  MachineKey.Encode(plaintextBytes, MachineKeyProtection.All);
     ViewBag.OrderIdEncrypt = encryptedOrderId;


   return View(order);
}

In the Razor view, simply add the ViewBag.OrderIdEncrypt in a hidden field within the form:

...
@Html.Hidden("OrderIdEncrypt", ViewBag.OrderIdEncrypt)
...

Now when the form is posted back there will be a OrderIdEncrypt value on the form. This can then be decrypted with the MachineKey API and compared to the submitted value of OrderId:

[HttpPost]
[Authorize]
ActionResult OrderDetails(Order order, string orderIdEncrypt)
{
   var decryptedOrderIdBytes = MachineKey.Decode(encryptedValue, MachineKeyProtection.All);
   var unEncryptedOrderId =  Encoding.UTF8.GetString(decryptedBytes);
    
   if(unEncryptedOrderId == order.Id)
    { 
       respository.UpdateOrder(order);
       return View(order);
    }
   else
    {
       //test failed - send error message
    }

}

This solution could of course be cleaned up a lot – an enhancement would be firstly to have an computed field on the Order model which returns an encrypted value of the orderId and a helper method IsOrderIdValid which would compare an encrypted orderId value to a plain text orderId. This would clean up the solution and minimise the amount of logic in the controller.

In addition to tampering with existing values in a form, MVC frameworks are also vulnerable to adding additional parameters which are then bound to objects. This is commonly referred to as a Mass Assignment vulnerability and will be dealt with in my next article article.

Security – Simulating And Protecting Against A DoS Attack

On a recent project, I was created a web service which parsed a set of financial statements into name/value pairs from an XBRL document. The complexity of the XBRL   specification means that parsing an XBRL document takes approximately 90 seconds on mid-spec server. To avoid users having to wait for the 90 seconds for the data, the parsed data was saved to a database. However, since the service covers over 5000 companies it would need to store approximately 100,000 sets of financial statements which places a strain on database resources as each set of statements contains several thousand data points. The obvious solution is to only store the most commonly requested financial statements in the database and load the others on-demand with a message to users that the financial statements were being loaded. However this inoccuous design decision creates a major security vulnerability.

Slow Loading Pages = Security Vulnerability
One of the hardest attacks to defend against is a Denial of Service (DoS) attack which is simply a repeated set of requests for a server resource such as a web page. The repeated requests generate an excess of load on the server which is then rendered incapable of serving other legitimate site users and hence the web site is taken down.
The difficulty in launching a DoS attack is that most web pages involve very little CPU processing and hence the flood of requests necessary to bring the site down would be extremely large and hard to generate. Attackers therefore will target slow loading pages which may be consuming a lot of server resources.

Simulating a DoS Attack
To test a web application’s vulernability to DoS attacks we can use a tool to issue multiple simultaneous requests. For this demo I will use  LOIC (Low Orbit Ion Cannon) with the source code available at https://github.com/NewEraCracker/LOIC.
LOIC comes with a simple Windows application front-end, although you could just use the source code in the XXPFlooder.cs file to programatically launch the attack (note the simplicity of the code for running DoS attacks – the XXPFlooder.cs file is a mere 89 lines).
Using LOIC Windows app I configured it to make Http requests to the /loadaccounts?id=1002 page on my development server:

This quickly overwhelmed the development server, bringing the CPU utilization close to 100%:

Solution
DoS attacks are one of the most challenging security issues as there are numerous different ways of launching a DoS attack. Typically the defense will be at the network or server level, but in the case of a resource intensive page there are several potential application level optimisations. Firstly, the resource intensive page can be configured as a separate application with a dedicated application pool. A attack will therefore be isolated to the specific page (or area in the website) and leave the rest of the site unaffected.
Next, the page itself can be protected by using a short term in-memory cache. After executing the the resource intensive operation the resulting object can be placed in memory for a short period and subsequent requests will be served from the in-memory cache. A basic outline of the code is below:

FinancialStatements financialStatements;

//First test if there is an an object in the cache
if(Cache["FinancialStatements" + id] == null)
{
//If there is no object in the cache, create it an then load it into the cache
financialStatements = LoadFinancialStatements(id); //This is the resouce intensive operation
CacheItemPolicy policy = new CacheItemPolicy();
policy.AbsoluteExpiration = DateTime.Now + TimeSpan.FromMinutes(1);

Cache.Add(new CacheItem("FinancialStatements" + id, financialStatements), policy);
}
else
{
//If the object is already in the memory, simply load it from the
//cache which is a quick low impact operation.
financialStatements = Cache["FinancialStatements" + id];
}

Thus the first request will run resource intensive operation, but subsequent requests made within 60 seconds will simply be loaded from the memory.

Now running the LOIC attack again results in the below CPU resouce utilisation:

Note that CPU now only reaches a manageable 31% and only one of the cores is under significant load.

A Simple Visual Random Number Test

Random numbers are very difficult to test since there are so many ways a series of numbers can exhibit non-random properties. Correlation between consecutive number in a random sequence and an uneven distribution of numbers across the entire sequence are just two important tests.

I was writing a simple random routine in C# and made the mistake of placing the declaration of the Random object inside the loop:

private ListRandNumbers()
{
var numbers = new List();

for (int i = 0; i < 100000; i++)
{
var rnd = new Random();
numbers.Add(rnd.Next(1, 1000));
}

return numbers;
}

In .NET this will produce a very poor series of random numbers since by default the Random object is automatically seeded with a DateTime and so if it is placed within a loop it will be created and seeded multiple times with identical values. Developers normally write tests to check an application’s data quality , but writing a battery of statistical tests to check a simple random number routine is working correctly is overkill.

A neat solution to this is to use a simple visual inspector with plots the values on a two dimensional surface. The below ASP.NET MVC controller would output the a bitmap of the above random number routine.

public ActionResult RandomNumberImage()
{
var path = @"c:usersyourfilepathfilename.bmp";
Bitmap bm = new Bitmap(1000, 1000);
var randNumbers = RandNumbers();
var randNumberCount = randNumbers.Count();

for (int i = 0; i {
bm.SetPixel(randNumbers[i], randNumbers[i+1], Color.Black);
}

bm.Save(path, System.Drawing.Imaging.ImageFormat.Bmp);
return File(path, "image/bmp");
}

This outputs the below visualisation:

Definitely not a random data set! Now we can correct the mistake and instantiate the Random object outside the loop:

private ListRandNumbers()
{
var numbers = new List();
var rnd = new Random();

for (int i = 0; i < 100000; i++)
{
numbers.Add(rnd.Next(1, 1000));
}
return numbers;
}

Now lets look at the output of the correctly implemented routine:

This appears much better. In addition to a broader scatter, note that there appears to be more data points, this is because in the previous routine numbers were often identical and hence plotted on top of eachother.

Visualisation can also help surface more subtle issues in random number generation. The below image is from a data set generated using the PHP random number generator on a Windows machine:


Source : random.org

The relationships between these numbers may be quite difficult to discern using statistical tests but for the above image there are clearly some relationships within the data set and hence it would not be random.

Note that this is not a robust statistical test, the inbuilt random number generation in .NET is generally considered to perform poorly (as is the case with Excel random numbers)

Testing ASP.NET MVC Server Side Validation

I came across an online random number generator recently which provided an input for the number of random numbers to be generated. Since I was looking for large samples I entered 1000000 to see what the response would be. Since such a request would likely cause too much server load, the site developer had protected against this and the site popped up the message:

 

My suspicions were aroused as to how completely this had been implemented. I fired up Fiddler, and modified the Http Request to set the random number to 8500. The site then happily returned the response with 8500 random numbers. Clearly the developers had only implemented this as a Javascript test on the browser and the server itself would accept and process any request. An attack could easily be mounted by sending off several simultaneous requests for a massive number of rows which would likely overload the server and bring the site down. In reality, modifying the Http post request is not even required as disabling Javascript on the browser would circumvent the number test.

 

Framework Protection

Frameworks such as ASP.NET MVC and Ruby on Rails provide client side and server side testing out-of-the-box. In ASP.NET MVC for example, adding a the [Range] data annotation in the model will result in both server side and client side tests being implemented:

public class RandonNumberViewModel    
{        
[Range(1, 2500)] //Only numbers between 1 and 2500 will be accepted 
public int RandomNumbers{ get; set; } 
}

Now when the developer uses this model to generated a strongly typed page it will automatically implement a Javascript test, in addition the model returned by the Http Post will be marked as invalid on the server (for full details on how to implement this see ASP.NET MVC Model Validation ).

When model validation is correctly implemented it provides excellent protection from Http Post modification attacks. However, small code mistakes such as omitting  a check of ModelState.IsValid prior to process a request will leave the application open to attack. Thus it is essential that developers do not blindly trust the framework to fully protect a site from attack, and test the site’s security by simulating an attack prior to going into production.

 

Server Side Validation Testing With Http Request Modification

The first step is to install an Http debugging proxy, the most popular of these on Windows is Fiddler which is free and well supported although the interface can be a little difficult to navigate,  a good alternative is  Charles Proxy which is a paid product but easier to use. I will use Fiddler in this demo.

The default setup in Fiddler shows a list of Web Sessions in the left pane (see below). Typically each ‘web session’ is an Http  Request and corresponding Http Response. Selecting a Web Session, and then clicking the Inspectors tab as below will load the Request in the top right pane and the Response in the bottom right pane.

 

Fiddler provides a wealth of information on the Http traffic but for this demo we are interested in modifying and sending a Post request. To do this first send a post request from your browser by submitting a form, this will then be captured in the Web Sessions pane of Fiddler. Then select the Composer tab on the right of the screen. For this demo I will use the default ASP.NET MVC site that is included in new Visual Studio ASP.NET MVC Internet projects. I will attempt to circumvent the restriction for passwords to be a minimum of six characters.

First I will attempt to register an account with a short password, this yields the below result when the register form is submitted from the website.

This is the client side Javascript preventing me from entering a short password. Now that I know the Javascript prevents an incorrect form submission I will send off a correct form submission with a valid password. This succeeds and I now have a post submission in my Web Sessions listing:

Now, select the Composer tab and then drag the Web Session to the right hand pane (this is just a short cut for  copying the Http Post so we can edit it). Now in the Request Body pane simply modify the Password and Confirm password values to invalid values (ie ‘judo’) as below and click Execute.

This will manually send an Http Post request and there will be a Web Session for the request and response. Select the session and the lower pane will show the response. The WebView tab will show a crude rendering of the Html in the response :

As can be seen the server tested the password and found that it was of insufficient length and did not register an account, and so both the client and server side validation are working correctly.

MVC Security – Model Binding Vulnerability

In my article on Parameter Tampering in ASP.NET MVC I focused on the general techniques of parameter tampering and defenses against it. In this article I will examine a specific type of parameter tampering which is often termed Mass Assignment.

In most MVC web development frameworks (including ASP.NET MVC). Model binding works by assigning an incoming html Form’s values to an object’s properties. This makes for very efficient development but one that contains a critical security flaw.

Take the below Employee model:

public class Employee
{

public int ID {get; set;}
public string FullName {get; set;}
public DateTime DateOfBirth {get; set;}
public bool IsAdmin {get; set;}

}

Note for now that the IsAdmin property is for determining if the Employee has admin rights on the application.

Assuming an employee creates a record themselves in the web app the relevant part of the html form will be as below (note that the id field is automatically set when the record is created in the database and so is not present and the IsAdmin field is not exposed).

<input id="fullname" name="fullname" type="text"  >
<input id="dateofbirth" name="dateofbirth" type="text"  >

When the form is submitted it will produce the below Http Post Request:

Request URL:http://yoursite/register
Request Method:POST
Status Code:200 OK
...
...
fullname:Jude OKelly
dateofbirth:22-04-1972

In an MVC framework like ASP.NET MVC an incoming Http Post/Get request is handled by a Controller’s method. In the scenario of a form being posted to the server, the MVC framework can automatically create an object and populate its properties with the form values:

[HttpPost]
public ActionResult Register(Employee emp)
{

//the emp object is automatically created and assigned the values from the html form:
db.Employees.Add(emp);
db.SaveChanges();

}

The above example uses an ORM such as Entity Framework to save the emp object to the database. So the database will contain the posted values for FullName and DateofBirth. The ID property is automatically assigned and since the IsAdmin value is not provided it is given a default value of False. Thus, in this case the newly created user will not have admin rights on the system.

However, using a tool such as Fiddler it is relatively simple to alter the html Post to include ‘IsAdmin:True’:

Request URL:http://yoursite/register
Request Method:POST
Status Code:200 OK
...
...
fullname:Jude OKelly
dateofbirth:22-04-1972
isadmin:true

Once the ASP.NET MVC model binder receives this http post request it will attempt to match as many properties of the incoming form with the Employee object, since the IsAdmin property is now provided it will be set and saved to the database. Thus the new user will now have full admin rights to the system.

Another way to inject ‘IsAdmin=true’ would be to simply add it to the querystring when posting back the page. ASP.NET MVC (as well as other MVC frameworks) will look for model values in the html form values, the querystring and route values, and cookies.

Solutions

There are several solutions to this vulnerability:

Matching Incoming Parameters

Instead of using a full object in the method’s constructor, just declare the parameters that are expected and manually bind them to the Employee object:

[HttpPost]
public ActionResult Register(string fullName, DateTime dateOfBirth)
{

var emp = new Employee{FullName = fullName, DateOfBirth=dateOfBirth};
db.Employees.Add(emp);
db.SaveChanges();

}

In this case that any additional data such as IsAdmin=True in the form post will be ignored and not make it into the database.

Use A ViewModel

A very common practise in MVC web development is to create custom models for each page (or ‘View’ in MVC parlance). These models can be a composite of different domain models or just a subset of properties of a single domain model.
In this case we would simply declare a ViewModel as below:

public class RegisterEmployeeViewModel
{

public string FullName {get; set;}
public DateTime DateOfBirth {get; set;} 
}

Note this ViewModel only contains the two properties of the Employee object we wish to receive from the html form.

Now in the Controller we will receive this model and use the MVC model binder to automatically create the RegisterEmployeeViewModel object and populate its properties with the values of the html form:


[HttpPost]
public ActionResult Register(RegisterEmployeeViewModel empViewModel)
{

var emp = new Employee{FullName = empViewModel.FullName, 
                       DateOfBirth= empViewModel.DateOfBirth};
db.Employees.Add(emp);
db.SaveChanges();

}

In this case we created a new Employee object and populated is values with the ViewModel object’s properties, in practice various mapping solutions can do this automatically to simplify the process.

Whitelist / Blacklist Parameters To Bind

ASP.NET MVC provides the Bind attribute which can be used to either specifically include or exclude parameters for model binding.

Whitelist parameters to be included in model binding (note only the UserName will be used in binding) :

 

[HttpPost]
[Authorize]
ActionResult UserDetails([Bind(Include = "UserName")] User user)
{ 
	//Update Logic
}

Blacklist parameters to be excluded in model binding (all parameters except IsAdmin will be used in binding) :

 

[HttpPost]
[Authorize]
ActionResult UserDetails([Bind(Exclude = "IsAdmin")] User user)
{ 
	//Update Logic
}

This approach has the weakness that it requires the Bind attribute to be used on every controller action where a mass assignment model binding vulnerability may exist, which is a reliable approach – a new controller action could easily be added in an update which omits the crucial bind attribute. In general this defense should be thought of as an extra layer of security to be used in conjunction with other approaches.

Unfortunately, so many of the early demos of ASP.NET MVC featured this simplified model binding scenario that the security flaw is very common in production sites (or at least that is my experience).

Entity Framework Performance Optimization

Query Tuning

Minimise the Data Requested

Minimising the data queried is a fundamental tenet of performance tuning for any method of data access. In querying a database using SQL, the below query should almost never make it into production :

Select * From [Table]

Instead, the query should specify the fields to be returned from the table. This is extremely familiar to any database developer or DBA. However this principal is often ignored when developing with Entity Framework.

In a recent project I was working on an alerting system which would send an email alert when a company released an SEC filing. The Alert entity had a related Company entity:

public class Alert
{
public virtual int ID { get; set; } 
public virtual DateTime CreateDate { get; set; } public virtual Company Company { get; set; } //......more properties } public class Company { [Key] public virtual string Ticker { get; set; } public virtual string Name { get; set; } //......more properties }

The UI needed to display the CreateDate of an Alert and the Name of the Company. How should this be written in LINQ-to-Entities?
Often, the related entity (Company in this case) would be requested using Include() as below:

var testQuery = db.Alerts   // db is the instance of the DbContext
.Where(x => x.ID == 1)
.Include(y => y.Company);

This is a common pattern, however, it is actually the equivalent of loading the entire rows in both the Company and Alert tables in the database. Look at the SQL this generates:

That is a lot of data requested when all that was needed was the CreateDate and Company Name. One way to solve this is by using an anonymous type which only selects the data requested :

var testQuery = db.Alerts.Where(x => x.ID == 1)
.Select(y => new {
CreateDate = y.CreateDate,
Company = y.Company.Name
});

This would generate the below SQL which is a vast improvement.

Anonymous types are not always that convenient since they cannot be used outside the scope of the method they are declared in. There are several alternatives such as creating public classes with only the properties which are required (these public classes would in some ways be analogous to Views in SQL Server since they would be a light-weight composite of the properties/fields of other entities).

Do Not Count After Query Execution

To perform a simple count, ensure that you include Count() in the LINQ statement (which will only generate a SQL COUNT query) and don’t return an IEnumberable and then execute a Count() on it which returns all the entities and then performs an in memory count. There are several varieties of this issue:

var empCount = db.Employees
.Where(x => x.CreateDate < DateTime.Now.AddDays(-10)
.Count() //only executes a SQL COUNT
var emps = db.Employees
.Where(x => x.CreateDate < DateTime.Now.AddDays(-10)
var empCount = emps.Count() // returns all matching employees and then executes a count

Filter Before ToList()

Another issue is executing ToList() prior to any filtering. ToList() will instantly trigger a database query for all LINQ statements prior to it being hit, subsequent LINQ filter statements such as Where() Take() Skip() etc will then be executed on the in-memory List:

//ToList() executed too early
var emps = db.Employees
.Where(x => x.CreateDate < DateTime.Now.AddDays(-10)
.ToList() // All employees created within last 10 days queried
.Take(50) // List is filtered for the top 50
//ToList() executed last
var emps = db.Employees
.Where(x => x.CreateDate < DateTime.Now.AddDays(-10)
.Take(50)
.ToList() //Query made for top 50 employees created within last 10 days

Don’t Load Entities To Delete Them

A common problem is loading an entity which needs to be deleted. The DbContext Remove() method requires an entity to be passed in, so if only provided with a primary key the below code would effect a delete on the database:

var emp = db.Employees.Find(empID); //Executes a query against the database
db.Employees.Remove(emp);
db.SaveChanges(); //Executes a second hit to the database

Instead of this, create a temporary entity and use Attach() to attach it to the context and then call Remove (it is not relevant that the entity will not have all its properties loaded – all that is needed for the Remove() method is the primary key):

var emp = new Employee { ID = empID }
db.Employees.Attach(emp);
db.Employees.Remove(emp);
db.SaveChanges(); //Executes a single hit to the database

Overall, the primary issue to bear in mind is that when an entity is requested all the properties on that entity are queried and returned unless the query states otherwise.

 

Profile Generated SQL

A problem in using Entity Framework (especially versions 4.1 and later) is that it encourages developers to focus solely on code and ignore the data access. This can result in an over-reliance on EF to handle all data access and SQL generation. It is often difficult to determine the efficiency of an operation simply by looking at the code, and so bloated queries due to poor LINQ structure as noted above or from EF’s SQL generation can arise. It is thus essential to look at the actual SQL generated by EF.

The simplest way to profile the SQL is to call the ToString() method on an IQueryable<> object. In the previous examples I generated the SQL by using:

var  sql = testQuery.ToString(); //Note this is only in EF 4.1 +

Beyond that, most SQL Server DBAs/developers should be familiar with SQL Profiler which would provide more indepth analytics.

Julie Lerman wrote a good roundup of EF profiling methods in Profiling Database Activity in EF (note that this was written prior to the release of EF 4.1 so the coding examples use the ObjectContext and not the DbContext which has some simpler methods such as calling ToString(0 directly on an IQueryable).

 

Working With Change Tracking

EF works in the background to track the changes of any in-memory entities. There are two methods by which it performs this function. Snapshot Change Tracking, keeps an in-memory snapshot of the entity when it originally sees it and compares it to the current values. Change Tracking Proxies by contrast generates a dynamic proxy for the entity at run-time which notifies EF of changes when they are made.

EF will use change tracking proxies if an entity meets all its criteria (the class must be public as well as neither sealed nor abstract, every navigation property must be marked ‘virtual’ plus several other criteria), otherwise the class will use snapshot change tracking.

The advantage of change tracking proxies is that they can automatically notify EF when a entity is updated and save the overhead of EF having to scan through numerous entities comparing them to their previous values. Thus if you have a large number of tracked entities and only a small number of entities whose values change then using change tracking proxies is likely to be very efficient.

Unfortunately, change tracking proxies can be extremely inefficient in scenarios when there are a large number of changes since each proxy has to notify EF of a change. Snapshot change tracking will only perform a single scan when DetectChanges() is triggered (this is automatically called by EF when an event such as SaveChanges() occurs).

In many scenarios the performance difference between the two change tracking methods may not be great but each should be considered when looking at performance optimisation. A scan through the questions of StackOverflow reveals that change tracking proxies does seem to cause quite a few performance and development issues as the entity model is much more restricted when using them, so I would typically default to using snapshot tracking.

When using snapshot change tracking note that change tracking can be temporarily disabled. This may be useful in performance tuning as DetectChanges() is called (and all entities in memory therefore scanned) if any LINQ query is executed on the DbContext object or when any of the below methods are executed :

  • DbSet.Add()
  • DbSet.Find()
  • DbSet.Remove()
  • DbSet.Local
  • DbSet.Attach()
  • DbContext.SaveChanges()
  • DbConext.GetValidationErrors()
  • DbContext.Entry()
  • DbChangeTracker.Entries()
In circumstances where there are a lot of entities being tracked in memory and when you are sure that the existing entities will not be changed it may make sense to test the performance improvement from briefly turning of change tracking:
db.Configuration.AutoDetectChangesEnabled = false;
db.Employees.Add(new Employee {name=”jude”});
db.Employees.SaveChanges();
db.Configuration.AutoDetectChangesEnabled = true;

The above code sample avoids an unnecessary call to DetectChanges for Add and SaveChanges, since we know neither of these will change the existing entities we can safely turn change tracking off while this work is performed.

 

Use Eager Loading Instead of Lazy Loading

Lazy loading (when EF automagically loads related entities when they are required) is convenient but leads to several performance issues. Most notable is when a related entity is called within a loop. When using lazy loading this can trigger a separate database query each time the related entity is requested :

var employees = db.Employees;
foreach(var emp in employees)
{
var address = emp.Address; //Database query is executed every time this line is hit
}

By contrast, the below code eagerly loads the related Address entity and only executes a single query on the database.

var employees = db.Employees .Include(x => x.Address) .ToList(); //All data loaded from database
foreach(var emp in employees)
{
var address = emp.Address; //No database query executed
}
Lazy loading also causes a variety of development headaches (as noted in Entity Framework Gotchas) and can be globally turned off in the constructor for the context:
public MyContext() : base() {     this.Configuration.LazyLoadingEnabled = false; } 

 

Query In-Memory Entities First

EF keeps track of in-memory entities so retrieving an entity from memory is preferable to executing a database query. The easiest to implement example of this is to use the Find() instead of Where() when retrieving an entity by primary key. The Find() method first queries the in-memory entities and if no match is found it then executes a database query, whereas Where() will always hit the database:

var emp = db.Employees.Find(empID); // Queries in memory first
var emp = db.Employees.Where(x => x.ID == empID); //Always hits the database

It is not always convenient to do so but can manually query in-memory entities using Local if required :

var emp = db.Employees.Local.Where(x => x.ID == empID) ;

 

Don’t Rely on Entity Framework Exclusively For Data Access

Finally, EF is great for productivity but executing raw SQL or a stored procedure may often produce performance benefits.

In EF 4.0 and above the DbContext the SQLQuery() method to execute raw SQL:

var emps = db.Database.SQLQuery("select email from Employees");

Similarly use ExecuteSqlCommand() to execute a stored procedure:

var emps = db.Database.ExecuteSqlCommand("...");

 

Wrap Up

I have of course ignored the actual SQL Server database – which is a massive subject by itself, but these are the primary performance issues I encountered. Please do let me know if you think anything has been left out.

Entity Framework Gotchas

Entity Framework is fast becoming the default data access method for small to medium sized apps in .NET. Whilst it offers a lot of benefits in terms of productivity and security, working with EF is often very different from working with the standard ADO.NET data-access objects and poses several common issues or ‘gotchas’:

Using Different DBContext objects

This is probably the most common ‘gotcha’ when starting development with EF. When an entity is loaded using a DBContext object it can only be tracked by that DBContext object and attempting to perform work on the entity using a different DBContext object will give rise to a variety of errors.

For example:

        public void LoadUser(AppUser user)
        {
           var db = new FilingContext() ;
           user = db.AppUsers.Find("username");
           UpdateUser(user);
        }
        private void UpdateUser(AppUser user)
        {
           var db = new FilingContext();
            user.Email = "newEmail";
            db.Entry(user).State = EntityState.Modified;
           db.SaveChanges();
        }

This will raise the error:

An entity object cannot be referenced by 
multiple instances of IEntityChangeTracker.

This is a very simplistic example (especially since opening two contexts would be a bad idea for performance reasons) but as the model and data access code grows this issue can often arise.

It is normally best practice in development to limit each method to performing a single task. However, as in the above case, if an entity is created and then passed to another method for additional processing an error will be throw as it is being tracked by more than one DBContext.

The cleanest solution to this is to use AsNoTracking() when loading an entity and then later manually Attach the entity to the DBContext object. This informs EF that the change tracking features are not required and hence the DBContext object does not need exclusive control over the object:

        public void Test(AppUser user)
        {
           var db = new FilingContext() ;
           user = db.AppUsers.AsNoTracking()
                    .Where(x => x.UserName == "userName").Single();
           UpdateUser(user);
        }
        public void UpdateUser(AppUser user)
        {
           var db = new FilingContext();
           db.AppUsers.Attach(user);
           user.Email = "newEmail";
           db.SaveChanges();
        }

As a general point you should consider using AsNoTracking() when loading entities which you know will not be modified, as there is a performance overhead in the DBContext having to track the changes of entities.

Lazy Loading

Lazy loading is when EF determines which related entities need to be loaded and automatically loads these at run-time. For example, if an Employee object contained a Job entity you would not explicity have to request the loading of the Job entity when loading the Employee – just attempt to access the Job as a property of the Employee and EF will automatically load the Job.

Lazy loading looks extremely efficient and convenient but in practice it is totally impractical for most applications. It is a shame that so many of the code samples from Microsoft feature lazy loading, and hence developers typically start off by relying on it. However, to use lazy loading the DBContext which loaded the object needs to still be available and if any form of a Data Access Layer is implemented it has likely been destroyed.

Most experienced EF users recommend wrapping the DBContext object in a using block to ensure it is destroyed after performing its work, in such a circumstance lazy loading could only be used with the using block.

The most common head-scratcher for beginners is that lazy loading is turned off for validation. This leads to run-time errors which an entity has related entities which are marked as [Required] as below:

  public class Employee
        {
        public virtual int ID { get; set; }
        public virtual string Name { get; set; }
        [Required]
        public virtual  Job  Job { get; set; }
        }
  public class Job
        {
        public virtual int ID { get; set; }
        public virtual string JobName { get; set; }
        }
......


var emp = db.Employees.Find(primaryKey); 
emp.Name = "New Name";
db.SaveChanges(); //throws an error at run-time

The above sample looks fine at first sight (and more importantly compiles without errors), but it throws a runtime error since the SaveChanges() method calls into EF Validation which notes the Required entity is not loaded and throws an error.

The simple solution to this is to an an Include statement to explicitly load the required entity:

var emp = db.Employees.Where(x => x.ID == primaryKey)
                      .Include(x => x.Jobs);

In general a good alternative to marking entities as [Required] is to expose a non-nullable foreign key in the model as below. This avoids the common scenario above when EF throws an exception on validation due to a Required entity not being loaded.

  public class Employee
        {
        public virtual int ID { get; set; }
        public virtual string Name { get; set; }
        public virtual  int JobID { get; set; }
        public virtual  Job  Job { get; set; }
        }

SQL Generation From LINQ

LINQ is extremely powerful for querying objects, however, SQL is less so and since EF has to translate LINQ into SQL when a database is queried not all LINQ queries can be successfully executed.

One notable issue is using methods on the DateTime object. The Add() as well as the ToShortDateString()/ToLongDateString() methods of DateTime are commonly used in LINQ however there is no equivalent in SQL. Using these will therefore cause run-time errors. To rectify this you should use methods such as these outside of the LINQ statement.

//Below code causes an error as AddDays can't be translated to SQL for the query.
var employees = db.Employees.Where(x => x.CreateDate > DateTime.Now.AddDays(-7)).ToList();
//Below code executes fine as AddDays is outside the LINQ query.
var oneWeekCutoff = DateTime.Now.AddDays(-7);
var employees = db.Employees.Where(x => x.CreateDate > oneWeekCutoff ).ToList();

Another common issue is using the Contains() method for checking if an object which implements IEnumerable (such as a List) contains another object. For example this method checks if a user is already stored:

  public bool UserExists(AppUser user)
  {
   using (var db = new MyContext())
     {
       return db.AppUsers.Contains(user);
     }
      }

This will compile fine, but will throw the below error at runtime:

Unable to create a constant value of type 'AppUser'. 

Only primitive types or enumeration types are 
supported in this context.

The reason is that LINQ-to-entities cannot translate Contains into a SQL Statement when an object is passed in.

A common solution to this is to use the Any() method and use a lamba to check for if any objects in the context have the primary key of the object you are testing for:

  public bool UserExists(AppUser user)
{
using (var db2 = new FilingContext())
 {
 return var bool1 = db2.AppUsers.Any(x => x.ID = user.ID);
 }
 }

Automatic Database Creation

EF can automatically update the database or create it if one does not exist – at least that is the headline. It would be more accurate to say that EF can automatically create or update the database when the first attempt is made to access the database.

This distinction can lead to a lot of confusion when a developer creates or updates the EF model, then loads the home page and checks the database which is mysteriously unchanged. The solution is to load a page which attempts to access some of the EF entities, this will then trigger the database creation/update.

Even adding an initializer in Application_Start() as below will not change this behaviour since this only specifies the strategy for when the first access attempt is made on the database.

protected void Application_Start()
{
Database.SetInitializer(new CreateDatabaseIfNotExists());
//....
}

To trigger creation of the database upon startup, create an instance of your Context and then run its Initialize method:

protected void Application_Start()
{
    Database.SetInitializer(new
            CreateDatabaseIfNotExists());

    var db = new YourContext();
     db.Database.Initialize(true);
//....
}

Using EF with ASP.NET Membership

This is more of a headache than a ‘gotcha’ since EF makes no pretense of supporting or implementing the commonly used ASP.NET Membership system. It does however help to know that there is no easy solution to this problem – one very bad idea is to have the ASP.NET Membership tables in the same database as the EF entities since the EF database is likely to be dropped and recreated.

If you are looking to have a Foreign Key in an EF entities pointing at note that the primary key for users in ASP.NET Membership is the UserID not UserName and can be programatically accessed as below:

  MembershipUser currentUser = Membership.GetUser();
 string currentUserID = currentUser.ProviderUserKey.ToString();

Also note that having a ASP.NET Membership foreign key in your EF model leaves your app vulnerable to any changes Microsoft might make to the ASP.NET Membership schema in future as noted in this critique which alas doesn’t provide any clean solutions (since there really are none!).

There are a lot unique issues in EF, but these are the ones that I find are most frequently encountered when getting started using EF. Let me know in the comments if there are any other common gotchas you have come across.