James Ball

Custom Magento Model Not Saving Properly?

Posted on May 22, 2012 by James Ball

Today I wasted almost an hour trying to figure out why the save() method on a custom Magento model was failing to save a field that I had just added to the database.

After trawling through numerous Magento source code files, I discovered that (unsurprisingly) Magento only saves the fields that appear in a DESCRIBE SQL query of the table. However, the underlying Zend Framework caches these descriptions so that if an extra field is added, the cached description is missing the new field and so that field will not be updated.

The solution, of course, is to clear Magento’s cache.

Posted in Magento | No Comments

Entity Framework – Generically Fetching Entities by Primary Key

Posted on April 10, 2012 by James Ball

Recently I have come across a situation where it would be desirable to retrieve an entity based only on its type and its primary key.

One possible way of doing this would be to implement and add a Find method to a partial class of each entity type. This would be possible for a small project yet quickly becomes unmaintainable in large or even medium sized projects.

Neither can we implement a Find method as an extension method to EntityObject because that would require an instance of an EntityObject, which we don’t necessarily have.

The solution is to implement a generic find method that makes use of raw SQL queries inside either a helper class or a repository.

Let’s consider the signature of such a method:

public List<TEntity> Find<TEntity>(int[] keys) where TEntity : EntityObject

In order to fetch the entities of type TEntity that have the given primary keys we need to know the following:

The table name corresponding to TEntity
The name of the primary key field

Fortunately, we can get both pieces of information from the EntitySetBase for the entity of type TEntity

/// <summary>
/// Gets the EntitySetBase to which the specified entity type belongs.
/// </summary>
/// <remarks>social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/a8cef6de-3cd1-464f-9b19-865b9794fe09/</remarks>
/// <typeparam name="TEntity"></typeparam>
/// <returns></returns>
private EntitySetBase GetEntitySet<TEntity>() where TEntity : EntityObject
{
  Contract.Requires(_context != null, "Context not set");

  EntityContainer container =
    _context.MetadataWorkspace
      .GetEntityContainer(_context.DefaultContainerName, DataSpace.CSpace);

  EntitySetBase entitySet = container.BaseEntitySets
    .Where(item => item.ElementType.Name == typeof(TEntity).Name).FirstOrDefault();

  return entitySet;
}

We can now complete our Find method as follows:

/// <summary>
/// Retrieves entities of the specified type that have the specified primary keys.
/// </summary>
/// <typeparam name="TEntity">Type of entity to retrieve.</typeparam>
/// <param name="keys">Primary keys of the entities to retrieve.</param>
/// <remarks>This method will not work correctly with entities that have composite keys.</remarks>
/// <returns>Eagerly retrieved and deatched entities.</returns>
public List<TEntity> Find<TEntity>(int[] keys) where TEntity : EntityObject
{
  var set = GetEntitySet<TEntity>();

  var entitySetName = set.ElementType.Name;
  var entityPrimaryKey = set.ElementType.KeyMembers.First().Name;

  var command = string.Format("SELECT * FROM {0} WHERE {1} IN ({2})", entitySetName, entityPrimaryKey, string.Join(",", keys));

  var query = _context.ExecuteStoreQuery<TEntity>(command);

  return query.ToList();
}

Notice that there are several limitations to this approach that are left as an exercise to the reader:

No support for composite keys
No support for non-integer keys
Merge options?
Entity is not attached to the DomainContext

Posted in .NET, C#, Entity Framework | No Comments

Design-Time T4 Templates

Posted on December 7, 2011 by James Ball

Design-time T4 text templates are a useful feature of Visual Studio that allow us to do a little meta-programming and generate code (or text) which is then compiled as part of a project. One problem I recently came up against which is nicely solved by T4 is a slight difficulty in using F#’s discriminated unions from C#.

In an F# class library project called Foo, I have the following types defined:

namespace Foo

type Vehicle =
    | LandVehicle of LandVehicle
    | WaterVehicle of WaterVehicle
    | AirVehicle of AirVehicle
    | SpaceVehicle of SpaceVehicle

and LandVehicle =
    | PersonalLandVehicle  of PersonalLandVehicle
    | PublicLandVehicle of PublicLandVehicle

and PersonalLandVehicle =
    | Car
    | Bike

... *snip* ...

module Something =
    let GetVehicle() =
        Vehicle.LandVehicle(LandVehicle.PersonalLandVehicle(PersonalLandVehicle.Bike))

I also have a C# project called Bar which makes use of the GetVehicle function defined in Foo.

Vehicle vehicle = Foo.Something.GetVehicle();

In order to take an appropriate action depending on whether or not vehicle is any of LandVehicle, WaterVehicle, AirVehicle, SpaceVehicle, it’s necessary to do something like:

if (vehicle.IsAirVehicle)
{
	AirVehicle airVehicle = ((Vehicle.AirVehicle)vehicle).Item;
}
else if (vehicle.IsLandVehicle)
{
	LandVehicle landVehicle = ((Vehicle.LandVehicle)vehicle).Item;
}
else if (vehicle.IsSpaceVehicle)
{
	SpaceVehicle spaceVehicle = ((Vehicle.SpaceVehicle)vehicle).Item;
}
else if (vehicle.IsWaterVehicle)
{
	WaterVehicle waterVehicle = ((Vehicle.WaterVehicle)vehicle).Item;
}

Note the ugly and awkward typecast which quickly becomes cumbersome when you have many of these unions in a structure such as an AST. However, it can be remedied with the use of an extension methods like this:

public static class Extensions
{
	public static AirVehicle AsAirVehicle(this Vehicle vehicle)
	{
		return ((Vehicle.AirVehicle)vehicle).Item;
	}

	public static LandVehicle AsLandVehicle(this Vehicle vehicle)
	{
		return ((Vehicle.LandVehicle)vehicle).Item;
	}
}

It’s then possible to replace the ugly typecasting with a more eloquent expression

AirVehicle airVehicle = vehicle.AsAirVehicle();

Of course, we have to write an extension method for each discriminated union and for each of its cases, so for Vehicle and LandVehicle alone we will need 6 extension methods that are fairly identical.

This is where T4 comes into play. First, let’s add a “Text Template” named “Extensions.tt” to the C# solution. The next thing to do is open it up, and change the line

<#@ output extension=".txt" #>

to read

<#@ output extension=".cs" #>

, which tells T4 to output C# source code. We also need to add the following lines to load the required assemblies:

<#@ assembly name="System.Core" #>
<#@ assembly name="FSharp.Core" #>
<#@ import namespace="System.Linq" #>
<#@ import namespace="Microsoft.FSharp.Core" #>

Now add

using Foo;

namespace Bar
{
	public static class Extensions
	{
		<#
		#>
	}
}

Note that any C# code between the <# and #> symbols will be executed when the template is parsed. It's possible to output text from within the code block by calling "WriteLine" (Not Console.WriteLine!). Now for the interesting bit, we need to somehow loop over all of the types defined in Foo and output the C# source for their relevant extension methods. The final T4 template looks like this:

<#@ template debug="false" hostspecific="false" language="C#" #>
<#@ output extension=".cs" #>
<#@ assembly name="System.Core" #>
<#@ assembly name="FSharp.Core" #>
<#@ import namespace="System.Linq" #>
<#@ import namespace="Microsoft.FSharp.Core" #>

using Foo;

namespace Bar
{
	public static class Extensions
	{
		<#
			PushIndent("\t");
			PushIndent("\t");

			var asm = System.Reflection.Assembly.LoadFrom(@"Z:\Path\To\Foo.dll");
			foreach (var type in asm.GetTypes().Where(t => t.Namespace == "Foo" && t.IsNested == false))
			{
				foreach(var attr in type.GetCustomAttributes(typeof(CompilationMappingAttribute), true))
				{
					if (((CompilationMappingAttribute)attr).SourceConstructFlags == SourceConstructFlags.SumType)
					{
						foreach (var nested in type.GetNestedTypes().Where(t => t.Name != "Tags" && t.Name != type.Name))
						{
							WriteLine("public static {0} As{1}(this {2} node)", nested.Name, nested.Name, type.Name);
							WriteLine("{");
							PushIndent("\t");

							WriteLine("return (({0}.{1})node).Item;", type.Name, nested.Name);

							PopIndent();
							WriteLine("}");
							WriteLine("");
						}
					}
				}
			}

			PopIndent();
			PopIndent();
		#>
	}
}

Important: Note the line var asm = System.Reflection.Assembly.LoadFrom(@"Z:\Path\To\Foo.dll");. We need to give an explicit path to the Foo.dll as the T4 engine runs outside of the scope of the project and we therefore could not access the Foo assembly through AppDomain for example.

The final step is to save "Extensions.tt", right click it in the solution explorer and select "Run Custom Tool". A new file, "Extensions.cs" should now be generated with all the required extension methods and looks something like

using Foo;

namespace Bar
{
	public static class Extensions
	{
		public static LandVehicle AsLandVehicle(this Vehicle node)
		{
			return ((Vehicle.LandVehicle)node).Item;
		}

		public static WaterVehicle AsWaterVehicle(this Vehicle node)
		{
			return ((Vehicle.WaterVehicle)node).Item;
		}

		... *snip* ...
	}
}

Posted in .NET, C#, F#, T4 | No Comments

Computing Perfect Numbers in F#

Posted on October 17, 2011 by James Ball

In this (brief) post I’ll quickly look at how we can naively determine if a number is perfect using F#.

A perfect number is an integer, whose proper factors sum to (where a proper factor is all the factors of except for itself).

Now, a factor of is any integer, , in the set such that is a whole number.

This can be expressed rather eloquently in F# as a lazily evaluated sequence:

let get_factors b =
    seq { 1 .. b } |> Seq.filter(fun a -> b % a = 0)

let get_proper_factors b =
    seq { 1 .. (b-1) } |> Seq.filter(fun a -> b % a = 0)

The method for determining if a number is perfect should now be clear:

let is_perfect b =
    match b with
    | 0 -> false
    | _ -> (b |> get_proper_factors |> Seq.sum) = b

Posted in F#, Mathematics, Number Theory | No Comments

F# – Sequences and Lists

Posted on July 3, 2011 by James Ball

Two useful features of F# are its lists and sequences, both provide similar functionality but with quite drastically different implementations.

A sequence is simply an alias for .NET’s IEnumerable and is therefore evaluated lazily; elements are only computed as they are needed (enumerated over).

A list (despite its name) is unrelated to .NET’s List. It is an unchangeable (immutable) collection of elements that is evaluated eagerly. Internally they are implemented with the use of a linked list structure which is in contrast to .NET’s List which internally uses an array. In practical terms, this means that reading elements in F#’s list becomes progressively slower the further down the list one goes, whilst in .NET’s List element access times remain constant. However, the fact that F# lists are immutable gives them a sharp advantage over standard .NET Lists in some scenarios.

First, lets look at how a sequence is defined.

let squares_seq n =
    seq { for i in 0..n -> (i, i*i) }

The squares_seq function returns a sequence of tuples; the first element of each tuple is a number and the second element is that number’s square.

Now for the same function, but returning a list:

let squares_list n =
    [ for i in 0..n -> (i, i*i) ]

The code is almost identical except for the square rather than curly braces, as well as the obvious emission of the seq keyword.

If each of these functions was iterated over in a for loop, the effect would be identical:

let n = 10

for (i, s) in (squares_seq n) do
    printfn "The square of %d id %d" i s

for (i, s) in (squares_list n) do
    printfn "The square of %d id %d" i s

However, drastic differences can be seen when we dig deeper and give thought to the eagerness of the list and the laziness of the sequence. Consider the following code:

open System.Diagnostics
let n = 1000000

stopwatch.Start()
squares_seq n |> ignore
stopwatch.Stop()
stopwatch.ElapsedMilliseconds |> printfn "Seq time: %d"

stopwatch.Restart()
squares_list n |> ignore
stopwatch.Stop()
stopwatch.ElapsedMilliseconds |> printfn "List time: %d"

Which produces the following output:

Seq time: 1
List time: 693

No matter how many elements are in the sequence it will always take exactly the same amount of time to instantiate it as any computation is deferred.

This is further demonstrated if we time how long it takes to iterate over both the sequence and the list once they have already been instantiated.

stopwatch.Start()
for (i, s) in squares_seq n do
    (i, s) |> ignore
stopwatch.Stop()
stopwatch.ElapsedMilliseconds |> printfn "Seq time: %d"

stopwatch.Reset()
for (i, s) in squares_list n do
    (i, s) |> ignore
stopwatch.Stop()
stopwatch.ElapsedMilliseconds |> printfn "List time: %d"

Which produces the output:

Seq time: 304
List time: 0

Presumably, because the list has already been computed and the for loop does nothing, it is optimized out by the compiler and thus takes 0 milliseconds, whilst the sequence for loop has quite a lot of hidden work to do.

It is interesting to note that it takes 304 milliseconds to iterate over the sequence and 693 milliseconds to instantiate the list; roughly twice slower for the same end results (in this particular non-rigorous test).

Posted in .NET, F# | No Comments

Real and Protected Segmentation

Posted on April 27, 2011 by James Ball

One of things that lead to the downfall of my first attempt to write an operating system last summer was not fully understanding how segmentation works and that there are two different types depending on the CPUs mode.

In real mode, segmentation is quite simple. A logical address consists of a segment and an offset. This logical address is then converted to a physical address internally by the CPU. To convert the logical address to a physical one, the CPU shifts the segment value 4 bits to the left (equivalent to multiplication by 16) and then adds the value of the offset.

Let’s look at an example:

mov word ax, [0x07c0:0x0200]

So, the segment is 0x07c0 (which refers to the 0x07c0th 64KiB block of memory). First, the segment is shifted 4 bits to the right, leaving us with 0x7c00. Then the offset (0×0200) is added, leaving us with a physical address of 0x7e00.

Pretty simple huh?

One problem you may notice with this is that there isn’t a unique logical address for each physical address, for example:

mov word ax, [0x07c0:0x0200]
mov word ax, [0x07e0:0x0000]

Both instructions refer to exactly the same physical address (0x7e00)!

The story in protected land becomes a little more complex. Segments no longer refer to a 64KiB block of memory, instead they have the following form:

The lowest 2 bits of the segment describe the privilege level that the segment expects to be accessed in (i.e. 0 is ring 0, 1 is ring 1, etc)

The next bit describes which table the CPU should look the segment index up in. A clear bit indicates the segment will be found in the GDT, whilst a set bit indicates it’ll be found in the LDT.

The remaining 13 bits are the segment index, the index points to an entry in either the LDT or GDT.

In protected mode, a logical address is converted first to a linear address by the CPU. The linear address is then sent through the paging unit to be converted to a physical address.

The conversion to a linear address is performed by taking the upper 13 bits of the segment selector, multiplying it by 8 (the size of an entry in either the GDT and LDT) and using the resulting value as a pointer to the segment descriptor in either the GDT or LDT. The correct segment descriptor contains the linear base address of the segment, which is taken by the CPU and added to the offset.

So for example,

jmp 0x08:0x0200

The segment is 0×08, or 00001000 in binary. The lower two bits (bit 0 and 1) with a value of 0 show that the segment should only be accessed in ring 0 and the next bit (bit 2) shows that the segment descriptor can be found in the GDT. The remaining 13 bits simply equal 1, which tells the CPU that the corresponding segment descriptor is at position 1 in the GDT.

If the segment descriptor at position 1 in the GDT had a base address of 0xcafe then the CPU adds the base to the offset (0×0200) to give the linear address of 0xccfe. Other checks are made such as checking the privilege level, read and write access, etc. and if they fail, the CPU will raise an exception.

I will be discussing segment descriptors and the GDT in a later post.

Exercise: Look up segment descriptors, the GDT and the LDT in the Intel x86 Manual, Volume 3.

Posted in Assembly, OS Development | No Comments

Converting Integers to Hex Strings

Posted on April 15, 2011 by James Ball

This article focuses on how one can go about converting the value of a register to a hex string in x86 real mode assembly. The conversion is almost trivial if we remember that each hex digit directly maps to 4 bits.

Let’s first define the routine which we are going to create; itoa_ax_hex takes the value of AX and creates a 5 character hex string at the location pointed to by ES:DI. Note that the created string is not null terminated (the 5th character is for ‘h’). Not null terminating the string allows the calling code to easily copy the string into other strings.

Consider the value C0DEh, we can get the first hex digit by performing a bitwise AND with F000h, and then shifting the resulting value 12 bits to the right.

mov ax, 0xc0de
mov bx, 0xf000
and ax, bx
shr ax, 12

The value of AX is now simply Ch. We now need to convert AX into its correct ASCII code (43h). Notice that if AX was less than Ah, we could get the ASCII code by adding 30h. Because AX is greater than 9h, then we must add 37h to get the correct code. This can be expressed in assembly as:

	cmp ax, 0x09
	jle .1
	add ax, 0x07
.1:	add ax, 0x30

The value of AL (we can discard the high bits of AX as it will never be greater than Fh) can then be copied to the location specified by ES:DI.

mov [es:di], al
inc di

So, now for the 2nd hex digit. We can repeat exactly the same process as before if we simply take the value C0DEh and shift it 4 bits to the left, which removes the C and puts 0 as the most significant hex digit.

Getting the rest of the digits follows exactly the same method, if we keep a counter of how many digits have been converted then we know that when the counter reaches 4, then all the digits have been converted.

Once all the digits have been converted, the final character is then added.

mov al, 'h'
mov [es:di], al

One problem remains, we have incremented ES:DI so that it now points to the last character in the string, this obviously presents a problem for the calling code. Simply subtracting 4 from DI solves the problem.

Exercise 1: Write the entire itoa_ax_hex routine to convert all 4 digits of AX in a loop.

Solution:

itoa_ax_hex:
; Converts the value of AX to its ASCII hex representation.
; IN:	ES:DI - String destination (must be atleast 5 bytes).
;	AX - number to convert
	push bx
	push cx		; Be nice and preserve BX and CX
	mov bx, 0xf000
	xor cx, cx	; CX = 0, counts the number of digits converted
.next:	push ax		; Save AX because AND will alter it
	and ax, bx	; AX now only contains the most significant digit
	shr ax, 12
	cmp ax, 9	; Is the HEX digit a letter?
	jle ._
	add ax, 0x07	; If it is, we need to add an additional 7
._:	add ax, 0x30	; Add 0x30 to bring the digit up to its ASCII code
	mov [es:di], al
	inc di		; Move to the next character.
	inc cx		; Incrememnt the counter. Thats another digit done
	pop ax
	shl ax, 4	; Make the next digit the most significant
	cmp cx, 4
	jne .next	; If CX doesn't yet equal 4, then there are more digits
			; left. Otherwise, we are done and can add the 'h'
			; suffix.
	mov byte [es:di], 'h'
	sub di, 4	; Restore DI to its previous value
	pop cx
	pop bx		; Restore original registers
	ret

Exercise 2: Adapt the routine to convert all 8 digits of EAX