Native On-Chip GDB Remote Protocol Support

November 24th, 2012

A typical software debug solution for an embedded systems might involve a JTAG connection to the board, and then some kind of protocol translation software that handles communication between GDB’s remote serial protocol and the target JTAG port (see OpenOCD, for instance). The FPGA systems I’m working with include JTAG ports, and the vendors also provide JTAG IP cores for interfacing them to your digital logic. On the other hand, these systems also have nice UARTs that are easy to talk to. We have the opportunity to dramatically simplify the debug toolchain by including support for GDB’s remote protocol directly on chip. This would be a hardware implementation of the protocol – no software stubs required.

The GDB Target Engine IP core is essentially a state machine that reads GDB packets coming over the UART (a microusb connection to my laptop). It has direct access to MoxieLite core through some additional wires for extracting register values. It also acts as a bus master to read/write directly to/from memory. The Marin SoC only has one bus master – the moxie core. The nice thing here is that we don’t have to add any new bus arbitration logic for the second master, because only one master will ever be active at a time. We’re either running in debug mode (active connection to GDB over the UART), in which case the GDB Target Engine is the bus master, or we’re running in regular mode, where the moxie core is in control.

The GDB remote protocol includes many commands these days, but only a small number are required to be supported by the target: read/write registers, read/write memory, step and continue.

Current status is that I can connect GDB directly to the SoC using “target remote /dev/ttyUSB0″, at which point GDB negotiates with the target to determine what features are supported. I can hit Ctrl-C in GDB to tell the SoC to enter debug mode. The Target Engine core then talks to MoxieLite to extract register values, converts them to ASCII text and sends them back to the debugger over the wire. This includes the PC, so GDB knows where to go. Given that this is working, I’m not too worried about the rest of it – but only time will tell…

Tags: gdb, JTAG, marin, SoC
Posted in moxie | No Comments »

Running a C Program on the Marin SoC

November 15th, 2012

I’ve just committed the bits required to run a C program on the Marin SoC.

Rather than hook up the Nexys3 external RAM module, I’m using extra space on the FPGA itself for RAM. Most of the hard work was sorting out the linker script magic required to generate an appropriate image.

I’ve also added a UART with 1k hardware FIFO transmit and receive buffers. The 1k is probably overkill, so I’ll likely shrink them once everything else is working.

I’ve moved all memory mapped IO devices up to 0xF0000000. So, for instance, the 7-segment display LED is at 0xF0000000, and the UART transmit register is at 0xF0000004. I’ll just keep going from there.

Next comes libgloss hacking to map stdout/stdin to the UART (which I talk to with minicom on my Linux box). We’re very close to “Hello World” now!

Tags: fpga, gcc, marin, newlib, Nexys3, SoC
Posted in moxie | 1 Comment »

Moxie SoC Progress

November 10th, 2012

Time for a quick update!

“Marin” is the name of my test SoC consisting of a wishbone wrapped 75Mhz big-endian MoxieLite bus master, along with two slave devices: embedded ROM and the Nexys3′s 7-segment display. So, right now I can write some code into FPGA embedded ROM to manipulate the display. For example…

        .text
	.p2align        1
        .global MarinDisplayTest

        .equ BIG_ENDIAN,1

        # This is where 7-segment display is mapped to memory
        .equ DISPLAY_ADDR,0x00100000

MarinDisplayTest:
        ldi.l   $r1, 0x1234
        ldi.l   $r3, 0x0
loop:   sta.s   DISPLAY_ADDR, $r1
        dec     $r1, 1
        ldi.l   $r2, 5000000
delay:  dec     $r2, 1
        cmp     $r2, $r3
        bne     delay
        jmpa    loop

This displays a countdown on the hex display starting at 1234.

Here’s what I think will be next:

  • I need to be able to access RAM, which means implementing a module to support the Nexys3′s CellularRAM device and wrapping that up as a wishbone slave.
  • Once I can access RAM, I can test C compiler output, but only small code that I can embed into the FPGA’s ROM.
  • Next comes a UART wishbone slave so I can talk to it over the microusb serial port from my Linux host. I’ll need to hack up libgloss to map I/O to my memory-mapped UART.
  • One of the annoying things about this Xilinx toolchain is that AFAICT Digilent doesn’t provide the tool you need for programming memory (Flash, RAM, or otherwise) from your Linux host. So I plan on writing some ROMable firmware to download code (srecords?) over the UART (xmodem?) to program memory. This is the point at which we should be able to run larger programs. I already have a u-boot port, so I think that will be first on my list.

It’s great to have Brad Robinson’s MoxieLite implementation for Marin. It’s very small but can still run at quite a clip. Once the surrounding infrastructure is working, however, I’m going to get back to Muskoka, which is my 4-stage pipelined moxie SoC to see if I can crank up the MHz.

As usual everything is in github. However, the HDL cores and SoC designs are no longer embedded in the moxiedev tree. They’re in a new top-level git repo called moxie-cores. Check it out here: github.com/atgreen/moxie-cores

Tags: fpga, marin, MoxieLite, Nexys3, SoC, u-boot, wishbone, Xilinx
Posted in moxie | No Comments »

Moxie and Free Software EDA at FSOSS

October 16th, 2012

I’ll be speaking at FSOSS in Toronto next week on moxie and Free Software EDA tools. Check it out here: fsoss.senecac.on.ca/2012/node/150.

Tags: FSOSS
Posted in moxie | 2 Comments »

MoxieLite in Action

September 22nd, 2012

spacer

Brad Robinson just sent me this awesome shot of MoxieLite in action. His Xilinx Spartan-6 FPGA based SoC features a moxie core handling VGA video, keyboard and FAT-on-flash filesystem duties using custom firmware written in C. This is all in support of a second z80-based core on the same FPGA used to emulate an ’80s era computer called the MicroBee. Those files in the listing above are actually audio cassette contents used to load the MicroBee software. The moxie core is essentially a peripheral emulator for his final product.

Keep up the great work, Brad!

The most recent compiler patch was the addition of -mno-crt0, which tells the compiler not to include the default C runtime startup object at link time. This is common practice for many embedded projects, where some system specific house keeping is often required before C programs can start running. For instance, you may need to copy the program’s .data section from ROM into RAM before jumping to main().

I’m going back to my pipelined moxie implementation. Last I looked I had to move memory reads further up the pipeline…

Tags: fpga, gcc, MoxieLite, vga, Xilinx
Posted in moxie | No Comments »

It’s Alive!

September 14th, 2012

There’s a working hardware implementation of moxie in the wild!

Intrepid hacker Brad Robinson created this moxie-compatible core as a peripheral controller for his SoC. He had been using a simple 8-bit core, but needed to address more memory than was possible with the 8-bit part. Moxie is a nice alternative because it has a compact instruction encoding, a supported GNU toolchain and a full 32-bit address space. FPGA space was a real concern, so he started with a non-pipelined VHDL implementation, and by all accounts it is running code and flashing LEDs on a Nexys3 board!

The one major “ask” was that there be a little-endian moxie architecture and toolchain in addition to the default big-endian design. I had somewhat arbitrarily selected big-endian for moxie, noting that this is the natural byte order for TCP. In Brad’s design, however, the moxie core will handling FAT filesystem duties, which is largely a little-endian task. At low clock speeds every cycle counts, so I agreed to produce a bi-endian toolchain and, for the most part, it’s all committed in the upstream FSF repositories (with the exception of gdb and the simulator). moxie-elf-gcc is big-endian by default, but compile with -mel and you’ll end up with little-endian binaries.

Brad also suggested several other useful tweaks to the architecture, including changing the PC-relative offsets encodings for branches. They had originally been encoded relative to the start of the branch instruction. Brad noted, however, that changing them to be relative to the end of the branch instruction saved an adder in his design. I made this change throughout the toolchain and (*cough*) documentation.

I’ll write more about this as it develops… Have to run now.

Oh. Here’s the VHDL on github: github.com/toptensoftware/MoxieLite. Go Brad!

AG

Tags: architecture, fpga, gcc, gdb, Nexys3, SoC, VHDL, Xilinx
Posted in moxie | 1 Comment »

The case against the [L]GPL for Semiconductor Core Licensing

September 9th, 2012

Eli Greenbaum wrote a terrific article for the Harvard Journal of Law & Technology last fall called ‘Open Source Semiconductor Core Licensing‘. I’m using the GPL as a place-holder in my verilog source, but I’ve always felt that the GPL/LGPL were inappropriate licenses for digital logic. Eli’s article makes clear arguments on why this is the case.

Here’s the link to Eli’s article: jolt.law.harvard.edu/articles/pdf/v25/25HarvJLTech131.pdf

Tags: copyright, legal, licensing
Posted in moxie | No Comments »

vfork() for uClinux forces an architecture change

September 3rd, 2012

Moxie uses a simple software interrupt instruction (swi) to implement system calls. The swi instruction creates a call frame on the stack and then jumps to a global exception handler routine. The exception handler for moxie-uClinux switches to the kernel stack before jumping to the relevant kernel routine. Returning from an exception becomes a simple ret instruction because we have a nice call frame on our stack. Very simple.

vfork(), a kludge that was ejected from posix, but is still required for MMU-less uClinux ports, throws this for a loop. The vfork system call creates a child process that shares memory with the parent, including a shared stack. This means that the vfork system call returns twice on the shared stack: once for the child, and then again for the parent. The problem is that the child, once returned, is going to write over the swi call frame on the shared stack as it continues to do work. This sends the parent off into randomland when it eventually returns using the corrupted call frame.

Actually, it’s not just the swi call frame. There’s also the vfork() stack frame from the C library to worry about.

This problem isn’t unique to moxie. If you examine the x86 uClibc vfork() implementation, you’ll see that it stashes all the info it needs for the return in registers that are preserved over the vfork system call.

For moxie, I’ll likely need to do the same thing in uClibc’s vfork(), but I’m also going to change the semantics of the swi instruction. This means formalizing the notion of user mode and kernel mode. The uClinux port already does this by convention. One of the special registers is used to store the Linux kernel-mode stack pointer. The swi instruction will be changed to immediately switch stacks and push the userland return info onto the non-shared kernel stack, leaving the shared user stack completely untouched. The exception handler will have a bit more house keeping to do, but vfork() should work.

Tags: architecture, linux
Posted in moxie | No Comments »

Forking bugs

August 15th, 2012

I found some time to look at the Linux kernel port again, and discovered a bug in the forking code (the child process must return 0 after a fork!). What we’re looking at here is the start of userland, post kernel boot, where busybox is trying to run an init script. It’s still not working, but some cool things are, like the stack trace. I think the next troubling bit is where busybox tries to exec itself (/proc/self/exe) and /proc isn’t mounted. Or something like that.

___  ___           _              _____ _ _                  
|  \/  |          (_)            /  __ \ (_)                 
| .  . | ___ __  ___  ___   _   _| /  \/ |_ _ __  _   ___  __
| |\/| |/ _ \\ \/ / |/ _ \ | | | | |   | | | '_ \| | | \ \/ /
| |  | | (_) |>  <| |  __/ | |_| | \__/\ | | | | | |_| |>  < 
\_|  |_/\___//_/\_\_|\___|  \__,_|\____/_|_|_| |_|\__,_/_/\_\
sh: can't execute 'ls': No such file or directory.
/bin/sh: option requires an argument -- c
BusyBox v1.19.0.git (2012-08-14 23:32:21 EDT) multi-call binary.

Usage: sh [-nxl] [-c 'SCRIPT' [ARG0 [ARGS]] / FILE [ARGS]]

Unix shell interpreter

Kernel panic - not syncing: Attempted to kill init!
Rebooting in 120 seconds..
Machine restart...

Stack:
  03819e8c ffffffff 03819fb8 ffffffff ffffffff 03819ec0 0000408a 000fdfd2 
  0022438c 00000063 fa3c0000 00000000 00000000 03819ee4 03819ee4 0001e800 
  03bb8d14 0002990e ffffffff ffffffff 000003e8 000fceac 0001d4bf 03819f34 
Call Trace: 

[<0000408a>] machine_restart+0x14/0x1a
[<000fdfd2>] bust_spinlocks+0x0/0x4a
[<0001e800>] emergency_restart+0xa/0xc
[<0002990e>] up_read+0x8/0xa
[<000fceac>] __muldi3+0x0/0x92
[<0001d4bf>] do_notify_parent+0x193/0x240
[<0004038c>] panic+0x11c/0x162
[<00012ea8>] exit_files+0x1e/0x26
[<000130c6>] do_exit+0x6e/0x706
[<0001377a>] sys_exit+0x0/0x18
[<00013792>] do_group_exit+0x0/0xac
[<00057fe2>] sys_write+0x0/0x96
[<000017fa>] .return_from_exception+0x0/0x18

Tags: linux
Posted in moxie | No Comments »

Multiported Registers, Microcode and Register Forwarding

July 1st, 2012

When I last wrote about tackling the ‘pop’ instruction I noted that I needed the ability to write to multiple registers before retiring that one instruction – something that would require extra instruction cycles or loads more logic. I recently came across some work by Charles Eric LaForest on Efficient Multi-Ported Memories for FPGAs. His Live Value Table (LVT) approach solves my problem quite neatly, and I was able to adapt some of his sample code for a new register file implementation that supports 2 simultaneous writes as well as 4 reads.

One more recent change includes the addition of microcoded pipeline control signals. I simply created a text file managed with emacs org-mode that describes pipeline control signals used for each instruction. A little lisp script reads this and turns it into a binary table that is read during the instruction decode stage. Passing the signals down the pipeline is much simpler than hand coding behaviours in a big switch statement.

Also, quite some time ago I wrote about handling Read-After-Write pipeline hazards by inserting bubbles into the pipeline. I replaced that with some register forwarding logic, so you can read a register immediately after writing to it without introducing any delays.

So… progress is being made! I think I’ll be running my first C program soon.

Tags: muskoka, verilog
Posted in moxie | No Comments »

« Older Entries